| Type: | Package |
| Title: | R Functions Useful for Community Ecology |
| Version: | 0.2.1 |
| Description: | A collection of utility functions for community ecology analyses, with emphasis on workflows using the 'phyloseq' and 'vegan' packages. Includes functions for normalizing OTU tables, computing alpha diversity via rarefaction (using a fast C++ implementation), differential abundance comparisons with compact letter displays, primer checking for amplicon sequencing, plotting QIIME 2/DADA2 generated transition stats and miscellaneous helpers for ordination plots and taxonomic name formatting. |
| Depends: | R (≥ 4.1.0) |
| Imports: | agricolae, ape, Biostrings, dada2, data.table, dplyr, ggplot2, insect, multcompView, magrittr, phyloseq, Rcpp, readr, scales, ShortRead, SRS, stringr, tibble, vegan (≥ 2.4-6) |
| LinkingTo: | Rcpp |
| License: | GPL-2 |
| Encoding: | UTF-8 |
| LazyData: | true |
| RoxygenNote: | 7.3.3 |
| Suggests: | dunn.test, knitr, rmarkdown, reshape2, testthat (≥ 3.0.0) |
| VignetteBuilder: | knitr |
| URL: | https://github.com/jfq3/QsRutils |
| BugReports: | https://github.com/jfq3/QsRutils/issues |
| Config/testthat/edition: | 3 |
| Additional_repositories: | https://bioconductor.org/packages/release/bioc |
| NeedsCompilation: | yes |
| Packaged: | 2026-05-09 20:49:14 UTC; johnq |
| Author: | John Quensen [aut, cre, cph] |
| Maintainer: | John Quensen <quensenj@msu.edu> |
| Repository: | CRAN |
| Date/Publication: | 2026-05-13 19:30:02 UTC |
QsRutils: R Functions Useful for Community Ecology
Description
A collection of utility functions for community ecology analyses, with emphasis on workflows using the 'phyloseq' and 'vegan' packages. Includes functions for normalizing OTU tables, computing alpha diversity via rarefaction (using a fast C++ implementation), differential abundance comparisons with compact letter displays, primer checking for amplicon sequencing, plotting QIIME 2/DADA2 generated transition stats and miscellaneous helpers for ordination plots and taxonomic name formatting.
Author(s)
Maintainer: John Quensen quensenj@msu.edu [copyright holder]
See Also
Useful links:
Remove elements from a vector
Description
Removes elements present in 'y' from 'x'.
Usage
x %wo% y
Arguments
x |
A vector. |
y |
A vector of elements to remove from 'x'. |
Value
A vector containing elements of 'x' that are not in 'y'.
Examples
c(1, 2, 3, 4, 5) %wo% c(2, 4)
letters[1:5] %wo% c("b", "d")
QsRutils: R Functions Useful for Community Ecology
Description
The QsRutils package contains functions I have written to make some aspects of using phyloseq and vegan simpler. I originally called the package MyRutils, but that does not make much sense if I am posting it publically!
arc_sine
Description
Transform a percentage to its arc sine.
Usage
arc_sine(x)
Arguments
x |
A percentage. |
Value
The arcsine transformation of x.
Examples
arc_sine(30.1)
Indicate Significance with Stars
Description
Indicate Significance with Stars
Usage
asterix(prob)
Arguments
prob |
p value |
Details
Returns '***; for p < 0.001, '**' for p < 0.01, '*' for p < 0.05.
Value
Character vector of asterisks indicating significance level.
Examples
asterix(0.039)
Average Alpha Diversity (faster implementation)
Description
Calculates alpha-diversity metrics from n samplings of an OTU table to a constant number of counts per sample.
Usage
avg_alpha(
otu,
sampling_depth,
iterations = 100,
sum_method = c("median", "mean"),
ncores = 1
)
Arguments
otu |
An OTU table as a data frame or matrix with samples as rows and taxa as columns. |
sampling_depth |
The number of counts per sample in the sampled OTU table |
iterations |
The number of times the OTU table should be sampled. |
sum_method |
Method ("median" or "mean") for summarizing replication results. |
ncores |
Number of cores to use for parallel execution. Default 1 (no parallelism). |
Details
This implementation focuses on speed: - vectorizes the alpha-metric calculations (avoids repeated calls to vegan::diversity and vegan::specnumber) - uses base R operations (rowSums, logical ops, arithmetic) which are much faster in tight loops - optionally parallelizes replicates across cores (platform-aware).
The OTU data frame supplied must be in typical vegan format: samples as row names and taxa as column names. The minimum row sum must be greater than or equal to the sampling depth.
By default the sum_method is mean. For a similar function in QIIME2, the default sum_method is median.
Value
Returns a data frame with Shannon, Observed, Pielou, Simpson and Inverse Simpson as the column names for each sample as the row names in an OTU table.
Examples
{
data(BCI, package = "vegan")
otu <- BCI[rowSums(BCI) > 400, ]
avg_alpha(otu, sampling_depth = 400, iterations = 100)
}
Check Primer Hits
Description
Determine hits of all orientations of primers to paired sequence files.
Usage
check_primer_hits(
path,
fwd_pattern = "_R1.fastq",
rev_pattern = "_R2.fastq",
fwd_primer = "GGAAGTAAAAGTCGTAACAAGG",
rev_primer = "GCTGCGTTCTTCATCGATGC"
)
Arguments
path |
Path to paired sequence files |
fwd_pattern |
Portion of file name that distinguishes forward read files. The default is "_R1.fastq" |
rev_pattern |
Portion of the file name that distinguishes the reverse file. The default is "_R2.fastq" |
fwd_primer |
Nucleotide sequence of the forward primer |
rev_primer |
Nucleotide sequence of the reverse primer |
Details
This function is for checking the effectiveness of primer trimming of ITS sequences. Because the ITS region varies in length, it is possible for forward sequences to extend past the reverse primenr region and vice-versa, leadsing to what Robert Edgar calls staggered pairs fi the sequences are merged.
The fwd_pattern and rev_pattern must contain the file extension. The defaults are "_R1.fastq" and "_R2.fastq".
Default primers are ITS5 (forward) and ITS2 (reverse) from White et.al 1990.
ITS5: "GGAAGTAAAAGTCGTAACAAGG"
ITS2: "GCTGCGTTCTTCATCGATGC"
Value
A table of hits to the sequences by all primer orientations
Examples
{
# Copy example fastq files to a writable temp directory
src <- system.file("extdata", package = "QsRutils")
path <- file.path(tempdir(), "QsRutils_example")
dir.create(path, showWarnings = FALSE)
file.copy(list.files(src, pattern = "\\.fastq\\.gz$", full.names = TRUE), path)
check_primer_hits(
path = path,
fwd_pattern = "raw_1.fastq.gz",
rev_pattern = "raw_2.fastq.gz",
fwd_primer = "GGAAGTAAAAGTCGTAACAAGG",
rev_primer = "GCTGCGTTCTTCATCGATGC"
)
}
Check Variance
Description
Tests for Heterogeneity of Variances in make_comparisons Result
Usage
check_var(otu.pc.transformed, group.vector)
## S3 method for class 'check_var'
print(x, ...)
Arguments
otu.pc.transformed |
An OTU matrix of transformed data. Taxa are rows. |
group.vector |
A vector of treatments. |
x |
A |
... |
Arguments passed to |
Value
A data frame of class "check_var" with one row per taxon and
columns taxon, statistic, df, and p.value from
the Fligner-Killeen test. The object can be printed with print().
See Also
make_comparisons
Examples
# Transform species matrix to proportion;
# Check variances for the first three species
# in the dune data set grouped by Management.
data(dune, package = "vegan")
data(dune.env, package = "vegan")
dune <- vegan::decostand(dune, method = "total")
dune <- dune[, 1:3]
dune <- t(dune)
check_var(dune, dune.env$Management)
CLDs from DUNN
Description
Generate compact letter displays from Dunn.test results
Usage
cld_dunn(dunn_rslt, significance = 0.05)
Arguments
dunn_rslt |
The result of the function dunn.test::dunn.test |
significance |
The alpha level for statistical significance |
Value
A list of two items. The first, p_adj_matrix, is an object of class 'dist' giving p values adjusted for multiple comparisons. The second,clds, is a character vector of compact letter displays (CLDs) for each treatment.
Examples
# Example cribbed and modified from the kruskal.test documentation
x <- c(2.9, 3.0, 2.5, 2.6, 3.2) # normal subjects
y <- c(3.8, 2.7, 4.0, 2.4) # with obstructive airway disease
z <- c(2.8, 3.4, 3.7, 2.2, 2.0) # with asbestosis
x <- c(x, y, z)
g <- factor(rep(1:3, c(5, 4, 5)),
labels = c("Normal",
"COPD",
"Asbestosis"))
dunn_rslt <- dunn.test::dunn.test(x, g)
cld_dunn(dunn_rslt, significance = 0.05)
Make CLD tibble from Tukey HSD Results
Description
Makes a tibble for adding compact letter assignments to a boxplot using HSD.test results.
Usage
cld_hsd(hsd_rslt, y_pos = "boxtop")
Arguments
hsd_rslt |
The result of the HSD.test function of package agricolae |
y_pos |
The y-position in relation to the boxplots. Choices are at the top of the box ("boxtop", the default) or at the maximum group value ("max"). |
Details
hsd_rslt must be created with agricolae::HSD.test
Value
A tibble with columns for treatment groups (x), the y-positions of the treatment CLD (y), and the CLD letters indicating significantly different treatments.
Examples
data("iris")
model <- lm(Petal.Length ~ Species, data = iris)
hsd_rslt <- agricolae::HSD.test(model, trt="Species")
cld_hsd(hsd_rslt)
Clear Warnings
Description
Clears all warning messages from the base environment.
Usage
clear_warnings()
Details
Sometimes when working in the console R retains a list of warnings such that they keep being reported after the function call which originated them. This function removes them so that they are not a nuisance.
Value
No return value, called for side effects. Clears the stored warning list so that stale warnings are no longer reported in the console.
Examples
{
as.numeric(c("1", "2", "apples"))
summary(warnings())
clear_warnings()
summary(warnings())
}
comb
Description
Calculates the number of combinations of n things drawn r at a time.
Usage
comb(n, r, repetition = FALSE)
Arguments
n |
The total number of items. |
r |
The number of items to be drawn. |
repetition |
A logical, whether or not repetitions are allowed. FALSE by default. |
Value
An integer giving the number of ways a set of r items can be drawn from a set of n items.
Examples
comb(5, 3)
comb(5, 3, repetition = TRUE)
Assemble Comparison Parts
Description
Assembles Comparison Data Frame
Usage
comp_assemble(part1, part2, part3)
Arguments
part1 |
Result from comp_means_sd |
part2 |
Result from comp_make_f_tests |
part3 |
Result from comp_comparisons |
Details
The data frame returned has taxa as row names. The first three column names are mean (relative abundance of the taxa), sd and F_value for comparisons among groups. The remaining column names are the groups. The group columns show the mean relative abundance for the group plus/minus the standard deviation and a compact letter display for the group. See also the vignette "Compare Relative Abundances Among Treatments."
Value
A summary data frame of differential abundances by taxon and treatment.
Examples
{
data("its.root")
temp1 <- comp_prepare_phyloseq(its.root)
temp2 <- comp_prepare_otu_table(temp1$expt.taxon.pc,
grps = "Label",
transformation = "sqrt_arc_sine")
temp3 <- comp_means_sd(temp2$otu.pc)
temp4 <- comp_make_f_tests(temp2$otu.pc.trans,
grps = temp2$groups,
var.equal = TRUE)
temp5 <- comp_comparisons(otu.pc = temp2$otu.pc,
otu.pc.trans = temp2$otu.pc.trans,
grps = temp2$groups,
p.adjust.method = "BH",
pool.sd = TRUE)
comp_assemble(temp3, temp4, temp5) |>
dplyr::arrange(desc(mean))
}
Make Comparisons
Description
Calculates the treatment comparison portion of a table comparing relative abundances of each taxon among treatments.
Usage
comp_comparisons(
otu.pc,
otu.pc.trans,
grps,
p.adjust.method = "BH",
pool.sd = FALSE
)
Arguments
otu.pc |
An OTU table of percentages. |
otu.pc.trans |
An OTU table of transfromed data. |
grps |
A vector of treatemnt groups for which to make comparisons. |
p.adjust.method |
Adjustment method for multiple comparisons. |
pool.sd |
A logical, whether or not to pool standard deviations. |
Details
The row names of the data frame returned are taxa. The columns are of type character and their names are the group names. For each group, the entry gives the mean relative abundance +/- the standard deviation and a compact letter display (CLD) for the group. See also the vignette "Compare Relative Abundances Among Treatments."
Value
A data frame of differences in relative abundances among treatments.
Examples
{
data("its.root")
temp1 <- comp_prepare_phyloseq(its.root)
temp2 <- comp_prepare_otu_table(temp1$expt.taxon.pc,
grps = "Label",
transformation = "sqrt_arc_sine")
comp_comparisons(otu.pc = temp2$otu.pc,
otu.pc.trans = temp2$otu.pc.trans,
grps = temp2$groups,
p.adjust.method = "BH",
pool.sd = TRUE)
}
Make F Tests
Description
Calculates omnibus F tests to be included in a table comparing relative abundances of each taxon among treatments.
Usage
comp_make_f_tests(otu.pc.trans, grps, var.equal = FALSE)
Arguments
otu.pc.trans |
An OTU table of transformed data from comp_prepare_otu_table. |
grps |
A vector of treatment groups for which to make comparisons. |
var.equal |
Logical, whether or not to assume variances equal. |
Details
The row names of the data frame returned are taxa. The column names are 'F', 'Prob', 'sig' and 'F_value'. The column 'sig' includes asterisks indicating the degree of significance and the 'F_value' column is the F value to 2 decimal places plus the asterisk(s) from 'sig'. See also the vignette "Compare Relative Abundances Among Treatments."
Value
A data frame of the F-test results.
Examples
{
data("its.root")
temp1 <- comp_prepare_phyloseq(its.root)
temp2 <- comp_prepare_otu_table(temp1$expt.taxon.pc,
grps = "Label",
transformation = "sqrt_arc_sine")
temp4 <- comp_make_f_tests(temp2$otu.pc.trans,
grps = temp2$groups,
var.equal = TRUE)
temp4
}
Calculate Means and Standard Deviations
Description
Calculates means and standard deviation for each taxon to be included in a table comparing relative abundances of each taxon among treatments.
Usage
comp_means_sd(otu.pc)
Arguments
otu.pc |
An OTU table with data as percentages. |
Details
The OTU table should be created with comp_prepare_otu_table. The data frame returned has taxa as row names. For each taxa, 'mean' is the mean relative abundance and 'sd' is the standard deviation. See also the vignette "Compare Relative Abundances Among Treatments."
Value
A data frame with mean relative abundance and standard deviations by taxon.
Examples
{
data("its.root")
temp1 <- comp_prepare_phyloseq(its.root)
temp2 <- comp_prepare_otu_table(temp1$expt.taxon.pc,
grps = "Label",
transformation = "sqrt_arc_sine")
temp3 <- comp_means_sd(temp2$otu.pc)
temp3
}
Prepare OTU Table
Description
Make OTU tables for making comparisons of relative abundances among treatments.
Usage
comp_prepare_otu_table(
expt.taxon.pc,
grps = "Treatment",
transformation = "sqrt_arc_sine"
)
Arguments
expt.taxon.pc |
Phyloseq object from comp_prepare_phyloseq with percentages in the otu_table. |
grps |
Factor in sample data for which to make comparisons. |
transformation |
Transformation function to use. |
Details
The transformation applied may be "none" or a user-supplied function name in quotation marks or any of the built-it transformations("arc_sine", "log_arc_sine", or "sqrt_arc_sine"). The "sqrt_arc_sine" has generally proven most effective. See also the vignette "Compare Relative Abundances Among Treatments."
Value
A list consisting of an OTU table with percentages, an OTU table with transformed data, and a vector of treatment groups.
Examples
{
data("its.root")
temp1 <- comp_prepare_phyloseq(its.root)
temp2 <- comp_prepare_otu_table(temp1$expt.taxon.pc,
grps = "Label",
transformation = "sqrt_arc_sine")
temp2
}
Prepare Phyloseq
Description
Prepares a phyloseq object for making comparisons of relative abundances among treatments.
Usage
comp_prepare_phyloseq(expt, taxrank = "Phylum", pc.filter = 0.01)
Arguments
expt |
Experiment level phyloseq object. |
taxrank |
Taxonomic rank for which to make comparisons. |
pc.filter |
Minimum percentage of total counts to include rank in result. |
Details
For both returned phyloseq objects, the OTU table has been filtered to include only OTUs initially present at >= pc.filter times the original total count and only taxrank is included in the taxonomy table. For the second object in the list, the OTU table has been transformed to percentages of the total counts per sample. See also the vignette "Compare Relative Abundances Among Treatments."
Value
A list of two modified experiment level phyloseq objects.
Examples
{
data(its.root)
temp1 <- comp_prepare_phyloseq(its.root)
temp1
}
deg2rad
Description
Transform an angle in degrees to radians.
Usage
deg2rad(x)
Arguments
x |
Angle in degrees |
Value
Angle in radians.
Examples
deg2rad(90)
A 16S Experiment Level phyloseq Object
Description
Based on 16S rRNA gene sequences from Iowa loess soil
Usage
expt
Format
A phyloseq object with otu_table, sample_data, tax_table, phylogenetic tree and refseqs.
- Depth
0-20 cm, 60-80cm
- Site
HNC
- Slope
Top, Middle, Bottom
Format a taxon
Description
Formats a taxon so that it will be properly italicized in a ggpplot
Usage
format_taxon(x)
Arguments
x |
A string representing a phylum, class, etc. |
Details
If a taxon begins with an upper case letter followed by lower case letters and does not contain an underscore, it is wrapped in asterisks. If it begins with an upper case letter followed by lower case letters and contains an underscore, the portion before the underscore is wrapped in asterisks, the underscore removed and any letters following the underscore left alone. If a taxon contains all upper case letters or digits it is not a proper taxon and is left alone.
Value
The string with asterisks properly inserted.
Examples
format_taxon("UAB")
format_taxon("Pseudomonas_B")
format_taxon("Pseudomonas")
Genererate a Password
Description
Generates a random character string of specified length.
Usage
generate_password(n, type = "alpha_numeric")
Arguments
n |
Numberof characters in password. |
type |
c("alpha_numeric", "anything_else") |
Details
If type equals "alpha_numeric" (the default), only alpha-numeric characters are used to generate the password. If type does not equal "alpha_numeric" then at least one non-alpha-numeric symbol will be included in the password. In either case, the alpha characters used are both upper and lower case.
Value
A character string.
Examples
generate_password(8)
get_groups
Description
Assign treatment groups based on pairwise t-tests.
Usage
get_groups(ptt.rslt, alpha = 0.05, rm.subset = FALSE)
Arguments
ptt.rslt |
Result from the stats function pairwise.t.test. |
alpha |
Confidence level. |
rm.subset |
A logical; remove group subsets if true. |
Details
This function aids in making letter assignments as to which treatments are significantly different. Also returns a square matrix of alpha values for all pairwise differences. This square matrix can serve as input to the multcompLetters function of the multcompView package which provides letter assignments. If rm.subset is FALSE, then groups such as {A,B} and {A, B, C} may be reported. This is redundant in the sense the {A, B} is a subset of {A, B, C}. In this case if rm.subset is FALSE, the group {A, B} is not reported.
Value
A list consisting of groups of treatment groups that are not significantly different and a matrix of p values.
See Also
make_letter_assignments
Examples
attach(airquality)
Month <- factor(Month, labels = month.abb[5:9])
ptt.rslt <- pairwise.t.test(Ozone, Month)
detach(airquality)
get_groups(ptt.rslt, alpha = 0.05, rm.subset = FALSE)
Get ggplot Plot Limits
Description
Gets the ranges for the width and height of a ggplot panel.
Usage
get_plot_limits(plot)
Arguments
plot |
A plot created with ggplot2 |
Details
Sometimes when adding text to a ggplot, the text is cutoff if it extends beyond the limits of plot panel. This function provides information enabling the user to extend the panel limits so that the text is not cutoff.
Value
A list: xmin, xmax, ymin, ymax giving the coordinates of the limits of a ggplot panel.
Examples
library(ggplot2)
data(iris)
plt <- ggplot(data=iris, aes(x=Species, y=Petal.Length)) + geom_boxplot()
get_plot_limits(plt)
Calculate Good's Coverage
Description
Calculates Good's coverage from a community data matrix with samples as rows and OTUs as columns.
Usage
goods(com)
Arguments
com |
a vegan compatible community data matrix. |
Value
A table with the headings number of singletons, number of sequences, and Good's coverage for each sample in rows.
References
Good, I. J. 1953. The Population Frequencies of Species and the Estimation of Population Parameters. Biometrika 40:237-264.
Examples
{
data(dune, package = "vegan")
goods(dune)
}
Hash DNA sequences
Description
Renames sequences in a multi-fasta file with the MD5 hash of each sequence.
Usage
hash_dna_seqs(seqs)
Arguments
seqs |
A list of DNA sequences |
Details
Taxa names for the ASV table returned by the R version of DADA2 are the sequences themselves. This function renames them with the MD5 hash of each sequences so that the results are directly comparable to QIIME2/DADA2 results.
Value
DNA sequences renamed with their hashes.
Examples
seqs <- ">ASV1
AGTTTGATCATGGCTCAGATTGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAACGGTAACAGGAAG"
hash_dna_seqs(seqs)
An ITS Experiment Level phyloseq Object
Description
Based on ITS2 sequences amplified from corn roots.
Usage
its.root
Format
A phyloseq object with otu_table, sample_data and tax_table. The sample_data variables are:
- P
Phosporous level, H or L
- Genotype
One of three: 2, 3, and C
- Label
A code for treatments: 2HR, 2LR, 3HR, 3LR, CHR, CLR
log_arc_sine
Description
Transform a percentage to the log of its arc sine.
Usage
log_arc_sine(x)
Arguments
x |
A percentage. |
Value
The common logarithm of the arcsine transformation of x.
Examples
log_arc_sine(x = 30.1)
Make Multiple Comparisons on Transformed Data
Description
Makes multiple comparisons of the relative abundances of taxa between treatment groups using the pairwise.t.test. Data may be transformed by a user supplied function. Three are included in this package.
Usage
make_comparisons(
expt,
taxrank = "Phylum",
grps = "Treatment",
transformation = "none",
pc.filter = 0.01,
p.adjust.method = "BH",
pool.sd = FALSE
)
Arguments
expt |
Experiment level phyloseq object. |
taxrank |
Rank for which to make comparisons. |
grps |
Factor in sample data for which to make comparisons. |
transformation |
Transformation function to use. |
pc.filter |
Minimum percentage of total counts to include rank in result. |
p.adjust.method |
Adjustment method for multiple comparisons. |
pool.sd |
Logical, whether or not to pool standard deviations. |
Details
This is essentially a wrapper around the functions comp_prepare_phyloseq(), comp_prepare_otu_table(), comp_means_sd(), comp_make_f_tests(), comp_comparisons() and comp_assemble(). Transformation may be "none" or a user-supplied function name in quotation marks or any of the built-it transformations ("arc_sine", "log_arc_sine", or "sqrt_arc_sine"). The "sqrt_arc_sine" has generally proven most effective.
Value
A list consisting of three data frames and a vector of treatment groups. The first data frame is comparison.table.giving for each row (taxon) the mean relative abundance, standard deviation, F statistic with significance indicated by asterisks, and then for each treatment group the mean+/-sd with a CLD indicating group membership. The second data frame is taxa.pc giving for each row (taxon) the relative abundance of each treatment group. The third data frame is taxa.pc.transformed. It is like the second data frame but the data has been transformed using the specified function.
See Also
arc_sine, log_arc_sine, sqrt_arc_sine, check_var
Examples
{
data(its.root)
make_comparisons(its.root,
taxrank = "Phylum",
grps = "Label",
transformation = "sqrt_arc_sine",
pc.filter = 0.01,
p.adjust.method ="BH",
pool.sd = TRUE)
}
Make Letter Assignments
Description
Makes letter assignments for treatment groups that are not significantly different.
Usage
make_letter_assignments(ptt.rslt, significance = 0.05)
Arguments
ptt.rslt |
Output from the pairwise.t.test function. |
significance |
Alpha level to be declared a significant difference. |
Details
Letter assignments are made using Piepho's algorithm.
Value
A named vector of letter assignments. Names are treatment groups.
References
Piepho, H. P. 2004. An algorithm for a letter-based representation of all-pairwise comparisons. Journal of Computational and Graphical Statistics **13**:456-466.
Examples
{
data(iris, package = "datasets")
ptt.rslt <- with(iris, pairwise.t.test(Petal.Length, Species, pool.sd = FALSE))
make_letter_assignments(ptt.rslt, significance = 0.05)
}
Merge Two Data Frames
Description
Merge two data frames by their row names.
Usage
merge_2_frames(one, two)
Arguments
one |
A data frame. |
two |
A second data frame. |
Details
Merges data frames by common row names. This function differs from merge.data.frames in that the merged data frame returned has row names and not a new column of the row names.
Value
A merged data frame.
Examples
{
common_rows <- paste0("ID_", 1:5)
df1 <- data.frame(
Value_A = runif(5),
Category = sample(c("X", "Y"), 5, replace = TRUE),
row.names = common_rows
)
df2 <- data.frame(
Value_B = rnorm(5),
Flag = sample(c(TRUE, FALSE), 5, replace = TRUE),
row.names = common_rows
)
merge_2_frames(df1, df2)
}
Make Ordination Axis Labels
Description
Makes ordination axis labels that include, if appropriate, the % of the total variance explained by each axis.
Usage
ord_labels(ord)
Arguments
ord |
A vegan ordination object. |
Details
If there are no eigenvalues in ord, or if any eigenvalues are less than 0, each element of the vector returned has the form "DIMn" where n is the axis number. Otherwise, each element of the vector returned has the form Axisn xx.x % where Axis is taken from the vector of eigenvalues in ord if they are named or simply "DIM" if they are not, n is the number of the axis, and xx.x is the % of total variance explained by the axis.
For this function to work correctly, ord should be created in one of the following ways:
As an unconstrained ordination using vegan::rda. In this case the labels have the form PCAn xx.x %
As a PCoA made with stats::cmdscale. In this case the labels have the form DIMn xx.x %.
As a CA made with vegan::ca. In this case the labels have the form CAn xx.x %.
Value
A character vector, each element of which can be used to label the corresponding axis of an ordination plot.
See Also
Examples
{
# For PCA using rda:
data("dune", package = "vegan")
dune_hel <- vegan::decostand(dune, method = "hellinger")
pca <- vegan::rda(dune_hel)
print("For the PCA case:")
print(ord_labels(pca)[1:2])
cat("\n")
# For PCoA with negative eigenvalues
d <- vegan::vegdist(dune)
pcoa <- stats::cmdscale(d, k = nrow(dune)-1, eig = TRUE, add = FALSE)
print("For the PCoA case with negative eigenvalues:")
print(ord_labels(pcoa)[1:2])
cat("\n")
# For PCoA without negative eigenvalues
pcoa <- stats::cmdscale(d, k = nrow(dune)-1, eig = TRUE, add = TRUE)
print("For the PCoA case without negative eigenvalues:")
print(ord_labels(pcoa)[1:2])
cat("\n")
# For correspondence analysis
ca_ord <- vegan::ca(dune)
(ord_labels(ca_ord))
print("For the CA case:")
print(ord_labels(ca_ord))
}
Make PCA Axis Labels
Description
Makes PCA axis labels that include the
Usage
pca_labels(pca)
Arguments
pca |
Object containing the results of vegan's rda function. |
Details
Each element of the vector returned has the form "PCAn xx.x
Value
A character vector, each element of which can be used to label the corresponding axis of a PCA plot.
Examples
{
data(dune, package = "vegan")
dune_hel <- vegan::decostand(dune, method = "hellinger")
pca_ord <- vegan::rda(dune_hel)
pca_labels(pca_ord)
}
Permutations
Description
Retuns the number of permutaions of n things taken r at a time.
Usage
perm(n, r, repetition = FALSE)
Arguments
n |
Total number of items. |
r |
Number of items drawn. |
repetition |
A logical, whether or not repetitions are allowed. FALSE by default. |
Value
An integer giving how many ways m things can be drawn n at a time.
Examples
perm(10, 5)
perm(10, 5, repetition = TRUE)
A Data File in Long Format
Description
Used in Case 3 of the vignette make_comparisons
Usage
plot_df
Format
A data file in long format used for a ggplot. The sample_data variables are:
- Treatment
A code for genotype (2, 3, or C), P level (H or L) and sample type (R)
- Family
One of the families in Gigasporaceae
- Percent
Percent of total counts for family and treatment combination.
Plot DADA2 Transition Stats
Description
Extracts a QIIME2/DADA2 transition stats file and returns a plot.
Usage
plot_transition_stats(trans_stats.qza)
Arguments
trans_stats.qza |
The transitions stats file output by QIIME2 DADA2 |
Details
Beginning with QIIME2 version 2025.7 the DADA2 plugin requires output of a compressed (qza) file of the transition stats. This function makes a ggplot plot from the data in that file. It is useful in determining how well DADA2 has corrected read errors.
Value
A ggplot of the transition probabilities
Examples
qza <- system.file("extdata", "base-transition-stats.qza", package = "QsRutils")
plot_transition_stats(qza)
Filter OTUs by Abundance
Description
Allows subsetting of a phyloseq object according to the relative abundance of OTUs in a minimal number of samples. Returns a logical vector of OTUs that are at least n% of the sequences in at least m samples.
Usage
prop_filter(x, n, m)
Arguments
x |
A phyloseq object. |
n |
Minimum percentage to keep OTU. |
m |
Minimum number of samples. |
Details
The functions creates a logical vector to be used in subsetting a phyloseq object according to the relative abundance of OTUs in a given number of samples. For example, if n = 1 and m = 2, then the OTUs to be kept must represent at least 1% of the sequences in at least 2 samples. The vector is then used as an argument to the phyloseq object 'prune_taxa'.
Value
A logical vector of OTUs to keep.
Examples
{
data("its.root")
prop_filter(x = its.root, n = 1, m = 5)
}
rad2deg
Description
Transform an angle in radians to degrees.
Usage
rad2deg(x)
Arguments
x |
Angle in radians |
Value
Angle in degrees.
Examples
rad2deg(pi * 0.5)
Make RDA Axis Labels
Description
Makes RDA axis labels that include the axis, that is by the constrained portion of the analysis.
Usage
rda_labels(rda)
Arguments
rda |
A constrained ordination object made with vegan::rda() or vegan::cca(). |
Details
Each element of the vector returned has the form "RDAn xx.x n is the number of the RDA axis and xx.x is the explained by the axis. The percent of total variance is for the constrained portion only.
Value
A character vector, each element of which can be used to label the corresponding axis of an RDA plot.
See Also
ord_labels()
Examples
{
# Using vegan::rda()
data(dune, package = "vegan")
data(dune.env, package = "vegan")
dune.Manure <- vegan::rda(dune ~ Manure, dune.env)
print("For the rda case:")
print(rda_labels(dune.Manure)[1:2])
cat("\n")
# Using vegan::cca()
data(varespec, package = "vegan")
data(varechem, package = "vegan")
vare.cca <- vegan::cca(varespec ~ Al + P*(K + Baresoil), data=varechem)
print("For the cca case:")
print(rda_labels(vare.cca)[1:2])
}
Root Tree in phyloseq Object
Description
Roots an unrooted tree in a phyloseq object
Usage
root_phyloseq_tree(phylo)
Arguments
phylo |
A phyloseq object containing an unrooted tree |
Details
The tree is rooted by the longest terminal branch.
Value
The same phyloseq object with a rooted tree
Examples
{
data("expt")
expt.rooted <- root_phyloseq_tree(expt)
ape::is.rooted(phyloseq::phy_tree(expt.rooted))
}
Fast rarefaction via sequential hypergeometric sampling
Description
For each row, taxa counts are subsampled without replacement to exactly
depth total counts using the sequential hypergeometric algorithm:
iterate over taxa and draw from Hypergeometric(m = count_j,
n = remaining_pool - count_j, k = remaining_depth), updating both
running totals after each taxon. This is O(rows * cols) and produces
the same distribution as drawing depth items uniformly at random
without replacement from the pool of individuals.
Usage
rrarefy_cpp(otu, depth)
Arguments
otu |
Numeric matrix of OTU counts (samples x taxa). Row sums must
be >= |
depth |
Integer rarefaction depth. |
Value
Integer matrix of rarefied counts with the same dimensions and
dimnames as otu.
Standard error
Description
Calculates the standard error of a numeric vector
Usage
se(x)
Arguments
x |
A numeric vector |
Details
NA values are ignored.
Value
The standard error of the numeric vector
Examples
x <- c(1,2,3,4,5, NA)
se(x)
sqrt_arc_sine
Description
Transform a percentage to the square root of its arc sine.
Usage
sqrt_arc_sine(x)
Arguments
x |
A percentage. |
Value
The square root of the arcsine transformation of x.
Examples
sqrt_arc_sine(30.1)
srs_p
Description
Normalize sample counts using scaling with ranked subsampling (SRS)
Usage
srs_p(p)
Arguments
p |
a phyloseq object containing an OTU table |
Details
This is an alternative to "rarefying" an OTU table to a constant sample size. The phyloseq object submitted must be pruned to the desired sample size before using this function.
Value
a phyloseq object including an OTU table, all sample sums equal.
Author(s)
John Quensen
References
Beule L, Karlovsky P. Improved normalization of species count data in ecology by scaling with ranked subsampling (SRS): application to microbial communities. PeerJ. 2020;8:e9593.
Examples
{
data("expt")
print("Sample sums before srs_p")
print(phyloseq::sample_sums(expt))
cat("\n")
expt_srs <- srs_p(p = expt)
print("Sample sums after srs_p")
print(phyloseq::sample_sums(expt_srs))
}
Subset physeq by refseq lengths
Description
Subset physeq by refseq lengths
Usage
subset_by_refseq_lengths(p, min_len = 252, max_len = 255)
Arguments
p |
An experiment level phyloseq object with reference sequences |
min_len |
The minimum reference sequence length to keep |
max_len |
The maximum references sequences length to keep |
Details
Sometimes, due to sequencing errors, reference sequences have lengths less than and/or greater than the expected amplicon length. This function offers an easy way of removing such extraneous reference sequences from an experiment level phyloseq object. The default values for min_len and max_len are fro the V4 region of the 16S rRNA gene.
Value
A phyloseq object filtered to have reference sequences within the length range specified.
Examples
{
data("expt")
print("Refseq length range before subsetting:")
expt@refseq@ranges@width |>
summary() |>
print()
cat("\n")
expt_filt <- subset_by_refseq_lengths(p = expt,
min_len = 400,
max_len = 420)
print("Refseq length range after subsetting:")
expt_filt@refseq@ranges@width |>
summary() |>
print()
}
Subset Distance Matrix
Description
Subsets a distance matrix.
Usage
subset_dist(physeq, d.matrix)
Arguments
physeq |
An experiment level phyloseq object. |
d.matrix |
A distance matrix. |
Details
Some distance matrices take a long time to calculate for large data sets. This is especially true of unifrac and generalized unifrac distances calculated by GUniFracs. If distances are first calculated from data in a large experiment level phyloseq object and then it is desired to perform PERMANOVA (with adonis) on a subset of that object, this function provides a means of sub-setting the distance matrix so that it does not have to be calculated again for the subset data. The arguments are the distance matrix for the original phyloseq object and the smaller phyloseq object subset from the original.
Value
A distance matrix of smaller dimensions.
References
Chen J, Bittinger K, Charlson ES et al. (2012) Associating microbiome composition with environmental covariates using generalized UniFrac distances. Bioinformatics, 28, 2106-2113.
Examples
{
otu <- veganotu(its.root)
d <- vegan::vegdist(otu)
print("Before subsetting:")
print(dim(as.matrix(d)))
cat("\n")
p <- phyloseq::subset_samples(its.root, P_Location == "LR")
d_sub <- subset_dist(p, d)
print("After subsetting:")
print(dim(as.matrix(d_sub)))
}
Standardize a Phyloseq OTU Table
Description
Applies any vegan decostand standardization method to a phyloseq OTU table.
Usage
vegan_stand(physeq, method = "hellinger", ...)
Arguments
physeq |
A phyloseq object containing at least an OTU table. |
method |
A method from vegan's decostand function. |
... |
Other parameters passed to vegan's decostand function. |
Value
Returns a phyloseq object with transformed OTU table.
Examples
{
data("expt")
print("Before standardization, first 5 rows and columns:")
print(veganotu(expt)[1:5, 1:5])
cat("\n")
expt_mod <- vegan_stand(expt, method = "hellinger")
print("After standardization, first 5 rows and columns:")
print(veganotu(expt_mod)[1:5, 1:5])
}
Extract Vegan OTU Table
Description
Extracts a vegan compatible OTU table from a phyloseq object.
Usage
veganotu(physeq)
Arguments
physeq |
A phyloseq object contaning at least an OTU table. |
Value
A matrix with samples in rows and OTUs in columns.
Examples
{
data("expt")
# Show only first 5 columns and rows:
veganotu(physeq = expt)[1:5, 1:5]
}
Extract Sample Data Table
Description
Extracts a sample data table from a phyloseq object.
Usage
vegansam(physeq)
Arguments
physeq |
A phyloseq object containing sample_data. |
Value
A data frame with samples in rows and factors and/or variables in columns.
Examples
{
data("expt")
vegansam(physeq = expt)
}