Type: | Package |
Title: | Read/Write, Analyze, and Visualize 'BIOM' Data |
Version: | 2.2.1 |
Date: | 2025-06-27 |
Description: | A toolkit for working with Biological Observation Matrix ('BIOM') files. Read/write all 'BIOM' formats. Compute rarefaction, alpha diversity, and beta diversity (including 'UniFrac'). Summarize counts by taxonomic level. Subset based on metadata. Generate visualizations and statistical analyses. CPU intensive operations are coded in C for speed. |
URL: | https://cmmr.github.io/rbiom/, https://github.com/cmmr/rbiom |
BugReports: | https://github.com/cmmr/rbiom/issues |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
LazyData: | true |
Depends: | R (≥ 4.2.0) |
RoxygenNote: | 7.3.2 |
Config/Needs/website: | rmarkdown, phyloseq, npregfast, withr |
Config/testthat/edition: | 3 |
Imports: | methods, mgcv, stats, utils, ape, dplyr, emmeans, fillpattern, ggbeeswarm, ggnewscale, ggplot2, ggrepel, ggtext, jsonlite, magrittr, parallelly, patchwork, pillar, plyr, readr, readxl, slam, vegan |
Suggests: | cli, crayon, ggdensity, glue, labeling, lifecycle, openxlsx, optparse, pkgconfig, prettycode, R6, rlang, scales, testthat, tibble, tsne, uwot |
NeedsCompilation: | yes |
Packaged: | 2025-06-27 19:52:14 UTC; Daniel |
Author: | Daniel P. Smith |
Maintainer: | Daniel P. Smith <dansmith01@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2025-06-27 20:10:02 UTC |
rbiom: Read/Write, Transform, and Summarize BIOM Data
Description
A toolkit for working with Biological Observation Matrix (BIOM) files. Features include reading/writing all BIOM formats, rarefaction, alpha diversity, beta diversity (including UniFrac), summarizing counts by taxonomic level, and sample subsetting. Standalone functions for reading, writing, and subsetting phylogenetic trees are also provided. All CPU intensive operations are encoded in C with multi-thread support.
Multithreading
Many rbiom functions support multithreading:
The default behavior of these function is to run on as many cores as are
available in the local compute environment. If you wish to limit the number
of simultaneous threads, set RcppParallel
's numThreads
option.
For instance:
RcppParallel::setThreadOptions(numThreads = 4)
Author(s)
Maintainer: Daniel P. Smith dansmith01@gmail.com (ORCID)
Other contributors:
Alkek Center for Metagenomics and Microbiome Research [copyright holder, funder]
See Also
Useful links:
Report bugs at https://github.com/cmmr/rbiom/issues
Visualize alpha diversity with boxplots.
Description
Visualize alpha diversity with boxplots.
Usage
adiv_boxplot(
biom,
x = NULL,
adiv = "Shannon",
layers = "x",
stat.by = x,
facet.by = NULL,
colors = TRUE,
shapes = TRUE,
patterns = FALSE,
flip = FALSE,
stripe = NULL,
ci = "ci",
level = 0.95,
p.adj = "fdr",
outliers = NULL,
xlab.angle = "auto",
p.label = 0.05,
transform = "none",
caption = TRUE,
...
)
Arguments
biom |
An rbiom object, such as from |
x |
A categorical metadata column name to use for the x-axis. Or
|
adiv |
Alpha diversity metric(s) to use. Options are: |
layers |
One or more of
|
stat.by |
Dataset field with the statistical groups. Must be
categorical. Default: |
facet.by |
Dataset field(s) to use for faceting. Must be categorical.
Default: |
colors |
How to color the groups. Options are:
See "Aesthetics" section below for additional information.
Default: |
shapes |
Shapes for each group.
Options are similar to |
patterns |
Patterns for each group.
Options are similar to |
flip |
Transpose the axes, so that taxa are present as rows instead
of columns. Default: |
stripe |
Shade every other x position. Default: same as flip |
ci |
How to calculate min/max of the crossbar,
errorbar, linerange, and pointrange layers.
Options are: |
level |
The confidence level for calculating a confidence interval.
Default: |
p.adj |
Method to use for multiple comparisons adjustment of
p-values. Run |
outliers |
Show boxplot outliers? |
xlab.angle |
Angle of the labels at the bottom of the plot.
Options are |
p.label |
Minimum adjusted p-value to display on the plot with a bracket.
If a numeric vector with more than one value is
provided, they will be used as breaks for asterisk notation.
Default: |
transform |
Transformation to apply. Options are:
|
caption |
Add methodology caption beneath the plot.
Default: |
... |
Additional parameters to pass along to ggplot2 functions.
Prefix a parameter name with a layer name to pass it to only that
layer. For instance, |
Value
A ggplot2
plot. The computed data points, ggplot2 command,
stats table, and stats table commands are available as $data
,
$code
, $stats
, and $stats$code
, respectively.
Aesthetics
All built-in color palettes are colorblind-friendly. The available
categorical palette names are: "okabe"
, "carto"
, "r4"
,
"polychrome"
, "tol"
, "bright"
, "light"
,
"muted"
, "vibrant"
, "tableau"
, "classic"
,
"alphabet"
, "tableau20"
, "kelly"
, and "fishy"
.
Patterns are added using the fillpattern R package. Options are "brick"
,
"chevron"
, "fish"
, "grid"
, "herringbone"
, "hexagon"
, "octagon"
,
"rain"
, "saw"
, "shingle"
, "rshingle"
, "stripe"
, and "wave"
,
optionally abbreviated and/or suffixed with modifiers. For example,
"hex10_sm"
for the hexagon pattern rotated 10 degrees and shrunk by 2x.
See fillpattern::fill_pattern()
for complete documentation of options.
Shapes can be given as per base R - numbers 0 through 17 for various shapes, or the decimal value of an ascii character, e.g. a-z = 65:90; A-Z = 97:122 to use letters instead of shapes on the plot. Character strings may used as well.
See Also
Other alpha_diversity:
adiv_corrplot()
,
adiv_stats()
,
adiv_table()
Other visualization:
adiv_corrplot()
,
bdiv_boxplot()
,
bdiv_corrplot()
,
bdiv_heatmap()
,
bdiv_ord_plot()
,
plot_heatmap()
,
rare_corrplot()
,
rare_multiplot()
,
rare_stacked()
,
stats_boxplot()
,
stats_corrplot()
,
taxa_boxplot()
,
taxa_corrplot()
,
taxa_heatmap()
,
taxa_stacked()
Examples
library(rbiom)
biom <- rarefy(hmp50)
adiv_boxplot(biom, x="Body Site", stat.by="Body Site")
adiv_boxplot(biom, x="Sex", stat.by="Body Site", adiv=c("otu", "shan"), layers = "bld")
adiv_boxplot(biom, x="body", stat.by="sex", adiv=".all", flip=TRUE, layers="p")
# Each plot object includes additional information.
fig <- adiv_boxplot(biom, x="Body Site")
## Computed Data Points -------------------
fig$data
## Statistics Table -----------------------
fig$stats
## ggplot2 Command ------------------------
fig$code
Visualize alpha diversity with scatterplots and trendlines.
Description
Visualize alpha diversity with scatterplots and trendlines.
Usage
adiv_corrplot(
biom,
x,
adiv = "Shannon",
layers = "tc",
stat.by = NULL,
facet.by = NULL,
colors = TRUE,
shapes = TRUE,
test = "emmeans",
fit = "gam",
at = NULL,
level = 0.95,
p.adj = "fdr",
transform = "none",
alt = "!=",
mu = 0,
caption = TRUE,
check = FALSE,
...
)
Arguments
biom |
An rbiom object, such as from |
x |
Dataset field with the x-axis values. Equivalent to the |
adiv |
Alpha diversity metric(s) to use. Options are: |
layers |
One or more of
|
stat.by |
Dataset field with the statistical groups. Must be
categorical. Default: |
facet.by |
Dataset field(s) to use for faceting. Must be categorical.
Default: |
colors |
How to color the groups. Options are:
See "Aesthetics" section below for additional information.
Default: |
shapes |
Shapes for each group.
Options are similar to |
test |
Method for computing p-values: |
fit |
How to fit the trendline. |
at |
Position(s) along the x-axis where the means or slopes should be
evaluated. Default: |
level |
The confidence level for calculating a confidence interval.
Default: |
p.adj |
Method to use for multiple comparisons adjustment of
p-values. Run |
transform |
Transformation to apply. Options are:
|
alt |
Alternative hypothesis direction. Options are |
mu |
Reference value to test against. Default: |
caption |
Add methodology caption beneath the plot.
Default: |
check |
Generate additional plots to aid in assessing data normality.
Default: |
... |
Additional parameters to pass along to ggplot2 functions.
Prefix a parameter name with a layer name to pass it to only that
layer. For instance, |
Value
A ggplot2
plot. The computed data points, ggplot2 command,
stats table, and stats table commands are available as $data
,
$code
, $stats
, and $stats$code
, respectively.
Aesthetics
All built-in color palettes are colorblind-friendly. The available
categorical palette names are: "okabe"
, "carto"
, "r4"
,
"polychrome"
, "tol"
, "bright"
, "light"
,
"muted"
, "vibrant"
, "tableau"
, "classic"
,
"alphabet"
, "tableau20"
, "kelly"
, and "fishy"
.
Shapes can be given as per base R - numbers 0 through 17 for various shapes, or the decimal value of an ascii character, e.g. a-z = 65:90; A-Z = 97:122 to use letters instead of shapes on the plot. Character strings may used as well.
See Also
Other alpha_diversity:
adiv_boxplot()
,
adiv_stats()
,
adiv_table()
Other visualization:
adiv_boxplot()
,
bdiv_boxplot()
,
bdiv_corrplot()
,
bdiv_heatmap()
,
bdiv_ord_plot()
,
plot_heatmap()
,
rare_corrplot()
,
rare_multiplot()
,
rare_stacked()
,
stats_boxplot()
,
stats_corrplot()
,
taxa_boxplot()
,
taxa_corrplot()
,
taxa_heatmap()
,
taxa_stacked()
Examples
library(rbiom)
p <- adiv_corrplot(babies, "age", stat.by = "deliv", fit = "gam")
p
p$stats
p$code
Create a matrix of samples x alpha diversity metrics.
Description
Create a matrix of samples x alpha diversity metrics.
Usage
adiv_matrix(biom, adiv = ".all", transform = "none", cpus = NULL)
Arguments
biom |
An rbiom object, such as from |
adiv |
Alpha diversity metric(s) to use. Options are: |
transform |
Transformation to apply. Options are:
|
cpus |
The number of CPUs to use. Set to |
Value
A numeric matrix with samples as rows. The first column is
Depth. Remaining columns are the alpha diversity metric names
given by adiv
: one or more of OTUs, Shannon,
Chao1, Simpson, and InvSimpson.
Examples
library(rbiom)
biom <- slice_head(hmp50, n = 5)
adiv_matrix(biom)
Test alpha diversity for associations with metadata.
Description
A convenience wrapper for adiv_table()
+ stats_table()
.
Usage
adiv_stats(
biom,
regr = NULL,
stat.by = NULL,
adiv = "Shannon",
split.by = NULL,
transform = "none",
test = "emmeans",
fit = "gam",
at = NULL,
level = 0.95,
alt = "!=",
mu = 0,
p.adj = "fdr"
)
Arguments
biom |
An rbiom object, such as from |
regr |
Dataset field with the x-axis (independent; predictive)
values. Must be numeric. Default: |
stat.by |
Dataset field with the statistical groups. Must be
categorical. Default: |
adiv |
Alpha diversity metric(s) to use. Options are: |
split.by |
Dataset field(s) that the data should be split by prior to
any calculations. Must be categorical. Default: |
transform |
Transformation to apply. Options are:
|
test |
Method for computing p-values: |
fit |
How to fit the trendline. |
at |
Position(s) along the x-axis where the means or slopes should be
evaluated. Default: |
level |
The confidence level for calculating a confidence interval.
Default: |
alt |
Alternative hypothesis direction. Options are |
mu |
Reference value to test against. Default: |
p.adj |
Method to use for multiple comparisons adjustment of
p-values. Run |
Value
A tibble data.frame with fields from the table below. This tibble
object provides the $code
operator to print the R code used to generate
the statistics.
Field | Description |
.mean | Estimated marginal mean. See emmeans::emmeans() . |
.mean.diff | Difference in means. |
.slope | Trendline slope. See emmeans::emtrends() . |
.slope.diff | Difference in slopes. |
.h1 | Alternate hypothesis. |
.p.val | Probability that null hypothesis is correct. |
.adj.p | .p.val after adjusting for multiple comparisons. |
.effect.size | Effect size. See emmeans::eff_size() . |
.lower | Confidence interval lower bound. |
.upper | Confidence interval upper bound. |
.se | Standard error. |
.n | Number of samples. |
.df | Degrees of freedom. |
.stat | Wilcoxon or Kruskal-Wallis rank sum statistic. |
.t.ratio | .mean / .se |
.r.sqr | Percent of variation explained by the model. |
.adj.r | .r.sqr , taking degrees of freedom into account. |
.aic | Akaike Information Criterion (predictive models). |
.bic | Bayesian Information Criterion (descriptive models). |
.loglik | Log-likelihood goodness-of-fit score. |
.fit.p | P-value for observing this fit by chance. |
See Also
Other alpha_diversity:
adiv_boxplot()
,
adiv_corrplot()
,
adiv_table()
Other stats_tables:
bdiv_stats()
,
distmat_stats()
,
stats_table()
,
taxa_stats()
Examples
library(rbiom)
biom <- rarefy(hmp50)
adiv_stats(biom, stat.by = "Sex")[,1:6]
adiv_stats(biom, stat.by = "Sex", split.by = "Body Site")[,1:6]
adiv_stats(biom, stat.by = "Body Site", test = "kruskal")
Calculate the alpha diversity of each sample.
Description
Calculate the alpha diversity of each sample.
Usage
adiv_table(
biom,
adiv = "Shannon",
md = ".all",
transform = "none",
cpus = NULL
)
Arguments
biom |
An rbiom object, such as from |
adiv |
Alpha diversity metric(s) to use. Options are: |
md |
Dataset field(s) to include in the output data frame, or |
transform |
Transformation to apply. Options are:
|
cpus |
The number of CPUs to use. Set to |
Value
A data frame of alpha diversity values.
Each combination of sample/depth/adiv
has its own row.
Column names are .sample, .depth, .adiv,
and .diversity, followed by any metadata fields requested by
md
.
See Also
Other alpha_diversity:
adiv_boxplot()
,
adiv_corrplot()
,
adiv_stats()
Examples
library(rbiom)
# Subset to 10 samples.
biom <- slice(hmp50, 1:10)
adiv_table(biom)
biom <- rarefy(biom)
adiv_table(biom, md = NULL)
Convert an rbiom object to a base R list.
Description
Convert an rbiom object to a base R list.
Usage
## S3 method for class 'rbiom'
as.list(x, ...)
Arguments
x |
An rbiom object, such as from |
... |
Not used. |
Value
A list with names
c('counts', 'metadata', 'taxonomy', 'tree', 'sequences', 'id', 'comment', 'date', 'generated_by')
.
See Also
Other conversion:
as.matrix.rbiom()
Convert an rbiom object to a simple count matrix.
Description
Identical to running as.matrix(biom$counts)
.
Usage
## S3 method for class 'rbiom'
as.matrix(x, ...)
Arguments
x |
An rbiom object, such as from |
... |
Not used. |
Value
A base R matrix with OTUs as rows and samples as columns.
See Also
Other conversion:
as.list.rbiom()
Examples
library(rbiom)
as.matrix(hmp50)[1:5,1:5]
Convert a variety of data types to an rbiom object.
Description
Construct an rbiom object. The returned object is an R6 reference class.
Use b <- a$clone()
to create copies, not b <- a
.
Usage
as_rbiom(biom, ...)
Arguments
biom |
Object which can be coerced to an rbiom-class object. For example:
|
... |
Properties to overwrite in biom: |
Value
An rbiom object.
Examples
library(rbiom)
# create a simple matrix ------------------------
mtx <- matrix(
data = floor(runif(24) * 1000),
nrow = 6,
dimnames = list(paste0("OTU", 1:6), paste0("Sample", 1:4)) )
mtx
# and some sample metadata ----------------------
df <- data.frame(
.sample = paste0("Sample", 1:4),
treatment = c("A", "B", "A", "B"),
days = c(12, 3, 7, 8) )
# convert data set to rbiom ---------------------
biom <- as_rbiom(mtx, metadata = df, id = "My BIOM")
biom
Longitudinal Stool Samples from Infants (n = 2,684)
Description
Longitudinal Stool Samples from Infants (n = 2,684)
Usage
babies
Format
An rbiom object with 2,684 samples. Includes metadata and taxonomy.
- Subject ID -
ID1, ID2, ..., ID12
- Sex -
Male or Female
- Age (days) -
1 - 266
- Child's diet -
"Breast milk", "Breast milk and formula", or "Formula"
- Sample collection -
"Frozen upon collection" or "Stored in alcohol"
- Antibiotic exposure -
Yes or No
- Antifungal exposure -
Yes or No
- Delivery mode -
Cesarean or Vaginal
- Solid food introduced (Age) -
116 - 247
Source
https://www.nature.com/articles/s41467-018-04641-7 and doi:10.1038/s41467-017-01973-8
Visualize BIOM data with boxplots.
Description
Visualize BIOM data with boxplots.
Usage
bdiv_boxplot(
biom,
x = NULL,
bdiv = "Bray-Curtis",
layers = "x",
weighted = TRUE,
tree = NULL,
within = NULL,
between = NULL,
stat.by = x,
facet.by = NULL,
colors = TRUE,
shapes = TRUE,
patterns = FALSE,
flip = FALSE,
stripe = NULL,
ci = "ci",
level = 0.95,
p.adj = "fdr",
outliers = NULL,
xlab.angle = "auto",
p.label = 0.05,
transform = "none",
caption = TRUE,
...
)
Arguments
biom |
An rbiom object, such as from |
x |
A categorical metadata column name to use for the x-axis. Or
|
bdiv |
Beta diversity distance algorithm(s) to use. Options are:
|
layers |
One or more of
|
weighted |
Take relative abundances into account. When
|
tree |
A |
within , between |
Dataset field(s) for intra- or inter- sample
comparisons. Alternatively, dataset field names given elsewhere can
be prefixed with |
stat.by |
Dataset field with the statistical groups. Must be
categorical. Default: |
facet.by |
Dataset field(s) to use for faceting. Must be categorical.
Default: |
colors |
How to color the groups. Options are:
See "Aesthetics" section below for additional information.
Default: |
shapes |
Shapes for each group.
Options are similar to |
patterns |
Patterns for each group.
Options are similar to |
flip |
Transpose the axes, so that taxa are present as rows instead
of columns. Default: |
stripe |
Shade every other x position. Default: same as flip |
ci |
How to calculate min/max of the crossbar,
errorbar, linerange, and pointrange layers.
Options are: |
level |
The confidence level for calculating a confidence interval.
Default: |
p.adj |
Method to use for multiple comparisons adjustment of
p-values. Run |
outliers |
Show boxplot outliers? |
xlab.angle |
Angle of the labels at the bottom of the plot.
Options are |
p.label |
Minimum adjusted p-value to display on the plot with a bracket.
If a numeric vector with more than one value is
provided, they will be used as breaks for asterisk notation.
Default: |
transform |
Transformation to apply. Options are:
|
caption |
Add methodology caption beneath the plot.
Default: |
... |
Additional parameters to pass along to ggplot2 functions.
Prefix a parameter name with a layer name to pass it to only that
layer. For instance, |
Value
A ggplot2
plot. The computed data points, ggplot2 command,
stats table, and stats table commands are available as $data
,
$code
, $stats
, and $stats$code
, respectively.
Aesthetics
All built-in color palettes are colorblind-friendly. The available
categorical palette names are: "okabe"
, "carto"
, "r4"
,
"polychrome"
, "tol"
, "bright"
, "light"
,
"muted"
, "vibrant"
, "tableau"
, "classic"
,
"alphabet"
, "tableau20"
, "kelly"
, and "fishy"
.
Patterns are added using the fillpattern R package. Options are "brick"
,
"chevron"
, "fish"
, "grid"
, "herringbone"
, "hexagon"
, "octagon"
,
"rain"
, "saw"
, "shingle"
, "rshingle"
, "stripe"
, and "wave"
,
optionally abbreviated and/or suffixed with modifiers. For example,
"hex10_sm"
for the hexagon pattern rotated 10 degrees and shrunk by 2x.
See fillpattern::fill_pattern()
for complete documentation of options.
Shapes can be given as per base R - numbers 0 through 17 for various shapes, or the decimal value of an ascii character, e.g. a-z = 65:90; A-Z = 97:122 to use letters instead of shapes on the plot. Character strings may used as well.
See Also
Other beta_diversity:
bdiv_clusters()
,
bdiv_corrplot()
,
bdiv_heatmap()
,
bdiv_ord_plot()
,
bdiv_ord_table()
,
bdiv_stats()
,
bdiv_table()
,
distmat_stats()
Other visualization:
adiv_boxplot()
,
adiv_corrplot()
,
bdiv_corrplot()
,
bdiv_heatmap()
,
bdiv_ord_plot()
,
plot_heatmap()
,
rare_corrplot()
,
rare_multiplot()
,
rare_stacked()
,
stats_boxplot()
,
stats_corrplot()
,
taxa_boxplot()
,
taxa_corrplot()
,
taxa_heatmap()
,
taxa_stacked()
Examples
library(rbiom)
biom <- rarefy(hmp50)
bdiv_boxplot(biom, x="==Body Site", bdiv="UniFrac", stat.by="Body Site")
Cluster samples by beta diversity k-means.
Description
Cluster samples by beta diversity k-means.
Usage
bdiv_clusters(
biom,
bdiv = "Bray-Curtis",
weighted = TRUE,
normalized = TRUE,
tree = NULL,
k = 5,
...
)
Arguments
biom |
An rbiom object, such as from |
bdiv |
Beta diversity distance algorithm(s) to use. Options are:
|
weighted |
Take relative abundances into account. When
|
normalized |
Only changes the "Weighted UniFrac" calculation.
Divides result by the total branch weights. Default: |
tree |
A |
k |
Number of clusters. Default: |
... |
Passed on to |
Value
A numeric factor assigning samples to clusters.
See Also
Other beta_diversity:
bdiv_boxplot()
,
bdiv_corrplot()
,
bdiv_heatmap()
,
bdiv_ord_plot()
,
bdiv_ord_table()
,
bdiv_stats()
,
bdiv_table()
,
distmat_stats()
Other clustering:
taxa_clusters()
Examples
library(rbiom)
biom <- rarefy(hmp50)
biom$metadata$bray_cluster <- bdiv_clusters(biom)
pull(biom, 'bray_cluster')[1:10]
bdiv_ord_plot(biom, stat.by = "bray_cluster")
Visualize beta diversity with scatterplots and trendlines.
Description
Visualize beta diversity with scatterplots and trendlines.
Usage
bdiv_corrplot(
biom,
x,
bdiv = "Bray-Curtis",
layers = "tc",
weighted = TRUE,
tree = NULL,
within = NULL,
between = NULL,
stat.by = NULL,
facet.by = NULL,
colors = TRUE,
shapes = TRUE,
test = "emmeans",
fit = "gam",
at = NULL,
level = 0.95,
p.adj = "fdr",
transform = "none",
ties = "random",
seed = 0,
alt = "!=",
mu = 0,
caption = TRUE,
check = FALSE,
...
)
Arguments
biom |
An rbiom object, such as from |
x |
Dataset field with the x-axis values. Equivalent to the |
bdiv |
Beta diversity distance algorithm(s) to use. Options are:
|
layers |
One or more of
|
weighted |
Take relative abundances into account. When
|
tree |
A |
within , between |
Dataset field(s) for intra- or inter- sample
comparisons. Alternatively, dataset field names given elsewhere can
be prefixed with |
stat.by |
Dataset field with the statistical groups. Must be
categorical. Default: |
facet.by |
Dataset field(s) to use for faceting. Must be categorical.
Default: |
colors |
How to color the groups. Options are:
See "Aesthetics" section below for additional information.
Default: |
shapes |
Shapes for each group.
Options are similar to |
test |
Method for computing p-values: |
fit |
How to fit the trendline. |
at |
Position(s) along the x-axis where the means or slopes should be
evaluated. Default: |
level |
The confidence level for calculating a confidence interval.
Default: |
p.adj |
Method to use for multiple comparisons adjustment of
p-values. Run |
transform |
Transformation to apply. Options are:
|
ties |
When |
seed |
Random seed for permutations. Must be a non-negative integer.
Default: |
alt |
Alternative hypothesis direction. Options are |
mu |
Reference value to test against. Default: |
caption |
Add methodology caption beneath the plot.
Default: |
check |
Generate additional plots to aid in assessing data normality.
Default: |
... |
Additional parameters to pass along to ggplot2 functions.
Prefix a parameter name with a layer name to pass it to only that
layer. For instance, |
Value
A ggplot2
plot. The computed data points, ggplot2 command,
stats table, and stats table commands are available as $data
,
$code
, $stats
, and $stats$code
, respectively.
Aesthetics
All built-in color palettes are colorblind-friendly. The available
categorical palette names are: "okabe"
, "carto"
, "r4"
,
"polychrome"
, "tol"
, "bright"
, "light"
,
"muted"
, "vibrant"
, "tableau"
, "classic"
,
"alphabet"
, "tableau20"
, "kelly"
, and "fishy"
.
Shapes can be given as per base R - numbers 0 through 17 for various shapes, or the decimal value of an ascii character, e.g. a-z = 65:90; A-Z = 97:122 to use letters instead of shapes on the plot. Character strings may used as well.
See Also
Other beta_diversity:
bdiv_boxplot()
,
bdiv_clusters()
,
bdiv_heatmap()
,
bdiv_ord_plot()
,
bdiv_ord_table()
,
bdiv_stats()
,
bdiv_table()
,
distmat_stats()
Other visualization:
adiv_boxplot()
,
adiv_corrplot()
,
bdiv_boxplot()
,
bdiv_heatmap()
,
bdiv_ord_plot()
,
plot_heatmap()
,
rare_corrplot()
,
rare_multiplot()
,
rare_stacked()
,
stats_boxplot()
,
stats_corrplot()
,
taxa_boxplot()
,
taxa_corrplot()
,
taxa_heatmap()
,
taxa_stacked()
Examples
library(rbiom)
biom <- rarefy(hmp50)
bdiv_corrplot(biom, "Age", stat.by = "Sex", layers = "tcp")
Display beta diversities in an all vs all grid.
Description
Display beta diversities in an all vs all grid.
Usage
bdiv_heatmap(
biom,
bdiv = "Bray-Curtis",
weighted = TRUE,
tree = NULL,
tracks = NULL,
grid = "devon",
label = TRUE,
label_size = NULL,
rescale = "none",
clust = "complete",
trees = TRUE,
asp = 1,
tree_height = 10,
track_height = 10,
legend = "right",
title = TRUE,
xlab.angle = "auto",
underscores = FALSE,
...
)
Arguments
biom |
An rbiom object, such as from |
bdiv |
Beta diversity distance algorithm(s) to use. Options are:
|
weighted |
Take relative abundances into account. When
|
tree |
A |
tracks |
A character vector of metadata fields to display as tracks
at the top of the plot. Or, a list as expected by the |
grid |
Color palette name, or a list with entries for |
label |
Label the matrix rows and columns. You can supply a list
or logical vector of length two to control row labels and column
labels separately, for example
|
label_size |
The font size to use for the row and column labels. You
can supply a numeric vector of length two to control row label sizes
and column label sizes separately, for example
|
rescale |
Rescale rows or columns to all have a common min/max.
Options: |
clust |
Clustering algorithm for reordering the rows and columns by
similarity. You can supply a list or character vector of length two to
control the row and column clustering separately, for example
Default: |
trees |
Draw a dendrogram for rows (left) and columns (top). You can
supply a list or logical vector of length two to control the row tree
and column tree separately, for example
|
asp |
Aspect ratio (height/width) for entire grid.
Default: |
tree_height , track_height |
The height of the dendrogram or annotation
tracks as a percentage of the overall grid size. Use a numeric vector
of length two to assign |
legend |
Where to place the legend. Options are: |
title |
Plot title. Set to |
xlab.angle |
Angle of the labels at the bottom of the plot.
Options are |
underscores |
When parsing the tree, should underscores be kept as
is? By default they will be converted to spaces (unless the entire ID
is quoted). Default |
... |
Additional arguments to pass on to ggplot2::theme().
For example, |
Value
A ggplot2
plot. The computed data points and ggplot
command are available as $data
and $code
,
respectively.
Annotation Tracks
Metadata can be displayed as colored tracks above the heatmap. Common use cases are provided below, with more thorough documentation available at https://cmmr.github.io/rbiom .
## Categorical ---------------------------- tracks = "Body Site" tracks = list('Body Site' = "bright") tracks = list('Body Site' = c('Stool' = "blue", 'Saliva' = "green")) ## Numeric -------------------------------- tracks = "Age" tracks = list('Age' = "reds") ## Multiple Tracks ------------------------ tracks = c("Body Site", "Age") tracks = list('Body Site' = "bright", 'Age' = "reds") tracks = list( 'Body Site' = c('Stool' = "blue", 'Saliva' = "green"), 'Age' = list('colors' = "reds") )
The following entries in the track definitions are understood:
colors
-A pre-defined palette name or custom set of colors to map to.
range
-The c(min,max) to use for scale values.
label
-Label for this track. Defaults to the name of this list element.
side
-Options are
"top"
(default) or"left"
.na.color
-The color to use for
NA
values.bins
-Bin a gradient into this many bins/steps.
guide
-A list of arguments for guide_colorbar() or guide_legend().
All built-in color palettes are colorblind-friendly.
Categorical palette names: "okabe"
, "carto"
, "r4"
,
"polychrome"
, "tol"
, "bright"
, "light"
,
"muted"
, "vibrant"
, "tableau"
, "classic"
,
"alphabet"
, "tableau20"
, "kelly"
, and "fishy"
.
Numeric palette names: "reds"
, "oranges"
, "greens"
,
"purples"
, "grays"
, "acton"
, "bamako"
,
"batlow"
, "bilbao"
, "buda"
, "davos"
,
"devon"
, "grayC"
, "hawaii"
, "imola"
,
"lajolla"
, "lapaz"
, "nuuk"
, "oslo"
,
"tokyo"
, "turku"
, "bam"
, "berlin"
,
"broc"
, "cork"
, "lisbon"
, "roma"
,
"tofino"
, "vanimo"
, and "vik"
.
See Also
Other beta_diversity:
bdiv_boxplot()
,
bdiv_clusters()
,
bdiv_corrplot()
,
bdiv_ord_plot()
,
bdiv_ord_table()
,
bdiv_stats()
,
bdiv_table()
,
distmat_stats()
Other visualization:
adiv_boxplot()
,
adiv_corrplot()
,
bdiv_boxplot()
,
bdiv_corrplot()
,
bdiv_ord_plot()
,
plot_heatmap()
,
rare_corrplot()
,
rare_multiplot()
,
rare_stacked()
,
stats_boxplot()
,
stats_corrplot()
,
taxa_boxplot()
,
taxa_corrplot()
,
taxa_heatmap()
,
taxa_stacked()
Examples
library(rbiom)
# Keep and rarefy the 10 most deeply sequenced samples.
hmp10 <- rarefy(hmp50, n = 10)
bdiv_heatmap(hmp10, tracks=c("Body Site", "Age"))
bdiv_heatmap(hmp10, bdiv="uni", weighted=c(TRUE,FALSE), tracks="sex")
Ordinate samples and taxa on a 2D plane based on beta diversity distances.
Description
Ordinate samples and taxa on a 2D plane based on beta diversity distances.
Usage
bdiv_ord_plot(
biom,
bdiv = "Bray-Curtis",
ord = "PCoA",
weighted = TRUE,
layers = "petm",
stat.by = NULL,
facet.by = NULL,
colors = TRUE,
shapes = TRUE,
tree = NULL,
test = "adonis2",
seed = 0,
permutations = 999,
rank = -1,
taxa = 4,
p.top = Inf,
p.adj = "fdr",
unc = "singly",
caption = TRUE,
underscores = FALSE,
...
)
Arguments
biom |
An rbiom object, such as from |
bdiv |
Beta diversity distance algorithm(s) to use. Options are:
|
ord |
Method for reducing dimensionality. Options are:
Multiple/abbreviated values allowed. Default: |
weighted |
Take relative abundances into account. When
|
layers |
One or more of
|
stat.by |
The categorical or numeric metadata field over which statistics should be calculated. Required. |
facet.by |
Dataset field(s) to use for faceting. Must be categorical.
Default: |
colors |
How to color the groups. Options are:
See "Aesthetics" section below for additional information.
Default: |
shapes |
Shapes for each group.
Options are similar to |
tree |
A |
test |
Permutational test for accessing significance. Options are:
Abbreviations are allowed. Default: |
seed |
Random seed for permutations. Must be a non-negative integer.
Default: |
permutations |
Number of random permutations to use.
Default: |
rank |
What rank(s) of taxa to display. E.g. |
taxa |
Which taxa to display. An integer value will show the top n
most abundant taxa. A value 0 <= n < 1 will show any taxa with that
mean abundance or greater (e.g. |
p.top |
Only display taxa with the most significant differences in
abundance. If |
p.adj |
Method to use for multiple comparisons adjustment of
p-values. Run |
unc |
How to handle unclassified, uncultured, and similarly ambiguous taxa names. Options are:
Abbreviations are allowed. Default: |
caption |
Add methodology caption beneath the plot.
Default: |
underscores |
When parsing the tree, should underscores be kept as
is? By default they will be converted to spaces (unless the entire ID
is quoted). Default |
... |
Parameters for layer geoms (e.g. |
Value
A ggplot2
plot.
The computed sample coordinates and ggplot command
are available as $data
and $code
respectively.
If stat.by
is given, then $stats
and
$stats$code
are set.
If rank
is given, then $data$taxa_coords
,
$taxa_stats
, and $taxa_stats$code
are set.
See Also
Other beta_diversity:
bdiv_boxplot()
,
bdiv_clusters()
,
bdiv_corrplot()
,
bdiv_heatmap()
,
bdiv_ord_table()
,
bdiv_stats()
,
bdiv_table()
,
distmat_stats()
Other ordination:
bdiv_ord_table()
,
distmat_ord_table()
Other visualization:
adiv_boxplot()
,
adiv_corrplot()
,
bdiv_boxplot()
,
bdiv_corrplot()
,
bdiv_heatmap()
,
plot_heatmap()
,
rare_corrplot()
,
rare_multiplot()
,
rare_stacked()
,
stats_boxplot()
,
stats_corrplot()
,
taxa_boxplot()
,
taxa_corrplot()
,
taxa_heatmap()
,
taxa_stacked()
Examples
library(rbiom)
biom <- rarefy(hmp50)
bdiv_ord_plot(biom, layers="pemt", stat.by="Body Site", rank="g")
Calculate PCoA and other ordinations, including taxa biplots and statistics.
Description
The biplot parameters (taxa
, unc
, p.top
, and
p.adj
) only only have an effect when rank
is not NULL
.
Usage
bdiv_ord_table(
biom,
bdiv = "Bray-Curtis",
ord = "PCoA",
weighted = TRUE,
md = NULL,
k = 2,
stat.by = NULL,
split.by = NULL,
tree = NULL,
test = "adonis2",
seed = 0,
permutations = 999,
rank = NULL,
taxa = 6,
p.top = Inf,
p.adj = "fdr",
unc = "singly",
underscores = FALSE,
...
)
Arguments
biom |
An rbiom object, such as from |
bdiv |
Beta diversity distance algorithm(s) to use. Options are:
|
ord |
Method for reducing dimensionality. Options are:
Multiple/abbreviated values allowed. Default: |
weighted |
Take relative abundances into account. When
|
md |
Dataset field(s) to include in the output data frame, or |
k |
Number of ordination dimensions to return. Either |
stat.by |
The categorical or numeric metadata field over which statistics should be calculated. Required. |
split.by |
Dataset field(s) that the data should be split by prior to
any calculations. Must be categorical. Default: |
tree |
A |
test |
Permutational test for accessing significance. Options are:
Abbreviations are allowed. Default: |
seed |
Random seed for permutations. Must be a non-negative integer.
Default: |
permutations |
Number of random permutations to use.
Default: |
rank |
What rank(s) of taxa to compute biplot coordinates and
statistics for, or |
taxa |
Which taxa to display. An integer value will show the top n
most abundant taxa. A value 0 <= n < 1 will show any taxa with that
mean abundance or greater (e.g. |
p.top |
Only display taxa with the most significant differences in
abundance. If |
p.adj |
Method to use for multiple comparisons adjustment of
p-values. Run |
unc |
How to handle unclassified, uncultured, and similarly ambiguous taxa names. Options are:
Abbreviations are allowed. Default: |
underscores |
When parsing the tree, should underscores be kept as
is? By default they will be converted to spaces (unless the entire ID
is quoted). Default |
... |
Additional arguments to pass on to |
Value
A data.frame with columns .sample
, .weighted
,
.bdiv
, .ord
, .x
, .y
, and (optionally)
.z
. Any columns given by md
, split.by
, and
stat.by
are included as well.
If stat.by
is given, then $stats
and
$stats$code)
are set.
If rank
is given, then $taxa_coords
,
$taxa_stats
, and $taxa_stats$code
are set.
See Also
Other beta_diversity:
bdiv_boxplot()
,
bdiv_clusters()
,
bdiv_corrplot()
,
bdiv_heatmap()
,
bdiv_ord_plot()
,
bdiv_stats()
,
bdiv_table()
,
distmat_stats()
Other ordination:
bdiv_ord_plot()
,
distmat_ord_table()
Examples
library(rbiom)
ord <- bdiv_ord_table(hmp50, "bray", "pcoa", stat.by="Body Site", rank="g")
head(ord)
ord$stats
ord$taxa_stats
Test beta diversity for associations with metadata.
Description
A convenience wrapper for bdiv_table()
+ stats_table()
.
Usage
bdiv_stats(
biom,
regr = NULL,
stat.by = NULL,
bdiv = "Bray-Curtis",
weighted = TRUE,
tree = NULL,
within = NULL,
between = NULL,
split.by = NULL,
transform = "none",
test = "emmeans",
fit = "gam",
at = NULL,
level = 0.95,
alt = "!=",
mu = 0,
p.adj = "fdr"
)
Arguments
biom |
An rbiom object, such as from |
regr |
Dataset field with the x-axis (independent; predictive)
values. Must be numeric. Default: |
stat.by |
Dataset field with the statistical groups. Must be
categorical. Default: |
bdiv |
Beta diversity distance algorithm(s) to use. Options are:
|
weighted |
Take relative abundances into account. When
|
tree |
A |
within , between |
Dataset field(s) for intra- or inter- sample
comparisons. Alternatively, dataset field names given elsewhere can
be prefixed with |
split.by |
Dataset field(s) that the data should be split by prior to
any calculations. Must be categorical. Default: |
transform |
Transformation to apply. Options are:
|
test |
Method for computing p-values: |
fit |
How to fit the trendline. |
at |
Position(s) along the x-axis where the means or slopes should be
evaluated. Default: |
level |
The confidence level for calculating a confidence interval.
Default: |
alt |
Alternative hypothesis direction. Options are |
mu |
Reference value to test against. Default: |
p.adj |
Method to use for multiple comparisons adjustment of
p-values. Run |
Value
A tibble data.frame with fields from the table below. This tibble
object provides the $code
operator to print the R code used to generate
the statistics.
Field | Description |
.mean | Estimated marginal mean. See emmeans::emmeans() . |
.mean.diff | Difference in means. |
.slope | Trendline slope. See emmeans::emtrends() . |
.slope.diff | Difference in slopes. |
.h1 | Alternate hypothesis. |
.p.val | Probability that null hypothesis is correct. |
.adj.p | .p.val after adjusting for multiple comparisons. |
.effect.size | Effect size. See emmeans::eff_size() . |
.lower | Confidence interval lower bound. |
.upper | Confidence interval upper bound. |
.se | Standard error. |
.n | Number of samples. |
.df | Degrees of freedom. |
.stat | Wilcoxon or Kruskal-Wallis rank sum statistic. |
.t.ratio | .mean / .se |
.r.sqr | Percent of variation explained by the model. |
.adj.r | .r.sqr , taking degrees of freedom into account. |
.aic | Akaike Information Criterion (predictive models). |
.bic | Bayesian Information Criterion (descriptive models). |
.loglik | Log-likelihood goodness-of-fit score. |
.fit.p | P-value for observing this fit by chance. |
See Also
Other beta_diversity:
bdiv_boxplot()
,
bdiv_clusters()
,
bdiv_corrplot()
,
bdiv_heatmap()
,
bdiv_ord_plot()
,
bdiv_ord_table()
,
bdiv_table()
,
distmat_stats()
Other stats_tables:
adiv_stats()
,
distmat_stats()
,
stats_table()
,
taxa_stats()
Examples
library(rbiom)
biom <- rarefy(hmp50)
bdiv_stats(biom, stat.by = "Sex", bdiv = c("bray", "unifrac"))[,1:7]
biom <- subset(biom, `Body Site` %in% c('Saliva', 'Stool', 'Buccal mucosa'))
bdiv_stats(biom, stat.by = "Body Site", split.by = "==Sex")[,1:6]
Distance / dissimilarity between samples.
Description
Distance / dissimilarity between samples.
Usage
bdiv_table(
biom,
bdiv = "Bray-Curtis",
weighted = TRUE,
normalized = TRUE,
tree = NULL,
md = ".all",
within = NULL,
between = NULL,
delta = ".all",
transform = "none",
ties = "random",
seed = 0,
cpus = NULL
)
bdiv_matrix(
biom,
bdiv = "Bray-Curtis",
weighted = TRUE,
normalized = TRUE,
tree = NULL,
within = NULL,
between = NULL,
transform = "none",
ties = "random",
seed = 0,
cpus = NULL,
underscores = FALSE
)
bdiv_distmat(
biom,
bdiv = "Bray-Curtis",
weighted = TRUE,
normalized = TRUE,
tree = NULL,
within = NULL,
between = NULL,
transform = "none",
cpus = NULL
)
Arguments
biom |
An rbiom object, such as from |
bdiv |
Beta diversity distance algorithm(s) to use. Options are:
|
weighted |
Take relative abundances into account. When
|
normalized |
Only changes the "Weighted UniFrac" calculation.
Divides result by the total branch weights. Default: |
tree |
A |
md |
Dataset field(s) to include in the output data frame, or |
within , between |
Dataset field(s) for intra- or inter- sample
comparisons. Alternatively, dataset field names given elsewhere can
be prefixed with |
delta |
For numeric metadata, report the absolute difference in values
for the two samples, for instance |
transform |
Transformation to apply. Options are:
|
ties |
When |
seed |
Random seed for permutations. Must be a non-negative integer.
Default: |
cpus |
The number of CPUs to use. Set to |
underscores |
When parsing the tree, should underscores be kept as
is? By default they will be converted to spaces (unless the entire ID
is quoted). Default |
Value
bdiv_matrix()
-An R matrix of samples x samples.
bdiv_distmat()
-A dist-class distance matrix.
bdiv_table()
--
A tibble data.frame with columns names .sample1, .sample2, .weighted, .bdiv, .distance, and any fields requested by
md
. Numeric metadata fields will be returned asabs(x - y)
; categorical metadata fields as"x"
,"y"
, or"x vs y"
.
Metadata Comparisons
Prefix metadata fields with ==
or !=
to limit comparisons to within or
between groups, respectively. For example, stat.by = '==Sex'
will
run calculations only for intra-group comparisons, returning "Male" and
"Female", but NOT "Female vs Male". Similarly, setting
stat.by = '!=Body Site'
will only show the inter-group comparisons, such
as "Saliva vs Stool", "Anterior nares vs Buccal mucosa", and so on.
The same effect can be achieved by using the within
and between
parameters. stat.by = '==Sex'
is equivalent to
stat.by = 'Sex', within = 'Sex'
.
See Also
Other beta_diversity:
bdiv_boxplot()
,
bdiv_clusters()
,
bdiv_corrplot()
,
bdiv_heatmap()
,
bdiv_ord_plot()
,
bdiv_ord_table()
,
bdiv_stats()
,
distmat_stats()
Examples
library(rbiom)
# Subset to four samples
biom <- hmp50$clone()
biom$counts <- biom$counts[,c("HMP18", "HMP19", "HMP20", "HMP21")]
# Return in long format with metadata
bdiv_table(biom, 'unifrac', md = ".all")
# Only look at distances among the stool samples
bdiv_table(biom, 'unifrac', md = c("==Body Site", "Sex"))
# Or between males and females
bdiv_table(biom, 'unifrac', md = c("Body Site", "!=Sex"))
# All-vs-all matrix
bdiv_matrix(biom, 'unifrac')
# All-vs-all distance matrix
dm <- bdiv_distmat(biom, 'unifrac')
dm
plot(hclust(dm))
Apply a function to each subset of an rbiom object.
Description
blply()
and bdply()
let you divide your biom dataset into smaller
pieces, run a function on those smaller rbiom objects, and return the
results as a data.frame or list.
Usage
bdply(biom, vars, FUN, ..., iters = list(), prefix = FALSE)
blply(biom, vars, FUN, ..., iters = list(), prefix = FALSE)
Arguments
biom |
An rbiom object, such as from |
vars |
A character vector of metadata fields. Each unique combination
of values in these columns will be used to create a subsetted
rbiom object to pass to |
FUN |
The function to execute on each subset of |
... |
Additional arguments to pass on to |
iters |
A named list of values to pass to |
prefix |
When |
Details
You can also specify additional variables for your function to iterate over in unique combinations.
Calls plyr::ddply()
or plyr::dlply()
internally.
Value
For bdply()
, a tibble data.frame comprising the accumulated
outputs of FUN
, along with the columns specified by
vars
and iters
. For blply()
, a named list that has details
about vars
and iters
in attr(,'split_labels')
.
See Also
Other metadata:
glimpse.rbiom()
Other biom:
biom_merge()
Examples
library(rbiom)
bdply(hmp50, "Sex", `$`, 'n_samples')
blply(hmp50, "Sex", `$`, 'n_samples') %>% unlist()
bdply(hmp50, c("Body Site", "Sex"), function (b) {
adm <- adiv_matrix(b)[,c("Shannon", "Simpson")]
apply(adm, 2L, mean)
})
iters <- list(w = c(TRUE, FALSE), d = c("bray", "euclid"))
bdply(hmp50, "Sex", iters = iters, function (b, w, d) {
r <- range(bdiv_distmat(biom = b, bdiv = d, weighted = w))
round(data.frame(min = r[[1]], max = r[[2]]))
})
Combine several rbiom objects into one.
Description
WARNING: It is generally ill-advised to merge BIOM datasets, as OTUs mappings are dependent on upstream clustering and are not equivalent between BIOM files.
Usage
biom_merge(
...,
metadata = NA,
taxonomy = NA,
tree = NULL,
sequences = NA,
id = NA,
comment = NA
)
Arguments
... |
Any number of rbiom objects (e.g. from |
metadata , taxonomy , tree , sequences , id , comment |
Replace the corresponding
data in the merged rbiom object with these values. Set to |
Value
An rbiom object.
See Also
Other biom:
bdply()
Examples
library(rbiom)
b1 <- as_rbiom(hmp50$counts[,1:4])
b2 <- as_rbiom(hmp50$counts[,5:8])
biom <- biom_merge(b1, b2)
print(biom)
biom$tree <- hmp50$tree
biom$metadata <- hmp50$metadata
print(biom)
Convert biom data to an external package class.
Description
Requires the relevant Bioconductor R package to be installed:
convert_to_phyloseq
-convert_to_SE
-convert_to_TSE
-
Usage
convert_to_SE(biom, ...)
convert_to_TSE(biom, ...)
convert_to_phyloseq(biom, ...)
Arguments
biom |
An rbiom object, such as from |
... |
Not Used. |
Details
A SummarizedExperiment object includes counts, metadata, and taxonomy.
phyloseq and TreeSummarizedExperiment additionally includes the tree and sequences.
Value
A phyloseq, SummarizedExperiment, or TreeSummarizedExperiment object.
Examples
## Not run:
library(rbiom)
print(hmp50)
# Requires 'phyloseq', a Bioconductor R package
if (nzchar(system.file(package = "phyloseq"))) {
physeq <- convert_to_phyloseq(hmp50)
print(physeq)
}
# Requires 'SummarizedExperiment', a Bioconductor R package
if (nzchar(system.file(package = "SummarizedExperiment"))) {
se <- convert_to_SE(hmp50)
print(se)
}
# Requires 'TreeSummarizedExperiment', a Bioconductor R package
if (nzchar(system.file(package = "TreeSummarizedExperiment"))) {
tse <- convert_to_TSE(hmp50)
print(tse)
}
## End(Not run)
Run ordinations on a distance matrix.
Description
Run ordinations on a distance matrix.
Usage
distmat_ord_table(dm, ord = "PCoA", k = 2L, ...)
Arguments
dm |
A |
ord |
Method for reducing dimensionality. Options are:
Multiple/abbreviated values allowed. Default: |
k |
Number of ordination dimensions to return. Either |
... |
Additional arguments for |
Value
A data.frame with columns .sample
, .ord
, .x
,
.y
, and (optionally) .z
.
See Also
Other ordination:
bdiv_ord_plot()
,
bdiv_ord_table()
Examples
library(rbiom)
dm <- bdiv_distmat(hmp50, "bray")
ord <- distmat_ord_table(dm, "PCoA")
head(ord)
Run statistics on a distance matrix vs a categorical or numeric variable.
Description
Run statistics on a distance matrix vs a categorical or numeric variable.
Usage
distmat_stats(dm, groups, test = "adonis2", seed = 0, permutations = 999)
Arguments
dm |
A |
groups |
A named vector of grouping values. The names should
correspond to |
test |
Permutational test for accessing significance. Options are:
Abbreviations are allowed. Default: |
seed |
Random seed for permutations. Must be a non-negative integer.
Default: |
permutations |
Number of random permutations to use.
Default: |
Value
A data.frame with summary statistics from vegan::permustats()
.
The columns are:
- .n -
The size of the distance matrix.
- .stat -
-
The observed statistic. For mrpp, this is the overall weighted mean of group mean distances.
- .z -
-
The difference of observed statistic and mean of permutations divided by the standard deviation of permutations (also known as z-values). Evaluated from permuted values without observed statistic.
- .p.val -
Probability calculated by
test
.
R commands for reproducing the results are in $code
.
See Also
Other beta_diversity:
bdiv_boxplot()
,
bdiv_clusters()
,
bdiv_corrplot()
,
bdiv_heatmap()
,
bdiv_ord_plot()
,
bdiv_ord_table()
,
bdiv_stats()
,
bdiv_table()
Other stats_tables:
adiv_stats()
,
bdiv_stats()
,
stats_table()
,
taxa_stats()
Examples
library(rbiom)
hmp10 <- hmp50$clone()
hmp10$counts <- hmp10$counts[,1:10]
dm <- bdiv_distmat(hmp10, 'unifrac')
distmat_stats(dm, groups = pull(hmp10, 'Body Site'))
distmat_stats(dm, groups = pull(hmp10, 'Age'))
# See the R code used to calculate these statistics:
stats <- distmat_stats(dm, groups = pull(hmp10, 'Age'))
stats$code
documentation_biom.rbiom
Description
documentation_biom.rbiom
Arguments
biom |
An rbiom object, such as from |
.data |
An rbiom object, such as from |
x |
An rbiom object, such as from |
object |
An rbiom object, such as from |
data |
An rbiom object, such as from |
documentation_clusters
Description
documentation_clusters
Arguments
k |
Number of clusters. Default: |
rank |
Which taxa rank to use. E.g. |
Value
A numeric factor assigning samples to clusters.
documentation_cmp
Description
documentation_cmp
Metadata Comparisons
Prefix metadata fields with ==
or !=
to limit comparisons to within or
between groups, respectively. For example, stat.by = '==Sex'
will
run calculations only for intra-group comparisons, returning "Male" and
"Female", but NOT "Female vs Male". Similarly, setting
stat.by = '!=Body Site'
will only show the inter-group comparisons, such
as "Saliva vs Stool", "Anterior nares vs Buccal mucosa", and so on.
The same effect can be achieved by using the within
and between
parameters. stat.by = '==Sex'
is equivalent to
stat.by = 'Sex', within = 'Sex'
.
documentation_default
Description
documentation_default
Arguments
biom |
An rbiom object, such as from |
mtx |
A matrix-like object. |
tree |
A |
underscores |
When parsing the tree, should underscores be kept as
is? By default they will be converted to spaces (unless the entire ID
is quoted). Default |
md |
Dataset field(s) to include in the output data frame, or |
adiv |
Alpha diversity metric(s) to use. Options are: |
bdiv |
Beta diversity distance algorithm(s) to use. Options are:
|
taxa |
Which taxa to display. An integer value will show the top n
most abundant taxa. A value 0 <= n < 1 will show any taxa with that
mean abundance or greater (e.g. |
ord |
Method for reducing dimensionality. Options are:
Multiple/abbreviated values allowed. Default: |
weighted |
Take relative abundances into account. When
|
normalized |
Only changes the "Weighted UniFrac" calculation.
Divides result by the total branch weights. Default: |
delta |
For numeric metadata, report the absolute difference in values
for the two samples, for instance |
rank |
What rank(s) of taxa to display. E.g. |
lineage |
Include all ranks in the name of the taxa. For instance,
setting to |
unc |
How to handle unclassified, uncultured, and similarly ambiguous taxa names. Options are:
Abbreviations are allowed. Default: |
other |
Sum all non-itemized taxa into an "Other" taxa. When
|
sparse |
If |
p.top |
Only display taxa with the most significant differences in
abundance. If |
y.transform |
The transformation to apply to the y-axis. Visualizing differences of both high- and low-abundance taxa is best done with a non-linear axis. Options are:
These methods allow visualization of both high- and low-abundance
taxa simultaneously, without complaint about 'zero' count
observations. Default: |
flip |
Transpose the axes, so that taxa are present as rows instead
of columns. Default: |
stripe |
Shade every other x position. Default: same as flip |
ci |
How to calculate min/max of the crossbar,
errorbar, linerange, and pointrange layers.
Options are: |
p.label |
Minimum adjusted p-value to display on the plot with a bracket.
If a numeric vector with more than one value is
provided, they will be used as breaks for asterisk notation.
Default: |
level |
The confidence level for calculating a confidence interval.
Default: |
caption |
Add methodology caption beneath the plot.
Default: |
outliers |
Show boxplot outliers? |
xlab.angle |
Angle of the labels at the bottom of the plot.
Options are |
k |
Number of ordination dimensions to return. Either |
split.by |
Dataset field(s) that the data should be split by prior to
any calculations. Must be categorical. Default: |
dm |
A |
groups |
A named vector of grouping values. The names should
correspond to |
df |
The dataset (data.frame or tibble object). "Dataset fields"
mentioned below should match column names in |
regr |
Dataset field with the x-axis (independent; predictive)
values. Must be numeric. Default: |
resp |
Dataset field with the y-axis (dependent; response) values,
such as taxa abundance or alpha diversity.
Default: |
stat.by |
Dataset field with the statistical groups. Must be
categorical. Default: |
color.by |
Dataset field with the group to color by. Must be
categorical. Default: |
shape.by |
Dataset field with the group for shapes. Must be
categorical. Default: |
facet.by |
Dataset field(s) to use for faceting. Must be categorical.
Default: |
colors |
How to color the groups. Options are:
See "Aesthetics" section below for additional information.
Default: |
shapes |
Shapes for each group.
Options are similar to |
patterns |
Patterns for each group.
Options are similar to |
test |
Method for computing p-values: |
fit |
How to fit the trendline. |
at |
Position(s) along the x-axis where the means or slopes should be
evaluated. Default: |
alt |
Alternative hypothesis direction. Options are |
mu |
Reference value to test against. Default: |
check |
Generate additional plots to aid in assessing data normality.
Default: |
within , between |
Dataset field(s) for intra- or inter- sample
comparisons. Alternatively, dataset field names given elsewhere can
be prefixed with |
seed |
Random seed for permutations. Must be a non-negative integer.
Default: |
cpus |
The number of CPUs to use. Set to |
permutations |
Number of random permutations to use.
Default: |
p.adj |
Method to use for multiple comparisons adjustment of
p-values. Run |
depths |
Rarefaction depths to show in the plot, or |
rline |
Where to draw a horizontal line on the plot, intended to show
a particular rarefaction depth. Set to |
clone |
Create a copy of |
labels |
Show sample names under each bar. Default: |
transform |
Transformation to apply. Options are:
|
ties |
When |
documentation_dist_test
Description
documentation_dist_test
Arguments
stat.by |
The categorical or numeric metadata field over which statistics should be calculated. Required. |
test |
Permutational test for accessing significance. Options are:
Abbreviations are allowed. Default: |
documentation_heatmap
Description
documentation_heatmap
Arguments
grid |
Color palette name, or a list with entries for |
label |
Label the matrix rows and columns. You can supply a list
or logical vector of length two to control row labels and column
labels separately, for example
|
label_size |
The font size to use for the row and column labels. You
can supply a numeric vector of length two to control row label sizes
and column label sizes separately, for example
|
rescale |
Rescale rows or columns to all have a common min/max.
Options: |
trees |
Draw a dendrogram for rows (left) and columns (top). You can
supply a list or logical vector of length two to control the row tree
and column tree separately, for example
|
clust |
Clustering algorithm for reordering the rows and columns by
similarity. You can supply a list or character vector of length two to
control the row and column clustering separately, for example
Default: |
dist |
Distance algorithm to use when reordering the rows and columns
by similarity. You can supply a list or character vector of length
two to control the row and column clustering separately, for example
Default: |
tree_height , track_height |
The height of the dendrogram or annotation
tracks as a percentage of the overall grid size. Use a numeric vector
of length two to assign |
asp |
Aspect ratio (height/width) for entire grid.
Default: |
legend |
Where to place the legend. Options are: |
title |
Plot title. Set to |
... |
Additional arguments to pass on to ggplot2::theme(). |
Value
A ggplot2
plot. The computed data points and ggplot
command are available as $data
and $code
,
respectively.
documentation_plot_return
Description
documentation_plot_return
Value
A ggplot2
plot. The computed data points and ggplot
command are available as $data
and $code
,
respectively.
documentation_rank.2
Description
documentation_rank.2
Arguments
rank |
What rank(s) of taxa to compute biplot coordinates and
statistics for, or |
documentation_rank.NULL
Description
documentation_rank.NULL
Arguments
rank |
What rank(s) of taxa to compute biplot coordinates and
statistics for, or |
documentation_return.biom
Description
documentation_return.biom
Value
An rbiom object.
documentation_taxa.4
Description
documentation_taxa.4
Arguments
taxa |
Which taxa to display. An integer value will show the top n
most abundant taxa. A value 0 <= n < 1 will show any taxa with that
mean abundance or greater (e.g. |
Export data to QIIME 2 or mothur.
Description
Populates a directory with the following files, formatted according to QIIME 2 or mothur's specifications.
-
biom_counts.tsv
-
biom_metadata.tsv
-
biom_taxonomy.tsv
-
biom_tree.nwk
-
biom_seqs.fna
biom_counts.tsv
will always be created. The others are dependent on
whether the content is present in the biom
argument.
Usage
write_mothur(biom, dir = tempfile(), prefix = "biom_")
write_qiime2(biom, dir = tempfile(), prefix = "biom_")
Arguments
biom |
An rbiom object, such as from |
dir |
Where to save the files. If the directory doesn't exist, it will
be created. Default: |
prefix |
A string to prepend to each file name. Default: |
Value
The normalized directory path that was written to (invisibly).
Examples
library(rbiom)
tdir <- tempfile()
write_qiime2(hmp50, tdir, 'qiime2_')
write_mothur(hmp50, tdir, 'mothur_')
list.files(tdir)
readLines(file.path(tdir, 'qiime2_metadata.tsv'), n = 4)
readLines(file.path(tdir, 'mothur_taxonomy.tsv'), n = 3)
unlink(tdir, recursive = TRUE)
Global Enteric Multicenter Study (n = 1,006)
Description
Global Enteric Multicenter Study (n = 1,006)
Usage
gems
Format
An rbiom object with 1,006 samples. Includes metadata and taxonomy.
- diarrhea -
Case or Control
- age -
0 - 4.8 (years old)
- country -
Bangladesh, Gambia, Kenya, or Mali
Source
doi:10.1186/gb-2014-15-6-r76 and doi:10.1093/nar/gkx1027
Get a glimpse of your metadata.
Description
Get a glimpse of your metadata.
Usage
## S3 method for class 'rbiom'
glimpse(x, width = NULL, ...)
Arguments
x |
An rbiom object, such as from |
width |
Width of output. See |
... |
Not used. |
Value
The original biom
, invisibly.
See Also
Other metadata:
bdply()
Examples
library(rbiom)
glimpse(hmp50)
Human Microbiome Project - demo dataset (n = 50)
Description
Human Microbiome Project - demo dataset (n = 50)
Usage
hmp50
Format
An rbiom object with 50 samples. Includes metadata, taxonomy, phylogeny, and sequences.
- Sex -
Male or Female
- Body Site -
Anterior nares, Buccal mucosa, Mid vagina, Saliva, or Stool
- Age -
21 - 40
- BMI -
19 - 32
Source
Create, modify, and delete metadata fields.
Description
mutate() creates new fields in $metadata
that are functions of existing
metadata fields. It can also modify (if the name is the same as an existing
field) and delete fields (by setting their value to NULL).
Usage
## S3 method for class 'rbiom'
mutate(.data, ..., clone = TRUE)
## S3 method for class 'rbiom'
rename(.data, ..., clone = TRUE)
Arguments
.data |
An rbiom object, such as from |
... |
Passed on to |
clone |
Create a copy of |
Value
An rbiom object.
See Also
Other transformations:
rarefy()
,
rarefy_cols()
,
slice_metadata
,
subset()
,
with()
Examples
library(rbiom)
biom <- slice_max(hmp50, BMI, n = 6)
biom$metadata
# Add a new field to the metadata
biom <- mutate(biom, Obsese = BMI >= 30)
biom$metadata
# Rename a metadata field
biom <- rename(biom, 'Age (years)' = "Age")
biom$metadata
Create a heatmap with tracks and dendrograms from any matrix.
Description
Create a heatmap with tracks and dendrograms from any matrix.
Usage
plot_heatmap(
mtx,
grid = list(label = "Grid Value", colors = "imola"),
tracks = NULL,
label = TRUE,
label_size = NULL,
rescale = "none",
trees = TRUE,
clust = "complete",
dist = "euclidean",
asp = 1,
tree_height = 10,
track_height = 10,
legend = "right",
title = NULL,
xlab.angle = "auto",
...
)
Arguments
mtx |
A numeric |
grid |
Color palette name, or a list with entries for |
tracks |
List of track definitions. See details below.
Default: |
label |
Label the matrix rows and columns. You can supply a list
or logical vector of length two to control row labels and column
labels separately, for example
|
label_size |
The font size to use for the row and column labels. You
can supply a numeric vector of length two to control row label sizes
and column label sizes separately, for example
|
rescale |
Rescale rows or columns to all have a common min/max.
Options: |
trees |
Draw a dendrogram for rows (left) and columns (top). You can
supply a list or logical vector of length two to control the row tree
and column tree separately, for example
|
clust |
Clustering algorithm for reordering the rows and columns by
similarity. You can supply a list or character vector of length two to
control the row and column clustering separately, for example
Default: |
dist |
Distance algorithm to use when reordering the rows and columns
by similarity. You can supply a list or character vector of length
two to control the row and column clustering separately, for example
Default: |
asp |
Aspect ratio (height/width) for entire grid.
Default: |
tree_height , track_height |
The height of the dendrogram or annotation
tracks as a percentage of the overall grid size. Use a numeric vector
of length two to assign |
legend |
Where to place the legend. Options are: |
title |
Plot title. Default: |
xlab.angle |
Angle of the labels at the bottom of the plot.
Options are |
... |
Additional arguments to pass on to ggplot2::theme(). |
Value
A ggplot2
plot. The computed data points and ggplot
command are available as $data
and $code
,
respectively.
Track Definitions
One or more colored tracks can be placed on the left and/or top of the heatmap grid to visualize associated metadata values.
## Categorical ---------------------------- cat_vals <- sample(c("Male", "Female"), 10, replace = TRUE) tracks <- list('Sex' = cat_vals) tracks <- list('Sex' = list(values = cat_vals, colors = "bright")) tracks <- list('Sex' = list( values = cat_vals, colors = c('Male' = "blue", 'Female' = "red")) ) ## Numeric -------------------------------- num_vals <- sample(25:40, 10, replace = TRUE) tracks <- list('Age' = num_vals) tracks <- list('Age' = list(values = num_vals, colors = "greens")) tracks <- list('Age' = list(values = num_vals, range = c(0,50))) tracks <- list('Age' = list( label = "Age (Years)", values = num_vals, colors = c("azure", "darkblue", "darkorchid") )) ## Multiple Tracks ------------------------ tracks <- list('Sex' = cat_vals, 'Age' = num_vals) tracks <- list( list(label = "Sex", values = cat_vals, colors = "bright"), list(label = "Age", values = num_vals, colors = "greens") ) mtx <- matrix(sample(1:50), ncol = 10) dimnames(mtx) <- list(letters[1:5], LETTERS[1:10]) plot_heatmap(mtx = mtx, tracks = tracks)
The following entries in the track definitions are understood:
values
-The metadata values. When unnamed, order must match matrix.
range
-The c(min,max) to use for scale values.
label
-Label for this track. Defaults to the name of this list element.
side
-Options are
"top"
(default) or"left"
.colors
-A pre-defined palette name or custom set of colors to map to.
na.color
-The color to use for
NA
values.bins
-Bin a gradient into this many bins/steps.
guide
-A list of arguments for guide_colorbar() or guide_legend().
All built-in color palettes are colorblind-friendly. See Mapping Metadata to Aesthetics for images of the palettes.
Categorical palette names: "okabe"
, "carto"
, "r4"
,
"polychrome"
, "tol"
, "bright"
, "light"
,
"muted"
, "vibrant"
, "tableau"
, "classic"
,
"alphabet"
, "tableau20"
, "kelly"
, and "fishy"
.
Numeric palette names: "reds"
, "oranges"
, "greens"
,
"purples"
, "grays"
, "acton"
, "bamako"
,
"batlow"
, "bilbao"
, "buda"
, "davos"
,
"devon"
, "grayC"
, "hawaii"
, "imola"
,
"lajolla"
, "lapaz"
, "nuuk"
, "oslo"
,
"tokyo"
, "turku"
, "bam"
, "berlin"
,
"broc"
, "cork"
, "lisbon"
, "roma"
,
"tofino"
, "vanimo"
, and "vik"
.
See Also
Other visualization:
adiv_boxplot()
,
adiv_corrplot()
,
bdiv_boxplot()
,
bdiv_corrplot()
,
bdiv_heatmap()
,
bdiv_ord_plot()
,
rare_corrplot()
,
rare_multiplot()
,
rare_stacked()
,
stats_boxplot()
,
stats_corrplot()
,
taxa_boxplot()
,
taxa_corrplot()
,
taxa_heatmap()
,
taxa_stacked()
Examples
library(rbiom)
set.seed(123)
mtx <- matrix(runif(5*8), nrow = 5, dimnames = list(LETTERS[1:5], letters[1:8]))
plot_heatmap(mtx)
plot_heatmap(mtx, grid="oranges")
plot_heatmap(mtx, grid=list(colors = "oranges", label = "Some %", bins = 5))
tracks <- list(
'Number' = sample(1:ncol(mtx)),
'Person' = list(
values = factor(sample(c("Alice", "Bob"), ncol(mtx), TRUE)),
colors = c('Alice' = "purple", 'Bob' = "darkcyan") ),
'State' = list(
side = "left",
values = sample(c("TX", "OR", "WA"), nrow(mtx), TRUE),
colors = "bright" )
)
plot_heatmap(mtx, tracks=tracks)
Map sample names to metadata field values.
Description
Map sample names to metadata field values.
Usage
## S3 method for class 'rbiom'
pull(.data, var = -1, name = ".sample", ...)
Arguments
.data |
An rbiom object, such as from |
var |
The metadata field name specified as:
Default: |
name |
The column to be used as names for a named vector.
Specified in a similar manner as var. Default: |
... |
Not used. |
Value
A vector of metadata values, named with sample names.
See Also
taxa_map()
Other samples:
sample_sums()
Examples
library(rbiom)
pull(hmp50, 'Age') %>% head()
pull(hmp50, 'bod') %>% head(4)
Visualize rarefaction curves with scatterplots and trendlines.
Description
Visualize rarefaction curves with scatterplots and trendlines.
Usage
rare_corrplot(
biom,
adiv = "Shannon",
layers = "tc",
rline = TRUE,
stat.by = NULL,
facet.by = NULL,
colors = TRUE,
shapes = TRUE,
test = "none",
fit = "log",
at = NULL,
level = 0.95,
p.adj = "fdr",
transform = "none",
alt = "!=",
mu = 0,
caption = TRUE,
check = FALSE,
...
)
Arguments
biom |
An rbiom object, such as from |
adiv |
Alpha diversity metric(s) to use. Options are: |
layers |
One or more of
|
rline |
Where to draw a horizontal line on the plot, intended to show
a particular rarefaction depth. Set to |
stat.by |
Dataset field with the statistical groups. Must be
categorical. Default: |
facet.by |
Dataset field(s) to use for faceting. Must be categorical.
Default: |
colors |
How to color the groups. Options are:
See "Aesthetics" section below for additional information.
Default: |
shapes |
Shapes for each group.
Options are similar to |
test |
Method for computing p-values: |
fit |
How to fit the trendline.
Options are |
at |
Position(s) along the x-axis where the means or slopes should be
evaluated. Default: |
level |
The confidence level for calculating a confidence interval.
Default: |
p.adj |
Method to use for multiple comparisons adjustment of
p-values. Run |
transform |
Transformation to apply. Options are:
|
alt |
Alternative hypothesis direction. Options are |
mu |
Reference value to test against. Default: |
caption |
Add methodology caption beneath the plot.
Default: |
check |
Generate additional plots to aid in assessing data normality.
Default: |
... |
Additional parameters to pass along to ggplot2 functions.
Prefix a parameter name with a layer name to pass it to only that
layer. For instance, |
Value
A ggplot2
plot. The computed data points, ggplot2 command,
stats table, and stats table commands are available as $data
,
$code
, $stats
, and $stats$code
, respectively.
Aesthetics
All built-in color palettes are colorblind-friendly. The available
categorical palette names are: "okabe"
, "carto"
, "r4"
,
"polychrome"
, "tol"
, "bright"
, "light"
,
"muted"
, "vibrant"
, "tableau"
, "classic"
,
"alphabet"
, "tableau20"
, "kelly"
, and "fishy"
.
Shapes can be given as per base R - numbers 0 through 17 for various shapes, or the decimal value of an ascii character, e.g. a-z = 65:90; A-Z = 97:122 to use letters instead of shapes on the plot. Character strings may used as well.
See Also
Other rarefaction:
rare_multiplot()
,
rare_stacked()
,
rarefy()
,
rarefy_cols()
,
sample_sums()
Other visualization:
adiv_boxplot()
,
adiv_corrplot()
,
bdiv_boxplot()
,
bdiv_corrplot()
,
bdiv_heatmap()
,
bdiv_ord_plot()
,
plot_heatmap()
,
rare_multiplot()
,
rare_stacked()
,
stats_boxplot()
,
stats_corrplot()
,
taxa_boxplot()
,
taxa_corrplot()
,
taxa_heatmap()
,
taxa_stacked()
Examples
library(rbiom)
biom <- subset(hmp50, `Body Site` %in% c('Saliva', 'Stool'))
rare_corrplot(biom, stat.by = "body", adiv = c("sh", "o"), facet.by = "Sex")
Combines rare_corrplot and rare_stacked into a single figure.
Description
Combines rare_corrplot and rare_stacked into a single figure.
Usage
rare_multiplot(
biom,
adiv = "Shannon",
layers = "tc",
rline = TRUE,
stat.by = NULL,
facet.by = NULL,
colors = TRUE,
shapes = TRUE,
test = "none",
fit = "log",
at = NULL,
level = 0.95,
p.adj = "fdr",
transform = "none",
alt = "!=",
mu = 0,
caption = TRUE,
check = FALSE,
...
)
Arguments
biom |
An rbiom object, such as from |
adiv |
Alpha diversity metric(s) to use. Options are: |
layers |
One or more of
|
rline |
Where to draw a horizontal line on the plot, intended to show
a particular rarefaction depth. Set to |
stat.by |
Dataset field with the statistical groups. Must be
categorical. Default: |
facet.by |
Dataset field(s) to use for faceting. Must be categorical.
Default: |
colors |
How to color the groups. Options are:
See "Aesthetics" section below for additional information.
Default: |
shapes |
Shapes for each group.
Options are similar to |
test |
Method for computing p-values: |
fit |
How to fit the trendline.
Options are |
at |
Position(s) along the x-axis where the means or slopes should be
evaluated. Default: |
level |
The confidence level for calculating a confidence interval.
Default: |
p.adj |
Method to use for multiple comparisons adjustment of
p-values. Run |
transform |
Transformation to apply. Options are:
|
alt |
Alternative hypothesis direction. Options are |
mu |
Reference value to test against. Default: |
caption |
Add methodology caption beneath the plot.
Default: |
check |
Generate additional plots to aid in assessing data normality.
Default: |
... |
Additional parameters to pass along to ggplot2 functions.
Prefix a parameter name with a layer name to pass it to only that
layer. For instance, |
Value
A ggplot2
plot. The computed data points, ggplot2 command,
stats table, and stats table commands are available as $data
,
$code
, $stats
, and $stats$code
, respectively.
Aesthetics
All built-in color palettes are colorblind-friendly. The available
categorical palette names are: "okabe"
, "carto"
, "r4"
,
"polychrome"
, "tol"
, "bright"
, "light"
,
"muted"
, "vibrant"
, "tableau"
, "classic"
,
"alphabet"
, "tableau20"
, "kelly"
, and "fishy"
.
Shapes can be given as per base R - numbers 0 through 17 for various shapes, or the decimal value of an ascii character, e.g. a-z = 65:90; A-Z = 97:122 to use letters instead of shapes on the plot. Character strings may used as well.
See Also
Other rarefaction:
rare_corrplot()
,
rare_stacked()
,
rarefy()
,
rarefy_cols()
,
sample_sums()
Other visualization:
adiv_boxplot()
,
adiv_corrplot()
,
bdiv_boxplot()
,
bdiv_corrplot()
,
bdiv_heatmap()
,
bdiv_ord_plot()
,
plot_heatmap()
,
rare_corrplot()
,
rare_stacked()
,
stats_boxplot()
,
stats_corrplot()
,
taxa_boxplot()
,
taxa_corrplot()
,
taxa_heatmap()
,
taxa_stacked()
Examples
library(rbiom)
rare_multiplot(hmp50, stat.by = "Body Site")
Visualize the number of observations per sample.
Description
Visualize the number of observations per sample.
Usage
rare_stacked(
biom,
rline = TRUE,
counts = TRUE,
labels = TRUE,
y.transform = "log10",
...
)
Arguments
biom |
An rbiom object, such as from |
rline |
Where to draw a horizontal line on the plot, intended to show
a particular rarefaction depth. Set to |
counts |
Display the number of samples and reads remaining after
rarefying to |
labels |
Show sample names under each bar. Default: |
y.transform |
Y-axis transformation. Options are |
... |
Additional parameters to pass along to ggplot2 functions.
Prefix a parameter name with |
Value
A ggplot2
plot. The computed data points and ggplot
command are available as $data
and $code
,
respectively.
See Also
Other rarefaction:
rare_corrplot()
,
rare_multiplot()
,
rarefy()
,
rarefy_cols()
,
sample_sums()
Other visualization:
adiv_boxplot()
,
adiv_corrplot()
,
bdiv_boxplot()
,
bdiv_corrplot()
,
bdiv_heatmap()
,
bdiv_ord_plot()
,
plot_heatmap()
,
rare_corrplot()
,
rare_multiplot()
,
stats_boxplot()
,
stats_corrplot()
,
taxa_boxplot()
,
taxa_corrplot()
,
taxa_heatmap()
,
taxa_stacked()
Examples
library(rbiom)
rare_stacked(hmp50)
rare_stacked(hmp50, rline = 500, r.linewidth = 2, r.linetype = "twodash")
fig <- rare_stacked(hmp50, counts = FALSE)
fig$code
Rarefy OTU counts.
Description
Sub-sample OTU observations such that all samples have an equal number.
If called on data with non-integer abundances, values will be re-scaled to
integers between 1 and depth
such that they sum to depth
.
Usage
rarefy(biom, depth = 0.1, n = NULL, seed = 0, clone = TRUE, cpus = NULL)
Arguments
biom |
An rbiom object, such as from |
depth |
How many observations to keep per sample. When
|
n |
The number of samples to keep. When |
seed |
An integer seed for randomizing which observations to keep or drop. If you need to create different random rarefactions of the same data, set the seed to a different number each time. |
clone |
Create a copy of |
cpus |
The number of CPUs to use. Set to |
Value
An rbiom object.
See Also
Other rarefaction:
rare_corrplot()
,
rare_multiplot()
,
rare_stacked()
,
rarefy_cols()
,
sample_sums()
Other transformations:
modify_metadata
,
rarefy_cols()
,
slice_metadata
,
subset()
,
with()
Examples
library(rbiom)
sample_sums(hmp50) %>% head()
biom <- rarefy(hmp50)
sample_sums(biom) %>% head()
Transform a counts matrix.
Description
Rarefaction subset counts so that all samples have the same number of observations. Rescaling rows or cols scales the matrix values so that row sums or column sums equal 1.
Usage
rarefy_cols(mtx, depth = 0.1, n = NULL, seed = 0L, cpus = NULL)
rescale_cols(mtx)
rescale_rows(mtx)
Arguments
mtx |
A matrix-like object. |
depth |
How many observations to keep per sample. When
|
n |
The number of samples to keep. When |
seed |
A positive integer to use for seeding the random number generator. If you need to create different random rarefactions of the same matrix, set this seed value to a different number each time. |
cpus |
The number of CPUs to use. Set to |
Value
The rarefied or rescaled matrix.
See Also
Other rarefaction:
rare_corrplot()
,
rare_multiplot()
,
rare_stacked()
,
rarefy()
,
sample_sums()
Other transformations:
modify_metadata
,
rarefy()
,
slice_metadata
,
subset()
,
with()
Examples
library(rbiom)
# rarefy_cols --------------------------------------
biom <- hmp50$clone()
sample_sums(biom) %>% head(10)
biom$counts %<>% rarefy_cols(depth=1000)
sample_sums(biom) %>% head(10)
# rescaling ----------------------------------------
mtx <- matrix(sample(1:20), nrow=4)
mtx
rowSums(mtx)
rowSums(rescale_rows(mtx))
colSums(mtx)
colSums(rescale_cols(mtx))
Deprecated functions in package rbiom.
Description
The functions listed below are deprecated and will be defunct in
the near future. When possible, alternative functions with similar
functionality are also mentioned. Help pages for deprecated functions are
available at help("<function>-deprecated")
.
Usage
alpha.div(biom, rarefy = FALSE)
beta.div(
biom,
method = "Bray-Curtis",
weighted = TRUE,
tree = NULL,
long = FALSE,
md = FALSE
)
counts(biom)
info(biom)
metadata(biom, field = NULL, cleanup = FALSE)
nsamples(biom)
ntaxa(biom)
phylogeny(biom)
read.biom(src, tree = "auto", prune = FALSE)
read.fasta(file, ids = NULL)
read.tree(src)
sample.names(biom)
## S3 method for class 'rbiom'
select(
.data,
samples = NULL,
nTop = NULL,
nRandom = NULL,
seed = 0,
biom = NULL,
...
)
sequences(biom)
subtree(tree, tips)
taxa.names(biom)
taxa.ranks(biom)
taxa.rollup(
biom,
rank = "OTU",
map = NULL,
lineage = FALSE,
sparse = FALSE,
taxa = NULL,
long = FALSE,
md = FALSE
)
taxonomy(biom, ranks = NULL, unc = "asis")
tips(x)
unifrac(biom, weighted = TRUE, tree = NULL)
write.biom(biom, file, format = "json")
write.fasta(seqs, outfile = NULL)
write.tree(tree, file = NULL)
write.xlsx(biom, outfile, depth = 0.1, seed = 0)
as.percent(biom)
comments(biom)
depth(biom)
depths_barplot(
biom,
rline = TRUE,
counts = TRUE,
labels = TRUE,
transform = "log10",
...
)
has.phylogeny(biom)
has.sequences(biom)
id(biom)
is.rarefied(biom)
repair(biom)
sample_subset(x, ...)
sample.sums(biom, long = FALSE, md = FALSE)
taxa_max(biom, rank = -1, lineage = FALSE, unc = "singly")
taxa.means(biom, rank = NULL)
taxa.sums(biom, rank = NULL)
top.taxa(biom, rank = "OTU", n = Inf)
top_taxa(biom, rank = "OTU", n = Inf)
comments(x) <- value
counts(x) <- value
id(x) <- value
metadata(x) <- value
phylogeny(x) <- value
sample.names(x) <- value
sequences(x) <- value
taxa.names(x) <- value
taxa.ranks(x) <- value
taxonomy(x) <- value
alpha.div
Use adiv_matrix()
or adiv_table()
instead.
beta.div
Use bdiv_table()
or bdiv_distmat()
instead.
counts
Use $counts
instead.
info
Use biom$id
, biom$comment
, etc instead.
metadata
Use biom$metadata
or pull(biom, field)
instead.
nsamples
Use biom$n_samples
instead.
ntaxa
Use biom$n_otus
instead.
phylogeny
Use biom$tree
instead.
read.biom
Use as_rbiom()
instead.
read.fasta
Use read_fasta()
instead.
read.tree
Use read_tree()
instead.
sample.names
Use biom$samples
instead.
select
Use slice()
instead.
sequences
Use biom$sequences
instead.
subtree
Use tree_subset()
instead.
taxa.names
Use biom$otus
instead.
taxa.ranks
Use biom$ranks
instead.
taxa.rollup
Use taxa_table()
taxa_matrix()
instead.
taxonomy
Use $taxonomy
instead.
tips
Use tree$tip.label
instead.
unifrac
Use bdiv_distmat()
or bdiv_table()
instead.
For weighted=TRUE
, returns non-normalized values.
write.biom
Use write_biom()
instead.
write.fasta
Use write_fasta()
instead.
write.tree
Use write_tree()
instead.
write.xlsx
Use write_xlsx()
instead.
as.percent
Use biom$counts %<>% rescale_cols()
instead.
comments
Use biom$comment
instead.
depth
Use sample_sums()
instead.
depths_barplot
Use rare_stacked()
instead.
has.phylogeny
Use !is.null(biom$tree)
instead.
has.sequences
Use !is.null(biom$sequences)
instead.
id
Use biom$id
instead.
is.rarefied
Use !is.null(biom$depth)
instead.
repair
Use as_rbiom(as.list(biom))
instead.
sample_subset
Use biom$metadata %<>% base::subset()
instead.
sample.sums
Use sample_sums()
or adiv_table()
instead.
taxa_max
Use taxa_apply(biom, max, sort = 'desc')
instead.
taxa.means
Use taxa_means()
instead.
taxa.sums
Use taxa_sums()
instead.
top.taxa
Use taxa_sums()
instead.
top_taxa
Use taxa_sums()
instead.
comments-set
Use biom$comment <-
instead.
counts-set
Use biom$counts <-
instead.
id-set
Use biom$id <-
instead.
metadata-set
Use biom$metadata <-
instead.
phylogeny-set
Use biom$tree <-
instead.
sample.names-set
Use biom$samples <-
instead.
sequences-set
Use biom$sequences <-
instead.
taxa.names-set
Use biom$otus <-
instead.
taxa.ranks-set
Use biom$ranks <-
instead.
taxonomy-set
Use biom$taxonomy <-
instead.
Working with rbiom Objects.
Description
Rbiom objects make it easy to access and manipulate your BIOM data, ensuring
all the disparate components remain in sync. These objects behave largely
like lists, in that you can access and assign to them using the $
operator. The sections below list all the fields which can be read and/or
written, and the helper functions for common tasks like rarefying and
subsetting. To create an rbiom object, see as_rbiom()
.
Use $clone()
to create a copy of an rbiom object. This is necessary
because rbiom objects are passed by reference. The usual <-
assignment
operator will simply create a second reference to the same object - it will
not create a second object. See speed ups for more details.
Readable Fields
Reading from fields will not change the rbiom object.
Accessor | Content |
$counts | Abundance of each OTU in each sample. |
$metadata | Sample mappings to metadata (treatment, patient, etc). |
$taxonomy | OTU mappings to taxonomic ranks (genus, phylum, etc). |
$otus , $n_otus | OTU names. |
$samples , $n_samples | Sample names. |
$fields , $n_fields | Metadata field names. |
$ranks , $n_ranks | Taxonomic rank names. |
$tree , $sequences | Phylogenetic tree / sequences for the OTUs, or NULL . |
$id , $comment | Arbitrary strings for describing the dataset. |
$depth | Rarefaction depth, or NULL if unrarefied. |
$date | Date from BIOM file. |
Writable Fields
Assigning new values to these components will trigger validation checks and inter-component synchronization.
Component | What can be assigned. |
$counts | Matrix of abundances; OTUs (rows) by samples (columns). |
$metadata | Data.frame with '.sample' column, or a file name. |
$taxonomy | Data.frame with '.otu' as the first column. |
$otus | Character vector with new names for the OTUs. |
$samples | Character vector with new names for the samples. |
$tree | Phylo object with the phylogenetic tree for the OTUs. |
$sequences | Named character vector of OTU reference sequences. |
$id , $comment | String with dataset's title or comment. |
$date | Date-like object, or "%Y-%m-%dT%H:%M:%SZ" string. |
Transformations
All functions return an rbiom object.
Function | Transformation |
<rbiom>$clone() | Safely duplicate an rbiom object. |
<rbiom>[ | Subset to a specific set of sample names. |
subset() | Subset samples according to metadata properties. |
slice() | Subset to a specific number of samples. |
mutate() | Create, modify, and delete metadata fields. |
rarefy() | Sub-sample OTU counts to an even sampling depth. |
Examples
library(rbiom)
# Duplicate the HMP50 example dataset.
biom <- hmp50$clone()
# Display an overall summary of the rbiom object.
biom
# Markdown syntax for comments is recommended.
biom$comment %>% cli::cli_text()
# Demonstrate a few accessors.
biom$n_samples
biom$fields
biom$metadata
# Edit the metadata table.
biom$metadata$rand <- sample(1:50)
biom %<>% mutate(Obese = BMI >= 30, Sex = NULL)
biom %<>% rename('Years Old' = "Age")
biom$metadata
# Subset the rbiom object
biom %<>% subset(`Body Site` == "Saliva" & !Obese)
biom$metadata
# Rarefy to an even sampling depth
sample_sums(biom)
biom %<>% rarefy()
sample_sums(biom)
Parse counts, metadata, taxonomy, and phylogeny from a BIOM file.
Description
Parse counts, metadata, taxonomy, and phylogeny from a BIOM file.
Usage
read_biom(src, ...)
Arguments
src |
Input data as either a file path, URL, or JSON string.
BIOM files can be formatted according to
version 1.0 (JSON) or 2.1 (HDF5)
specifications, or as
classical tabular format. URLs must begin with |
... |
Properties to set in the new rbiom object, for example,
|
Value
An rbiom object.
See Also
as_rbiom()
Examples
library(rbiom)
infile <- system.file("extdata", "hmp50.bz2", package = "rbiom")
biom <- read_biom(infile)
print(biom)
# Taxa Abundances
biom$counts[1:4,1:10] %>% as.matrix()
biom$taxonomy %>% head()
# Metadata
biom$metadata %>% head()
table(biom$metadata$Sex, biom$metadata$`Body Site`)
sprintf("Mean age: %.1f", mean(biom$metadata$Age))
# Phylogenetic tree
biom$tree %>%
tree_subset(1:10) %>%
plot()
Parse a fasta file into a named character vector.
Description
Parse a fasta file into a named character vector.
Usage
read_fasta(file, ids = NULL)
Arguments
file |
A file/URL with fasta-formatted sequences. Can optionally be compressed with gzip, bzip2, xz, or lzma. |
ids |
Character vector of IDs to retrieve. The default, |
Value
A named character vector in which names are the fasta headers and values are the sequences.
Read a newick formatted phylogenetic tree.
Description
A phylogenetic tree is required for computing UniFrac distance matrices. You can load a tree from a file or by providing the tree string directly. This tree must be in Newick format, also known as parenthetic format and New Hampshire format.
Usage
read_tree(src, underscores = FALSE)
Arguments
src |
Input data as either a file path, URL, or Newick string. Compressed (gzip or bzip2) files are also supported. |
underscores |
When parsing the tree, should underscores be kept as
is? By default they will be converted to spaces (unless the entire ID
is quoted). Default |
Value
A phylo
class object representing the tree.
See Also
Other phylogeny:
tree_subset()
Examples
library(rbiom)
infile <- system.file("extdata", "newick.tre", package = "rbiom")
tree <- read_tree(infile)
print(tree)
tree <- read_tree("
(A:0.99,((B:0.87,C:0.89):0.51,(((D:0.16,(E:0.83,F:0.96)
:0.94):0.69,(G:0.92,(H:0.62,I:0.85):0.54):0.23):0.74,J:0.1
2):0.43):0.67);")
plot(tree)
Objects exported from other packages
Description
These objects are imported from other packages. Follow the links below to see their documentation.
- dplyr
left_join
,mutate
,pull
,relocate
,rename
,select
,slice
,slice_head
,slice_max
,slice_min
,slice_sample
,slice_tail
- magrittr
- parallelly
- pillar
- plyr
- stats
Summarize the taxa observations in each sample.
Description
Summarize the taxa observations in each sample.
Usage
sample_sums(biom, rank = -1, sort = NULL, unc = "singly")
sample_apply(biom, FUN, rank = -1, sort = NULL, unc = "singly", ...)
Arguments
biom |
An rbiom object, such as from |
rank |
What rank(s) of taxa to display. E.g. |
sort |
Sort the result. Options: |
unc |
How to handle unclassified, uncultured, and similarly ambiguous taxa names. Options are:
Abbreviations are allowed. Default: |
FUN |
The function to apply to each column of |
... |
Optional arguments to |
Value
For sample_sums
, A named numeric vector of the number of
observations in each sample. For sample_apply
, a named vector or
list with the results of FUN
. The names are the taxa IDs.
See Also
Other samples:
pull.rbiom()
Other rarefaction:
rare_corrplot()
,
rare_multiplot()
,
rare_stacked()
,
rarefy()
,
rarefy_cols()
Other taxa_abundance:
taxa_boxplot()
,
taxa_clusters()
,
taxa_corrplot()
,
taxa_heatmap()
,
taxa_stacked()
,
taxa_stats()
,
taxa_sums()
,
taxa_table()
Examples
library(rbiom)
library(ggplot2)
sample_sums(hmp50, sort = 'asc') %>% head()
# Unique OTUs and "cultured" classes per sample
nnz <- function (x) sum(x > 0) # number of non-zeroes
sample_apply(hmp50, nnz, 'otu') %>% head()
sample_apply(hmp50, nnz, 'class', unc = 'drop') %>% head()
# Number of reads in each sample's most abundant family
sample_apply(hmp50, base::max, 'f', sort = 'desc') %>% head()
ggplot() + geom_histogram(aes(x=sample_sums(hmp50)), bins = 20)
Subset to a specific number of samples.
Description
Subset to a specific number of samples.
Usage
## S3 method for class 'rbiom'
slice(.data, ..., .by = NULL, .preserve = FALSE, clone = TRUE)
## S3 method for class 'rbiom'
slice_head(.data, n, prop, by = NULL, clone = TRUE, ...)
## S3 method for class 'rbiom'
slice_tail(.data, n, prop, by = NULL, clone = TRUE, ...)
## S3 method for class 'rbiom'
slice_min(
.data,
order_by,
n,
prop,
by = NULL,
with_ties = TRUE,
na_rm = FALSE,
clone = TRUE,
...
)
## S3 method for class 'rbiom'
slice_max(
.data,
order_by,
n,
prop,
by = NULL,
with_ties = TRUE,
na_rm = FALSE,
clone = TRUE,
...
)
## S3 method for class 'rbiom'
slice_sample(
.data,
n,
prop,
by = NULL,
weight_by = NULL,
replace = FALSE,
clone = TRUE,
...
)
Arguments
.data |
An rbiom object, such as from |
... |
For |
.by , by |
< |
.preserve |
Relevant when the |
clone |
Create a copy of |
n , prop |
Provide either A negative value of |
order_by |
< |
with_ties |
Should ties be kept together? The default, |
na_rm |
Should missing values in |
weight_by |
< |
replace |
Should sampling be performed with ( |
Value
An rbiom object.
See Also
Other transformations:
modify_metadata
,
rarefy()
,
rarefy_cols()
,
subset()
,
with()
Examples
library(rbiom)
# The last 3 samples in the metadata table.
biom <- slice_tail(hmp50, n = 3)
biom$metadata
# The 3 oldest subjects sampled.
biom <- slice_max(hmp50, Age, n = 3)
biom$metadata
# Pick 3 samples at random.
biom <- slice_sample(hmp50, n = 3)
biom$metadata
Speed Ups.
Description
When working with very large datasets, you can make use of these tips and tricks to speed up operations on rbiom objects.
Skip Cloning
Functions that modify rbiom objects, like subset()
and rarefy()
, will
automatically clone the object before modifying it. This is to make these
functions behave as most R users would expect - but at a performance trade
off.
Rather than:
biom <- subset(biom, ...) biom <- rarefy(biom)
Modify biom
in place like this:
subset(biom, clone = FALSE, ...) rarefy(biom, clone = FALSE) # Or: biom$metadata %<>% subset(...) biom$counts %<>% rarefy_cols()
Drop Components
Sequences
Reference sequences for OTUs will be imported along with the rest of your
dataset and stored in $sequences
. However, rbiom doesn't currently use
these sequences for anything (except writing them back out with
write_biom()
or write_fasta()
).
You can delete them from your rbiom object with:
biom$sequences <- NULL
Tree
The phylogenetic reference tree for OTUs is only used for calculating UniFrac distances. If you aren't using UniFrac, the tree can be dropped from the rbiom object with:
biom$tree <- NULL
Alternatively, you can store the tree separately from the rbiom object and provide it to just the functions that use it. For example:
tree <- biom$tree biom$tree <- NULL dm <- bdiv_distmat(biom, 'unifrac', tree = tree)
Increase Caching
Caching is enabled by default - up to 20 MB per R session.
For large datasets, increasing the cache size can help. The size is specified in bytes by an R option or environment variable.
options(rbiom.cache_size=200 * 1024 ^ 2) # 200 MB Sys.setenv(RBIOM_CACHE_SIZE=1024 ^ 3) # 1 GB
You can also specify a cache directory where results can be preserved from one R session to the next.
options(rbiom.cache_dir=tools::R_user_dir("rbiom", "cache")) Sys.setenv(RBIOM_CACHE_DIR="~/rbiom_cache")
Other quick notes about caching:
Setting the cache directory to
"FALSE"
will disable caching.R options will override environment variables.
The key hash algorithm can be set with
options(rbiom.cache_hash=rlang::hash)
.
Summary Layers
The figure-generating functions allow you to display every data point.
However, when you have thousands of data points, rendering every single one
can be slow. Instead, set the layers
parameter to use other options.
adiv_boxplot(biom, layers = "bl") # bar, linerange adiv_corrplot(biom, layers = "tc") # trend, confidence bdiv_ord_plot(biom, layers = "e") # ellipse
Visualize categorical metadata effects on numeric values.
Description
Visualize categorical metadata effects on numeric values.
Usage
stats_boxplot(
df,
x = NULL,
y = attr(df, "response"),
layers = "x",
stat.by = x,
facet.by = NULL,
colors = TRUE,
shapes = TRUE,
patterns = FALSE,
test = "auto",
flip = FALSE,
stripe = NULL,
ci = "ci",
level = 0.95,
p.adj = "fdr",
p.top = Inf,
outliers = NULL,
xlab.angle = "auto",
p.label = 0.05,
caption = TRUE,
...
)
Arguments
df |
The dataset (data.frame or tibble object). "Dataset fields"
mentioned below should match column names in |
x |
A categorical metadata column name to use for the x-axis. Or
|
y |
A numeric metadata column name to use for the y-axis.
Default: |
layers |
One or more of
|
stat.by |
Dataset field with the statistical groups. Must be
categorical. Default: |
facet.by |
Dataset field(s) to use for faceting. Must be categorical.
Default: |
colors |
How to color the groups. Options are:
See "Aesthetics" section below for additional information.
Default: |
shapes |
Shapes for each group.
Options are similar to |
patterns |
Patterns for each group.
Options are similar to |
test |
Method for computing p-values: |
flip |
Transpose the axes, so that taxa are present as rows instead
of columns. Default: |
stripe |
Shade every other x position. Default: same as flip |
ci |
How to calculate min/max of the crossbar,
errorbar, linerange, and pointrange layers.
Options are: |
level |
The confidence level for calculating a confidence interval.
Default: |
p.adj |
Method to use for multiple comparisons adjustment of
p-values. Run |
p.top |
Only display taxa with the most significant differences in
abundance. If |
outliers |
Show boxplot outliers? |
xlab.angle |
Angle of the labels at the bottom of the plot.
Options are |
p.label |
Minimum adjusted p-value to display on the plot with a bracket.
If a numeric vector with more than one value is
provided, they will be used as breaks for asterisk notation.
Default: |
caption |
Add methodology caption beneath the plot.
Default: |
... |
Additional parameters to pass along to ggplot2 functions.
Prefix a parameter name with a layer name to pass it to only that
layer. For instance, |
Value
A ggplot2
plot. The computed data points, ggplot2 command,
stats table, and stats table commands are available as $data
,
$code
, $stats
, and $stats$code
, respectively.
Aesthetics
All built-in color palettes are colorblind-friendly. The available
categorical palette names are: "okabe"
, "carto"
, "r4"
,
"polychrome"
, "tol"
, "bright"
, "light"
,
"muted"
, "vibrant"
, "tableau"
, "classic"
,
"alphabet"
, "tableau20"
, "kelly"
, and "fishy"
.
Patterns are added using the fillpattern R package. Options are "brick"
,
"chevron"
, "fish"
, "grid"
, "herringbone"
, "hexagon"
, "octagon"
,
"rain"
, "saw"
, "shingle"
, "rshingle"
, "stripe"
, and "wave"
,
optionally abbreviated and/or suffixed with modifiers. For example,
"hex10_sm"
for the hexagon pattern rotated 10 degrees and shrunk by 2x.
See fillpattern::fill_pattern()
for complete documentation of options.
Shapes can be given as per base R - numbers 0 through 17 for various shapes, or the decimal value of an ascii character, e.g. a-z = 65:90; A-Z = 97:122 to use letters instead of shapes on the plot. Character strings may used as well.
See Also
Other visualization:
adiv_boxplot()
,
adiv_corrplot()
,
bdiv_boxplot()
,
bdiv_corrplot()
,
bdiv_heatmap()
,
bdiv_ord_plot()
,
plot_heatmap()
,
rare_corrplot()
,
rare_multiplot()
,
rare_stacked()
,
stats_corrplot()
,
taxa_boxplot()
,
taxa_corrplot()
,
taxa_heatmap()
,
taxa_stacked()
Examples
library(rbiom)
df <- adiv_table(rarefy(hmp50))
stats_boxplot(df, x = "Body Site")
stats_boxplot(df, x = "Sex", stat.by = "Body Site", layers = "be")
Visualize regression with scatterplots and trendlines.
Description
Visualize regression with scatterplots and trendlines.
Usage
stats_corrplot(
df,
x,
y = attr(df, "response"),
layers = "tc",
stat.by = NULL,
facet.by = NULL,
colors = TRUE,
shapes = TRUE,
test = "emmeans",
fit = "gam",
at = NULL,
level = 0.95,
p.adj = "fdr",
p.top = Inf,
alt = "!=",
mu = 0,
caption = TRUE,
check = FALSE,
...
)
Arguments
df |
The dataset (data.frame or tibble object). "Dataset fields"
mentioned below should match column names in |
x |
Dataset field with the x-axis values. Equivalent to the |
y |
A numeric metadata column name to use for the y-axis.
Default: |
layers |
One or more of
|
stat.by |
Dataset field with the statistical groups. Must be
categorical. Default: |
facet.by |
Dataset field(s) to use for faceting. Must be categorical.
Default: |
colors |
How to color the groups. Options are:
See "Aesthetics" section below for additional information.
Default: |
shapes |
Shapes for each group.
Options are similar to |
test |
Method for computing p-values: |
fit |
How to fit the trendline. |
at |
Position(s) along the x-axis where the means or slopes should be
evaluated. Default: |
level |
The confidence level for calculating a confidence interval.
Default: |
p.adj |
Method to use for multiple comparisons adjustment of
p-values. Run |
p.top |
Only display taxa with the most significant differences in
abundance. If |
alt |
Alternative hypothesis direction. Options are |
mu |
Reference value to test against. Default: |
caption |
Add methodology caption beneath the plot.
Default: |
check |
Generate additional plots to aid in assessing data normality.
Default: |
... |
Additional parameters to pass along to ggplot2 functions.
Prefix a parameter name with a layer name to pass it to only that
layer. For instance, |
Value
A ggplot2
plot. The computed data points, ggplot2 command,
stats table, and stats table commands are available as $data
,
$code
, $stats
, and $stats$code
, respectively.
Aesthetics
All built-in color palettes are colorblind-friendly. The available
categorical palette names are: "okabe"
, "carto"
, "r4"
,
"polychrome"
, "tol"
, "bright"
, "light"
,
"muted"
, "vibrant"
, "tableau"
, "classic"
,
"alphabet"
, "tableau20"
, "kelly"
, and "fishy"
.
Shapes can be given as per base R - numbers 0 through 17 for various shapes, or the decimal value of an ascii character, e.g. a-z = 65:90; A-Z = 97:122 to use letters instead of shapes on the plot. Character strings may used as well.
See Also
Other visualization:
adiv_boxplot()
,
adiv_corrplot()
,
bdiv_boxplot()
,
bdiv_corrplot()
,
bdiv_heatmap()
,
bdiv_ord_plot()
,
plot_heatmap()
,
rare_corrplot()
,
rare_multiplot()
,
rare_stacked()
,
stats_boxplot()
,
taxa_boxplot()
,
taxa_corrplot()
,
taxa_heatmap()
,
taxa_stacked()
Examples
library(rbiom)
biom <- subset(hmp50, `Body Site` %in% c('Saliva', 'Stool'))
df <- adiv_table(rarefy(biom))
stats_corrplot(df, "age", stat.by = "body")
stats_corrplot(
df = df,
x = "Age",
stat.by = "Body Site",
facet.by = "Sex",
layers = "trend" )
Run non-parametric statistics on a data.frame.
Description
A simple interface to lower-level statistics functions, including
stats::wilcox.test()
, stats::kruskal.test()
, emmeans::emmeans()
,
and emmeans::emtrends()
.
Usage
stats_table(
df,
regr = NULL,
resp = attr(df, "response"),
stat.by = NULL,
split.by = NULL,
test = "emmeans",
fit = "gam",
at = NULL,
level = 0.95,
alt = "!=",
mu = 0,
p.adj = "fdr"
)
Arguments
df |
The dataset (data.frame or tibble object). "Dataset fields"
mentioned below should match column names in |
regr |
Dataset field with the x-axis (independent; predictive)
values. Must be numeric. Default: |
resp |
Dataset field with the y-axis (dependent; response) values,
such as taxa abundance or alpha diversity.
Default: |
stat.by |
Dataset field with the statistical groups. Must be
categorical. Default: |
split.by |
Dataset field(s) that the data should be split by prior to
any calculations. Must be categorical. Default: |
test |
Method for computing p-values: |
fit |
How to fit the trendline. |
at |
Position(s) along the x-axis where the means or slopes should be
evaluated. Default: |
level |
The confidence level for calculating a confidence interval.
Default: |
alt |
Alternative hypothesis direction. Options are |
mu |
Reference value to test against. Default: |
p.adj |
Method to use for multiple comparisons adjustment of
p-values. Run |
Value
A tibble data.frame with fields from the table below. This tibble
object provides the $code
operator to print the R code used to generate
the statistics.
Field | Description |
.mean | Estimated marginal mean. See emmeans::emmeans() . |
.mean.diff | Difference in means. |
.slope | Trendline slope. See emmeans::emtrends() . |
.slope.diff | Difference in slopes. |
.h1 | Alternate hypothesis. |
.p.val | Probability that null hypothesis is correct. |
.adj.p | .p.val after adjusting for multiple comparisons. |
.effect.size | Effect size. See emmeans::eff_size() . |
.lower | Confidence interval lower bound. |
.upper | Confidence interval upper bound. |
.se | Standard error. |
.n | Number of samples. |
.df | Degrees of freedom. |
.stat | Wilcoxon or Kruskal-Wallis rank sum statistic. |
.t.ratio | .mean / .se |
.r.sqr | Percent of variation explained by the model. |
.adj.r | .r.sqr , taking degrees of freedom into account. |
.aic | Akaike Information Criterion (predictive models). |
.bic | Bayesian Information Criterion (descriptive models). |
.loglik | Log-likelihood goodness-of-fit score. |
.fit.p | P-value for observing this fit by chance. |
See Also
Other stats_tables:
adiv_stats()
,
bdiv_stats()
,
distmat_stats()
,
taxa_stats()
Examples
library(rbiom)
biom <- rarefy(hmp50)
df <- taxa_table(biom, rank = "Family")
stats_table(df, stat.by = "Body Site")[,1:6]
df <- adiv_table(biom)
stats_table(df, stat.by = "Sex", split.by = "Body Site")[,1:7]
Subset an rbiom object by sample names, OTU names, metadata, or taxonomy.
Description
Dropping samples or OTUs will lead to observations being removed from the
OTU matrix (biom$counts
). OTUs and samples with zero observations are
automatically removed from the rbiom object.
Usage
## S3 method for class 'rbiom'
subset(x, subset, clone = TRUE, ...)
## S3 method for class 'rbiom'
x[i, j, ..., clone = TRUE, drop = FALSE]
## S3 method for class 'rbiom'
na.omit(object, fields = ".all", clone = TRUE, ...)
subset_taxa(x, subset, clone = TRUE, ...)
Arguments
x |
An rbiom object, such as from |
subset |
Logical expression for rows to keep. See |
clone |
Create a copy of |
... |
Not used. |
i , j |
The sample or OTU names to keep. Or a logical/integer vector
indicating which sample names from |
drop |
Not used |
object |
An rbiom object, such as from |
fields |
Which metadata field(s) to check for |
Value
An rbiom object.
See Also
Other transformations:
modify_metadata
,
rarefy()
,
rarefy_cols()
,
slice_metadata
,
with()
Examples
library(rbiom)
library(dplyr)
# Subset to specific samples
biom <- hmp50[c('HMP20', 'HMP42', 'HMP12')]
biom$metadata
# Subset to specific OTUs
biom <- hmp50[c('LtbAci52', 'UncO2012'),] # <- Trailing ,
biom$taxonomy
# Subset to specific samples and OTUs
biom <- hmp50[c('LtbAci52', 'UncO2012'), c('HMP20', 'HMP42', 'HMP12')]
as.matrix(biom)
# Subset samples according to metadata
biom <- subset(hmp50, `Body Site` %in% c('Saliva') & Age < 25)
biom$metadata
# Subset OTUs according to taxonomy
biom <- subset_taxa(hmp50, Phylum == 'Cyanobacteria')
biom$taxonomy
# Remove samples with NA metadata values
biom <- mutate(hmp50, BS2 = na_if(`Body Site`, 'Saliva'))
biom$metadata
biom <- na.omit(biom)
biom$metadata
Visualize BIOM data with boxplots.
Description
Visualize BIOM data with boxplots.
Usage
taxa_boxplot(
biom,
x = NULL,
rank = -1,
layers = "x",
taxa = 6,
unc = "singly",
other = FALSE,
p.top = Inf,
stat.by = x,
facet.by = NULL,
colors = TRUE,
shapes = TRUE,
patterns = FALSE,
flip = FALSE,
stripe = NULL,
ci = "ci",
level = 0.95,
p.adj = "fdr",
outliers = NULL,
xlab.angle = "auto",
p.label = 0.05,
transform = "none",
y.transform = "sqrt",
caption = TRUE,
...
)
Arguments
biom |
An rbiom object, such as from |
x |
A categorical metadata column name to use for the x-axis. Or
|
rank |
What rank(s) of taxa to display. E.g. |
layers |
One or more of
|
taxa |
Which taxa to display. An integer value will show the top n
most abundant taxa. A value 0 <= n < 1 will show any taxa with that
mean abundance or greater (e.g. |
unc |
How to handle unclassified, uncultured, and similarly ambiguous taxa names. Options are:
Abbreviations are allowed. Default: |
other |
Sum all non-itemized taxa into an "Other" taxa. When
|
p.top |
Only display taxa with the most significant differences in
abundance. If |
stat.by |
Dataset field with the statistical groups. Must be
categorical. Default: |
facet.by |
Dataset field(s) to use for faceting. Must be categorical.
Default: |
colors |
How to color the groups. Options are:
See "Aesthetics" section below for additional information.
Default: |
shapes |
Shapes for each group.
Options are similar to |
patterns |
Patterns for each group.
Options are similar to |
flip |
Transpose the axes, so that taxa are present as rows instead
of columns. Default: |
stripe |
Shade every other x position. Default: same as flip |
ci |
How to calculate min/max of the crossbar,
errorbar, linerange, and pointrange layers.
Options are: |
level |
The confidence level for calculating a confidence interval.
Default: |
p.adj |
Method to use for multiple comparisons adjustment of
p-values. Run |
outliers |
Show boxplot outliers? |
xlab.angle |
Angle of the labels at the bottom of the plot.
Options are |
p.label |
Minimum adjusted p-value to display on the plot with a bracket.
If a numeric vector with more than one value is
provided, they will be used as breaks for asterisk notation.
Default: |
transform |
Transformation to apply. Options are:
|
y.transform |
The transformation to apply to the y-axis. Visualizing differences of both high- and low-abundance taxa is best done with a non-linear axis. Options are:
These methods allow visualization of both high- and low-abundance
taxa simultaneously, without complaint about 'zero' count
observations. Default: |
caption |
Add methodology caption beneath the plot.
Default: |
... |
Additional parameters to pass along to ggplot2 functions.
Prefix a parameter name with a layer name to pass it to only that
layer. For instance, |
Value
A ggplot2
plot. The computed data points, ggplot2 command,
stats table, and stats table commands are available as $data
,
$code
, $stats
, and $stats$code
, respectively.
Aesthetics
All built-in color palettes are colorblind-friendly. The available
categorical palette names are: "okabe"
, "carto"
, "r4"
,
"polychrome"
, "tol"
, "bright"
, "light"
,
"muted"
, "vibrant"
, "tableau"
, "classic"
,
"alphabet"
, "tableau20"
, "kelly"
, and "fishy"
.
Patterns are added using the fillpattern R package. Options are "brick"
,
"chevron"
, "fish"
, "grid"
, "herringbone"
, "hexagon"
, "octagon"
,
"rain"
, "saw"
, "shingle"
, "rshingle"
, "stripe"
, and "wave"
,
optionally abbreviated and/or suffixed with modifiers. For example,
"hex10_sm"
for the hexagon pattern rotated 10 degrees and shrunk by 2x.
See fillpattern::fill_pattern()
for complete documentation of options.
Shapes can be given as per base R - numbers 0 through 17 for various shapes, or the decimal value of an ascii character, e.g. a-z = 65:90; A-Z = 97:122 to use letters instead of shapes on the plot. Character strings may used as well.
See Also
Other taxa_abundance:
sample_sums()
,
taxa_clusters()
,
taxa_corrplot()
,
taxa_heatmap()
,
taxa_stacked()
,
taxa_stats()
,
taxa_sums()
,
taxa_table()
Other visualization:
adiv_boxplot()
,
adiv_corrplot()
,
bdiv_boxplot()
,
bdiv_corrplot()
,
bdiv_heatmap()
,
bdiv_ord_plot()
,
plot_heatmap()
,
rare_corrplot()
,
rare_multiplot()
,
rare_stacked()
,
stats_boxplot()
,
stats_corrplot()
,
taxa_corrplot()
,
taxa_heatmap()
,
taxa_stacked()
Examples
library(rbiom)
biom <- rarefy(hmp50)
taxa_boxplot(biom, stat.by = "Body Site", stripe = TRUE)
taxa_boxplot(biom, layers = "bed", rank = c("Phylum", "Genus"), flip = TRUE)
taxa_boxplot(
biom = subset(biom, `Body Site` %in% c('Saliva', 'Stool')),
taxa = 3,
layers = "ps",
stat.by = "Body Site",
colors = c('Saliva' = "blue", 'Stool' = "red") )
Cluster samples by taxa abundances k-means.
Description
Cluster samples by taxa abundances k-means.
Usage
taxa_clusters(biom, rank = ".otu", k = 5, ...)
Arguments
biom |
An rbiom object, such as from |
rank |
Which taxa rank to use. E.g. |
k |
Number of clusters. Default: |
... |
Passed on to |
Value
A numeric factor assigning samples to clusters.
See Also
Other taxa_abundance:
sample_sums()
,
taxa_boxplot()
,
taxa_corrplot()
,
taxa_heatmap()
,
taxa_stacked()
,
taxa_stats()
,
taxa_sums()
,
taxa_table()
Other clustering:
bdiv_clusters()
Examples
library(rbiom)
biom <- rarefy(hmp50)
biom$metadata$otu_cluster <- taxa_clusters(biom)
pull(biom, 'otu_cluster')[1:10]
bdiv_ord_plot(biom, layers = "p", stat.by = "otu_cluster")
Visualize taxa abundance with scatterplots and trendlines.
Description
Visualize taxa abundance with scatterplots and trendlines.
Usage
taxa_corrplot(
biom,
x,
rank = -1,
layers = "tc",
taxa = 6,
lineage = FALSE,
unc = "singly",
other = FALSE,
stat.by = NULL,
facet.by = NULL,
colors = TRUE,
shapes = TRUE,
test = "emmeans",
fit = "gam",
at = NULL,
level = 0.95,
p.adj = "fdr",
transform = "none",
ties = "random",
seed = 0,
alt = "!=",
mu = 0,
caption = TRUE,
check = FALSE,
...
)
Arguments
biom |
An rbiom object, such as from |
x |
Dataset field with the x-axis values. Equivalent to the |
rank |
What rank(s) of taxa to display. E.g. |
layers |
One or more of
|
taxa |
Which taxa to display. An integer value will show the top n
most abundant taxa. A value 0 <= n < 1 will show any taxa with that
mean abundance or greater (e.g. |
lineage |
Include all ranks in the name of the taxa. For instance,
setting to |
unc |
How to handle unclassified, uncultured, and similarly ambiguous taxa names. Options are:
Abbreviations are allowed. Default: |
other |
Sum all non-itemized taxa into an "Other" taxa. When
|
stat.by |
Dataset field with the statistical groups. Must be
categorical. Default: |
facet.by |
Dataset field(s) to use for faceting. Must be categorical.
Default: |
colors |
How to color the groups. Options are:
See "Aesthetics" section below for additional information.
Default: |
shapes |
Shapes for each group.
Options are similar to |
test |
Method for computing p-values: |
fit |
How to fit the trendline. |
at |
Position(s) along the x-axis where the means or slopes should be
evaluated. Default: |
level |
The confidence level for calculating a confidence interval.
Default: |
p.adj |
Method to use for multiple comparisons adjustment of
p-values. Run |
transform |
Transformation to apply. Options are:
|
ties |
When |
seed |
Random seed for permutations. Must be a non-negative integer.
Default: |
alt |
Alternative hypothesis direction. Options are |
mu |
Reference value to test against. Default: |
caption |
Add methodology caption beneath the plot.
Default: |
check |
Generate additional plots to aid in assessing data normality.
Default: |
... |
Additional parameters to pass along to ggplot2 functions.
Prefix a parameter name with a layer name to pass it to only that
layer. For instance, |
Value
A ggplot2
plot. The computed data points, ggplot2 command,
stats table, and stats table commands are available as $data
,
$code
, $stats
, and $stats$code
, respectively.
Aesthetics
All built-in color palettes are colorblind-friendly. The available
categorical palette names are: "okabe"
, "carto"
, "r4"
,
"polychrome"
, "tol"
, "bright"
, "light"
,
"muted"
, "vibrant"
, "tableau"
, "classic"
,
"alphabet"
, "tableau20"
, "kelly"
, and "fishy"
.
Shapes can be given as per base R - numbers 0 through 17 for various shapes, or the decimal value of an ascii character, e.g. a-z = 65:90; A-Z = 97:122 to use letters instead of shapes on the plot. Character strings may used as well.
See Also
Other taxa_abundance:
sample_sums()
,
taxa_boxplot()
,
taxa_clusters()
,
taxa_heatmap()
,
taxa_stacked()
,
taxa_stats()
,
taxa_sums()
,
taxa_table()
Other visualization:
adiv_boxplot()
,
adiv_corrplot()
,
bdiv_boxplot()
,
bdiv_corrplot()
,
bdiv_heatmap()
,
bdiv_ord_plot()
,
plot_heatmap()
,
rare_corrplot()
,
rare_multiplot()
,
rare_stacked()
,
stats_boxplot()
,
stats_corrplot()
,
taxa_boxplot()
,
taxa_heatmap()
,
taxa_stacked()
Examples
library(rbiom)
biom <- rarefy(subset(hmp50, `Body Site` %in% c('Buccal mucosa', 'Saliva')))
taxa_corrplot(biom, x = "BMI", stat.by = "Body Site", taxa = 'Streptococcus')
Display taxa abundances as a heatmap.
Description
Display taxa abundances as a heatmap.
Usage
taxa_heatmap(
biom,
rank = -1,
taxa = 6,
tracks = NULL,
grid = "bilbao",
other = FALSE,
unc = "singly",
lineage = FALSE,
label = TRUE,
label_size = NULL,
rescale = "none",
trees = TRUE,
clust = "complete",
dist = "euclidean",
asp = 1,
tree_height = 10,
track_height = 10,
legend = "right",
title = TRUE,
xlab.angle = "auto",
...
)
Arguments
biom |
An rbiom object, such as from |
rank |
What rank(s) of taxa to display. E.g. |
taxa |
Which taxa to display. An integer value will show the top n
most abundant taxa. A value 0 <= n < 1 will show any taxa with that
mean abundance or greater (e.g. |
tracks |
A character vector of metadata fields to display as tracks
at the top of the plot. Or, a list as expected by the |
grid |
Color palette name, or a list as expected |
other |
Sum all non-itemized taxa into an "Other" taxa. When
|
unc |
How to handle unclassified, uncultured, and similarly ambiguous taxa names. Options are:
Abbreviations are allowed. Default: |
lineage |
Include all ranks in the name of the taxa. For instance,
setting to |
label |
Label the matrix rows and columns. You can supply a list
or logical vector of length two to control row labels and column
labels separately, for example
|
label_size |
The font size to use for the row and column labels. You
can supply a numeric vector of length two to control row label sizes
and column label sizes separately, for example
|
rescale |
Rescale rows or columns to all have a common min/max.
Options: |
trees |
Draw a dendrogram for rows (left) and columns (top). You can
supply a list or logical vector of length two to control the row tree
and column tree separately, for example
|
clust |
Clustering algorithm for reordering the rows and columns by
similarity. You can supply a list or character vector of length two to
control the row and column clustering separately, for example
Default: |
dist |
Distance algorithm to use when reordering the rows and columns
by similarity. You can supply a list or character vector of length
two to control the row and column clustering separately, for example
Default: |
asp |
Aspect ratio (height/width) for entire grid.
Default: |
tree_height , track_height |
The height of the dendrogram or annotation
tracks as a percentage of the overall grid size. Use a numeric vector
of length two to assign |
legend |
Where to place the legend. Options are: |
title |
Plot title. Set to |
xlab.angle |
Angle of the labels at the bottom of the plot.
Options are |
... |
Additional arguments to pass on to ggplot2::theme(). |
Value
A ggplot2
plot. The computed data points and ggplot
command are available as $data
and $code
,
respectively.
Annotation Tracks
Metadata can be displayed as colored tracks above the heatmap. Common use cases are provided below, with more thorough documentation available at https://cmmr.github.io/rbiom .
## Categorical ---------------------------- tracks = "Body Site" tracks = list('Body Site' = "bright") tracks = list('Body Site' = c('Stool' = "blue", 'Saliva' = "green")) ## Numeric -------------------------------- tracks = "Age" tracks = list('Age' = "reds") ## Multiple Tracks ------------------------ tracks = c("Body Site", "Age") tracks = list('Body Site' = "bright", 'Age' = "reds") tracks = list( 'Body Site' = c('Stool' = "blue", 'Saliva' = "green"), 'Age' = list('colors' = "reds") )
The following entries in the track definitions are understood:
colors
-A pre-defined palette name or custom set of colors to map to.
range
-The c(min,max) to use for scale values.
label
-Label for this track. Defaults to the name of this list element.
side
-Options are
"top"
(default) or"left"
.na.color
-The color to use for
NA
values.bins
-Bin a gradient into this many bins/steps.
guide
-A list of arguments for guide_colorbar() or guide_legend().
All built-in color palettes are colorblind-friendly.
Categorical palette names: "okabe"
, "carto"
, "r4"
,
"polychrome"
, "tol"
, "bright"
, "light"
,
"muted"
, "vibrant"
, "tableau"
, "classic"
,
"alphabet"
, "tableau20"
, "kelly"
, and "fishy"
.
Numeric palette names: "reds"
, "oranges"
, "greens"
,
"purples"
, "grays"
, "acton"
, "bamako"
,
"batlow"
, "bilbao"
, "buda"
, "davos"
,
"devon"
, "grayC"
, "hawaii"
, "imola"
,
"lajolla"
, "lapaz"
, "nuuk"
, "oslo"
,
"tokyo"
, "turku"
, "bam"
, "berlin"
,
"broc"
, "cork"
, "lisbon"
, "roma"
,
"tofino"
, "vanimo"
, and "vik"
.
See Also
Other taxa_abundance:
sample_sums()
,
taxa_boxplot()
,
taxa_clusters()
,
taxa_corrplot()
,
taxa_stacked()
,
taxa_stats()
,
taxa_sums()
,
taxa_table()
Other visualization:
adiv_boxplot()
,
adiv_corrplot()
,
bdiv_boxplot()
,
bdiv_corrplot()
,
bdiv_heatmap()
,
bdiv_ord_plot()
,
plot_heatmap()
,
rare_corrplot()
,
rare_multiplot()
,
rare_stacked()
,
stats_boxplot()
,
stats_corrplot()
,
taxa_boxplot()
,
taxa_corrplot()
,
taxa_stacked()
Examples
library(rbiom)
# Keep and rarefy the 10 most deeply sequenced samples.
hmp10 <- rarefy(hmp50, n = 10)
taxa_heatmap(hmp10, rank = "Phylum", tracks = "Body Site")
taxa_heatmap(hmp10, rank = "Genus", tracks = c("sex", "bo"))
taxa_heatmap(hmp10, rank = "Phylum", tracks = list(
'Sex' = list(colors = c(m = "#0000FF", f = "violetred")),
'Body Site' = list(colors = "muted", label = "Source") ))
Map OTUs names to taxa names at a given rank.
Description
Map OTUs names to taxa names at a given rank.
Usage
taxa_map(
biom,
rank = NULL,
taxa = Inf,
unc = "singly",
lineage = FALSE,
other = FALSE
)
Arguments
biom |
An rbiom object, such as from |
rank |
When |
taxa |
Which taxa to display. An integer value will show the top n
most abundant taxa. A value 0 <= n < 1 will show any taxa with that
mean abundance or greater (e.g. |
unc |
How to handle unclassified, uncultured, and similarly ambiguous taxa names. Options are:
Abbreviations are allowed. Default: |
lineage |
Include all ranks in the name of the taxa. For instance,
setting to |
other |
Sum all non-itemized taxa into an "Other" taxa. When
|
Value
A tibble data.frame when rank=NULL
, or a character vector named
with the OTU names.
See Also
pull.rbiom()
Examples
library(rbiom)
library(dplyr, warn.conflicts = FALSE)
# In $taxonomy, .otu is the first column (like a row identifier) -----
hmp50$taxonomy %>% head(4)
# In taxa_map, .otu is the last column (most precise rank) -----------
taxa_map(hmp50) %>% head(4)
# Generate an OTU to Genus mapping -----------------------------------
taxa_map(hmp50, "Genus") %>% head(4)
# Sometimes taxonomic names are incomplete ----------------------------
otus <- c('GemAsacc', 'GcbBacte', 'Unc58411')
taxa_map(hmp50, unc = "asis") %>% filter(.otu %in% otus) %>% select(Phylum:.otu)
# rbiom can replace them with unique placeholders ---------------------
taxa_map(hmp50, unc = "singly") %>% filter(.otu %in% otus) %>% select(Class:.otu)
# Or collapse them into groups ----------------------------------------
taxa_map(hmp50, unc = "grouped") %>% filter(.otu %in% otus) %>% select(Class:Genus)
Display taxa abundances as a stacked bar graph.
Description
Display taxa abundances as a stacked bar graph.
Usage
taxa_stacked(
biom,
rank = -1,
taxa = 6,
colors = TRUE,
patterns = FALSE,
label.by = NULL,
order.by = NULL,
facet.by = NULL,
dist = "euclidean",
clust = "complete",
other = TRUE,
unc = "singly",
lineage = FALSE,
xlab.angle = 90,
...
)
Arguments
biom |
An rbiom object, such as from |
rank |
What rank(s) of taxa to display. E.g. |
taxa |
Which taxa to display. An integer value will show the top n
most abundant taxa. A value 0 <= n < 1 will show any taxa with that
mean abundance or greater (e.g. |
colors , patterns |
A character vector of colors or patterns to use in
the graph. A named character vector can be used to map taxon names to
specific colors or patterns. Set to |
label.by , order.by |
What metadata column to use for labeling and/or
sorting the samples across the x-axis. Set |
facet.by |
Dataset field(s) to use for faceting. Must be categorical.
Default: |
dist , clust |
Distance ( |
other |
Sum all non-itemized taxa into an "Other" taxa. When
|
unc |
How to handle unclassified, uncultured, and similarly ambiguous taxa names. Options are:
Abbreviations are allowed. Default: |
lineage |
Include all ranks in the name of the taxa. For instance,
setting to |
xlab.angle |
Angle of the labels at the bottom of the plot.
Options are |
... |
Parameters for underlying functions. Prefixing parameter names with a layer name ensures that a particular parameter is passed to, and only to, that layer. |
Value
A ggplot2
plot. The computed data points and ggplot
command are available as $data
and $code
,
respectively.
See Also
Other taxa_abundance:
sample_sums()
,
taxa_boxplot()
,
taxa_clusters()
,
taxa_corrplot()
,
taxa_heatmap()
,
taxa_stats()
,
taxa_sums()
,
taxa_table()
Other visualization:
adiv_boxplot()
,
adiv_corrplot()
,
bdiv_boxplot()
,
bdiv_corrplot()
,
bdiv_heatmap()
,
bdiv_ord_plot()
,
plot_heatmap()
,
rare_corrplot()
,
rare_multiplot()
,
rare_stacked()
,
stats_boxplot()
,
stats_corrplot()
,
taxa_boxplot()
,
taxa_corrplot()
,
taxa_heatmap()
Examples
library(rbiom)
biom <- rarefy(hmp50)
taxa_stacked(biom, rank="Phylum")
taxa_stacked(biom, rank = "genus", facet.by = "body site")
Test taxa abundances for associations with metadata.
Description
A convenience wrapper for taxa_table()
+ stats_table()
.
Usage
taxa_stats(
biom,
regr = NULL,
stat.by = NULL,
rank = -1,
taxa = 6,
lineage = FALSE,
unc = "singly",
other = FALSE,
split.by = NULL,
transform = "none",
test = "emmeans",
fit = "gam",
at = NULL,
level = 0.95,
alt = "!=",
mu = 0,
p.adj = "fdr"
)
Arguments
biom |
An rbiom object, such as from |
regr |
Dataset field with the x-axis (independent; predictive)
values. Must be numeric. Default: |
stat.by |
Dataset field with the statistical groups. Must be
categorical. Default: |
rank |
What rank(s) of taxa to display. E.g. |
taxa |
Which taxa to display. An integer value will show the top n
most abundant taxa. A value 0 <= n < 1 will show any taxa with that
mean abundance or greater (e.g. |
lineage |
Include all ranks in the name of the taxa. For instance,
setting to |
unc |
How to handle unclassified, uncultured, and similarly ambiguous taxa names. Options are:
Abbreviations are allowed. Default: |
other |
Sum all non-itemized taxa into an "Other" taxa. When
|
split.by |
Dataset field(s) that the data should be split by prior to
any calculations. Must be categorical. Default: |
transform |
Transformation to apply. Options are:
|
test |
Method for computing p-values: |
fit |
How to fit the trendline. |
at |
Position(s) along the x-axis where the means or slopes should be
evaluated. Default: |
level |
The confidence level for calculating a confidence interval.
Default: |
alt |
Alternative hypothesis direction. Options are |
mu |
Reference value to test against. Default: |
p.adj |
Method to use for multiple comparisons adjustment of
p-values. Run |
Value
A tibble data.frame with fields from the table below. This tibble
object provides the $code
operator to print the R code used to generate
the statistics.
Field | Description |
.mean | Estimated marginal mean. See emmeans::emmeans() . |
.mean.diff | Difference in means. |
.slope | Trendline slope. See emmeans::emtrends() . |
.slope.diff | Difference in slopes. |
.h1 | Alternate hypothesis. |
.p.val | Probability that null hypothesis is correct. |
.adj.p | .p.val after adjusting for multiple comparisons. |
.effect.size | Effect size. See emmeans::eff_size() . |
.lower | Confidence interval lower bound. |
.upper | Confidence interval upper bound. |
.se | Standard error. |
.n | Number of samples. |
.df | Degrees of freedom. |
.stat | Wilcoxon or Kruskal-Wallis rank sum statistic. |
.t.ratio | .mean / .se |
.r.sqr | Percent of variation explained by the model. |
.adj.r | .r.sqr , taking degrees of freedom into account. |
.aic | Akaike Information Criterion (predictive models). |
.bic | Bayesian Information Criterion (descriptive models). |
.loglik | Log-likelihood goodness-of-fit score. |
.fit.p | P-value for observing this fit by chance. |
See Also
Other taxa_abundance:
sample_sums()
,
taxa_boxplot()
,
taxa_clusters()
,
taxa_corrplot()
,
taxa_heatmap()
,
taxa_stacked()
,
taxa_sums()
,
taxa_table()
Other stats_tables:
adiv_stats()
,
bdiv_stats()
,
distmat_stats()
,
stats_table()
Examples
library(rbiom)
biom <- rarefy(hmp50)
taxa_stats(biom, stat.by = "Body Site", rank = "Family")[,1:6]
Get summary taxa abundances.
Description
Get summary taxa abundances.
Usage
taxa_sums(
biom,
rank = -1,
sort = NULL,
lineage = FALSE,
unc = "singly",
transform = "none"
)
taxa_means(
biom,
rank = -1,
sort = NULL,
lineage = FALSE,
unc = "singly",
transform = "none"
)
taxa_apply(
biom,
FUN,
rank = -1,
sort = NULL,
lineage = FALSE,
unc = "singly",
transform = "none",
...
)
Arguments
biom |
An rbiom object, such as from |
rank |
What rank(s) of taxa to display. E.g. |
sort |
Sort the result. Options: |
lineage |
Include all ranks in the name of the taxa. For instance,
setting to |
unc |
How to handle unclassified, uncultured, and similarly ambiguous taxa names. Options are:
Abbreviations are allowed. Default: |
transform |
Transformation to apply. Options are:
|
FUN |
The function to apply to each row of the |
... |
Optional arguments to |
Value
For taxa_sums
and taxa_means
, a named numeric vector.
For taxa_apply
, a named vector or list with the results of FUN
.
The names are the taxa IDs.
See Also
Other taxa_abundance:
sample_sums()
,
taxa_boxplot()
,
taxa_clusters()
,
taxa_corrplot()
,
taxa_heatmap()
,
taxa_stacked()
,
taxa_stats()
,
taxa_table()
Examples
library(rbiom)
taxa_sums(hmp50) %>% head(4)
taxa_means(hmp50, 'Family') %>% head(5)
taxa_apply(hmp50, max) %>% head(5)
taxa_apply(hmp50, fivenum) %>% head(5)
Taxa abundances per sample.
Description
taxa_matrix()
-Accepts a single
rank
and returns a matrix.taxa_table()
-Can accept more than one
rank
and returns a tibble data.frame.
Usage
taxa_table(
biom,
rank = -1,
taxa = 6,
lineage = FALSE,
md = ".all",
unc = "singly",
other = FALSE,
transform = "none",
ties = "random",
seed = 0
)
taxa_matrix(
biom,
rank = -1,
taxa = NULL,
lineage = FALSE,
sparse = FALSE,
unc = "singly",
other = FALSE,
transform = "none",
ties = "random",
seed = 0
)
Arguments
biom |
An rbiom object, such as from |
rank |
What rank(s) of taxa to display. E.g. |
taxa |
Which taxa to display. An integer value will show the top n
most abundant taxa. A value 0 <= n < 1 will show any taxa with that
mean abundance or greater (e.g. |
lineage |
Include all ranks in the name of the taxa. For instance,
setting to |
md |
Dataset field(s) to include in the output data frame, or |
unc |
How to handle unclassified, uncultured, and similarly ambiguous taxa names. Options are:
Abbreviations are allowed. Default: |
other |
Sum all non-itemized taxa into an "Other" taxa. When
|
transform |
Transformation to apply. Options are:
|
ties |
When |
seed |
Random seed for permutations. Must be a non-negative integer.
Default: |
sparse |
If |
Value
taxa_matrix()
--
A numeric matrix with taxa as rows, and samples as columns.
taxa_table()
--
A tibble data frame with column names .sample, .taxa, .abundance, and any requested by
md
.
See Also
Other taxa_abundance:
sample_sums()
,
taxa_boxplot()
,
taxa_clusters()
,
taxa_corrplot()
,
taxa_heatmap()
,
taxa_stacked()
,
taxa_stats()
,
taxa_sums()
Examples
library(rbiom)
hmp50$ranks
taxa_matrix(hmp50, 'Phylum')[1:4,1:6]
taxa_table(hmp50, 'Phylum')
Create a subtree by specifying tips to keep.
Description
Create a subtree by specifying tips to keep.
Usage
tree_subset(tree, tips, underscores = FALSE)
Arguments
tree |
A phylo object, as returned from |
tips |
A character, numeric, or logical vector of tips to keep. |
underscores |
When parsing the tree, should underscores be kept as
is? By default they will be converted to spaces (unless the entire ID
is quoted). Default |
Value
A phylo
object for the subtree.
See Also
Other phylogeny:
read_tree()
Examples
library(rbiom)
infile <- system.file("extdata", "newick.tre", package = "rbiom")
tree <- read_tree(infile)
tree
subtree <- tree_subset(tree, tips = head(tree$tip.label))
subtree
Evaluate expressions on metadata.
Description
with()
will return the result of your expression. within()
will return
an rbiom object.
Usage
## S3 method for class 'rbiom'
with(data, expr, ...)
## S3 method for class 'rbiom'
within(data, expr, clone = TRUE, ...)
Arguments
data |
An rbiom object, such as from |
expr |
Passed on to |
... |
Not used. |
clone |
Create a copy of |
Value
See description.
See Also
Other transformations:
modify_metadata
,
rarefy()
,
rarefy_cols()
,
slice_metadata
,
subset()
Examples
library(rbiom)
with(hmp50, table(`Body Site`, Sex))
biom <- within(hmp50, {
age_bin = cut(Age, 5)
bmi_bin = cut(BMI, 5)
})
biom$metadata
Save an rbiom object to a file.
Description
Automatically creates directories and adds compression based on file name.
write_biom()
-According to BIOM format specification.
write_xlsx()
-Raw data and summary tables in Excel file format. See details.
write_fasta()
-Sequences only in fasta format.
biom
may also be a named character vector.write_tree()
-Phylogenetic tree only in newick format.
biom
may also be a phylo object.write_counts()
,write_metadata()
,write_taxonomy()
-Tab-separated values.
Usage
write_biom(biom, file, format = "json")
write_metadata(biom, file, quote = FALSE, sep = "\t", ...)
write_counts(biom, file, quote = FALSE, sep = "\t", ...)
write_taxonomy(biom, file, quote = FALSE, sep = "\t", ...)
write_fasta(biom, file = NULL)
write_tree(biom, file = NULL)
write_xlsx(biom, file, depth = 0.1, n = NULL, seed = 0, unc = "singly")
Arguments
biom |
An rbiom object, such as from |
file |
Path to the output file. File names ending in |
format |
Options are |
quote , sep , ... |
Parameters passed on to |
depth , n |
Passed on to |
seed |
Random seed to use in rarefying. See |
unc |
How to handle unclassified, uncultured, and similarly ambiguous taxa names. Options are:
Abbreviations are allowed. Default: |
Details
For write_xlsx()
, attributes(biom)
are saved as additional worksheets if
the attribute is a data frame, matrix, or dist -class object. An attribute
named 'Reads Per Step' is treated specially and merged with the usual 'Reads
Per Sample' tab.
Value
The normalized filepath that was written to (invisibly), unless
file=NULL
(see file
argument above).
Examples
library(rbiom)
write_tree(hmp50) %>% substr(1, 50)
if (FALSE) {
hmp10 <- hmp50$clone()
hmp10$counts <- hmp10$counts[,1:10] %>% rarefy_cols()
attr(hmp10, "Weighted UniFrac") <- bdiv_distmat(hmp10, 'unifrac')
attr(hmp10, "Unweighted Jaccard") <- bdiv_distmat(hmp10, 'jaccard', weighted=FALSE)
outfile <- write_xlsx(hmp10, tempfile(fileext = ".xlsx"))
}