The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
Maintainer: | Karel Hron, Javier Palarea-Albaladejo, Matthias Templ, Alessandra Menafoglio |
Contact: | karel.hron at upol.cz |
Version: | 2025-03-25 |
URL: | https://CRAN.R-project.org/view=CompositionalData |
Source: | https://github.com/cran-task-views/CompositionalData/ |
Contributions: | Suggestions and improvements for this task view are very welcome and can be made through issues or pull requests on GitHub or via e-mail to the maintainer address. For further details see the Contributing guide. |
Citation: | Karel Hron, Javier Palarea-Albaladejo, Matthias Templ, Alessandra Menafoglio (2025). CRAN Task View: Compositional Data Analysis. Version 2025-03-25. URL https://CRAN.R-project.org/view=CompositionalData. |
Installation: | The packages from this task view can be installed automatically using the ctv package. For example, ctv::install.views("CompositionalData", coreOnly = TRUE) installs all the core packages or ctv::update.views("CompositionalData") installs all packages that are not yet installed and up-to-date. See the CRAN Task View Initiative for more details. |
In general, compositional data refers to multivariate, positive and scale-invariant data that convey relative information. Although not necessarily, they are often closed or normalized to be expressed in proportions adding up to 1, percentages adding to 100, or the like; but the scale-invariance property implies that the normalization constant used is actually irrelevant. That is, compositional methods are applicable whenever the researcher recognizes that the relevant information in the data is relative and, thus, there is an intrinsic interdependence between the parts that make up the composition. These particularities are not considered by ordinary statistical methods, which are generally designed for unconstrained real-valued data.
This task view provides a curated collection of R packages to support compositional data analysis within the log-ratio coordinate framework. The main goal is serving as a guide to practitioners interested in applying such methods. The packages can be broadly categorized into the following topics, although many provide functionalities spanning multiple categories, as detailed below.
Table of Contents
This section refers to packages that provide a general platform for compositional data analysis in R, implementing functions to conduct basic operations and calculations, log-ratio representations, data visualization and some common statistical analyses. They typically accompany a published monograph and provide an environment for analysis that is compatible with the basic properties of compositional data for those approaching the methodology from diverse domains.
acomp
class, or multivariate positive data, aplus
class), the package operates through functions for their consistent analysis and modeling; including descriptive statistics, visualization, statistical testing, and multivariate analysis (e.g. principal component analysis, clustering, MANOVA and regression). It also implements some geostatistical tools such a variogram for compositions and compositional ordinary kriging. This package is linked to the monograph Van den Boogaart and Tolosana-Delgado (2013) and supports the analyses and examples therein.Compositional data, like ordinary datasets, often face challenges that complicate statistical analysis. A key issue in the log-ratio approach is handling zeros without distorting data properties. Three types are distinguished: rounded zeros (small values rounded or below detection limits), count zeros (from discrete counting processes), and essential zeros (true absences in the composition). Rounded zeros, akin to left-censored data, have received the most attention in the literature.
Moreover, analogously to non-compositional statistical methods, the presence of either missing values or outliers poses practical challenges. Again, coherent handling is required for consistent data analysis within the compositional framework.
The following are specialized packages focused on addressing these issues while respecting the compositional nature of the data.
Visualization is a critical component of compositional data analysis, allowing researchers to explore patterns, relationships, and distributions within the constrained simplicial geometry.
On top of the functionality provided with the general purpose packages cited above, this section compiles specialized tools for producing ternary plots, compositional biplots, or pairwise log-ratio plots, among others.
ggplot2
; supporting both standard and additional geometries with a high level of customization.Compositional tables (i.e., ordinary contingency tables in their discrete version) represent frequencies or proportions structured across multiple categories. The compositional nature of these tables, often constrained by row or column sums, requires specialized methods to analyze relationships, dependencies, and patterns while respecting their relative nature. This section refers to tools for their analysis, including log-ratio representation and selected multivariate methods.
Probability density functions are essentially scale invariant data objects, usually subject to a unit integral constraint. Therefore, they can be considered as infinite dimensional compositional data and embedded in a Hilbert space, so-called a Bayes space (see Van den Boogaart, Egozcue and Pawlowsky-Glahn (2014) for details).
The packages listed below implement methods for density data analysis from this perspective. Unlike the methods in the FunctionalData task view, here it is assumed that the sample space for density functions is the Bayes space.
Regression modeling with compositional data allows researchers to explore associations between compositions and other variables, either as predictors/covariates or response; and also between compositions on both sides of the regression model. Packages specifically devoted to compositional regression analysis are listed below. It should be noted that complmrob and robregcc do not offer anything essential beyond, for example, robCompositions.
Compositional data analysis has gained importance in omics sciences and bioinformatics, where microbiome compositions, gene expression, and metabolomic profiles are inherently compositional. These applications require methods to handle high dimensionality, zero inflation, overdispersion, and phylogenetic integration.
This section highlights packages that provide compositional tools designed for omics data, but certainly most of them could also be considered for the statistical processing of high-dimensional compositional data in general.
Compositional data analysis is essential in geostatistics and geochemistry, where it applies to element, mineral, or isotope proportions with spatial dependencies. These applications require methods that respect relative data structures while accounting for spatial relationships (see SpatioTemporal for a task view focused on spatiotemporal methods).
Thus, this section refers to packages for geostatistical modeling, spatial interpolation, variogram analysis, and compositional kriging; as well as techniques for analyzing spatial geochemical compositions. Note that some methods would be equally applicable to any data set with analogous structures in any other application area.
This collection is meant to include other useful small packages, typically having a fairly specific purpose. In accordance with the log-ratio framework considered here, the condition for inclusion is that the scale invariance property of compositional data is, at least partially, respected.
The following should explain which direction in compositional data analysis this task view follows.
Awareness of the problems with compositional data dates back to the end of the 19th century, when the renowned statistician Karl Pearson recognized the problem of spurious correlations between variables scaled with respect to a common denominator. When closed to add up to constant value, compositional data are formally projected on a simplex sample space, and this is often a convenient representation in a practical setting. The simplex is a constrained space with its own internal operations and geometry. However, any coherent approach to analyzing compositional data should not depend on the chosen representation, nor require any preliminary normalization.
The mainstream approach to compositional data analysis, as originally formulated by Aitchison (1982), involves the use of log-ratio transformations (or log-ratio coordinates to use a more more modern terminology) which project the data into real space. Nowadays, the compositional literature offers a wide range of methods within this methodological framework, many of which are implemented in R packages.
Compositional data are common in diverse scientific fields, including the chemical, biological, and environmental sciences; where they typically represent portions of a total sample weight or volume and are expressed in units such as percent, parts per million, mg/l, mmol/mol, or similar. Some examples include chemical compositions of soil, water, or air, food compositions, behavioral or time-use profiles, and relative abundances of species. They are also common in socio-economical sciences; for example when dealing with market shares, investment portfolios, or household budgets.
In recent years, the popularity of compositional methods has grown significantly. Simultaneously, new methodological challenges have arisen requiring novel ways to transfer and formulate compositional knowledge to meet the needs of different scientific fields.
Core: | compositions, easyCODA, robCompositions, zCompositions. |
Regular: | aIc, ArArRedux, coda.base, coda4microbiome, codacore, codaredistlm, complmrob, DirichletReg, FLORAL, ggtern, gmGeostats, isopleuros, lnmCluster, MicrobiomeStat, multilevelcoda, mvoutlier, provenance, QFASA, robregcc, SARP.compo, Ternary, ToolsForCoDa. |
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.