Type: | Package |
Title: | Clustering with Compositional Data |
Version: | 1.0 |
Date: | 2025-09-15 |
Author: | Michail Tsagris [aut, cre], Nikolaos Kontemeniotis [aut] |
Maintainer: | Michail Tsagris <mtsagris@uoc.gr> |
Depends: | R (≥ 4.0) |
Imports: | Compositional, doParallel, foreach, graphics, lowmemtkmeans, mixture, Rfast, Rfast2, stats |
Description: | Cluster analysis with compositional data using the alpha–transformation. Relevant papers include: Tsagris M. and Kontemeniotis N. (2025), <doi:10.48550/arXiv.2509.05945>. Tsagris M.T., Preston S. and Wood A.T.A. (2011), <doi:10.48550/arXiv.1106.1451>. Garcia-Escudero Luis A., Gordaliza Alfonso, Matran Carlos, Mayo-Iscar Agustin. (2008), <doi:10.1214/07-AOS515>. |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
NeedsCompilation: | no |
Packaged: | 2025-09-14 20:47:14 UTC; mtsag |
Repository: | CRAN |
Date/Publication: | 2025-09-18 09:20:02 UTC |
Clustering with Compositional Data
Description
Cluster analysis with compositional data using the \alpha
–transformation.
Details
Package: | CompositionalClust |
Type: | Package |
Version: | 1.0 |
Date: | 2025-09-15 |
License: | GPL-2 |
Maintainers
Michail Tsagris <mtsagris@uoc.gr>
Author(s)
Michail Tsagris mtsagris@uoc.gr and Nikolaos Kontemeniotis kontemeniotisn@gmail.com.
References
Tsagris M. and Kontemeniotis N. (2025). Simplicial clustering using the \alpha
–transformation. https://arxiv.org/pdf/2509.05945.
Alenazi A. (2023). A review of compositional data analysis and recent advances. Communications in Statistics–Theory and Methods, 52(16): 5535–5567.
Tsagris M.T., Preston S. and Wood A.T.A. (2011). A data-based power transformation for compositional data. Fourth International International Workshop on Compositional Data Analysis.
Gaussian mixture models for compositional data using the \alpha
-transformation
Description
Gaussian mixture models for compositional data using the \alpha
-transformation.
Usage
alfa.mix.norm(x, g, a, model, veo = FALSE)
Arguments
x |
A matrix with the compositional data. |
g |
How many clusters to create. |
a |
The value of the power transformation, it has to be between -1 and 1. If zero values are present it has to be greater than 0.
If |
model |
The type of model to be used.
|
veo |
Stands for "Variables exceed observations". If TRUE then if the number variablesin the model exceeds the number of observations, but the model is still fitted. |
Details
A log-ratio transformation is applied and then a Gaussian mixture model is constructed.
Value
A list including:
mu |
A matrix where each row corresponds to the mean vector of each cluster. |
su |
An array containing the covariance matrix of each cluster. |
prob |
The estimated mixing probabilities. |
est |
The estimated cluster membership values. |
Author(s)
Michail Tsagris.
R implementation and documentation: Michail Tsagris mtsagris@uoc.gr.
References
Tsagris M. and Kontemeniotis N. (2025). Simplicial clustering using the \alpha
–transformation. https://arxiv.org/pdf/2509.05945.
Ryan P. Browne, Aisha ElSherbiny and Paul D. McNicholas (2015). R package mixture: Mixture Models for Clustering and Classification.
Tsagris M.T., Preston S. and Wood A.T.A. (2011). A data-based power transformation for compositional data. In Proceedings of the 4th Compositional Data Analysis Workshop, Girona, Spain. https://arxiv.org/pdf/1106.1451.pdf
See Also
Examples
x <- as.matrix(iris[, 1:4])
x <- x/ rowSums(x)
mod1 <- alfa.mix.norm(x, 3, 0.4, model = "EII" )
mod2 <- alfa.mix.norm(x, 4, 0.7, model = "VII")
Mixture model selection with the \alpha
-transformation using BIC
Description
Mixture model selection with the \alpha
-transformation using BIC.
Usage
bic.alfamixnorm(x, G, a = seq(-1, 1, by = 0.1), veo = FALSE, graph = TRUE)
Arguments
x |
A matrix with compositional data. |
G |
A numeric vector with the number of components, clusters, to be considered, e.g. 1:3. |
a |
A vector with a grid of values of the power transformation, it has to be between -1 and 1. If zero values are present
it has to be greater than 0. If |
veo |
Stands for "Variables exceed observations". If TRUE then if the number variablesin the model exceeds the number of observations, but the model is still fitted. |
graph |
A boolean variable, TRUE or FALSE specifying whether a graph should be drawn or not. |
Details
The \alpha
-transformation is applied to the compositional data first and then mixtures of multivariate Gaussian
distributions are fitted. BIC is used to decide on the optimal model and number of components.
Value
A list including:
abic |
A list that contains the matrices of all BIC values for all values of |
optalpha |
The value of |
optG |
The number of components with the highest BIC. |
optmodel |
The type of model corresponding to the highest BIC. |
If graph is set equal to TRUE a plot with the BIC of the best model for each number of components versus the number of components and a list with the results of the Gaussian mixture model for each value of \alpha
.
Author(s)
Michail Tsagris.
R implementation and documentation: Michail Tsagris mtsagris@uoc.gr.
References
Tsagris M. and Kontemeniotis N. (2025). Simplicial clustering using the \alpha
–transformation. https://arxiv.org/pdf/2509.05945.
Ryan P. Browne, Aisha ElSherbiny and Paul D. McNicholas (2018). mixture: Mixture Models for Clustering and Classification. R package version 1.5.
Ryan P. Browne and Paul D. McNicholas (2014). Estimating Common Principal Components in High Dimensions. Advances in Data Analysis and Classification, 8(2), 217-226.
Tsagris M.T., Preston S. and Wood A.T.A. (2011). A data-based power transformation for compositional data. In Proceedings of the 4th Compositional Data Analysis Workshop, Girona, Spain. https://arxiv.org/pdf/1106.1451.pdf
See Also
Examples
x <- as.matrix( iris[, 1:4] )
x <- x/ rowSums(x)
bic.alfamixnorm(x, 1:3, a = c(0.4, 0.5, 0.6), graph = FALSE)
Cluster indices for the K
–means algorithm for compositional data using the \alpha
–transformation
Description
Cluster indices for the K
–means algorithm for compositional data using the \alpha
–transformation.
Usage
alfa.cikmeans(x, ncl = 10, trim = 0, a = seq(-1, 1, by = 0.1), max.iters = 50,
nstart = 10)
Arguments
x |
A matrix with the compositional data. |
ncl |
The maximum number of clusters to try. The minimum number of clusters is 2. |
trim |
A number in [0, 1). If trim = 0, then the classical |
a |
A vector with a grid of values of the power transformation, it has to be between -1 and 1. If zero values are present it has to be greater than 0. If a=0, the isometric log-ratio transformation is applied. |
max.iters |
The maximum number of iterations allowed during the |
nstart |
How many random starts to perform? |
Details
The \alpha
–transformation is applied to the compositional data, and then the K
–means algorithm is performed and a series of cluster validity indices are computed.
Value
A list inclusing:
min_crit |
A matrix with 9 columns and at least one row, where each column contains the value of a cluster validity index, whose minimal vlaue is preferred. Each row corresponds to a specific number of clusters, starting from 2 up to ncl. |
best_min |
The number of clusters selected based upon the minimal valued cluster validity indices. |
max_crit |
A matrix with 24 columns and at least one row, where each column contains the value of a cluster validity index, whose minimal vlaue is preferred. Each row corresponds to a specific number of clusters, starting from 2 up to ncl. |
best_max |
The number of clusters selected based upon the maximal valued cluster validity indices. |
cluster |
If the argument "all" is TRUE, then the clustering indices of each observation for each number of clusters will be returned in a matrix, where each column corresponds to the clustering of each number of clusters. |
Author(s)
Michail Tsagris.
R implementation and documentation: Michail Tsagris mtsagris@uoc.gr.
References
Tsagris M. and Kontemeniotis N. (2025). Simplicial clustering using the \alpha
–transformation. https://arxiv.org/pdf/2509.05945.
Tsagris M.T., Preston S. and Wood A.T.A. (2011). A data-based power transformation for compositional data. In Proceedings of the 4th Compositional Data Analysis Workshop, Girona, Spain. https://arxiv.org/pdf/1106.1451.pdf
Garcia-Escudero Luis A., Gordaliza Alfonso, Matran Carlos, Mayo-Iscar Agustin. (2008). A general trimming approach to robust cluster analysis. Annals of Statistics 36(3): 1324–1345.
See Also
Examples
y <- as.matrix(iris[, 1:4])
y <- y / rowSums(y)
mod <- alfa.cikmeans( y, ncl = 5, a = c(0, 0.5, 1) )
The K
–means algorithm with cluster indices computed
Description
The K
–means algorithm with cluster indices computed.
Usage
cikmeans(y, ncl = 10, trim = 0, max.iters = 50, nstart = 10, all = FALSE)
Arguments
y |
A matrix with numerical data. |
ncl |
The maximum number of clusters to try. The minimum number of clusters is 2. |
trim |
A number in [0, 1). If trim = 0, then the classical |
max.iters |
The maximum number of iterations allowed during the |
nstart |
How many random starts to perform? |
all |
If this is TRUE, then the clustering indices of each observation for each number of clusters will be returned. |
Details
The K
–means algorithm is performed and a series of cluster validity indices are computed.
Value
A list inclusing:
min_crit |
A matrix with 9 columns and at least one row, where each column contains the value of a cluster validity index, whose minimal vlaue is preferred. Each row corresponds to a specific number of clusters, starting from 2 up to ncl. |
best_min |
The number of clusters selected based upon the minimal valued cluster validity indices. |
max_crit |
A matrix with 24 columns and at least one row, where each column contains the value of a cluster validity index, whose minimal vlaue is preferred. Each row corresponds to a specific number of clusters, starting from 2 up to ncl. |
best_max |
The number of clusters selected based upon the maximal valued cluster validity indices. |
cluster |
If the argument "all" is TRUE, then the clustering indices of each observation for each number of clusters will be returned in a matrix, where each column corresponds to the clustering of each number of clusters. |
Author(s)
Michail Tsagris and Nikolaos Kontemeniotis.
R implementation and documentation: Michail Tsagris mtsagris@uoc.gr and Nikolaos Kontemeniotis kontemeniotisn@gmail.com. .
References
Garcia-Escudero Luis A., Gordaliza Alfonso, Matran Carlos, Mayo-Iscar Agustin. (2008). A general trimming approach to robust cluster analysis. Annals of Statistics 36(3): 1324–1345.
See Also
index_min, index_max, alfa.cikmeans
Examples
y <- as.matrix(iris[, 1:4])
y <- y / rowSums(y)
mod <- cikmeans(y, ncl = 5)
Cluster indices (maximal valued) for the K
–means algorithm
Description
Cluster indices (maximal valued) for the K
–means algorithm.
Usage
index_max(y, mod)
Arguments
y |
A matrix with numerical data. |
mod |
An object with the result of the |
Details
A series of cluster validity indices (maximal valued) are computed.
Value
A vector with 24 cluster validity indices.
Author(s)
Michail Tsagris.
R implementation and documentation: Michail Tsagris mtsagris@uoc.gr.
See Also
index_min, cikmeans, alfa.cikmeans
Examples
y <- as.matrix(iris[, 1:4])
mod <- kmeans(y, 3)
mod <- index_max(y, mod)
Cluster indices (minimial valued) for the K
–means algorithm
Description
Cluster indices (minimal valued) for the K
–means algorithm.
Usage
index_min(y, mod)
Arguments
y |
A matrix with numerical data. |
mod |
An object with the result of the |
Details
A series of cluster validity indices (minimal valued) are computed.
Value
A vector with 9 cluster validity indices.
Author(s)
Michail Tsagris.
R implementation and documentation: Michail Tsagris mtsagris@uoc.gr.
See Also
index_max, cikmeans, alfa.cikmeans
Examples
y <- as.matrix(iris[, 1:4])
mod <- kmeans(y, 3)
mod <- index_min(y, mod)