Type: | Package |
Title: | R Toolbox for Unsupervised Spectral Clustering |
Version: | 1.0 |
Date: | 2021-08-20 |
Author: | Emilie Poisson-Caillault [aut, cre, cph], Alain Lefebvre [ctb], Erwan Vincent [aut], Pierre-Alexandre Hebert [ctb] |
Description: | Toolbox containing a variety of spectral clustering tools functions. Among the tools available are the hierarchical spectral clustering algorithm, the Shi and Malik clustering algorithm, the Perona and Freeman algorithm, the non-normalized clustering, the Von Luxburg algorithm, the Partition Around Medoids clustering algorithm, a multi-level clustering algorithm, recursive clustering and the fast method for all clustering algorithm. As well as other tools needed to run these algorithms or useful for unsupervised spectral clustering. This toolbox aims to gather the main tools for unsupervised spectral classification. See http://mawenzi.univ-littoral.fr/ for more information and documentation. |
Depends: | R (≥ 3.0.0) |
Imports: | cluster, stats, grDevices, class |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
RoxygenNote: | 6.0.1 |
NeedsCompilation: | no |
Packaged: | 2021-08-20 12:50:28 UTC; erwan |
Maintainer: | Emilie Poisson-Caillault <emilie.caillault@univ-littoral.fr> |
Repository: | CRAN |
Date/Publication: | 2021-08-23 18:50:02 UTC |
Hierarchical Clustering
Description
Hierarchical Clustering
Usage
HierarchicalClust(
W,
K = 5,
method = "ward.D2",
flagDiagZero = FALSE,
verbose = FALSE,
...
)
Arguments
W |
Gram Similarity Matrix. |
K |
number of cluster to obtain. |
method |
method that will be used in the hierarchical clustering. |
flagDiagZero |
if True, Put zero on the similarity matrix W. |
verbose |
To output the verbose in the terminal. |
... |
Additional parameter for the hclust function. |
Value
returns a list containing the following elements:
cluster: a vector containing the cluster
Author(s)
Emilie Poisson Caillault and Erwan Vincent
Examples
### Example 1: 2 disks of the same size
n<-100 ; r1<-1
x<-(runif(n)-0.5)*2;
y<-(runif(n)-0.5)*2
keep1<-which((x*2+y*2)<(r1*2))
disk1<-data.frame(x+3*r1,y)[keep1,]
disk2 <-data.frame(x-3*r1,y)[keep1,]
sameTwoDisks <- rbind(disk1,disk2)
W <- compute.similarity.ZP(scale(sameTwoDisks))
res <- HierarchicalClust(W,K=2,method="ward.D2",flagDiagZero=TRUE,verbose=TRUE)
plot(sameTwoDisks, col = res$cluster)
### Example 2: Speed and Stopping Distances of Cars
W <- compute.similarity.ZP(scale(iris[,-5]))
res <- HierarchicalClust(W,K=2,method="ward.D2",flagDiagZero=TRUE,verbose=TRUE)
plot(iris, col = res$cluster)
Hierarchical Spectral Clustering
Description
Hierarchical Spectral Clustering
Usage
HierarchicalSC(
W,
K = 5,
method = "ward.D2",
flagDiagZero = FALSE,
verbose = FALSE
)
Arguments
W |
Gram Similarity Matrix. |
K |
number of cluster to obtain. |
method |
method that will be used in the hierarchical clustering. |
flagDiagZero |
if True, Put zero on the similarity matrix W. |
verbose |
To output the verbose in the terminal. |
Value
returns a list containing the following elements:
cluster: a vector containing the cluster
eigenVect: a vector containing the eigenvectors
eigenVal: a vector containing the eigenvalues
Author(s)
Emilie Poisson Caillault and Erwan Vincent
References
Sanchez-Garcia, R., Fernnelly, M. and al. (2014). Hierarchical Spectral Clustering of Power Grids. In IEEE Transaction on Power Systems 29.5, pages 2229-2237. ISSN : 0885-8950.
Examples
### Example 1: 2 disks of the same size
n<-100 ; r1<-1
x<-(runif(n)-0.5)*2;
y<-(runif(n)-0.5)*2
keep1<-which((x*2+y*2)<(r1*2))
disk1<-data.frame(x+3*r1,y)[keep1,]
disk2 <-data.frame(x-3*r1,y)[keep1,]
sameTwoDisks <- rbind(disk1,disk2)
W <- compute.similarity.ZP(scale(sameTwoDisks))
res <- HierarchicalSC(W,K=2,method = "ward.D2",flagDiagZero=TRUE,verbose=TRUE)
plot(sameTwoDisks, col = res$cluster)
plot(res$eigenVect[,1:2], col = res$cluster, main="spectral space",
xlim=c(-1,1),ylim=c(-1,1)); points(0,0,pch='+');
plot(res$eigenVal, main="Laplacian eigenvalues",pch='+');
### Example 2: Speed and Stopping Distances of Cars
W <- compute.similarity.ZP(scale(iris[,-5]))
res <- HierarchicalSC(W,K=2,method="ward.D2",flagDiagZero=TRUE,verbose=TRUE)
plot(iris, col = res$cluster)
plot(res$eigenVect[,1:2], col = res$cluster, main="spectral space",
xlim=c(-1,1),ylim=c(-1,1)); points(0,0,pch='+');
plot(res$eigenVal, main="Laplacian eigenvalues",pch='+');
Multi-Level Spectral Clustering
Description
The function, for a given dataFrame, will separate the data using the NJW clustering in several levels.
Usage
MSC(
X,
levelMax,
silMin = 0.7,
vois = 7,
flagDiagZero = FALSE,
method = "default",
Kmax = 20,
tolerence = 0.99,
threshold = 0.7,
minPoint = 7,
verbose = FALSE
)
Arguments
X |
The dataFrame. |
levelMax |
The maximum depth level. |
silMin |
The minimal silhouette allowed. Below this value, the cluster will be cut again. |
vois |
number of points that will be selected for the similarity computation. |
flagDiagZero |
if True, Put zero on the similarity matrix W. |
method |
The method that will be used. "default" to let the function choose the most suitable method. "PEV" for the Principal EigenValue method. "GAP" for the GAP method. |
Kmax |
The maximum number of cluster which is allowed. |
tolerence |
The tolerance allowed for the Principal EigenValue method. |
threshold |
The threshold to select the dominant eigenvalue for the GAP method. |
minPoint |
The minimum number of points required to compute a cluster. |
verbose |
To output the verbose in the terminal. |
Value
returns a list containing the following elements:
cluster: a vector containing the cluster
eigenVect: a vector containing the eigenvectors
eigenVal: a vector containing the eigenvalues
Author(s)
Emilie Poisson Caillault and Erwan Vincent
References
Grassi, K. (2020) Definition multivariee et multi-echelle d'etats environnementaux par Machine Learning : Caracterisation de la dynamique phytoplanctonique.
Examples
### Example 1: 2 disks of the same size
n<-100 ; r1<-1
x<-(runif(n)-0.5)*2;
y<-(runif(n)-0.5)*2
keep1<-which((x*2+y*2)<(r1*2))
disk1<-data.frame(x+3*r1,y)[keep1,]
disk2 <-data.frame(x-3*r1,y)[keep1,]
sameTwoDisks <- rbind(disk1,disk2)
res <- MSC(scale(sameTwoDisks),levelMax=5, silMin=0.7, vois=7,
flagDiagZero=TRUE, method = "default", Kmax = 20,
tolerence = 0.99,threshold = 0.7, minPoint = 7, verbose = TRUE)
plot(sameTwoDisks, col = as.factor(res[,ncol(res)]))
### Example 2: Speed and Stopping Distances of Cars
res <- MSC(scale(iris[,-5]),levelMax=5, silMin=0.7, vois=7,
flagDiagZero=TRUE, method = "default", Kmax = 20,
tolerence = 0.99,threshold = 0.9, minPoint = 7, verbose = TRUE)
plot(iris, col = as.factor(res[,ncol(res)]))
table(res[,ncol(res)],iris$Species)
Bi-parted Spectral Clustering. Peronna and Freeman.
Description
Bi-parted spectral clustering based on Peronna and Freeman algorithm, which separates the data into two distinct clusters
Usage
PeronaFreemanSC(W, flagDiagZero = FALSE, verbose = FALSE)
Arguments
W |
Gram Similarity Matrix. |
flagDiagZero |
if True, Put zero on the similarity matrix W. |
verbose |
To output the verbose in the terminal. |
Value
returns a list containing the following elements:
cluster: a vector containing the cluster
eigenVect: a vector containing the eigenvectors
eigenVal: a vector containing the eigenvalues
Author(s)
Emilie Poisson Caillault and Erwan Vincent
References
Perona, P. and Freeman, W. (1998). A factorization approach to grouping. In European Conference on Computer Vision, pages 655-670
Examples
### Example 1: 2 disks of the same size
n<-100 ; r1<-1
x<-(runif(n)-0.5)*2;
y<-(runif(n)-0.5)*2
keep1<-which((x*2+y*2)<(r1*2))
disk1<-data.frame(x+3*r1,y)[keep1,]
disk2 <-data.frame(x-3*r1,y)[keep1,]
sameTwoDisks <- rbind(disk1,disk2)
W <- compute.similarity.ZP(scale(sameTwoDisks))
res <- PeronaFreemanSC(W,flagDiagZero=TRUE,verbose=TRUE)
plot(sameTwoDisks, col = res$cluster)
plot(res$eigenVect[,1:2], col = res$cluster, main="spectral space",
xlim=c(-1,1),ylim=c(-1,1)); points(0,0,pch='+');
plot(res$eigenVal, main="Laplacian eigenvalues",pch='+');
### Example 2: Speed and Stopping Distances of Cars
W <- compute.similarity.ZP(scale(iris[,-5]))
res <- PeronaFreemanSC(W,flagDiagZero=TRUE,verbose=TRUE)
plot(iris, col = res$cluster)
plot(res$eigenVect[,1:2], col = res$cluster, main="spectral space",
xlim=c(-1,1),ylim=c(-1,1)); points(0,0,pch='+');
plot(res$eigenVal, main="Laplacian eigenvalues",pch='+');
Bi-parted Spectral Clustering. Shi and Malik.
Description
Bi-parted spectral clustering based on Shi and Malik algorithm, which separates the data into two distinct clusters
Usage
ShiMalikSC(W, flagDiagZero = FALSE, verbose = FALSE)
Arguments
W |
Gram Similarity Matrix. |
flagDiagZero |
if True, Put zero on the similarity matrix W. |
verbose |
To output the verbose in the terminal. |
Value
returns a list containing the following elements:
cluster: a vector containing the cluster
eigenVect: a vector containing the eigenvectors
eigenVal: a vector containing the eigenvalues
Author(s)
Emilie Poisson Caillault and Erwan Vincent
References
Shi, J and Malik, J. (2000). Normalized cuts and image segmentation. In PAMI, Transactions on Pattern Analysis and Machine Intelligence, pages 888-905
Examples
### Example 1: 2 disks of the same size
n<-100 ; r1<-1
x<-(runif(n)-0.5)*2;
y<-(runif(n)-0.5)*2
keep1<-which((x*2+y*2)<(r1*2))
disk1<-data.frame(x+3*r1,y)[keep1,]
disk2 <-data.frame(x-3*r1,y)[keep1,]
sameTwoDisks <- rbind(disk1,disk2)
W <- compute.similarity.ZP(scale(sameTwoDisks))
res <- ShiMalikSC(W,flagDiagZero=TRUE,verbose=FALSE)
plot(sameTwoDisks, col = res$cluster)
plot(res$eigenVect[,1:2], col = res$cluster, main="spectral space",
xlim=c(-1,1),ylim=c(-1,1)); points(0,0,pch='+');
plot(res$eigenVal, main="Laplacian eigenvalues",pch='+');
### Example 2: Speed and Stopping Distances of Cars
W <- compute.similarity.ZP(scale(iris[,-5]))
res <- ShiMalikSC(W,flagDiagZero=TRUE,verbose=TRUE)
plot(iris, col = res$cluster)
plot(res$eigenVect[,1:2], col = res$cluster, main="spectral space",
xlim=c(-1,1),ylim=c(-1,1)); points(0,0,pch='+');
plot(res$eigenVal, main="Laplacian eigenvalues",pch='+');
Unormalized Spectral Clustering Ng.
Description
The function, for a given similarity matrix, will separate the data using a spectral space. It does not normalize the Laplacian matrix compared to other algorithms
Usage
UnormalizedSC(W, K = 5, flagDiagZero = FALSE, verbose = FALSE)
Arguments
W |
Gram Similarity Matrix. |
K |
number of cluster to obtain. |
flagDiagZero |
if True, Put zero on the similarity matrix W. |
verbose |
To output the verbose in the terminal. |
Value
returns a list containing the following elements:
cluster: a vector containing the cluster
eigenVect: a vector containing the eigenvectors
eigenVal: a vector containing the eigenvalues
Author(s)
Emilie Poisson Caillault and Erwan Vincent
Examples
### Example 1: 2 disks of the same size
n<-100 ; r1<-1
x<-(runif(n)-0.5)*2;
y<-(runif(n)-0.5)*2
keep1<-which((x*2+y*2)<(r1*2))
disk1<-data.frame(x+3*r1,y)[keep1,]
disk2 <-data.frame(x-3*r1,y)[keep1,]
sameTwoDisks <- rbind(disk1,disk2)
W <- compute.similarity.ZP(scale(sameTwoDisks))
res <- UnormalizedSC(W,K=2,flagDiagZero=TRUE,verbose=TRUE)
plot(sameTwoDisks, col = res$cluster)
plot(res$eigenVect[,1:2], col = res$cluster, main="spectral space",
xlim=c(-1,1),ylim=c(-1,1)); points(0,0,pch='+');
plot(res$eigenVal, main="Laplacian eigenvalues",pch='+');
### Example 2: Speed and Stopping Distances of Cars
W <- compute.similarity.ZP(scale(iris[,-5]))
res <- UnormalizedSC(W,K=2,flagDiagZero=TRUE,verbose=TRUE)
plot(iris, col = res$cluster)
plot(res$eigenVect[,1:2], col = res$cluster, main="spectral space",
xlim=c(-1,1),ylim=c(-1,1)); points(0,0,pch='+');
plot(res$eigenVal, main="Laplacian eigenvalues",pch='+');
Spectral Clustering based on the Von Luxburg algorithm
Description
The function, for a given similarity matrix, will separate the data using a spectral space. It uses the Von Luxburg algorithm to do this
Usage
VonLuxburgSC(W, K = 5, flagDiagZero = FALSE, verbose = FALSE)
Arguments
W |
Gram Similarity Matrix. |
K |
number of cluster to obtain. |
flagDiagZero |
if True, Put zero on the similarity matrix W. |
verbose |
To output the verbose in the terminal. |
Value
returns a list containing the following elements:
cluster: a vector containing the cluster
eigenVect: a vector containing the eigenvectors
eigenVal: a vector containing the eigenvalues
Author(s)
Emilie Poisson Caillault and Erwan Vincent
References
Von Luxburg, U. (2007). A Tutorial on Spectral Clustering. Statistics and Computing, Volume 17(4), pages 395-416
Examples
### Example 1: 2 disks of the same size
n<-100 ; r1<-1
x<-(runif(n)-0.5)*2;
y<-(runif(n)-0.5)*2
keep1<-which((x*2+y*2)<(r1*2))
disk1<-data.frame(x+3*r1,y)[keep1,]
disk2 <-data.frame(x-3*r1,y)[keep1,]
sameTwoDisks <- rbind(disk1,disk2)
W <- compute.similarity.ZP(scale(sameTwoDisks))
res <- VonLuxburgSC(W,K=2,flagDiagZero=TRUE,verbose=TRUE)
plot(sameTwoDisks, col = res$cluster)
plot(res$eigenVect[,1:2], col = res$cluster, main="spectral space",
xlim=c(-1,1),ylim=c(-1,1)); points(0,0,pch='+');
plot(res$eigenVal, main="Laplacian eigenvalues",pch='+');
### Example 2: Speed and Stopping Distances of Cars
W <- compute.similarity.ZP(scale(iris[,-5]))
res <- VonLuxburgSC(W,K=2,flagDiagZero=TRUE,verbose=TRUE)
plot(iris, col = res$cluster)
plot(res$eigenVect[,1:2], col = res$cluster, main="spectral space",
xlim=c(-1,1),ylim=c(-1,1)); points(0,0,pch='+');
plot(res$eigenVal, main="Laplacian eigenvalues",pch='+');
Gram similarity matrix checker
Description
Function to check if a similarity matrix is Gram or not
Usage
checking.gram.similarityMatrix(W, flagDiagZero = FALSE, verbose = FALSE)
Arguments
W |
Gram Similarity Matrix or not. |
flagDiagZero |
if True, Put zero on the similarity matrix W. |
verbose |
To output the verbose in the terminal. |
Value
a Gram similarity matrix
Author(s)
Emilie Poisson Caillault and Erwan Vincent
Examples
### Example 1: 2 disks of the same size
n<-100 ; r1<-1
x<-(runif(n)-0.5)*2;
y<-(runif(n)-0.5)*2
keep1<-which((x*2+y*2)<(r1*2))
disk1<-data.frame(x+3*r1,y)[keep1,]
disk2 <-data.frame(x-3*r1,y)[keep1,]
sameTwoDisks <- rbind(disk1,disk2)
W <- compute.similarity.ZP(scale(sameTwoDisks))
W <- checking.gram.similarityMatrix(W)
### Example 2: Speed and Stopping Distances of Cars
W <- compute.similarity.ZP(scale(cars))
W <- checking.gram.similarityMatrix(W)
Gram similarity matrix checker
Description
Function which select the number of cluster to compute thanks to a selected method
Usage
compute.kclust(
eigenValues,
method = "default",
Kmax = 20,
tolerence = 1,
threshold = 0.9,
verbose = FALSE
)
Arguments
eigenValues |
The eigenvalues of the laplacian matrix. |
method |
The method that will be used. "default" to let the function choose the most suitable method. "PEV" for the Principal EigenValue method. "GAP" for the GAP method. |
Kmax |
The maximum number of cluster which is allowed. |
tolerence |
The tolerance allowed for the Principal EigenValue method. |
threshold |
The threshold to select the dominant eigenvalue for the GAP method. |
verbose |
To output the verbose in the terminal. |
Value
a vector which contain the number of cluster to compute.
Author(s)
Emilie Poisson Caillault and Erwan Vincent
Examples
### Example 1: 2 disks of the same size
n<-100 ; r1<-1
x<-(runif(n)-0.5)*2;
y<-(runif(n)-0.5)*2
keep1<-which((x*2+y*2)<(r1*2))
disk1<-data.frame(x+3*r1,y)[keep1,]
disk2 <-data.frame(x-3*r1,y)[keep1,]
sameTwoDisks <- rbind(disk1,disk2)
W <- compute.similarity.ZP(scale(sameTwoDisks))
W <- checking.gram.similarityMatrix(W)
eigVal <- compute.laplacian.NJW(W,verbose = TRUE)$eigen$values
K <- compute.kclust(eigVal, method="default", Kmax=20, tolerence=0.99, threshold=0.9, verbose=TRUE)
### Example 2: Speed and Stopping Distances of Cars
W <- compute.similarity.ZP(scale(cars))
W <- checking.gram.similarityMatrix(W)
eigVal <- compute.laplacian.NJW(W,verbose = TRUE)$eigen$values
K <- compute.kclust(eigVal, method="default", Kmax=20, tolerence=0.99, threshold=0.9, verbose=TRUE)
K clust compute selection V2
Description
Function which select the number of cluster to compute thanks to a selected method
Usage
compute.kclust2(
eigenValues,
method = "default",
Kmax = 20,
tolerence = 1,
threshold = 0.9,
verbose = FALSE
)
Arguments
eigenValues |
The eigenvalues of the laplacian matrix. |
method |
The method that will be used. "default" to let the function choose the most suitable method. "PEV" for the Principal EigenValue method. "GAP" for the GAP method. |
Kmax |
The maximum number of cluster which is allowed. |
tolerence |
The tolerance allowed for the Principal EigenValue method. |
threshold |
The threshold to select the dominant eigenvalue for the GAP method. |
verbose |
To output the verbose in the terminal. |
Value
a vector which contain the number of cluster to compute.
Author(s)
Emilie Poisson Caillault and Erwan Vincent
Gram similarity matrix checker
Description
Function which select the number of cluster to compute thanks to a selected method
Usage
compute.laplacian.NJW(W, verbose = FALSE)
Arguments
W |
Gram Similarity Matrix. |
verbose |
To output the verbose in the terminal. |
Value
returns a list containing the following elements:
Lsym: a NJW laplacian matrix
eigen: a list that contain the eigenvectors ans eigenvalues
diag: a diagonal matrix used for the laplacian matrix
Author(s)
Emilie Poisson Caillault and Erwan Vincent
Examples
### Example 1: 2 disks of the same size
n<-100 ; r1<-1
x<-(runif(n)-0.5)*2;
y<-(runif(n)-0.5)*2
keep1<-which((x*2+y*2)<(r1*2))
disk1<-data.frame(x+3*r1,y)[keep1,]
disk2 <-data.frame(x-3*r1,y)[keep1,]
sameTwoDisks <- rbind(disk1,disk2)
W <- compute.similarity.ZP(scale(sameTwoDisks))
W <- checking.gram.similarityMatrix(W)
res <- compute.laplacian.NJW(W,verbose = TRUE)
### Example 2: Speed and Stopping Distances of Cars
W <- compute.similarity.ZP(scale(cars))
W <- checking.gram.similarityMatrix(W)
res <- compute.laplacian.NJW(W,verbose = TRUE)
Recherche du nb de cluster par selon le critere du gap
Description
Recherche du nb de cluster par selon le critere du gap
Usage
compute.nbCluster.gap(val, seuil = 0, fig = FALSE)
Arguments
val |
#valeur propre d'une matrice de similarite |
seuil |
seuil |
fig |
booleen |
Value
Kli
Author(s)
Emilie Poisson Caillault v13/10/2015
Calcule matrice de similarite gaussienne selon Zelnik-Manor et Perona
Description
sigma local, attention risque matrice non semi-def positive
Usage
compute.similarity.ZP(points, vois = 7)
Arguments
points |
matrice pointsxattributs |
vois |
nombre de voisin qui seront selectionnes |
Value
mat
Author(s)
Emilie Poisson Caillault v13/10/2015
Calcule matrice de similarite gaussienn
Description
Calcule matrice de similarite gaussienn
Usage
compute.similarity.gaussien(points, sigma)
Arguments
points |
matrice pointsxattributs |
sigma |
sigma |
Value
mat
Author(s)
Emilie Poisson Caillault v13/10/2015
Fast Spectral Clustering
Description
This function will sample the data before performing a classification function on the samples and then applying K nearest neighbours.
Usage
fastClustering(
dataFrame,
smplPoint,
stopCriteria = 0.99,
neighbours = 7,
similarity = TRUE,
clustFunction,
...
)
Arguments
dataFrame |
The dataFrame. |
smplPoint |
maximum of sample number for reduction. |
stopCriteria |
criterion for minimizing intra-group distance and select final smplPoint. |
neighbours |
number of points that will be selected for the similarity computation. |
similarity |
if True, will use the similarity matrix for the clustering function. |
clustFunction |
the clustering function to apply on data. |
... |
additional arguments for the clustering function. |
Value
returns a list containing the following elements:
results: clustering results
sample: dataframe containing the sample used
quantLabels: quantization labels
clustLabels: results labels
kmeans: kmeans quantization results
Author(s)
Emilie Poisson Caillault and Erwan Vincent
Examples
### Example 1: 2 disks of the same size
n<-100 ; r1<-1
x<-(runif(n)-0.5)*2;
y<-(runif(n)-0.5)*2
keep1<-which((x*2+y*2)<(r1*2))
disk1<-data.frame(x+3*r1,y)[keep1,]
disk2 <-data.frame(x-3*r1,y)[keep1,]
sameTwoDisks <- rbind(disk1,disk2)
res <- fastClustering(scale(sameTwoDisks),smplPoint = 500,
stopCriteria = 0.99, neighbours = 7, similarity = TRUE,
clustFunction = UnormalizedSC, K = 2)
plot(sameTwoDisks, col = as.factor(res$clustLabels))
### Example 2: Speed and Stopping Distances of Cars
res <- fastClustering(scale(iris[,-5]),smplPoint = 500,
stopCriteria = 0.99, neighbours = 7, similarity = TRUE,
clustFunction = spectralPAM, K = 3)
plot(iris, col = as.factor(res$clustLabels))
table(res$clustLabels,iris$Species)
Fast Multi-Level Spectral Clustering
Description
The function, for a given dataFrame, will separate the data using the Fast NJW clustering in several levels.
Usage
fastMSC(
X,
levelMax,
silMin = 0.7,
vois = 7,
flagDiagZero = FALSE,
method = "default",
Kmax = 20,
tolerence = 0.99,
threshold = 0.7,
minPoint = 7,
verbose = FALSE
)
Arguments
X |
The dataFrame. |
levelMax |
The maximum depth level. |
silMin |
The minimal silhouette allowed. Below this value, the cluster will be cut again. |
vois |
number of points that will be selected for the similarity computation. |
flagDiagZero |
if True, Put zero on the similarity matrix W. |
method |
The method that will be used. "default" to let the function choose the most suitable method. "PEV" for the Principal EigenValue method. "GAP" for the GAP method. |
Kmax |
The maximum number of cluster which is allowed. |
tolerence |
The tolerance allowed for the Principal EigenValue method. |
threshold |
The threshold to select the dominant eigenvalue for the GAP method. |
minPoint |
The minimum number of points required to compute a cluster. |
verbose |
To output the verbose in the terminal. |
Value
a dataframe containing the results labels of each levels
Author(s)
Emilie Poisson Caillault and Erwan Vincent
Examples
### Example 1: 2 disks of the same size
n<-100 ; r1<-1
x<-(runif(n)-0.5)*2;
y<-(runif(n)-0.5)*2
keep1<-which((x*2+y*2)<(r1*2))
disk1<-data.frame(x+3*r1,y)[keep1,]
disk2 <-data.frame(x-3*r1,y)[keep1,]
sameTwoDisks <- rbind(disk1,disk2)
res <- fastMSC(scale(sameTwoDisks),levelMax=5, silMin=0.7, vois=7,
flagDiagZero=TRUE, method = "PEV", Kmax = 20,
tolerence = 0.99,threshold = 0.7, minPoint = 7, verbose = TRUE)
plot(sameTwoDisks, col = as.factor(res[,ncol(res)]))
### Example 2: Speed and Stopping Distances of Cars
res <- fastMSC(scale(iris[,-5]),levelMax=5, silMin=0.7, vois=7,
flagDiagZero=TRUE, method = "PEV", Kmax = 20,
tolerence = 0.99,threshold = 0.9, minPoint = 7, verbose = TRUE)
plot(iris, col = as.factor(res[,ncol(res)]))
table(res[,ncol(res)],iris$Species)
Data quantization
Description
The function use kmeans algorithm to perform data quantization.
Usage
kmeansQuantization(dataFrame, maxData, stopCriteria = 0.99)
Arguments
dataFrame |
The dataFrame. |
maxData |
maximum of sample number for reduction. |
stopCriteria |
criterion for minimizing intra-group distance and select final smplPoint. |
Value
kmeans result
Author(s)
Emilie Poisson Caillault and Erwan Vincent
Perform a multi level clustering
Description
The function, for a given dataFrame, will separate the data using the input clustering method in several levels.
Usage
recursClust(
dataFrame,
levelMax = 2,
clustFunction,
similarity = TRUE,
vois = 7,
flagDiagZero = FALSE,
biparted = FALSE,
method = "default",
tolerence = 0.99,
threshold = 0.9,
minPoint = 7,
verbose = FALSE,
...
)
Arguments
dataFrame |
The dataFrame. |
levelMax |
The maximum depth level. |
clustFunction |
the clustering function to apply on data. |
similarity |
if True, will use the similarity matrix for the clustering function. |
vois |
number of points that will be selected for the similarity computation. |
flagDiagZero |
if True, Put zero on the similarity matrix W. |
biparted |
if True, the function will not automatically choose the number of clusters to compute. |
method |
The method that will be used. "default" to let the function choose the most suitable method. "PEV" for the Principal EigenValue method. "GAP" for the GAP method. |
tolerence |
The tolerance allowed for the Principal EigenValue method. |
threshold |
The threshold to select the dominant eigenvalue for the GAP method. |
minPoint |
The minimum number of points required to compute a cluster. |
verbose |
To output the verbose in the terminal. |
... |
additional arguments for the clustering function. |
Value
returns a list containing the following elements:
cluster: vector that contain the result of the last level
allLevels: dataframe containing the clustering results of each levels
nbLevels: the number of computed levels
Author(s)
Emilie Poisson Caillault and Erwan Vincent
Examples
### Example 1: 2 disks of the same size
n<-100 ; r1<-1
x<-(runif(n)-0.5)*2;
y<-(runif(n)-0.5)*2
keep1<-which((x*2+y*2)<(r1*2))
disk1<-data.frame(x+3*r1,y)[keep1,]
disk2 <-data.frame(x-3*r1,y)[keep1,]
sameTwoDisks <- rbind(disk1,disk2)
res <- recursClust(scale(sameTwoDisks),levelMax=3, clustFunction =ShiMalikSC,
similarity = TRUE, vois = 7, flagDiagZero = FALSE,
biparted = TRUE, verbose = TRUE)
plot(sameTwoDisks, col = as.factor(res$cluster))
### Example 2: Speed and Stopping Distances of Cars
res <- recursClust(scale(iris[,-5]),levelMax=4, clustFunction = spectralPAM,
similarity = TRUE, vois = 7, flagDiagZero = FALSE,
biparted = FALSE, method = "PEV", tolerence = 0.99,
threshold = 0.9, verbose = TRUE)
plot(iris, col = as.factor(res$cluster))
Recherche du voisin num id le plus proche
Description
Recherche du voisin num id le plus proche
Usage
search.neighboor(vdist, vois)
Arguments
vdist |
vecteur de distance du point avec d'autres points |
vois |
nombre de voisin a selectionner |
Value
id
Author(s)
Emilie Poisson Caillault v13/10/2015
Spectral-PAM clustering
Description
The function, for a given similarity matrix, will separate the data using a spectral space.It is based on the Jordan and Weiss algorithm. This version uses K-medoid to split the clusters.
Usage
spectralPAM(W, K, flagDiagZero = FALSE, verbose = FALSE)
Arguments
W |
Gram Similarity Matrix. |
K |
number of cluster to obtain. |
flagDiagZero |
if True, Put zero on the similarity matrix W. |
verbose |
To output the verbose in the terminal. |
Value
returns a list containing the following elements:
cluster: a vector containing the cluster
eigenVect: a vector containing the eigenvectors
eigenVal: a vector containing the eigenvalues
Author(s)
Emilie Poisson Caillault and Erwan Vincent
Examples
### Example 1: 2 disks of the same size
n<-100 ; r1<-1
x<-(runif(n)-0.5)*2;
y<-(runif(n)-0.5)*2
keep1<-which((x*2+y*2)<(r1*2))
disk1<-data.frame(x+3*r1,y)[keep1,]
disk2 <-data.frame(x-3*r1,y)[keep1,]
sameTwoDisks <- rbind(disk1,disk2)
W <- compute.similarity.ZP(scale(sameTwoDisks))
res <- spectralPAM(W,K=2,flagDiagZero=TRUE,verbose=TRUE)
plot(sameTwoDisks, col = res$cluster)
plot(res$eigenVect[,1:2], col = res$cluster, main="spectral space",
xlim=c(-1,1),ylim=c(-1,1)); points(0,0,pch='+');
plot(res$eigenVal, main="Laplacian eigenvalues",pch='+');
abline(h=1,lty="dashed",col="red")
### Example 2: Speed and Stopping Distances of Cars
W <- compute.similarity.ZP(scale(iris[-5]))
res <- spectralPAM(W,K=2,flagDiagZero=TRUE,verbose=TRUE)
plot(iris, col = res$cluster)
plot(res$eigenVect[,1:2], col = res$cluster, main="spectral space",
xlim=c(-1,1),ylim=c(-1,1)); points(0,0,pch='+');
plot(res$eigenVal, main="Laplacian eigenvalues",pch='+');
abline(h=1,lty="dashed",col="red")