| Type: | Package |
| Title: | Metagenomic Clustering |
| Version: | 0.1.1 |
| Author: | Dipro Sinha [aut, cre], Sayanti Guha Majumdar [aut], Anu Sharma [aut], Dwijesh Chandra Mishra [aut], Md Yeasin [aut] |
| Maintainer: | Dipro Sinha <diprosinha@gmail.com> |
| Description: | Clustering in metagenomics is the process of grouping of microbial contigs in species specific bins. This package contains functions that extract genomic features from metagenome data, find the number of clusters for that given data and find the best clustering algorithm for binning. |
| License: | GPL-3 |
| Encoding: | UTF-8 |
| LazyData: | true |
| RoxygenNote: | 7.1.2 |
| Imports: | factoextra, cluster, dbscan, dplyr, seqinr, Biostrings |
| Depends: | R (≥ 3.6) |
| NeedsCompilation: | no |
| Packaged: | 2024-06-20 05:07:17 UTC; YEASIN |
| Repository: | CRAN |
| Date/Publication: | 2024-06-20 05:20:02 UTC |
Calculation of GC content
Description
This function will calculate GC content from each sequence or contigs of a FASTA file.
Usage
GC.content(fasta_file)
Arguments
fasta_file |
Name of the fasta or multifasta file |
Value
Value of the GC content of each sequence or contig.
Author(s)
Dipro Sinha <diprosinha@gmail.com>,Sayanti Guha Majumdar, Anu Sharma, Dwijesh Chandra Mishra
Examples
library(metaCluster)
library(seqinr)
sample_data <- read.fasta(file = system.file("extdata/sample1.fasta", package = "metaCluster"),
seqtype = "DNA")
gc <- GC.content(sample_data)
Determination of Suitable Clustering Algorithm for Metagenomics Data
Description
This function will give the best clustering algorithm for a given metagenomics data based on silhouette index for kmeans clustering, kmedoids clustering, fuzzy kmeans clsutering, DBSCAN clustering and hierarchical clsutering.
Usage
clust.suite(data, k, eps, minpts)
Arguments
data |
Feature matrix consisting of different genomic features.Each row represents features corresponding to a particular individual or contig and each column represents different genomic features. |
k |
Optimum number of clusters |
eps |
Radius value for DBSCAN clustering |
minpts |
Minimum point value of DBSCAN clustering |
Value
kmeans |
Output of kmeans clustering |
kmedoids |
Output of kmedoids clustering |
fkmeans |
Output of fuzzy kmeans clustering |
dbscan |
Output of dbscan clustering |
hierarchical |
Output of hierarchical clustering |
silhouette.kmeans |
Silhouette plot of kmeans clustering |
silhouette.kmedoids |
Silhouette plot of kmedoids clustering |
silhouette.fkmeans |
Silhouette plot of fuzzy kmeans clustering |
silhouette.dbscan |
Silhouette plot of dbscan clustering |
silhouette.hierarchical |
Silhouette plot of hierarchical clustering |
best.clustering.method |
Best clustering algorithm based on silhouette index |
silhouette.summary |
Average silhouette width of each clustering algorithm |
Author(s)
Dipro Sinha <diprosinha@gmail.com>,Sayanti Guha Majumdar, Anu Sharma, Dwijesh Chandra Mishra
Examples
library(metaCluster)
data(metafeatures)
result <- clust.suite(metafeatures[1:200,],8,0.5,10)
Metagenomic data
Description
Feature matrix consisting of different genomic features.Each row represents features corresponding to a particular individual or contig and each column represents different genomic features.
Usage
data("metafeatures")
Format
A data frame with 1196 observations on the following 8 variables.
classa factor with levels
.1a numeric vector
Dim.2a numeric vector
Dim.3a numeric vector
Dim.4a numeric vector
Dim.5a numeric vector
Dim.6a numeric vector
gca numeric vector
Oligonucleotide Frequency
Description
This function will calculate oligonucleotide frequency of each sequence or contig from a FASTA file.
Usage
oligo.freq(fasta_file, f)
Arguments
fasta_file |
Name of the fasta or multifasta file |
f |
Length of the oligonucleotide |
Value
Frequency value of each oligonucleotide of length specified by the user
Author(s)
Dipro Sinha <diprosinha@gmail.com>,Sayanti Guha Majumdar, Anu Sharma, Dwijesh Chandra Mishra
Examples
library(metaCluster)
freq <- oligo.freq(fasta_file = system.file("extdata/sample1.fasta", package = "metaCluster"),4)
Finding Optimum Number of Cluster for Metagenomics Data
Description
This function will give optimum number of clusters based on Within Sum of Squares (wss) plot.
Usage
opt.clust.num(data, nc, seed = 1234)
Arguments
data |
Feature matrix consisting of different genomic features.Each row represents features corresponding to a particular individual or contig and each column represents different genomic features. |
nc |
Probable number of clusters |
seed |
Seed value for iteration |
Value
WSS plot
Author(s)
Dipro Sinha <diprosinha@gmail.com>,Sayanti Guha Majumdar, Anu Sharma, Dwijesh Chandra Mishra
Examples
library(metaCluster)
data(metafeatures)
wss_plot <- opt.clust.num(metafeatures[1:200,], nc=10, seed = 1234)