Introduction-to-scStability

The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

First, we need to import a scRNA-seq dataset that can be used to demonstrate the functions. We will use the Zeisel brain dataset from the scRNAseq package.

if (have_bioc){
  sce <- scRNAseq::ZeiselBrainData(location = FALSE)
  counts_matrix <- SummarizedExperiment::assay(sce, "counts")
}

Then, we need to create a Seurat object to run are functions and create a PCA. To save time, we will reduce our sample to only 200 cells

library(Seurat)
#> Loading required package: SeuratObject
#> Loading required package: sp
#> 
#> Attaching package: 'sp'
#> The following object is masked from 'package:IRanges':
#> 
#>     %over%
#> 'SeuratObject' was built under R 4.4.0 but the current version is
#> 4.4.3; it is recomended that you reinstall 'SeuratObject' as the ABI
#> for R may have changed
#> 'SeuratObject' was built with package 'Matrix' 1.7.0 but the current
#> version is 1.7.3; it is recomended that you reinstall 'SeuratObject' as
#> the ABI for 'Matrix' may have changed
#> 
#> Attaching package: 'SeuratObject'
#> The following object is masked from 'package:SummarizedExperiment':
#> 
#>     Assays
#> The following object is masked from 'package:GenomicRanges':
#> 
#>     intersect
#> The following object is masked from 'package:GenomeInfoDb':
#> 
#>     intersect
#> The following object is masked from 'package:IRanges':
#> 
#>     intersect
#> The following object is masked from 'package:S4Vectors':
#> 
#>     intersect
#> The following object is masked from 'package:BiocGenerics':
#> 
#>     intersect
#> The following objects are masked from 'package:base':
#> 
#>     intersect, t
#> 
#> Attaching package: 'Seurat'
#> The following object is masked from 'package:SummarizedExperiment':
#> 
#>     Assays
seurat_obj <- Seurat::CreateSeuratObject(counts = counts_matrix)
#> Warning: Feature names cannot have underscores ('_'), replacing with dashes
#> ('-')
cell_names <- colnames(seurat_obj@assays$RNA)[1:200]
seurat_obj <- subset(seurat_obj, cells = cell_names)

seurat_obj <- Seurat::NormalizeData(seurat_obj)
#> Normalizing layer: counts
seurat_obj <- Seurat::FindVariableFeatures(seurat_obj)
#> Finding variable features for layer counts
seurat_obj <- Seurat::ScaleData(seurat_obj, verbose = FALSE)

# We need a PCA as the input to the functions and so we will create one using Seurat::RunPCA(). Recommended to use as few principal components as is logical to speed up computation. You can check the number of PCs needed using a scree plot.
seurat_obj <- Seurat::RunPCA(seurat_obj, npcs = 10)
#> PC_ 1 
#> Positive:  Zwint, Ngfrap1, Fth1, Gm6402, Rps15a-ps4, Rab3b, Ndufab1, OTTMUSG00000016609-loc3, 1500004A13Rik, Tmsb4x 
#>     Gm14305, OTTMUSG00000016609-loc4, Fabp5, Trappc2, Ndfip1, Zfp931, Glrx, Gm14436, Plp1, Gpx4 
#>     Sst, Csrp1, Lgals1, Schip1, Cd81, Cnp, Car2, Sepp1, Ptn, Cd9 
#> Negative:  Dner, Olfm1, Atp2a2, Rgs7bp, Ogfrl1, Elmod1, Camk2b, Nfib, Nfix, Eef1a2 
#>     Slc6a1, Plcxd2, Atp2b2, Bcl11b, Lrp12, D3Bwg0562e, Rab6a, Gabrb2, Rph3a, Nr2f1 
#>     Chd3, Celf2, Ptprm, Tspan5, Slc2a13, Tspan7, Kit, Nt5dc3, Mef2c, Gnao1 
#> PC_ 2 
#> Positive:  Cnr1, Trp53i11, Cplx2, Cadps2, Npas1, Gng2, Nrxn3, Cck, Sncg, Adarb2 
#>     Eps8, Kctd12, Fxyd6, Necab1, Rgs12, Igf1, Syt6, Gng4, Celf6, Col25a1 
#>     Pde5a, Necab2, Wnt5a, Fstl5, Lix1, Npas3, Stk32c, Cntnap2, Dlx6os1, Ddah1 
#> Negative:  Adcy1, Mpped1, Bcl11a, Akap5, Nxph1, Cacna1a, Prkacb, Lamp5, Gda, Ncald 
#>     Rgs17, Gabrd, Mafb, Tox3, Grm5, Rbfox1, Sema5a, Hapln1, Cyp46a1, Sst 
#>     Cap2, Grin3a, Tspan7, Gria3, Celf2, Srgap3, Car4, Fam46a, Bcl11b, Rapgef4 
#> PC_ 3 
#> Positive:  Grm1, Shisa6, Kctd8, Satb1, Trpc6, Elfn1, Kcnc2, Cdh13, Gria1, Lphn2 
#>     Grin3a, Gria3, Ephx4, Lmo4, Runx1t1, Vsnl1, Coch, Sst, Car10, Nppc 
#>     Cbln4, Grik3, Cdh8, Slc24a2, Cacna2d2, Ntm, Lhx6, Rab3b, Dgkg, Eya1 
#> Negative:  Cryab, Trf, Mal, Plp1, Car14, Mog, Apod, Ugt8a, Mobp, Hapln2 
#>     Cmtm5, Ermn, Elovl1, Tmem63a, Tmem125, Gsn, Cldn11, Gatm, Gng11, Mbp 
#>     Aspa, Cnp, Mag, Pdlim2, Id2, Elovl7, Phgdh, Enpp6, Gjc3, Csrp1 
#> PC_ 4 
#> Positive:  Elfn1, Camk2n1, Cadm2, Satb1, Asic1, Lgi3, Chrna4, Htra1, Aldoc, Timp2 
#>     Vgf, Marcks, Grpr, Glra2, Penk, Cacna1h, Cacna2d3, Igsf3, Crh, Tmem229b 
#>     2610017I09Rik, Adra1b, Ppargc1a, Asic4, Ptpn5, Ano4, Gpd1, Lynx1, Pthlh, Ndrg2 
#> Negative:  Tsc22d1, Id2, Cpne8, Lamp5, Tgfb2, Ldha, Ndst3, Vwa5a, Itpr1, Sema5a 
#>     Crispld1, Gabra5, Rgs10, Gm14305, Wif1, Mpped1, Cpne6, Gpr158, Plch2, Enc1 
#>     Ddn, 9930013L23Rik, Pls3, Cacna2d1, Ociad2, Dkk3, Wls, Scube1, Slc35d3, Sv2c 
#> PC_ 5 
#> Positive:  Sparcl1, Rgs8, Mgll, Rgs16, Ankrd29, Rgs10, Cox6a2, Kcnc1, Gabrd, Ptpre 
#>     Npy, Ppargc1a, Ncald, Pthlh, Fryl, Slc24a2, mt-Ty, Dlx1os, Abat, Pacsin2 
#>     Slc2a13, Pygm, Ptprd, Gpr176, Maf, Rab6a, Wnt5b, Kcng4, Pid1, Osbpl3 
#> Negative:  Baiap2, Neurod2, Fam19a1, Atp2b1, Itpka, Pde1a, Ppp3ca, Rgs14, Prkcb, Neurod6 
#>     Lpl, Rprml, Crym, Nrgn, Sv2b, 2010300C02Rik, Ildr2, Crim1, Cpne6, Camk2a 
#>     6330403A02Rik, Fam131a, Miat, Slc17a7, Hpca, Klk8, Ptk2b, Nrn1, Lingo1, Ddn

# Extract the PCA also for use in embedding functions
input_pca <- Seurat::Embeddings(seurat_obj, reduction = "pca")

If you would like to use the wrapper function, then it is a single function call. Make sure you know which dimension reduction and clustering algorithm you want to use. Also check how many CPU cores you have to see if you can make it run faster. For speed and demonstration only 15 runs are used here, however it is recommended to try ~100-300 runs.

# Run the wrapper function with 100 different embeddings and cluster assignments being created and compared
stability_results <- scStability::scStability(seurat_obj, n_runs = 15, dr_method = 'umap', clust_method = 'louvain', n_cores = 8)
#> -----------------------Embedding Statistics----------------------------- 
#> Mean of Mean Correlation per Embedding: 0.8281
#> 95 Percent CI of Mean Correlation per Embedding: [0.7466, 0.8650]
#> Mean Correlation Range: Min = 0.7284 Max = 0.8674



# As the function runs, output statistics will be printed. A density plot of the mean Kendall's tau per embedding and a violin plot of the cluster assignment NMI will also be printed.


# To see what the mean embedding with the mean cluster assignment is, we can print the plot created
print(stability_results$plot)


# For statistics on each stage, we can extract the following:
embedding_statistics <- stability_results$emb_stats
cluster_statistics <- stability_results$clust_stats

# Look at the mean Kendall's tau (correlation between embeddings) for each embedding
print(embedding_statistics$mean_per_embedding)
#> NULL

# Also look at the overall mean of means Kendall's tau
print(embedding_statistics$mean)
#> NULL

# Look at the mean NMI (similar to correlation) for each cluster assignment
print(cluster_statistics$per_index_means)
#> NULL

# Can also see the mean NMI over all the cluster assignments
print(mean(cluster_statistics$per_index_means))
#> Warning in mean.default(cluster_statistics$per_index_means): argument is not
#> numeric or logical: returning NA
#> [1] NA

A more manual approach can be taken by running each function seperately. This allows you to see the results of each step before moving onto the next.

# Create a list of embeddings using the input pca
emb_list <- scStability::createEmb(input_pca, n_runs = 15, method = 'umap', n_cores = 8)

# Compare the generated embedding list and look at the output statistics. A density plot will be generated which shows the density of the mean Kendall's tau for each embedding. 
embedding_stats <- scStability::compareEmb(emb_list, n_cores = 8)
#> Mean of Mean Correlation per Embedding: 0.7716
#> 95 Percent CI of Mean Correlation per Embedding: [0.7326, 0.8199]
#> Mean Correlation Range: Min = 0.7306 Max = 0.8200


# Create and compare cluster assignments. A violin plot showing the distribution of mean NMI values
cluster_stats <- scStability::clustStable(n_runs = 15, seurat_obj = seurat_obj, method = 'louvain', n_cores = 8)
#> Mean of Mean NMI per Index: 0.9849
#> 95 Percent CI of Mean NMI per Index: [0.9812, 0.9875]

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.