The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
library(Canek)
# Functions
## Function to plot the pca coordinates
plotPCA <- function(pcaData = NULL, label = NULL, legPosition = "topleft"){
col <- as.integer(label)
plot(x = pcaData[,"PC1"], y = pcaData[,"PC2"],
col = as.integer(label), cex = 0.75, pch = 19,
xlab = "PC1", ylab = "PC2")
legend(legPosition, pch = 19,
legend = levels(label),
col = unique(as.integer(label)))
}
On this toy example we use the two simulated batches included in the
SimBatches
data from Canek’s package.
SimBatches
is a list containing:
batches
: Simulated scRNA-seq datasets with genes (rows)
and cells (columns). Simulations were performed using Splatter.cell_type
: a factor containing the celltype labels of
the batcheslsData <- list(B1 = SimBatches$batches[[1]], B2 = SimBatches$batches[[2]])
batch <- factor(c(rep("Batch-1", ncol(lsData[[1]])),
rep("Batch-2", ncol(lsData[[2]]))))
celltype <- SimBatches$cell_types
table(batch)
#> batch
#> Batch-1 Batch-2
#> 631 948
table(celltype)
#> celltype
#> Cell Type 1 Cell Type 2 Cell Type 3 Cell Type 4
#> 1451 53 38 37
We perform the Principal Component Analysis (PCA) of the joined datasets and scatter plot the first two PCs. The batch-effect causes cells to group by batch.
We correct the toy batches using the function RunCanek. This function accepts:
On this example we use the list of matrices created before.
We perform PCA of the corrected datasets and plot the first two PCs. After correction, the cells group by their corresponding cell type.
sessionInfo()
#> R version 4.1.3 (2022-03-10)
#> Platform: x86_64-apple-darwin13.4.0 (64-bit)
#> Running under: macOS Big Sur/Monterey 10.16
#>
#> Matrix products: default
#> BLAS/LAPACK: /Users/martin/miniconda3/envs/R_4.1.3/lib/libopenblasp-r0.3.18.dylib
#>
#> locale:
#> [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] Canek_0.2.5
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.10 highr_0.10 DEoptimR_1.0-11
#> [4] bslib_0.4.2 compiler_4.1.3 bluster_1.4.0
#> [7] jquerylib_0.1.4 class_7.3-21 prabclus_2.3-2
#> [10] BiocNeighbors_1.12.0 numbers_0.8-5 tools_4.1.3
#> [13] digest_0.6.31 mclust_6.0.0 jsonlite_1.8.4
#> [16] evaluate_0.20 lattice_0.20-45 pkgconfig_2.0.3
#> [19] rlang_1.0.6 Matrix_1.5-3 igraph_1.3.5
#> [22] cli_3.6.0 rstudioapi_0.14 yaml_2.3.7
#> [25] parallel_4.1.3 xfun_0.37 fastmap_1.1.0
#> [28] knitr_1.42 cluster_2.1.4 sass_0.4.5
#> [31] S4Vectors_0.32.4 fpc_2.2-10 diptest_0.76-0
#> [34] nnet_7.3-18 stats4_4.1.3 grid_4.1.3
#> [37] robustbase_0.95-0 R6_2.5.1 flexmix_2.3-18
#> [40] BiocParallel_1.28.3 rmarkdown_2.20 irlba_2.3.5.1
#> [43] kernlab_0.9-32 magrittr_2.0.3 matrixStats_0.63.0
#> [46] modeltools_0.2-23 htmltools_0.5.4 BiocGenerics_0.40.0
#> [49] MASS_7.3-58.3 cachem_1.0.6 FNN_1.1.3.1
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.