The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
This vignette introduces the CAESAR.Suite workflow for the analysis of Xenium BC spatial transcriptomics dataset. In this vignette, the workflow of CAESAR.Suite consists of five steps
We demonstrate the application of CAESAR.Suite to Xenium data. In
this vignette, the input data includes: BC_scRNAList — two
scRNA-seq reference datasets, each with 3,000 cells;
BC_XeniumList — two Xenium target datasets, each with 3,000
cells; and BC_feature_imgList — two feature matrices
containing histology image information from the Xenium target datasets.
For more details with respect to histology image feature extraction, see
vignette.
The genes shared between the scRNA-seq reference and Xenium target
datasets were used for analysis. The package can be loaded with the
command:
set.seed(1) # set a random seed for reproducibility.
library(CAESAR.Suite) # load the package of CAESAR.Suite method
library(Seurat)
library(ggplot2)
library(msigdbr)
library(dplyr)##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Those data can be downloaded and load to the current working path by the following command:
githubURL <- "https://github.com/XiaoZhangryy/CAESAR.Suite/blob/master/vignettes_data/BC_scRNAList.rda?raw=true"
BC_scRNAList_file <- file.path(tempdir(), "BC_scRNAList.rda")
download.file(githubURL, BC_scRNAList_file, mode='wb')
load(BC_scRNAList_file)
print(BC_scRNAList)
githubURL <- "https://github.com/XiaoZhangryy/CAESAR.Suite/blob/master/vignettes_data/BC_XeniumList.rda?raw=true"
BC_XeniumList_file <- file.path(tempdir(), "BC_XeniumList.rda")
download.file(githubURL, BC_XeniumList_file, mode='wb')
load(BC_XeniumList_file)
print(BC_XeniumList)
githubURL <- "https://github.com/XiaoZhangryy/CAESAR.Suite/blob/master/vignettes_data/BC_feature_imgList.rda?raw=true"
BC_feature_imgList_file <- file.path(tempdir(), "BC_feature_imgList.rda")
download.file(githubURL, BC_feature_imgList_file, mode='wb')
load(BC_feature_imgList_file)
print(sapply(BC_feature_imgList, dim))Users can perform appropriate quality control on the reference dataset and target datasets. Genes expressed in fewer than one cell were removed. Other quality control steps can be set by the user according to the data quality. Here, since the scRNA-seq reference and Xenium target datasets were already matched by their shared genes, no additional quality control was performed.
# BC_scRNAList <- lapply(BC_scRNAList, function(seu) {
# CreateSeuratObject(
# counts = seu@assays$RNA@counts,
# meta.data = seu@meta.data,
# min.features = 5,
# min.cells = 1
# )
# })
#
# print(BC_scRNAList)
#
#
# BC_XeniumList <- lapply(BC_XeniumList, function(seu) {
# CreateSeuratObject(
# counts = seu@assays$RNA@counts,
# meta.data = seu@meta.data,
# min.features = 5,
# min.cells = 1
# )
# })
#
# print(BC_XeniumList)
#
# BC_feature_imgList <- lapply(1:2, function(i) {
# BC_feature_imgList[[i]][colnames(BC_XeniumList[[i]]), ]
# })First, we normalize the data and select the variable genes. We matched the genes and variable genes between the reference and target datasets.
# match genes
common_genes <- Reduce(intersect, c(
lapply(BC_scRNAList, rownames),
lapply(BC_XeniumList, rownames)
))
print(length(common_genes))
# all common genes are used as variable genes, as only around 300 genes here
BC_scRNAList <- lapply(BC_scRNAList, function(seu) {
seu <- seu[common_genes, ]
seu <- NormalizeData(seu)
VariableFeatures(seu) <- common_genes
seu
})
BC_XeniumList <- lapply(BC_XeniumList, function(seu) {
seu <- seu[common_genes, ]
seu <- NormalizeData(seu)
VariableFeatures(seu) <- common_genes
seu
})
print(BC_scRNAList)
print(BC_XeniumList)We introduce how to use CAESAR to detect signature genes form scRNA-seq reference data. First, we calculate the co-embeddings.
Then, we detect signature genes.
# calculate cell-gene distance and identify signature genes
sg_sc_List <- lapply(BC_scRNAList, function(seu) {
print(table(seu$CellType))
Idents(seu) <- seu$CellType
find.sig.genes(seu, reduction.name = "caesar")
})
str(sg_sc_List)Finally, select marker genes for each cell type from the signature gene list.
Similarly, we first calculate co-embeddings for Xenium BC dataset. The difference is that spatial transcriptome data has spatial coordinates and image feature information, so we can obtain image-based spatial aware co-embeddings.
BC_XeniumList <- lapply(1:2, function(i) {
seu <- BC_XeniumList[[i]]
# the spatial coordinates
pos <- seu@meta.data[, c("x_centroid", "y_centroid")]
print(head(pos))
# the image feature
feature_img <- BC_feature_imgList[[i]]
seu <- CAESAR.coembedding.image(
seu, feature_img, pos, reduction.name = "caesar", q = 50)
seu
})
names(BC_XeniumList) <- paste0("BC", 1:2)
print(BC_XeniumList)Subsequently, the CAESAR co-embeddings and marker gene lists from scRNA-seq reference datasets are used to annotate the Xenium BC data.
# convert marker list to marker frequency matrix
marker.freq <- markerList2mat(markerList)
# perform annotation using CAESAR and save results to Seurat object
BC_XeniumList <- lapply(
BC_XeniumList, CAESAR.annotation, marker.freq = marker.freq,
reduction.name = "caesar", add.to.meta = TRUE
)
print(colnames(BC_XeniumList[[1]]@meta.data))In the following, we visualize the CAESAR annotation results.
# set up colors
cols <- setNames(
c(
"#fdc086", "#386cb0", "#b30000", "#FBEA2E", "#731A73",
"#FF8C00", "#F898CB", "#4DAF4A", "#a6cee3", "#737373"
),
c(
"B-cells", "CAFs", "Cancer Epithelial", "Endothelial", "Myeloid",
"Normal Epithelial", "Plasmablasts", "PVL", "T-cells", "unassigned"
)
)
celltypes <- c(
"B-cells", "CAFs", "Cancer Epithelial", "Endothelial", "Myeloid",
"Normal Epithelial", "Plasmablasts", "PVL", "T-cells", "unassigned"
)
BC_XeniumList <- lapply(BC_XeniumList, function(seu) {
Idents(seu) <- factor(seu$CAESARunasg, levels = celltypes)
pos <- seu@meta.data[, c("x_centroid", "y_centroid")]
colnames(pos) <- paste0("pos", 1:2)
seu@reductions[["pos"]] <- CreateDimReducObject(
embeddings = as.matrix(pos),
key = paste0("pos", "_"), assay = "RNA"
)
seu
})First, we visualize the CAESAR annotation account for ‘unassigned’.
plots <- lapply(BC_XeniumList, function(seu) {
DimPlot(seu, reduction = "pos", cols = cols, pt.size = 1)
})
cowplot::plot_grid(plotlist = plots, ncol = 1)The confidence level of the CAESAR annotation can be visualized by
plots <- lapply(BC_XeniumList, function(seu) {
FeaturePlot(
seu,
reduction = "pos", features = "CAESARconf", pt.size = 1,
cols = c("blue", "lightgrey"), min.cutoff = 0.0, max.cutoff = 1.0
)
})
cowplot::plot_grid(plotlist = plots, ncol = 1)Next, we detect and visualize the signature genes for each cell type.
We remove unwanted variation by
dist_names <- paste0("dist_", gsub("-|/| ", ".", setdiff(celltypes, "unassigned")))
distList <- lapply(BC_XeniumList, function(seu) {
as.matrix(seu@meta.data[, dist_names])
})
seuInt <- CAESAR.RUV(BC_XeniumList, distList, verbose = FALSE, species = "human")
metaInt <- Reduce(rbind, lapply(BC_XeniumList, function(seu) {
as.matrix(seu@meta.data[, "CAESARunasg", drop = FALSE])
})) %>% as.data.frame()
colnames(metaInt) <- "CAESARunasg"
row.names(metaInt) <- colnames(seuInt)
seuInt <- AddMetaData(seuInt, metaInt, col.name = colnames(metaInt))
Idents(seuInt) <- factor(seuInt$CAESARunasg, levels = names(cols))
print(seuInt)Then, we can visualize the top three signature genes by a dot plot.
# obtain the top three signature genes
celltypes_plot <- setdiff(celltypes, "unassigned")
top3sgs <- Intsg(sg_List, 3)[celltypes_plot]
print(top3sgs)
sg_features <- unname(unlist(top3sgs))
DotPlot(
seuInt,
idents = celltypes_plot, col.min = -1, col.max = 2, dot.scale = 7,
features = sg_features, scale.min = 0, scale.max = 30
) + theme(axis.text.x = element_text(face = "italic", angle = 45, vjust = 1, hjust = 1))Next, we calculate the UMAP projections of co-embeddings of cells and the selected signature genes.
# calculate coumap
BC_XeniumList <- lapply(
BC_XeniumList, CoUMAP, reduction = "caesar",
reduction.name = "caesarUMAP", gene.set = sg_features
)
df_gene_label <- data.frame(
gene = unlist(top3sgs),
label = rep(names(top3sgs), each = 3)
)
plots <- lapply(BC_XeniumList, function(seu) {
CoUMAP.plot(
seu, reduction = "caesarUMAP", gene_txtdata = df_gene_label,
cols = c("gene" = "#000000", cols)
)
})
cowplot::plot_grid(plotlist = plots, ncol = 1)Next, we show how to use CAESAR for enrichment analysis. Here we choose GOBP pathways as an example. Let’s first get some pathways.
# pathway_list <- msigdbr(species = "Homo sapiens", category = "C5", subcategory = "GO:BP") %>%
# group_by(gs_name) %>%
# summarise(genes = list(intersect(gene_symbol, common_genes))) %>%
# tibble::deframe()
# n.pathway_list <- sapply(pathway_list, length)
# pathway_list <- pathway_list[n.pathway_list >= 5]
# --------------------------------------------
# To avoid potential issues caused by differences in operating systems,
# R package versions, and other uncontrollable factors across environments,
# we pre-generated the 'pathway_list' object using the code above
# --------------------------------------------
githubURL <- "https://github.com/XiaoZhangryy/CAESAR.Suite/blob/master/vignettes_data/pathway4BC.rda?raw=true"
pathway4BC_file <- file.path(tempdir(), "pathway4BC.rda")
download.file(githubURL, pathway4BC_file, mode='wb')
load(pathway4BC_file)
print(head(pathway_list))Then, we can test whether those pathways are enriched in BC1 section.
seuBC1 <- BC_XeniumList[[1]]
df_enrich <- CAESAR.enrich.pathway(
seuBC1, pathway.list = pathway_list, reduction = "caesar"
)
# obtain significant enriched pathways
pathways <- pathway_list[df_enrich$asy.wei.pval.adj < 0.05]Next, we can calculate the spot level enrichment scores and detect differentially enriched pathways.
enrich.score.BC1 <- CAESAR.enrich.score(seuBC1, pathways)
dep.pvals <- CAESAR.CTDEP(seuBC1, enrich.score.BC1)
head(dep.pvals)We can visualize the spatial heatmap of enrichment score.
seuBC1 <- AddMetaData(seuBC1, as.data.frame(enrich.score.BC1))
pathway <- "GOBP_VASCULATURE_DEVELOPMENT"
FeaturePlot(seuBC1, features = pathway, reduction = "pos") +
scale_color_gradientn(
colors = c("#fff7f3", "#fcc5c0", "#f768a1", "#ae017e", "#49006a"),
values = scales::rescale(seq(0, 1, 0.25)),
limits = c(0, 1)
) +
theme(
legend.position = "right",
legend.justification = "center",
legend.box = "vertical"
)## R Under development (unstable) (2025-02-28 r87848 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 26100)
##
## Matrix products: default
## LAPACK version 3.12.0
##
## locale:
## [1] LC_COLLATE=C
## [2] LC_CTYPE=Chinese (Simplified)_China.utf8
## [3] LC_MONETARY=Chinese (Simplified)_China.utf8
## [4] LC_NUMERIC=C
## [5] LC_TIME=Chinese (Simplified)_China.utf8
##
## time zone: Asia/Shanghai
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] dplyr_1.1.4 msigdbr_7.5.1 ggplot2_3.5.1 Seurat_5.2.1
## [5] SeuratObject_5.0.2 sp_2.2-0 CAESAR.Suite_0.2.3
##
## loaded via a namespace (and not attached):
## [1] matrixStats_1.5.0 spatstat.sparse_3.1-0
## [3] httr_1.4.7 RColorBrewer_1.1-3
## [5] tools_4.5.0 sctransform_0.4.1
## [7] backports_1.5.0 R6_2.6.1
## [9] lazyeval_0.2.2 uwot_0.2.3
## [11] withr_3.0.2 prettyunits_1.2.0
## [13] gridExtra_2.3 progressr_0.15.1
## [15] cli_3.6.4 Biobase_2.67.0
## [17] spatstat.explore_3.3-4 fastDummies_1.7.5
## [19] sass_0.4.9 mvtnorm_1.3-3
## [21] spatstat.data_3.1-4 proxy_0.4-27
## [23] ggridges_0.5.6 pbapply_1.7-2
## [25] harmony_1.2.3 scater_1.35.1
## [27] parallelly_1.42.0 readxl_1.4.3
## [29] rstudioapi_0.17.1 RSQLite_2.3.9
## [31] generics_0.1.3 gtools_3.9.5
## [33] ica_1.0-3 spatstat.random_3.3-2
## [35] car_3.1-3 Matrix_1.7-2
## [37] ggbeeswarm_0.7.2 DescTools_0.99.59
## [39] S4Vectors_0.45.4 abind_1.4-8
## [41] lifecycle_1.0.4 yaml_2.3.10
## [43] CompQuadForm_1.4.3 carData_3.0-5
## [45] SummarizedExperiment_1.37.0 SparseArray_1.7.6
## [47] BiocFileCache_2.15.1 Rtsne_0.17
## [49] grid_4.5.0 blob_1.2.4
## [51] promises_1.3.2 crayon_1.5.3
## [53] GiRaF_1.0.1 miniUI_0.1.1.1
## [55] lattice_0.22-6 haven_2.5.4
## [57] beachmat_2.23.6 cowplot_1.1.3
## [59] KEGGREST_1.47.0 pillar_1.10.1
## [61] knitr_1.49 ProFAST_1.4
## [63] GenomicRanges_1.59.1 boot_1.3-31
## [65] gld_2.6.7 future.apply_1.11.3
## [67] codetools_0.2-20 glue_1.8.0
## [69] spatstat.univar_3.1-1 data.table_1.17.0
## [71] vctrs_0.6.5 png_0.1-8
## [73] spam_2.11-1 org.Mm.eg.db_3.20.0
## [75] cellranger_1.1.0 gtable_0.3.6
## [77] cachem_1.1.0 xfun_0.51
## [79] S4Arrays_1.7.3 mime_0.12
## [81] survival_3.8-3 SingleCellExperiment_1.29.1
## [83] fitdistrplus_1.2-2 ROCR_1.0-11
## [85] nlme_3.1-167 bit64_4.6.0-1
## [87] progress_1.2.3 filelock_1.0.3
## [89] RcppAnnoy_0.0.22 GenomeInfoDb_1.43.4
## [91] bslib_0.9.0 irlba_2.3.5.1
## [93] vipor_0.4.7 KernSmooth_2.23-26
## [95] colorspace_2.1-1 BiocGenerics_0.53.6
## [97] DBI_1.2.3 ade4_1.7-23
## [99] Exact_3.3 tidyselect_1.2.1
## [101] DR.SC_3.4 bit_4.5.0.1
## [103] compiler_4.5.0 curl_6.2.1
## [105] httr2_1.1.0 BiocNeighbors_2.1.2
## [107] expm_1.0-0 xml2_1.3.6
## [109] DelayedArray_0.33.6 plotly_4.10.4
## [111] scales_1.3.0 lmtest_0.9-40
## [113] rappdirs_0.3.3 stringr_1.5.1
## [115] digest_0.6.37 goftest_1.2-3
## [117] spatstat.utils_3.1-2 rmarkdown_2.29
## [119] XVector_0.47.2 htmltools_0.5.8.1
## [121] pkgconfig_2.0.3 MatrixGenerics_1.19.1
## [123] dbplyr_2.5.0 fastmap_1.2.0
## [125] rlang_1.1.5 htmlwidgets_1.6.4
## [127] ggthemes_5.1.0 UCSC.utils_1.3.1
## [129] shiny_1.10.0 farver_2.1.2
## [131] jquerylib_0.1.4 zoo_1.8-13
## [133] jsonlite_1.8.9 BiocParallel_1.41.2
## [135] mclust_6.1.1 BiocSingular_1.23.0
## [137] magrittr_2.0.3 Formula_1.2-5
## [139] scuttle_1.17.0 GenomeInfoDbData_1.2.13
## [141] dotCall64_1.2 patchwork_1.3.0
## [143] munsell_0.5.1 Rcpp_1.0.14
## [145] babelgene_22.9 viridis_0.6.5
## [147] reticulate_1.41.0 furrr_0.3.1
## [149] stringi_1.8.4 rootSolve_1.8.2.4
## [151] MASS_7.3-65 org.Hs.eg.db_3.20.0
## [153] plyr_1.8.9 parallel_4.5.0
## [155] PRECAST_1.6.5 listenv_0.9.1
## [157] ggrepel_0.9.6 forcats_1.0.0
## [159] lmom_3.2 deldir_2.0-4
## [161] Biostrings_2.75.4 splines_4.5.0
## [163] tensor_1.5 hms_1.1.3
## [165] igraph_2.1.4 ggpubr_0.6.0
## [167] spatstat.geom_3.3-5 ggsignif_0.6.4
## [169] RcppHNSW_0.6.0 reshape2_1.4.4
## [171] biomaRt_2.63.1 stats4_4.5.0
## [173] ScaledMatrix_1.15.0 evaluate_1.0.3
## [175] httpuv_1.6.15 RANN_2.6.2
## [177] tidyr_1.3.1 purrr_1.0.4
## [179] polyclip_1.10-7 future_1.34.0
## [181] scattermore_1.2 rsvd_1.0.5
## [183] broom_1.0.7 xtable_1.8-4
## [185] e1071_1.7-16 RSpectra_0.16-2
## [187] rstatix_0.7.2 later_1.4.1
## [189] viridisLite_0.4.2 class_7.3-23
## [191] tibble_3.2.1 memoise_2.0.1
## [193] beeswarm_0.4.0 AnnotationDbi_1.69.0
## [195] IRanges_2.41.3 cluster_2.1.8
## [197] globals_0.16.3
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.