The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

Benchmarking SignacFast with single cell flow-sorted data

Mathew Chamberlain

2021-11-18

This vignette shows how to use SignacFast to annotate flow-sorted synovial cells by integrating SignacX with Seurat. We start with raw counts from this publication.

Load data

Read the CEL-seq2 data.

ReadCelseq <- function(counts.file, meta.file) {
    E = suppressWarnings(readr::read_tsv(counts.file))
    gns <- E$gene
    E = E[, -1]
    E = Matrix::Matrix(as.matrix(E), sparse = TRUE)
    rownames(E) <- gns
    E
}

counts.file = "./fls/celseq_matrix_ru10_molecules.tsv.gz"
meta.file = "./fls/celseq_meta.immport.723957.tsv"

E = ReadCelseq(counts.file = counts.file, meta.file = meta.file)
M = suppressWarnings(readr::read_tsv(meta.file))

# filter data based on depth and number of genes detected
kmu = Matrix::colSums(E != 0)
kmu2 = Matrix::colSums(E)
E = E[, kmu > 200 & kmu2 > 500]

# filter by mitochondrial percentage
logik = grepl("^MT-", rownames(E))
MitoFrac = Matrix::colSums(E[logik, ])/Matrix::colSums(E) * 100
E = E[, MitoFrac < 20]

Seurat

Start with the standard pre-processing steps for a Seurat object.

library(Seurat)

Create a Seurat object, and then perform SCTransform normalization. Note:

# load data
synovium <- CreateSeuratObject(counts = E, project = "FACs")

# run sctransform
synovium <- SCTransform(synovium)

Perform dimensionality reduction by PCA and UMAP embedding. Note:

# These are now standard steps in the Seurat workflow for visualization and clustering
synovium <- RunPCA(synovium, verbose = FALSE)
synovium <- RunUMAP(synovium, dims = 1:30, verbose = FALSE)
synovium <- FindNeighbors(synovium, dims = 1:30, verbose = FALSE)

SignacX

library(SignacX)

Generate Signac labels for the Seurat object. Note:

labels <- Signac(synovium, num.cores = 4)
celltypes = GenerateLabels(labels, E = synovium)

Sometimes, training the neural networks takes a lot of time. The above classification took 27 minutes. To make a faster method, we implemented SignacFast which uses pre-trained models. Note:

# Run SignacFast
labels_fast <- SignacFast(synovium, num.cores = 4)
celltypes_fast = GenerateLabels(labels_fast, E = synovium)

Compare results:

Celltypes:
B MPh NonImmune Plasma.cells TNK Unclassified
B 681 0 0 0 0 0
MPh 0 835 0 0 0 68
NonImmune 0 0 2487 0 0 0
Plasma.cells 0 0 0 263 0 6
TNK 0 0 0 0 1768 0
Unclassified 0 13 7 0 0 174
Cellstates:
B.memory B.naive Fibroblasts Macrophages Mon.Classical NonImmune Plasma.cells T.CD4.memory T.CD4.naive T.CD8.em T.CD8.naive T.regs Unclassified
B.memory 489 4 0 0 0 0 0 0 0 0 0 0 1
B.naive 4 184 0 0 0 0 0 0 0 0 0 0 0
DC 0 0 0 4 3 0 0 0 0 0 0 0 5
Fibroblasts 0 0 2110 0 0 136 0 0 0 0 0 0 0
Macrophages 0 0 0 662 33 1 0 0 0 0 0 0 73
Mon.Classical 0 0 0 23 93 0 0 0 0 0 0 0 1
NonImmune 0 0 74 0 0 166 0 0 0 0 0 0 2
Plasma.cells 0 0 0 0 0 1 259 0 0 0 0 0 8
T.CD4.memory 0 0 0 0 0 0 0 504 112 17 29 15 0
T.CD4.naive 0 0 0 0 0 0 0 0 309 4 18 0 1
T.CD8.em 0 0 0 0 1 0 0 7 4 574 1 0 2
T.CD8.naive 0 0 0 0 0 0 0 2 1 0 26 2 0
T.regs 0 0 0 0 0 0 0 0 27 1 2 106 0
Unclassified 0 0 1 12 3 5 0 0 0 1 0 0 179

Save results

saveRDS(synovium, file = "fls/seurat_obj_amp_synovium.rds")
saveRDS(celltypes, file = "fls/celltypes_amp_synovium.rds")
saveRDS(celltypes_fast, file = "fls/celltypes_fast_amp_synovium_celltypes.rds")
Session Info
## R version 4.0.3 (2020-10-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.5 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] compiler_4.0.3    magrittr_2.0.1    formatR_1.7       htmltools_0.5.1.1
##  [5] tools_4.0.3       yaml_2.2.1        stringi_1.5.3     rmarkdown_2.6    
##  [9] highr_0.8         knitr_1.30        stringr_1.4.0     digest_0.6.27    
## [13] xfun_0.20         rlang_0.4.10      evaluate_0.14

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.