Signac and SPRING: Learning CD56 NK cells from multi-modal analysis of CITE-seq PBMCs from 10X Genomics

The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

Mathew Chamberlain

2021-11-18

This vignette shows how to use SignacX with Seurat and SPRING to learn a new cell type category from single cell data.

Load data

We start with CITE-seq data that were already classified with SignacX using the SPRING pipeline.

library(Seurat)
library(SignacX)

Load CITE-seq data from 10X Genomics processed with SPRING and classified with SignacX already.

# load CITE-seq data
data.dir = './CITESEQ_EXPLORATORY_CITESEQ_5K_PBMCS/FullDataset_v1_protein'
E = CID.LoadData(data.dir = data.dir)

# Load labels
json_data = rjson::fromJSON(file=paste0(data.dir,'/categorical_coloring_data.json'))

Create a Seurat object for the protein expression data; we will use this as a reference.

# separate protein and gene expression data
logik = grepl("Total", rownames(E))
P = E[logik,]
E = E[!logik,]

# CLR normalization in Seurat
colnames(P) <- 1:ncol(P)
colnames(E) <- 1:ncol(E)
reference <- CreateSeuratObject(E)
reference[["ADT"]] <- CreateAssayObject(counts = P)
reference <- NormalizeData(reference, assay = "ADT", normalization.method = "CLR")

Identify CD56 bright NK cells based on protein expression data.

# generate labels 
lbls = json_data$CellStates$label_list
lbls[lbls != "NK"] = "Unclassified"
CD16 = reference@assays$ADT@counts[rownames(reference@assays$ADT@counts) == "CD16-TotalSeqB-CD16",]
CD56 = reference@assays$ADT@counts[rownames(reference@assays$ADT@counts) == "CD56-TotalSeqB-CD56",]
logik = log2(CD56) > 10 & log2(CD16) < 7.5 & lbls == "NK"; sum(logik)
lbls[logik] = "NK.CD56bright"

SignacX

Generate a training data set from the reference data and save it for later use. Note:

SignacBoot performs feature selection, bootstrapping, imputation and normalization to derive a training data set from single cell data.

# generate bootstrapped single cell data
R_learned = SignacBoot(E = E, spring.dir = data.dir, L = c("NK", "NK.CD56bright"), labels = lbls, logfc.threshold = 1)

# save the training data
save(R_learned, file = "training_NKBright_v207.rda")

Classify a new data set with the model

Load expression data for a different data set (this was also previously processed through SPRING and SignacX)

# Classify another data set with new model
# load new data
new.data.dir = "./PBMCs_5k_10X/FullDataset_v1"
E = CID.LoadData(data.dir = new.data.dir)
# load cell types identified with Signac
json_data = rjson::fromJSON(file=paste0(new.data.dir,'/categorical_coloring_data.json'))

Generate new labels. Note:

Signac trains an ensemble of 100 neural network classifiers using the new training data set built above (R_learned), and then classifies unseen data (E).

# generate new labels
cr_learned = Signac(E = E, R = R_learned, spring.dir = new.data.dir)

Now we amend the existing labels (classified previously with SignacX); we add the new labels and generate a new SPRING layout.Note:

We usually copy the existing SPRING files from “FullDataset_v1” to “FullDataset_v1_Learned” to generate a new layout while preserving the existing layout.

# modify the existing labels
cr = lapply(json_data, function(x) x$label_list)
logik = cr$CellStates == 'NK'
cr$CellStates[logik] = cr_learned[logik]
logik = cr$CellStates_novel == 'NK'
cr$CellStates_novel[logik] = cr_learned[logik]
new.data.dir = paste0(new.data.dir, "_Learned")

Save results

# save
dat = CID.writeJSON(cr, spring.dir = new.data.dir, new_colors = c('red'), new_populations = c( 'NK.CD56bright'))

Session Info

## R version 4.0.3 (2020-10-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.5 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] compiler_4.0.3    magrittr_2.0.1    formatR_1.7       htmltools_0.5.1.1
##  [5] tools_4.0.3       yaml_2.2.1        stringi_1.5.3     rmarkdown_2.6    
##  [9] highr_0.8         knitr_1.30        stringr_1.4.0     digest_0.6.27    
## [13] xfun_0.20         rlang_0.4.10      evaluate_0.14

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.