The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
Exploring copy number signatures with recently developed approach have been described at The repertoire of copy number alteration signatures in human cancer.
A more general introduction please read Extract, Analyze and Visualize Mutational Signatures with Sigminer.
library(sigminer)
#> Registered S3 method overwritten by 'sigminer':
#> method from
#> print.bytes Rcpp
#> sigminer version 2.3.1
#> - Star me at https://github.com/ShixiangWang/sigminer
#> - Run hello() to see usage and citation.
For this analysis, data with six columns are required.
load(system.file("extdata", "toy_segTab.RData",
package = "sigminer", mustWork = TRUE
))
set.seed(1234)
segTabs$minor_cn <- sample(c(0, 1), size = nrow(segTabs), replace = TRUE)
cn <- read_copynumber(segTabs,
seg_cols = c("chromosome", "start", "end", "segVal"),
genome_measure = "wg", complement = TRUE, add_loh = TRUE
)
#> ℹ [2024-05-11 12:07:22.722145]: Started.
#> ℹ [2024-05-11 12:07:22.739754]: Genome build : hg19.
#> ℹ [2024-05-11 12:07:22.741917]: Genome measure: wg.
#> ℹ [2024-05-11 12:07:22.743996]: When add_loh is TRUE, use_all is forced to TRUE.
#> Please drop columns you don't want to keep before reading.
#> ✔ [2024-05-11 12:07:22.76519]: Chromosome size database for build obtained.
#> ℹ [2024-05-11 12:07:22.767823]: Reading input.
#> ✔ [2024-05-11 12:07:22.770067]: A data frame as input detected.
#> ✔ [2024-05-11 12:07:22.772699]: Column names checked.
#> ✔ [2024-05-11 12:07:22.777028]: Column order set.
#> ✔ [2024-05-11 12:07:22.796121]: Chromosomes unified.
#> ✔ [2024-05-11 12:07:22.824277]: Value 2 (normal copy) filled to uncalled chromosomes.
#> ✔ [2024-05-11 12:07:22.833025]: Data imported.
#> ℹ [2024-05-11 12:07:22.835415]: Segments info:
#> ℹ [2024-05-11 12:07:22.8376]: Keep - 477
#> ℹ [2024-05-11 12:07:22.839727]: Drop - 0
#> ✔ [2024-05-11 12:07:22.842663]: Segments sorted.
#> ℹ [2024-05-11 12:07:22.844732]: Adding LOH labels...
#> ℹ [2024-05-11 12:07:22.848268]: Joining adjacent segments with same copy number value. Be patient...
#> ✔ [2024-05-11 12:07:23.026732]: 410 segments left after joining.
#> ✔ [2024-05-11 12:07:23.029618]: Segmental table cleaned.
#> ℹ [2024-05-11 12:07:23.031965]: Annotating.
#> ✔ [2024-05-11 12:07:23.056107]: Annotation done.
#> ℹ [2024-05-11 12:07:23.058526]: Summarizing per sample.
#> ✔ [2024-05-11 12:07:23.094252]: Summarized.
#> ℹ [2024-05-11 12:07:23.09669]: Generating CopyNumber object.
#> ✔ [2024-05-11 12:07:23.099685]: Generated.
#> ℹ [2024-05-11 12:07:23.101863]: Validating object.
#> ✔ [2024-05-11 12:07:23.104097]: Done.
#> ℹ [2024-05-11 12:07:23.106758]: 0.385 secs elapsed.
cn
#> An object of class CopyNumber
#> =============================
#> sample n_of_seg n_of_cnv n_of_amp n_of_del n_of_vchr
#> <char> <int> <int> <int> <int> <int>
#> 1: TCGA-DF-A2KN-01A-11D-A17U-01 34 6 5 1 4
#> 2: TCGA-19-2621-01B-01D-0911-01 34 8 5 3 5
#> 3: TCGA-B6-A0X5-01A-21D-A107-01 29 8 4 4 2
#> 4: TCGA-A8-A07S-01A-11D-A036-01 39 11 2 9 4
#> 5: TCGA-26-6174-01A-21D-1842-01 44 13 8 5 8
#> 6: TCGA-CV-7432-01A-11D-2128-01 41 16 7 9 9
#> 7: TCGA-06-0644-01A-02D-0310-01 47 19 5 14 8
#> 8: TCGA-A5-A0G2-01A-11D-A042-01 40 21 5 16 10
#> 9: TCGA-99-7458-01A-11D-2035-01 49 26 10 16 13
#> 10: TCGA-05-4417-01A-22D-1854-01 53 37 33 4 17
#> n_loh cna_burden
#> <int> <num>
#> 1: 15 0.000
#> 2: 20 0.095
#> 3: 18 0.083
#> 4: 21 0.106
#> 5: 24 0.113
#> 6: 24 0.188
#> 7: 33 0.158
#> 8: 23 0.375
#> 9: 33 0.304
#> 10: 29 0.617
cn@data
#> chromosome start end segVal sample
#> <char> <num> <num> <int> <char>
#> 1: chr1 3218923 116319008 2 TCGA-05-4417-01A-22D-1854-01
#> 2: chr1 116324707 120523902 1 TCGA-05-4417-01A-22D-1854-01
#> 3: chr1 149879545 247812431 4 TCGA-05-4417-01A-22D-1854-01
#> 4: chr10 423671 135224372 3 TCGA-05-4417-01A-22D-1854-01
#> 5: chr11 458784 19461653 3 TCGA-05-4417-01A-22D-1854-01
#> ---
#> 406: chr6 1016984 170898549 2 TCGA-DF-A2KN-01A-11D-A17U-01
#> 407: chr7 746917 158385118 2 TCGA-DF-A2KN-01A-11D-A17U-01
#> 408: chr8 617885 145225107 2 TCGA-DF-A2KN-01A-11D-A17U-01
#> 409: chr9 790234 140938075 2 TCGA-DF-A2KN-01A-11D-A17U-01
#> 410: chrX 1 155270560 2 TCGA-DF-A2KN-01A-11D-A17U-01
#> minor_cn loh .loh_frac
#> <num> <lgcl> <num>
#> 1: 1.0000000 FALSE NA
#> 2: 0.0000000 TRUE NA
#> 3: 0.5000000 TRUE 0.1175943
#> 4: 1.0000000 FALSE NA
#> 5: 1.0000000 FALSE NA
#> ---
#> 406: 0.3333333 TRUE 0.9979494
#> 407: 1.0000000 FALSE NA
#> 408: 1.0000000 FALSE NA
#> 409: 0.5000000 TRUE 0.8328715
#> 410: NA FALSE NA
If you want to try other type of copy number signatures, change the method argument.
tally_s <- sig_tally(cn, method = "S")
#> ℹ [2024-05-11 12:07:23.165562]: Started.
#> ℹ [2024-05-11 12:07:23.171528]: When you use method 'S', please make sure you have set 'join_adj_seg' to FALSE and 'add_loh' to TRUE in 'read_copynumber() in the previous step!
#> ✔ [2024-05-11 12:07:23.197549]: Matrix generated.
#> ℹ [2024-05-11 12:07:23.200068]: 0.034 secs elapsed.
str(tally_s$all_matrices, max.level = 1)
#> List of 2
#> $ CN_40: int [1:10, 1:40] 0 0 0 0 0 0 0 0 0 0 ...
#> ..- attr(*, "dimnames")=List of 2
#> $ CN_48: int [1:10, 1:48] 0 0 0 0 0 0 0 0 0 0 ...
#> ..- attr(*, "dimnames")=List of 2
sig_denovo = sig_auto_extract(tally_s$all_matrices$CN_48)
#> Select Run 3, which K = 2 as best solution.
head(sig_denovo$Signature)
#> Sig1 Sig2
#> 0:homdel:0-100Kb 0.000000 0.000000e+00
#> 0:homdel:100Kb-1Mb 0.000000 0.000000e+00
#> 0:homdel:>1Mb 0.000000 0.000000e+00
#> 1:LOH:0-100Kb 3.609460 3.819129e-242
#> 1:LOH:100Kb-1Mb 6.316554 2.814800e-127
#> 1:LOH:1Mb-10Mb 13.535473 2.784288e-190
This directly calculates the contribution of 19 reference signatures.
act_refit = sig_fit(t(tally_s$all_matrices$CN_48), sig_index = "ALL", sig_db = "CNS_TCGA")
#> ℹ [2024-05-11 12:07:24.377693]: Started.
#> ✔ [2024-05-11 12:07:24.379994]: Signature index detected.
#> ℹ [2024-05-11 12:07:24.382046]: Checking signature database in package.
#> ℹ [2024-05-11 12:07:24.386141]: Checking signature index.
#> ℹ [2024-05-11 12:07:24.388193]: Valid index for db 'CNS_TCGA':
#> CN1 CN2 CN3 CN4 CN5 CN6 CN7 CN8 CN9 CN10 CN11 CN12 CN13 CN14 CN15 CN16 CN17 CN18 CN19
#> ✔ [2024-05-11 12:07:24.390339]: Database and index checked.
#> ✔ [2024-05-11 12:07:24.392602]: Signature normalized.
#> ℹ [2024-05-11 12:07:24.394607]: Checking row number for catalog matrix and signature matrix.
#> ✔ [2024-05-11 12:07:24.396599]: Checked.
#> ℹ [2024-05-11 12:07:24.398572]: Checking rownames for catalog matrix and signature matrix.
#> ✔ [2024-05-11 12:07:24.400536]: Checked.
#> ✔ [2024-05-11 12:07:24.402494]: Method 'QP' detected.
#> ✔ [2024-05-11 12:07:24.414918]: Corresponding function generated.
#> ℹ [2024-05-11 12:07:24.417316]: Calling function.
#> ℹ [2024-05-11 12:07:24.419917]: Fitting sample: TCGA-05-4417-01A-22D-1854-01
#> ℹ [2024-05-11 12:07:24.42325]: Fitting sample: TCGA-06-0644-01A-02D-0310-01
#> ℹ [2024-05-11 12:07:24.425641]: Fitting sample: TCGA-19-2621-01B-01D-0911-01
#> ℹ [2024-05-11 12:07:24.427867]: Fitting sample: TCGA-26-6174-01A-21D-1842-01
#> ℹ [2024-05-11 12:07:24.430052]: Fitting sample: TCGA-99-7458-01A-11D-2035-01
#> ℹ [2024-05-11 12:07:24.432224]: Fitting sample: TCGA-A5-A0G2-01A-11D-A042-01
#> ℹ [2024-05-11 12:07:24.434394]: Fitting sample: TCGA-A8-A07S-01A-11D-A036-01
#> ℹ [2024-05-11 12:07:24.436542]: Fitting sample: TCGA-B6-A0X5-01A-21D-A107-01
#> ℹ [2024-05-11 12:07:24.438711]: Fitting sample: TCGA-CV-7432-01A-11D-2128-01
#> ℹ [2024-05-11 12:07:24.440867]: Fitting sample: TCGA-DF-A2KN-01A-11D-A17U-01
#> ✔ [2024-05-11 12:07:24.443045]: Done.
#> ℹ [2024-05-11 12:07:24.445076]: Generating output signature exposures.
#> ✔ [2024-05-11 12:07:24.44793]: Done.
#> ℹ [2024-05-11 12:07:24.450022]: 0.072 secs elapsed.
We can use some threshold to keep really contributed signautres.
For de novo signatures:
Show the activity/exposure.
For reference signatures, you can just select what you want:
show_sig_profile(
get_sig_db("CNS_TCGA")$db[, rownames(act_refit2)],
style = "cosmic",
mode = "copynumber", method = "S", check_sig_names = FALSE)
Similarly for showing activity.
NOTE that this case shows relatively large difference with different approaches, so you need to pick based on your data size/quality and double-check the results. In general, for small-size data set, the refitting approach is recommended.
To assign the de-novo signatures to reference signatures, we use cosine similarity.
get_sig_similarity(sig_denovo, sig_db = "CNS_TCGA")
#> -Comparing against COSMIC signatures
#> ------------------------------------
#> --Found Sig1 most similar to CN1
#> Aetiology: See https://cancer.sanger.ac.uk/signatures/cn/ [similarity: 0.706]
#> --Found Sig2 most similar to CN2
#> Aetiology: See https://cancer.sanger.ac.uk/signatures/cn/ [similarity: 0.771]
#> ------------------------------------
#> Return result invisiblely.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.