How To Optimize the Memory Requirements of SME

The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

This tutorial outlines the factors that affect memory usage when running the sme() function and how to adjust parameters to optimize memory consumption based on your available resources. To estimate memory requirements, use the approximate_memory_requirements() function, which provides a rough estimate of the C++ objects stored in memory during the sme() function call, though it may not reflect exact memory usage.

Genotype Data Size and Number of Blocks

The sample size is the primary factor influencing memory requirements. Both phenotype and genotype data need to be loaded into memory for computation. For large datasets, like Biobank-scale data (350k samples and 500k SNPs), loading the entire dataset into memory requires about 1.4TB (assuming double precision for the data matrix), which exceeds most machines’ capacities.

To manage large datasets more efficiently, sme() reads the genotype data in smaller blocks. The parameter n_blocks controls the number of blocks. For instance, with 500k SNPs, setting n_blocks = 100 will load 5000 SNPs into memory at a time, reducing the memory load and allowing computations to proceed block by block.

Number of Random Vectors

The sme() function uses a stochastic trace estimator to approximate the trace of matrix products efficiently. The number of random vectors impacts both the accuracy of trace estimates and memory and computational efficiency.

During blockwise computation, the algorithm stores intermediate matrices sized sample_size x n_randvecs. Increasing the number of random vectors improves accuracy but also increases memory usage and computation time. Typically, using around 10 random vectors provides reasonably accurate results.

Number of SNPs Sharing Random Vectors

The chunk_size parameter controls how many SNPs share the same set of random vectors, enhancing the efficiency of genome-wide data processing. This method reduces redundant calculations of the genetic relatedness covariance matrix and minimizes the time spent reading genotype data into memory.

For each set of SNPs analyzed together (in a “chunk”), intermediate results must be stored. Consequently, the memory requirement grows with the chunk size, calculated as: chunk_size x (sample_size x n_randvecs).

Sparse Marginal Epistasis test (SME) schematic pseudo-random vectors Figure 1. Schematic overview illustrating the compuational speedup resulting from sharing random vectors. (a) In the randomized trace estimates we can identify reusable matrix by vector products. Computing the exact trace of a product of two covariance matrices is prohibitively computationally expensive. Instead, the sparse marginal epistasis (SME) test approximates the traces using random vectors \(z\). For the full MQS computation of the point estimates of the variance components, we see that the matrix-by-vector products of the form \(Az\) with \(A \in \{K, G\}\) appear repeatedly. (b) The genetic relatedness matrix \(K\) is the same for all focal SNPs. Using unique random vectors in this computation for every focal SNP, we compute the same quantity repeatedly. Computing the matrix-by-vector products \(Kz\) constitutes almost half of the computation time of the point estimates. (c) By sharing random vectors \(z\) between focal SNPs, computing \(Kz\) can be done once for all focal SNPs that share random vectors. With this, the computation time of \(Kz\) becomes negligible.

Genotype Masking for the Gene-by-Gene Interaction Covariance

Masking genotypes that do not contribute to epistasis can help reduce memory usage and computation time. When masked, these genotypes do not need to be stored in memory, significantly decreasing memory requirements. Note that the approximate_memory_requirements() function does not account for this reduction.

Explore the Memory Requirements

To estimate memory needs based on your chosen parameters, use the approximate_memory_requirements() function. This function helps you determine if your planned settings will fit within available memory and identify which parameters can be adjusted to meet your resource constraints. The parameters n_blocks, n_randvecs, and chunk_size are particularly flexible and have a significant impact on memory usage. Note however, that it does not account for masking and therefore likely overestimates the required memory.

n_samples <- c(350000)
n_snps <- c(500000)
n_blocks <- c(1, 100, 1000)
n_randvecs <- c(10, 100)
chunk_size <- c(10, 100)

parameters <- crossing(
  n_samples = n_samples,
  n_snps = n_snps,
  n_blocks = n_blocks,
  n_randvecs = n_randvecs,
  chunk_size = chunk_size
)


estimated_memory <- parameters %>%
  mutate(memory_gb = round(
    approximate_memory_requirements(n_samples,
                                    n_snps,
                                    n_blocks,
                                    n_randvecs,
                                    chunk_size),
    2
  ))

kable(estimated_memory)

n_samples	n_snps	n_blocks	n_randvecs	chunk_size	memory_gb
350000	5e+05	1	10	10	797.64
350000	5e+05	1	10	100	7322.88
350000	5e+05	1	100	10	801.16
350000	5e+05	1	100	100	7347.53
350000	5e+05	100	10	10	8.73
350000	5e+05	100	10	100	79.24
350000	5e+05	100	100	10	12.25
350000	5e+05	100	100	100	103.89
350000	5e+05	1000	10	10	1.56
350000	5e+05	1000	10	100	13.39
350000	5e+05	1000	100	10	5.08
350000	5e+05	1000	100	100	38.04

A Note on the Runtime of SME

Despite the computational efficiency of SME, genome-wide testing requires considerable resources. We recommend to analyze data in batches, and to launch multiple processes simultaneously on a high-performance cluster (HPC). In this study, we analyzed 544k SNPs genotype in 350k individuals. We launched 544 slurm jobs requesting 43GB memory and 6 CPUs each to analyze batches of 1000 SNPs with chunk sizes of 250 SNPs. Genome-wide testing of a single trait on an HPC with 960 CPUs and 6840GB of memory available took about 3.5 days.

Sparse Marginal Epistasis test (SME) runtime Figure 2. SME has improved power to detect marginal epistasis and runs 10x to 90x faster than state-of-the-art methods. The CPU time was measured on 350,000 individuals.

SessionInfo

sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: aarch64-apple-darwin20
#> Running under: macOS Sequoia 15.2
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
#> 
#> locale:
#> [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> time zone: Europe/Berlin
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] knitr_1.49           tidyr_1.3.1          GenomicRanges_1.56.2
#>  [4] GenomeInfoDb_1.40.1  IRanges_2.38.1       S4Vectors_0.42.1    
#>  [7] BiocGenerics_0.50.0  ggplot2_3.5.1        dplyr_1.1.4         
#> [10] smer_0.0.1          
#> 
#> loaded via a namespace (and not attached):
#>  [1] sass_0.4.9              generics_0.1.3          digest_0.6.37          
#>  [4] magrittr_2.0.3          evaluate_1.0.3          grid_4.4.2             
#>  [7] iterators_1.0.14        CompQuadForm_1.4.3      mvtnorm_1.3-3          
#> [10] fastmap_1.2.0           foreach_1.5.2           genio_1.1.2            
#> [13] jsonlite_1.8.9          backports_1.5.0         httr_1.4.7             
#> [16] purrr_1.0.2             UCSC.utils_1.0.0        scales_1.3.0           
#> [19] codetools_0.2-20        jquerylib_0.1.4         cli_3.6.3              
#> [22] rlang_1.1.4             XVector_0.44.0          munsell_0.5.1          
#> [25] withr_3.0.2             cachem_1.1.0            yaml_2.3.10            
#> [28] FMStable_0.1-4          tools_4.4.2             parallel_4.4.2         
#> [31] checkmate_2.3.2         colorspace_2.1-1        GenomeInfoDbData_1.2.12
#> [34] vctrs_0.6.5             R6_2.5.1                lifecycle_1.0.4        
#> [37] zlibbioc_1.50.0         mvMAPIT_2.0.3           pkgconfig_2.0.3        
#> [40] pillar_1.10.1           bslib_0.8.0             gtable_0.3.6           
#> [43] glue_1.8.0              Rcpp_1.0.14             harmonicmeanp_3.0.1    
#> [46] xfun_0.50               tibble_3.2.1            tidyselect_1.2.1       
#> [49] farver_2.1.2            htmltools_0.5.8.1       rmarkdown_2.29         
#> [52] labeling_0.4.3          compiler_4.4.2

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.