GFM: installation and simulated example

Wei Liu

2023-02-13

Install the GFM

This vignette provides an introduction to the R package GFM, where the function gfm implements the model GFM, Generalized Factor Model for ultra-high dimensional variables with mixed types. The estimated factors and loading matrices can be applied to a variaty of downstream analyses, such as cell type clustering in single cell RNA sequencing data analysis and important SNPs identification in GWAS data analysis, among others.

The package can be installed with the command:

library(remotes)

remotes::install_github("feiyoung/GFM")

or

install.packages("GFM")

The package can be loaded with the command:

library("GFM")
set.seed(1) # set a random seed for reproducibility.

Fit GFM model using simulated data

GFM can handle data with homogeneous normal variables

First, we generate the data with homogeneous normal variables.

Then, we set the algorithm parameters and fit model

Third, we fit the GFM model with user-specified number of factors.

The number of factors can also be determined by data-driven manners.

GFM outperforms LFM in analyzing data with heterogeous normal variables

First, we generate the data with heterogeous normal variables and set the parameters of algorithm.

Third, we fit the GFM model with user-specified number of factors and compare the results with that of linear factor models.

The number of factors can also be determined by data-driven manners.

GFM outperforms LFM in analyzing data with Count(Poisson) variables

First, we generate the data with Count(Poisson) variables and set the parameters of algorithm.

Second, we we fit the GFM models given the true number of factors.

Additionally, we demonstrate the two methods, eigenvalue ratio test (ratio_test) and information criterion (IC), to choose the number of factors, which suggests both methods can accurately chooose the number of factors, while ratio_test is much more efficient than the IC method that is even though implemented in parallel. Thus, we strongly recommand ratio_test, especially for high-dimensional large-scale data.

Third, we compare the results with that of linear factor models.

GFM outperforms LFM in analyzing data with the mixed-types of count and categorical variables

First, we generate the data with Count(Poisson) variables and set the parameters of algorithm. Then fit the GFM model with user-specified number of factors.

Third, we compare the results with that of linear factor models.

Compare with linear factor models

Session information

sessionInfo()
#> R version 4.1.2 (2021-11-01)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 22621)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=C                              
#> [2] LC_CTYPE=Chinese (Simplified)_China.936   
#> [3] LC_MONETARY=Chinese (Simplified)_China.936
#> [4] LC_NUMERIC=C                              
#> [5] LC_TIME=Chinese (Simplified)_China.936    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.29   R6_2.5.1        jsonlite_1.8.0  magrittr_2.0.3 
#>  [5] evaluate_0.15   stringi_1.7.6   rlang_1.0.2     cli_3.2.0      
#>  [9] rstudioapi_0.13 jquerylib_0.1.4 bslib_0.3.1     rmarkdown_2.11 
#> [13] tools_4.1.2     stringr_1.4.0   xfun_0.29       yaml_2.3.6     
#> [17] fastmap_1.1.0   compiler_4.1.2  htmltools_0.5.2 knitr_1.37     
#> [21] sass_0.4.1