Repository Mirror for your Cloud Server and Webhosting

Type:

Package

Title:

Genomic Selection

Version:

0.1.0

Author:

Sayanti Guha Majumdar, Anil Rai, Dwijesh Chandra Mishra

Maintainer:

Sayanti Guha Majumdar <sayanti23gm@gmail.com>

Description:

Genomic selection is a specialized form of marker assisted selection. The package contains functions to select important genetic markers and predict phenotype on the basis of fitted training data using integrated model framework (Guha Majumdar et. al. (2019) <doi:10.1089/cmb.2019.0223>) developed by combining one additive (sparse additive models by Ravikumar et. al. (2009) <doi:10.1111/j.1467-9868.2009.00718.x>) and one non-additive (hsic lasso by Yamada et. al. (2014) <doi:10.1162/NECO_a_00537>) model.

License:

GPL-3

Encoding:

UTF-8

LazyData:

true

Imports:

SAM, penalized, gdata, stats, utils

RoxygenNote:

6.1.1

Depends:

R (≥ 3.5)

NeedsCompilation:

Packaged:

2019-10-31 08:59:34 UTC; SAYANTI

Repository:

CRAN

Date/Publication:

2019-11-04 16:30:27 UTC

Genomic Selection

Description

Details

The DESCRIPTION file:

Package:	GSelection
Type:	Package
Title:	Genomic Selection
Version:	0.1.0
Author:	Sayanti Guha Majumdar, Anil Rai, Dwijesh Chandra Mishra
Maintainer:	Sayanti Guha Majumdar <sayanti23gm@gmail.com>
Description:	Genomic selection is a specialized form of marker assisted selection. The package contains functions to select important genetic markers and predict phenotype on the basis of fitted training data using integrated model framework (Guha Majumdar et. al. (2019) <doi:10.1089/cmb.2019.0223>) developed by combining one additive (sparse additive models by Ravikumar et. al. (2009) <doi:10.1111/j.1467-9868.2009.00718.x>) and one non-additive (hsic lasso by Yamada et. al. (2014) <doi:10.1162/NECO_a_00537>) model.
License:	GPL-3
Encoding:	UTF-8
LazyData:	true
Imports:	SAM, penalized, gdata, stats, utils
RoxygenNote:	6.1.1
Depends:	R (>= 3.5)
NeedsCompilation:	no
Packaged:	2019-10-26 10:25:25 UTC; user6

Index of help topics:

GS                      Genotypic and phenotypic simulated dataset
GSelection-package      Genomic Selection
RED                     Redundancy Rate
feature.selection       Genomic Feature Selection
genomic.prediction      Genomic Prediction
hsic.var.ensemble       Error Variance Estimation in Genomic Prediction
hsic.var.rcv            Error Variance Estimation in Genomic Prediction
spam.var.ensemble       Error Variance Estimation in Genomic Prediction
spam.var.rcv            Error Variance Estimation in Genomic Prediction

Author(s)

Sayanti Guha Majumdar, Anil Rai, Dwijesh Chandra Mishra

Maintainer: Sayanti Guha Majumdar <sayanti23gm@gmail.com>

References

Guha Majumdar, S., Rai, A. and Mishra, D. C. (2019). Integrated framework for selection of additive and non-additive genetic markers for genomic selection. Journal of Computational Biology. doi:10.1089/cmb.2019.0223
Ravikumar, P., Lafferty, J., Liu, H. and Wasserman, L. (2009). Sparse additive models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(5), 1009-1030. doi:10.1111/j.1467-9868.2009.00718.x
Yamada, M., Jitkrittum, W., Sigal, L., Xing, E. P. and Sugiyama, M. (2014). High-Dimensional Feature Selection by Feature-Wise Kernelized Lasso. Neural Computation, 26(1):185-207. doi:10.1162/NECO_a_00537

Genotypic and phenotypic simulated dataset

Description

This dataset is simulated with the R package "qtlbim" where 10 are true features associated with the trait of study and remaining 100 are random markers. we consider 10 chromosomes each containing 10 markers. Each chromosome have 1 qtl which is the true feature.

Usage

data("GS")

Format

A data frame with 60 rows as genotypes with 111 columns (i.e. contains information of genotyped markers and phenotypic traits).

Details

It has total 60 rows which represents 200 individuals genotypes and a total of 111 of columns, in which first 110 columns contain information of genotyped markers and last column represents value of phenotypic trait associated with genotype under study.

Source

Yandell, B. S., Mehta, T., Banerjee, S., Shriner, D., Venkataraman, R. et al. (2007). R/qtlbim: QTL with Bayesian Interval Mapping in experimental crosses. Bioinformatics, 23, 641-643.
Yandell, B. S., Nengjun, Y., Mehta, T., Banerjee, S., Shriner, D. et al. (2012). qtlbim: QTL Bayesian Interval Mapping. R package version 2.0.5. http://CRAN.R-project.org/package=qtlbim

Examples

library(GSelection)
data(GS)
X<-GS[,1:110]   ## Extracting Genotype
Y<-GS[,111] ## Extracting Phenotype

Redundancy Rate

Description

Calculate the redundancy rate of the selected features(markers). Value will be high if many redundant features are selected.

Usage

RED(x,spam_selected_feature_index,hsic_selected_feature_index,
integrated_selected_feature_index)

Arguments

x

a matrix of markers or explanatory variables, each column contains one marker and each row represents an individual.

spam_selected_feature_index

index of selected markers from x using Sparse Additive Model.

hsic_selected_feature_index

index of selected markers from x using HSIC LASSO.

integrated_selected_feature_index

index of selected markers from x using integrated model framework

Details

The RED score (Zhao et al., 2010) is determined by average of the correlation between each pair of selected markers. A large RED score signifies that selected features are more strongly correlated to each other which means many redundant features are selected. Thus, a small redundancy rate is preferable for feature selection.

Value

Returns a LIST containing

RED_spam

returns redundancy rate of features selected by using Sparse Additive Model.

RED_hsic

returns redundancy rate of features selected by using HSIC LASSO.

RED_I

returns redundancy rate of features selected by using integrated model framework.

Author(s)

Sayanti Guha Majumdar <sayanti23gm@gmail.com>, Anil Rai, Dwijesh Chandra Mishra

References

Examples

library(GSelection)
data(GS)
x_trn <- GS[1:40,1:110]
y_trn <- GS[1:40,111]
x_tst <- GS[41:60,1:110]
y_tst <- GS[41:60,111]
fit <- feature.selection(x_trn,y_trn,d=10)
red <- RED(x_trn,fit$spam_selected_feature_index,fit$hsic_selected_feature_index,
fit$integrated_selected_feature_index)

Genomic Feature Selection

Description

Feature (marker) selection in case of genomic prediction with integrated model framework using both additive (Sparse Additive Models) and non-additive (HSIC LASSO) statistical models.

Usage

feature.selection(x,y,d)

Arguments

x

a matrix of markers or explanatory variables, each column contains one marker and each row represents an individual.

y

a column vector of response variable.

d

number of variables to be selected from x.

Details

Integrated model framework was developed by combining one additive model (Sparse Additive Model) and one non-additive model (HSIC LASSO) for selection of important markers from whole genome marker data.

Value

Returns a LIST containing

spam_selected_feature_index

returns index of selected markers from x using Sparse Additive Model

coefficient.spam

returns coefficient values of selected markers using Sparse Additive Model.

hsic_selected_feature_index

returns index of selected markers from x using HSIC LASSO.

coefficient.hsic

returns coefficient values of selected markers using HSIC LASSO.

integrated_selected_feature_index

returns index of selected markers from x using integrated model framework.

Author(s)

Sayanti Guha Majumdar <sayanti23gm@gmail.com>, Anil Rai, Dwijesh Chandra Mishra

References

Examples

library(GSelection)
data(GS)
x_trn <- GS[1:40,1:110]
y_trn <- GS[1:40,111]
x_tst <- GS[41:60,1:110]
y_tst <- GS[41:60,111]
fit <- feature.selection(x_trn,y_trn,d=10)

Genomic Prediction

Description

Prediction of phenotypic values based on selected markers with integrated model framework using both additive (Sparse Additive Models) and non-additive (HSIC LASSO) statistical models.

Usage

genomic.prediction(x,spam_error_var,hsic_error_var,
spam_selected_feature_index,hsic_selected_feature_index,
coefficient.spam,coefficient.hsic)

Arguments

x

a matrix of markers or explanatory variables for which phenotype will be predicted. Each column contains one marker and each row represents an individual.

spam_error_var

estimated error variance of genomic prediction by Sparse Additive Model.

hsic_error_var

estimated error variance of genomic prediction by HSIC LASSO.

spam_selected_feature_index

index of selected markers from x using Sparse Additive Model

hsic_selected_feature_index

index of selected markers from x using HSIC LASSO.

coefficient.spam

coefficient values of selected markers using Sparse Additive Model.

coefficient.hsic

coefficient values of selected markers using HSIC LASSO.

Details

Phenotypic values will be predicted for given genotype of markers by using previously fitted model object. Integrated model framework is used for this purpose which is developed by combining selected features from SpAm and HSIC LASSO.

Value

Integrated_y

returns predicted phenotype

Author(s)

Sayanti Guha Majumdar <sayanti23gm@gmail.com>, Anil Rai, Dwijesh Chandra Mishra

References

Examples

library(GSelection)
data(GS)
x_trn <- GS[1:40,1:110]
y_trn <- GS[1:40,111]
x_tst <- GS[41:60,1:110]
y_tst <- GS[41:60,111]

## estimate spam_var from function spam.var.ensemble or spam.var.rcv
spam_var <- 2.681972
## estimate hsic_var from function hsic.var.ensemble or hsic.var.rcv
hsic_var <- 10.36974

fit <- feature.selection(x_trn,y_trn,d=10)
pred_y <- genomic.prediction(x_tst,spam_var,hsic_var,
fit$spam_selected_feature_index,fit$hsic_selected_feature_index,
fit$coefficient.spam,fit$coefficient.hsic)

Error Variance Estimation in Genomic Prediction

Description

Estimation of error variance using Ensemble method which combines bootstraping and sampling with srswor in HSIC LASSO.

Usage

hsic.var.ensemble(x,y,b,d)

Arguments

x

a matrix of markers or explanatory variables, each column contains one marker and each row represents an individual.

y

a column vector of response variable.

b

number of bootstrap samples.

d

number of variables to be selected from x.

Details

In this method, both bootstrapping and simple random sampling without replacement are combined to estimate error variance. Variables are selected using HSIC LASSO from the original datasets and all possible samples of a particular size are taken from the selected variables set with simple random sampling without replacement. With these selected samples error variance is estimated from bootstrap samples of the original datasets using least squared regression method. Finally the average of all the estimated variances is considered as the final estimate of the error variance.

Value

Error variance

Author(s)

Sayanti Guha Majumdar <sayanti23gm@gmail.com>, Anil Rai, Dwijesh Chandra Mishra

References

Yamada, M., Jitkrittum, W., Sigal, L., Xing, E. P. and Sugiyama, M. (2014). High-Dimensional Feature Selection by Feature-Wise Kernelized Lasso. Neural Computation, 26(1):185-207. doi:10.1162/NECO_a_00537

Examples

library(GSelection)
data(GS)
x_trn <- GS[1:40,1:110]
y_trn <- GS[1:40,111]
x_tst <- GS[41:60,1:110]
y_tst <- GS[41:60,111]
hsic_var <- hsic.var.ensemble(x_trn,y_trn,2,10)

Error Variance Estimation in Genomic Prediction

Description

Estimation of error variance using Refitted Cross Validation in HSIC LASSO.

Usage

hsic.var.rcv(x,y,d)

Arguments

x

a matrix of markers or explanatory variables, each column contains one marker and each row represents an individual.

y

a column vector of response variable.

d

number of variables to be selected from x.

Details

Refitted cross validation method (RCV) which is a two step method, is used to get the estimate of the error variance. In first step, dataset is divided into two sub-datasets and with the help of HSIC LASSO most significant markers(variables) are selected from the two sub-datasets. This results in two small sets of selected variables. Then using the set selected from 1st sub-dataset error variance is estimated from the 2nd sub-dataset with ordinary least square method and using the set selected from the 2nd sub-dataset error variance is estimated from the 1st sub-dataset with ordinary least square method. Finally the average of those two error variances are taken as the final estimator of error variance with RCV method.

Value

Error variance

Author(s)

Sayanti Guha Majumdar <sayanti23gm@gmail.com>, Anil Rai, Dwijesh Chandra Mishra

References

Fan, J., Guo, S., Hao, N. (2012). Variance estimation using refitted cross-validation in ultrahigh dimensional regression. Journal of the Royal Statistical Society, 74(1), 37-65.
Yamada, M., Jitkrittum, W., Sigal, L., Xing, E. P. and Sugiyama, M. (2014). High-Dimensional Feature Selection by Feature-Wise Kernelized Lasso. Neural Computation, 26(1):185-207. doi:10.1162/NECO_a_00537

Examples

library(GSelection)
data(GS)
x_trn <- GS[1:40,1:110]
y_trn <- GS[1:40,111]
x_tst <- GS[41:60,1:110]
y_tst <- GS[41:60,111]
hsic_var <- hsic.var.rcv(x_trn,y_trn,10)

Error Variance Estimation in Genomic Prediction

Description

Estimation of error variance using Ensemble method which combines bootstraping and sampling with srswor in Sparse Additive Models.

Usage

spam.var.ensemble(x,y,b,d)

Arguments

x

a matrix of markers or explanatory variables, each column contains one marker and each row represents an individual.

y

a column vector of response variable.

b

number of bootstrap samples

d

number of variables to be selected from x.

Details

In this method, both bootstrapping and simple random sampling without replacement are combined to estimate error variance. Variables are selected using Sparse Additive Models (SpAM) from the original datasets and all possible samples of a particular size are taken from the selected variables set with simple random sampling without replacement. With these selected samples error variance is estimated from bootstrap samples of the original datasets using least squared regression method. Finally the average of all the estimated variances is considered as the final estimate of the error variance.

Value

Error variance

Author(s)

Sayanti Guha Majumdar <sayanti23gm@gmail.com>, Anil Rai, Dwijesh Chandra Mishra

References

Ravikumar, P., Lafferty, J., Liu, H. and Wasserman, L. (2009). Sparse additive models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(5), 1009-1030. doi:10.1111/j.1467-9868.2009.00718.x

Examples

library(GSelection)
data(GS)
x_trn <- GS[1:40,1:110]
y_trn <- GS[1:40,111]
x_tst <- GS[41:60,1:110]
y_tst <- GS[41:60,111]
spam_var <- spam.var.ensemble(x_trn,y_trn,2,10)

Error Variance Estimation in Genomic Prediction

Description

Estimation of error variance using Refitted cross validation in Sparse Additive Models.

Usage

spam.var.rcv(x,y,d)

Arguments

x

a matrix of markers or explanatory variables, each column contains one marker and each row represents an individual.

y

a column vector of response variable.

d

number of variables to be selected from x.

Details

Refitted cross validation method (RCV) which is a two step method, is used to get the estimate of the error variance. In first step, dataset is divided into two sub-datasets and with the help of Sparse Additive Models (SpAM) most significant markers(variables) are selected from the two sub-datasets. This results in two small sets of selected variables. Then using the set selected from 1st sub-dataset error variance is estimated from the 2nd sub-dataset with ordinary least square method and using the set selected from the 2nd sub-dataset error variance is estimated from the 1st sub-dataset with ordinary least square method. Finally the average of those two error variances are taken as the final estimator of error variance with RCV method.

Value

Error variance

Author(s)

Sayanti Guha Majumdar <sayanti23gm@gmail.com>, Anil Rai, Dwijesh Chandra Mishra

References

Fan, J., Guo, S., Hao, N. (2012).Variance estimation using refitted cross-validation in ultrahigh dimensional regression. Journal of the Royal Statistical Society, 74(1), 37-65.
Ravikumar, P., Lafferty, J., Liu, H. and Wasserman, L. (2009). Sparse additive models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(5), 1009-1030. doi:10.1111/j.1467-9868.2009.00718.x

Examples

library(GSelection)
data(GS)
x_trn <- GS[1:40,1:110]
y_trn <- GS[1:40,111]
x_tst <- GS[41:60,1:110]
y_tst <- GS[41:60,111]
spam_var <- spam.var.rcv(x_trn,y_trn,10)