Repository Mirror for your Cloud Server and Webhosting

Type:

Package

Title:

Genome-Wide Association Study with SNP-Set Methods

Version:

0.1.38

Maintainer:

Kosuke Hamazaki <hamazaki@ut-biomet.org>

Description:

By using 'RAINBOWR' (Reliable Association INference By Optimizing Weights with R), users can test multiple SNPs (Single Nucleotide Polymorphisms) simultaneously by kernel-based (SNP-set) methods. This package can also be applied to haplotype-based GWAS (Genome-Wide Association Study). Users can test not only additive effects but also dominance and epistatic effects. In detail, please check our paper on PLOS Computational Biology: Kosuke Hamazaki and Hiroyoshi Iwata (2020) <doi:10.1371/journal.pcbi.1007663>.

License:

MIT + file LICENSE

Encoding:

UTF-8

LazyData:

true

Depends:

R (≥ 3.5.0)

Imports:

Rcpp, Matrix, cluster, MASS, pbmcapply, optimx, methods, ape, stringr, pegas, rrBLUP, expm, here, htmlwidgets, Rfast, gaston, MM4LMM, R.utils

LinkingTo:

Rcpp, RcppEigen

RoxygenNote:

7.3.2

Suggests:

knitr, rmarkdown, plotly, haplotypes, adegenet, ggplot2, ggtree, scatterpie, phylobase, ggimage, furrr, future, progressr, foreach, doParallel, data.table

VignetteBuilder:

knitr

NeedsCompilation:

yes

Packaged:

2025-05-21 12:35:35 UTC; hamazaki

Author:

Kosuke Hamazaki [aut, cre], Hiroyoshi Iwata [aut, ctb]

Repository:

CRAN

Date/Publication:

2025-05-21 13:30:07 UTC

RAINBOWR: Perform Genome-Wide Asscoiation Study (GWAS) By Kernel-Based Methods

Description

By using 'RAINBOWR' (Reliable Association INference By Optimizing Weights with R), users can test multiple SNPs (Single Nucleotide Polymorphisms) simultaneously by kernel-based (SNP-set) methods. Users can test not only additive effects but also dominance and epistatic effects. In detail, please check our preprint on bioRxiv: Kosuke Hamazaki and Hiroyoshi Iwata (2019) <doi:10.1101/612028>.

Author(s)

Maintainer: Kosuke Hamazaki hamazaki@ut-biomet.org

Authors:

Hiroyoshi Iwata aiwata@mail.ecc.u-tokyo.ac.jp [contributor]

Function to calculate threshold for GWAS

Description

Calculate thresholds for the given GWAS (genome-wide association studies) result by the Benjamini-Hochberg method or Bonferroni method.

Usage

CalcThreshold(input, sig.level = 0.05, method = "BH")

Arguments

input

Data frame of GWAS results where the first column is the marker names, the second and third column is the chromosome amd map position, and the forth column is -log10(p) for each marker.

sig.level

Significance level for the threshold. The default is 0.05. You can also assign vector of sinificance levels.

method

Three methods are offered:

"BH": Benjamini-Hochberg method. To control FDR, use this method.

"Bonf": Bonferroni method. To perform simple correction of multiple testing, use this method.

"Sidak": Sidak method.

You can also assign two of them by 'method = c("BH", "Bonf")'

Value

The value of the threshold. If there is no threshold, it returns NA.

References

Benjamini, Y. and Hochberg, Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc. 57(1): 289-300.

Storey, J.D. and Tibshirani, R. (2003) Statistical significance for genomewide studies. Proc Natl Acad Sci. 100(16): 9440-9445.

Equation of mixed model for multi-kernel considering covariance structure between kernels

Description

This function solves the following multi-kernel linear mixed effects model with covairance structure.

y = X \beta + \sum _{l=1} ^ {L} Z _ {l} u _ {l} + \epsilon

where Var[y] = \sum _{i=1} ^ {L} Z _ {i} K _ {i} Z _ {i}' \sigma _ {i} ^ 2 + \sum _{i=1} ^ {L-1} \sum _{j=1} ^ {L} (Z _ {i} K _ {i j} Z _ {j}' + Z _ {j} K _ {j i} Z _ {i}') \sigma _ {i} \sigma _ {j} \rho _{i j} + I \sigma _ {e} ^ {2}.

Here, K _ {i j} and K _ {j i} are m_i \times m_j and m_j \times m_i matrices representing covariance structure between two random effects. \rho _{i j} is a correlation parameter to be estimated in addition to \sigma^2_i and \sigma_{e}^2.

Usage

EM3.cov(
  y,
  X0 = NULL,
  ZETA,
  covList,
  eigen.G = NULL,
  eigen.SGS = NULL,
  tol = NULL,
  n.core = NA,
  optimizer = "optim",
  traceInside = 0,
  nIterOptimization = NULL,
  n.thres = 450,
  REML = TRUE,
  pred = TRUE,
  return.u.always = TRUE,
  return.u.each = TRUE,
  return.Hinv = TRUE
)

Arguments

y

A n \times 1 vector. A vector of phenotypic values should be used. NA is allowed.

X0

A n \times p matrix. You should assign mean vector (rep(1, n)) and covariates. NA is not allowed.

ZETA

A list of variance matrices and its design matrices of random effects. You can use more than one kernel matrix. For example, ZETA = list(A = list(Z = Z.A, K = K.A), D = list(Z = Z.D, K = K.D)) (A for additive, D for dominance) Please set names of lists "Z" and "K"!

covList

A list of matrices representing covariance structure between paired random effects. If there are L random effects in the model, the list should contain L lists each consisting of L lists. Each \{i, j\} element of the list includes a matrix K_{ij} representing covariance structure between i-th and j-th random effects. See examples for details.

eigen.G

A list with

$values: Eigen values
$vectors: Eigen vectors

The result of the eigen decompsition of G = ZKZ'. You can use "spectralG.cpp" function in RAINBOWR. If this argument is NULL, the eigen decomposition will be performed in this function. We recommend you assign the result of the eigen decomposition beforehand for time saving.

eigen.SGS

A list with

$values: Eigen values
$vectors: Eigen vectors

The result of the eigen decompsition of SGS, where S = I - X(X'X)^{-1}X', G = ZKZ'. You can use "spectralG.cpp" function in RAINBOWR. If this argument is NULL, the eigen decomposition will be performed in this function. We recommend you assign the result of the eigen decomposition beforehand for time saving.

tol

The tolerance for detecting linear dependencies in the columns of G = ZKZ'. Eigen vectors whose eigen values are less than "tol" argument will be omitted from results. If tol is NULL, top 'n' eigen values will be effective.

n.core

Setting n.core > 1 will enable parallel execution on a machine with multiple cores.

optimizer

The function used in the optimization process. We offer "optim", "optimx", and "nlminb" functions.

traceInside

Perform trace for the optimzation if traceInside >= 1, and this argument shows the frequency of reports.

nIterOptimization

Maximum number of iterations allowed. Defaults are different depending on 'optimizer'.

n.thres

If n >= n.thres, perform EMM1.cpp. Else perform EMM2.cpp.

REML

You can choose which method you will use, "REML" or "ML". If REML = TRUE, you will perform "REML", and if REML = FALSE, you will perform "ML".

pred

If TRUE, the fitting values of y is returned.

return.u.always

If TRUE, BLUP ('u'; u) will be returned.

return.u.each

If TRUE, the function also computes each BLUP corresponding to different kernels (when solving multi-kernel mixed-effects model). It takes additional time compared to the one with 'return.u.each = FALSE'.

return.Hinv

If TRUE, H ^ {-1} = (Var[y] / \sum _{l=1} ^ {L} \sigma _ {l} ^ 2) ^ {-1} will be computed. It also returns V ^ {-1} = (Var[y]) ^ {-1}.

Value

$y.pred: The fitting values of y y = X\beta + Zu
$Vu: Estimator for \sigma^2_u, all of the genetic variance
$Ve: Estimator for \sigma^2_e
$beta: BLUE(\beta)
$u: BLUP(Sum of Zu)
$u.each: BLUP(Each u)
$weights: The proportion of each genetic variance (corresponding to each kernel of ZETA) to Vu
$rhosMat: The estimator for a matrix of correlation parameters \rho. Diagonal elements are always 0.
$LL: Maximized log-likelihood (full or restricted, depending on method)
$Vinv: The inverse of V = Vu \times ZKZ' + Ve \times I
$Hinv: The inverse of H = ZKZ' + \lambda I

References

Kang, H.M. et al. (2008) Efficient Control of Population Structure in Model Organism Association Mapping. Genetics. 178(3): 1709-1723.

Zhou, X. and Stephens, M. (2012) Genome-wide efficient mixed-model analysis for association studies. Nat Genet. 44(7): 821-824.

Examples





  ### Import RAINBOWR
  require(RAINBOWR)

  ### Load example datasets
  data("Rice_Zhao_etal")
  Rice_geno_score <- Rice_Zhao_etal$genoScore
  Rice_geno_map <- Rice_Zhao_etal$genoMap
  Rice_pheno <- Rice_Zhao_etal$pheno

  ### View each dataset
  See(Rice_geno_score)
  See(Rice_geno_map)
  See(Rice_pheno)

  ### Select one trait for example
  trait.name <- "Flowering.time.at.Arkansas"
  y <- as.matrix(Rice_pheno[, trait.name, drop = FALSE])

  ### Remove SNPs whose MAF <= 0.05
  x.0 <- t(Rice_geno_score)
  MAF.cut.res <- MAF.cut(x.0 = x.0, map.0 = Rice_geno_map)
  x <- MAF.cut.res$x
  map <- MAF.cut.res$map


  ### Assume adjacent individuals are regarded as "neighbors"
  xAdj <- array(data = NA, dim = dim(x), dimnames = dimnames(x))
  for (i in 1:nrow(x)) {
    adjs <- (i - 1):(i + 1)
    adjs <- adjs[adjs %in% 1:nrow(x)]
    adjs <- adjs[adjs != i]
    nAdjs <- length(adjs)

    xAdj[i, ] <- x[i, , drop = FALSE] *
      apply(X = x[adjs, , drop = FALSE],
            MARGIN = 2, FUN = mean)
  }


  ### Estimate additive genomic relationship matrix (GRM)
  K.A <- tcrossprod(x) / ncol(x)
  K.Adj <- tcrossprod(xAdj) / ncol(xAdj)  # for neighbor kernel


  ### Modify data
  Z <- design.Z(pheno.labels = rownames(y),
                geno.names = rownames(K.A))  ### design matrix for random effects
  pheno.mat <- y[rownames(Z), , drop = FALSE]
  ZETA <- list(A = list(Z = Z, K = K.A),
               Adj = list(Z = Z, K = K.Adj))


  ### Prepare for covairance structure between two random effects
  K12 <- tcrossprod(x, xAdj) / sqrt(ncol(x) * ncol(xAdj))
  K21 <- tcrossprod(xAdj, x) / sqrt(ncol(x) * ncol(xAdj))

  covList <- rep(list(rep(list(NULL), 2)), 2)
  covList[[1]][[2]] <- K12
  covList[[2]][[1]] <- K21


  ### Solve multi-kernel linear mixed effects model (2 random efects)
  ### conidering covariance structure
  EM3cov.res <- EM3.cov(y = pheno.mat,
                        X0 = NULL,
                        ZETA = ZETA,
                        covList = covList)
  (Vu <- EM3cov.res$Vu)   ### estimated genetic variance
  (Ve <- EM3cov.res$Ve)   ### estimated residual variance
  (weights <- EM3cov.res$weights)   ### estimated proportion of two genetic variances
  (herit <- Vu * weights / (Vu + Ve))   ### genomic heritability (additive, neighbor)
  (rho <- EM3cov.res$rhosMat[2, 1])   ### correlation parameter

  (beta <- EM3cov.res$beta)   ### Here, this is an intercept.
  u.each <- EM3cov.res$u.each   ### estimated genotypic values (additive, neighbor)
  See(u.each)


  ### Perform genomic prediction with 10-fold cross validation (multi-kernel)
  noNA <- !is.na(c(pheno.mat))   ### NA (missing) in the phenotype data

  phenoNoNA <- pheno.mat[noNA, , drop = FALSE]   ### remove NA
  ZETANoNA <- ZETA
  ZETANoNA <- lapply(X = ZETANoNA, FUN = function (List) {
    List$Z <- List$Z[noNA, ]

    return(List)
  })   ### remove NA


  nFold <- 10    ### # of folds
  nLine <- nrow(phenoNoNA)
  idCV <- sample(1:nLine %% nFold)   ### assign random ids for cross-validation
  idCV[idCV == 0] <- nFold

  yPred <- yPredCov <- rep(NA, nLine)


  for (noCV in 1:nFold) {
    print(paste0("Fold: ", noCV))
    yTrain <- phenoNoNA
    yTrain[idCV == noCV, ] <- NA   ### prepare test data

    EM3.resCV <- EM3.cpp(y = yTrain, X0 = NULL,
                         ZETA = ZETANoNA)   ### prediction
    EM3cov.resCV <- EM3.cov(y = yTrain, X0 = NULL,
                            ZETA = ZETANoNA,
                            covList = covList)   ### prediction
    yTest <- EM3.resCV$y.pred     ### predicted values
    yTestCov <- EM3cov.resCV$y.pred

    yPred[idCV == noCV] <- yTest[idCV == noCV]
    yPredCov[idCV == noCV] <- yTestCov[idCV == noCV]
  }

  ### Plot the results
  plotRange <- range(phenoNoNA, yPred)
  plot(x = phenoNoNA, y = yPred,
       xlim = plotRange, ylim = plotRange,
       xlab = "Observed values", ylab = "Predicted values (EM3.cpp)",
       main = "Results of Genomic Prediction (multi-kernel)",
       cex.lab = 1.5, cex.main = 1.5, cex.axis = 1.3)
  abline(a = 0, b = 1, col = 2, lwd = 2, lty = 2)
  R2 <- cor(x = phenoNoNA[, 1], y = yPred) ^ 2
  text(x = plotRange[2] - 10,
       y = plotRange[1] + 10,
       paste0("R2 = ", round(R2, 3)),
       cex = 1.5)


  plotRange <- range(phenoNoNA, yPred)
  plot(x = phenoNoNA, y = yPredCov,
       xlim = plotRange, ylim = plotRange,
       xlab = "Observed values", ylab = "Predicted values (EM3.cov)",
       main = "Results of Genomic Prediction (multi-kernel with covariance)",
       cex.lab = 1.5, cex.main = 1.5, cex.axis = 1.3)
  abline(a = 0, b = 1, col = 2, lwd = 2, lty = 2)
  R2 <- cor(x = phenoNoNA[, 1], y = yPredCov) ^ 2
  text(x = plotRange[2] - 10,
       y = plotRange[1] + 10,
       paste0("R2 = ", round(R2, 3)),
       cex = 1.5)


  plotRange <- range(yPred, yPredCov)
  plot(x = yPred, y = yPredCov,
       xlim = plotRange, ylim = plotRange,
       xlab = "Predicted values (EM3.cpp)", ylab = "Predicted values (EM3.cov)",
       main = "Comparison of Multi-Kernel Genomic Prediction",
       cex.lab = 1.5, cex.main = 1.5, cex.axis = 1.3)
  abline(a = 0, b = 1, col = 2, lwd = 2, lty = 2)
  R2 <- cor(x = yPred, y = yPredCov) ^ 2
  text(x = plotRange[2] - 10,
       y = plotRange[1] + 10,
       paste0("R2 = ", round(R2, 3)),
       cex = 1.5)

Equation of mixed model for multi-kernel (slow, general version)

Description

This function solves the following multi-kernel linear mixed effects model.

y = X \beta + \sum _{l=1} ^ {L} Z _ {l} u _ {l} + \epsilon

where Var[y] = \sum _{l=1} ^ {L} Z _ {l} K _ {l} Z _ {l}' \sigma _ {l} ^ 2 + I \sigma _ {e} ^ {2}.

Usage

EM3.cpp(
  y,
  X0 = NULL,
  ZETA,
  eigen.G = NULL,
  eigen.SGS = NULL,
  tol = NULL,
  n.core = NA,
  optimizer = "nlminb",
  traceInside = 0,
  n.thres = 450,
  REML = TRUE,
  pred = TRUE,
  return.u.always = TRUE,
  return.u.each = TRUE,
  return.Hinv = TRUE
)

Arguments

y

A n \times 1 vector. A vector of phenotypic values should be used. NA is allowed.

X0

A n \times p matrix. You should assign mean vector (rep(1, n)) and covariates. NA is not allowed.

ZETA

eigen.G

A list with

$values: Eigen values
$vectors: Eigen vectors

eigen.SGS

A list with

$values: Eigen values
$vectors: Eigen vectors

tol

n.core

Setting n.core > 1 will enable parallel execution on a machine with multiple cores.

optimizer

The function used in the optimization process. We offer "optim", "optimx", and "nlminb" functions.

traceInside

Perform trace for the optimzation if traceInside >= 1, and this argument shows the frequency of reports.

n.thres

If n >= n.thres, perform EMM1.cpp. Else perform EMM2.cpp.

REML

You can choose which method you will use, "REML" or "ML". If REML = TRUE, you will perform "REML", and if REML = FALSE, you will perform "ML".

pred

If TRUE, the fitting values of y is returned.

return.u.always

If TRUE, BLUP ('u'; u) will be returned.

return.u.each

return.Hinv

If TRUE, H ^ {-1} = (Var[y] / \sum _{l=1} ^ {L} \sigma _ {l} ^ 2) ^ {-1} will be computed. It also returns V ^ {-1} = (Var[y]) ^ {-1}.

Value

$y.pred: The fitting values of y y = X\beta + Zu
$Vu: Estimator for \sigma^2_u, all of the genetic variance
$Ve: Estimator for \sigma^2_e
$beta: BLUE(\beta)
$u: BLUP(Sum of Zu)
$u.each: BLUP(Each u)
$weights: The proportion of each genetic variance (corresponding to each kernel of ZETA) to Vu
$LL: Maximized log-likelihood (full or restricted, depending on method)
$Vinv: The inverse of V = Vu \times ZKZ' + Ve \times I
$Hinv: The inverse of H = ZKZ' + \lambda I

References

Kang, H.M. et al. (2008) Efficient Control of Population Structure in Model Organism Association Mapping. Genetics. 178(3): 1709-1723.

Zhou, X. and Stephens, M. (2012) Genome-wide efficient mixed-model analysis for association studies. Nat Genet. 44(7): 821-824.

Examples





  ### Import RAINBOWR
  require(RAINBOWR)
  
  ### Load example datasets
  data("Rice_Zhao_etal")
  Rice_geno_score <- Rice_Zhao_etal$genoScore
  Rice_geno_map <- Rice_Zhao_etal$genoMap
  Rice_pheno <- Rice_Zhao_etal$pheno
  
  ### View each dataset
  See(Rice_geno_score)
  See(Rice_geno_map)
  See(Rice_pheno)
  
  ### Select one trait for example
  trait.name <- "Flowering.time.at.Arkansas"
  y <- as.matrix(Rice_pheno[, trait.name, drop = FALSE])
  
  ### Remove SNPs whose MAF <= 0.05
  x.0 <- t(Rice_geno_score)
  MAF.cut.res <- MAF.cut(x.0 = x.0, map.0 = Rice_geno_map)
  x <- MAF.cut.res$x
  map <- MAF.cut.res$map
  
  
  ### Estimate additive genomic relationship matrix (GRM) & epistatic relationship matrix
  K.A <- calcGRM(genoMat = x) 
  K.AA <- K.A * K.A   ### additive x additive epistatic effects
  
  
  ### Modify data
  Z <- design.Z(pheno.labels = rownames(y),
                geno.names = rownames(K.A))  ### design matrix for random effects
  pheno.mat <- y[rownames(Z), , drop = FALSE]
  ZETA <- list(A = list(Z = Z, K = K.A),
               AA = list(Z = Z, K = K.AA))
  
  
  ### Solve multi-kernel linear mixed effects model (2 random efects)
  EM3.res <- EM3.cpp(y = pheno.mat, X0 = NULL, ZETA = ZETA)
  (Vu <- EM3.res$Vu)   ### estimated genetic variance
  (Ve <- EM3.res$Ve)   ### estimated residual variance
  (weights <- EM3.res$weights)   ### estimated proportion of two genetic variances
  (herit <- Vu * weights / (Vu + Ve))   ### genomic heritability (additive, additive x additive)
  
  (beta <- EM3.res$beta)   ### Here, this is an intercept.
  u.each <- EM3.res$u.each   ### estimated genotypic values (additive, additive x additive)
  See(u.each)
  
  
  ### Perform genomic prediction with 10-fold cross validation (multi-kernel)
  noNA <- !is.na(c(pheno.mat))   ### NA (missing) in the phenotype data
  
  phenoNoNA <- pheno.mat[noNA, , drop = FALSE]   ### remove NA
  ZETANoNA <- ZETA
  ZETANoNA <- lapply(X = ZETANoNA, FUN = function (List) {
    List$Z <- List$Z[noNA, ]
    
    return(List)
  })   ### remove NA
  
  
  nFold <- 10    ### # of folds
  nLine <- nrow(phenoNoNA)
  idCV <- sample(1:nLine %% nFold)   ### assign random ids for cross-validation
  idCV[idCV == 0] <- nFold
  
  yPred <- rep(NA, nLine)
  
  for (noCV in 1:nFold) {
    print(paste0("Fold: ", noCV))
    yTrain <- phenoNoNA
    yTrain[idCV == noCV, ] <- NA   ### prepare test data
    
    EM3.resCV <- EM3.cpp(y = yTrain, X0 = NULL, ZETA = ZETANoNA)   ### prediction
    yTest <-  EM3.resCV$y.pred     ### predicted values
    
    yPred[idCV == noCV] <- yTest[idCV == noCV]
  }
  
  ### Plot the results
  plotRange <- range(phenoNoNA, yPred)
  plot(x = phenoNoNA, y = yPred,xlim = plotRange, ylim = plotRange,
       xlab = "Observed values", ylab = "Predicted values",
       main = "Results of Genomic Prediction (multi-kernel)",
       cex.lab = 1.5, cex.main = 1.5, cex.axis = 1.3)
  abline(a = 0, b = 1, col = 2, lwd = 2, lty = 2)
  R2 <- cor(x = phenoNoNA[, 1], y = yPred) ^ 2
  text(x = plotRange[2] - 10,
       y = plotRange[1] + 10,
       paste0("R2 = ", round(R2, 3)), 
       cex = 1.5)

Equation of mixed model for multi-kernel including using other packages (with other packages, much faster than EM3.cpp)

Description

This function solves the following multi-kernel linear mixed effects model using MMEst function in 'MM4LMM' package, lmm.aireml or lmm.diago functions in 'gaston' package, or EM3.cpp function in 'RAINBOWR' package.

y = X \beta + \sum _{l=1} ^ {L} Z _ {l} u _ {l} + \epsilon

where Var[y] = \sum _{l=1} ^ {L} Z _ {l} K _ {l} Z _ {l}' \sigma _ {l} ^ 2 + I \sigma _ {e} ^ {2}.

Usage

EM3.general(
  y,
  X0 = NULL,
  ZETA,
  eigen.G = NULL,
  package = "gaston",
  tol = NULL,
  n.core = 1,
  optimizer = "nlminb",
  REML = TRUE,
  pred = TRUE,
  return.u.always = TRUE,
  return.u.each = TRUE,
  return.Hinv = TRUE,
  recheck.RAINBOWR = TRUE,
  var.ratio.range = c(1e-09, 1e+07)
)

Arguments

y

A n \times 1 vector. A vector of phenotypic values should be used. NA is allowed.

X0

A n \times p matrix. You should assign mean vector (rep(1, n)) and covariates. NA is not allowed.

ZETA

eigen.G

A list with

$values: Eigen values
$vectors: Eigen vectors

package

Package name to be used in this function. We only offer the following three packages: "RAINBOWR", "MM4LMM" and "gaston". Default package is 'gaston'.

tol

n.core

Setting n.core > 1 will enable parallel execution on a machine with multiple cores. (‘n.core' will be replaced by 1 for 'package = ’gaston'')

optimizer

The function used in the optimization process. We offer "optim", "optimx", and "nlminb" functions. This argument is only valid when ‘package = ’RAINBOWR''.

REML

You can choose which method you will use, "REML" or "ML". If REML = TRUE, you will perform "REML", and if REML = FALSE, you will perform "ML".

pred

If TRUE, the fitting values of y is returned.

return.u.always

When using the "gaston" package with missing values or using the "MM4LMM" package (with/without missings), computing BLUP will take some time in addition to solving the mixed-effects model. You can choose whether BLUP ('u'; u) will be returned or not.

return.u.each

return.Hinv

If TRUE, H ^ {-1} = (Var[y] / \sum _{l=1} ^ {L} \sigma _ {l} ^ 2) ^ {-1} will be computed. It also returns V ^ {-1} = (Var[y]) ^ {-1}. It will take some time in addition to solving the mixed-effects model when using packages other than 'RAINBOWR'.

recheck.RAINBOWR

When you use the package other than 'RAINBOWR' and the ratio of variance components is out of the range of 'var.ratio.range', the function will solve the mixed-effects model again with 'RAINBOWR' package, if 'recheck.RAINBOWR = TRUE'.

var.ratio.range

The range of variance components to check that the results by the package other than RAINBOWR is correct or not when 'recheck.RAINBOWR = TRUE'.

Value

$y.pred: The fitting values of y y = X\beta + Zu
$Vu: Estimator for \sigma^2_u, all of the genetic variance
$Ve: Estimator for \sigma^2_e
$beta: BLUE(\beta)
$u: BLUP(Sum of Zu)
$u.each: BLUP(Each u)
$weights: The proportion of each genetic variance (corresponding to each kernel of ZETA) to Vu
$LL: Maximized log-likelihood (full or restricted, depending on method)
$Vinv: The inverse of V = Vu \times ZKZ' + Ve \times I
$Hinv: The inverse of H = ZKZ' + \lambda I

References

Kang, H.M. et al. (2008) Efficient Control of Population Structure in Model Organism Association Mapping. Genetics. 178(3): 1709-1723.

Zhou, X. and Stephens, M. (2012) Genome-wide efficient mixed-model analysis for association studies. Nat Genet. 44(7): 821-824.

Johnson, D. L., & Thompson, R. (1995). Restricted maximum likelihood estimation of variance components for univariate animal models using sparse matrix techniques and average information. Journal of dairy science, 78(2), 449-456.

Hunter, D. R., & Lange, K. (2004). A tutorial on MM algorithms. The American Statistician, 58(1), 30-37.

Zhou, H., Hu, L., Zhou, J., & Lange, K. (2015). MM algorithms for variance components models. arXiv preprint arXiv:1509.07426.

Gilmour, A. R., Thompson, R., & Cullis, B. R. (1995), Average information REML: an efficient algorithm for variance parameter estimation in linear mixed models, Biometrics, 1440-1450.

Examples





  ### Import RAINBOWR
  require(RAINBOWR)
  
  ### Load example datasets
  data("Rice_Zhao_etal")
  Rice_geno_score <- Rice_Zhao_etal$genoScore
  Rice_geno_map <- Rice_Zhao_etal$genoMap
  Rice_pheno <- Rice_Zhao_etal$pheno
  
  ### View each dataset
  See(Rice_geno_score)
  See(Rice_geno_map)
  See(Rice_pheno)
  
  ### Select one trait for example
  trait.name <- "Flowering.time.at.Arkansas"
  y <- as.matrix(Rice_pheno[, trait.name, drop = FALSE])
  
  ### Remove SNPs whose MAF <= 0.05
  x.0 <- t(Rice_geno_score)
  MAF.cut.res <- MAF.cut(x.0 = x.0, map.0 = Rice_geno_map)
  x <- MAF.cut.res$x
  map <- MAF.cut.res$map
  
  
  ### Estimate additive genomic relationship matrix (GRM) & epistatic relationship matrix
  K.A <- calcGRM(genoMat = x) 
  K.AA <- K.A * K.A   ### additive x additive epistatic effects
  
  
  ### Modify data
  Z <- design.Z(pheno.labels = rownames(y),
                geno.names = rownames(K.A))  ### design matrix for random effects
  pheno.mat <- y[rownames(Z), , drop = FALSE]
  ZETA <- list(A = list(Z = Z, K = K.A),
               AA = list(Z = Z, K = K.AA))
  
  
  ### Solve multi-kernel linear mixed effects model using gaston package (2 random efects)
  EM3.gaston.res <- EM3.general(y = pheno.mat, X0 = NULL, ZETA = ZETA,
                                package = "gaston", return.u.always = TRUE,
                                pred = TRUE, return.u.each = TRUE,
                                return.Hinv = TRUE)
  (Vu <- EM3.gaston.res$Vu)   ### estimated genetic variance
  (Ve <- EM3.gaston.res$Ve)   ### estimated residual variance
  (weights <- EM3.gaston.res$weights)   ### estimated proportion of two genetic variances
  (herit <- Vu * weights / (Vu + Ve))   ### genomic heritability (additive, additive x additive)
  
  (beta <- EM3.gaston.res$beta)   ### Here, this is an intercept.
  u.each <- EM3.gaston.res$u.each   ### estimated genotypic values (additive, additive x additive)
  See(u.each)
  
  
  ### Perform genomic prediction with 10-fold cross validation using gaston package (multi-kernel)
  noNA <- !is.na(c(pheno.mat))   ### NA (missing) in the phenotype data
  
  phenoNoNA <- pheno.mat[noNA, , drop = FALSE]   ### remove NA
  ZETANoNA <- ZETA
  ZETANoNA <- lapply(X = ZETANoNA, FUN = function (List) {
    List$Z <- List$Z[noNA, ]
    
    return(List)
  })   ### remove NA
  
  
  nFold <- 10    ### # of folds
  nLine <- nrow(phenoNoNA)
  idCV <- sample(1:nLine %% nFold)   ### assign random ids for cross-validation
  idCV[idCV == 0] <- nFold
  
  yPred <- rep(NA, nLine)
  
  for (noCV in 1:nFold) {
    print(paste0("Fold: ", noCV))
    yTrain <- phenoNoNA
    yTrain[idCV == noCV, ] <- NA   ### prepare test data
    
    EM3.gaston.resCV <- EM3.general(y = yTrain, X0 = NULL, ZETA = ZETANoNA,
                                    package = "gaston", return.u.always = TRUE,
                                    pred = TRUE, return.u.each = TRUE,
                                    return.Hinv = TRUE)   ### prediction
    yTest <-  EM3.gaston.resCV$y.pred     ### predicted values
    
    yPred[idCV == noCV] <- yTest[idCV == noCV]
  }
  
  ### Plot the results
  plotRange <- range(phenoNoNA, yPred)
  plot(x = phenoNoNA, y = yPred,xlim = plotRange, ylim = plotRange,
       xlab = "Observed values", ylab = "Predicted values",
       main = "Results of Genomic Prediction (multi-kernel)",
       cex.lab = 1.5, cex.main = 1.5, cex.axis = 1.3)
  abline(a = 0, b = 1, col = 2, lwd = 2, lty = 2)
  R2 <- cor(x = phenoNoNA[, 1], y = yPred) ^ 2
  text(x = plotRange[2] - 10,
       y = plotRange[1] + 10,
       paste0("R2 = ", round(R2, 3)), 
       cex = 1.5)

Equation of mixed model for multi-kernel (fast, for limited cases)

Description

This function solves multi-kernel mixed model using fastlmm.snpset approach (Lippert et al., 2014). This function can be used only when the kernels other than genomic relationship matrix are linear kernels.

Usage

EM3.linker.cpp(
  y0,
  X0 = NULL,
  ZETA = NULL,
  Zs0 = NULL,
  Ws0,
  Gammas0 = lapply(Ws0, function(x) diag(ncol(x))),
  gammas.diag = TRUE,
  X.fix = TRUE,
  eigen.SGS = NULL,
  eigen.G = NULL,
  n.core = 1,
  tol = NULL,
  bounds = c(1e-06, 1e+06),
  optimizer = "nlminb",
  traceInside = 0,
  n.thres = 450,
  spectral.method = NULL,
  REML = TRUE,
  pred = TRUE,
  return.u.always = TRUE,
  return.u.each = TRUE,
  return.Hinv = TRUE
)

Arguments

y0

A n \times 1 vector. A vector of phenotypic values should be used. NA is allowed.

X0

A n \times p matrix. You should assign mean vector (rep(1, n)) and covariates. NA is not allowed.

ZETA

A list of variance (relationship) matrix (K; m \times m) and its design matrix (Z; n \times m) of random effects. You can use only one kernel matrix. For example, ZETA = list(A = list(Z = Z, K = K)) Please set names of list "Z" and "K"!

Zs0

A list of design matrices (Z; n \times m matrix) for Ws. For example, Zs0 = list(A.part = Z.A.part, D.part = Z.D.part)

Ws0

A list of low rank matrices (W; m \times k matrix). This forms linear kernel K = W \Gamma W'. For example, Ws0 = list(A.part = W.A, D.part = W.D)

Gammas0

A list of matrices for weighting SNPs (Gamma; k \times k matrix). This forms linear kernel K = W \Gamma W'. For example, if there is no weighting, Gammas0 = lapply(Ws0, function(x) diag(ncol(x)))

gammas.diag

If each Gamma is the diagonal matrix, please set this argument TRUE. The calculationtime can be saved.

X.fix

If you repeat this function and when X0 is fixed during iterations, please set this argument TRUE.

eigen.SGS

A list with

$values: Eigen values
$vectors: Eigen vectors

eigen.G

A list with

$values: Eigen values
$vectors: Eigen vectors

n.core

Setting n.core > 1 will enable parallel execution on a machine with multiple cores.

tol

bounds

Lower and upper bounds for weights.

optimizer

The function used in the optimization process. We offer "optim", "optimx", and "nlminb" functions.

traceInside

Perform trace for the optimzation if traceInside >= 1, and this argument shows the frequency of reports.

n.thres

If n >= n.thres, perform EMM1.cpp. Else perform EMM2.cpp.

spectral.method

The method of spectral decomposition. In this function, "eigen" : eigen decomposition and "cholesky" : cholesky and singular value decomposition are offered. If this argument is NULL, either method will be chosen accorsing to the dimension of Z and X.

REML

You can choose which method you will use, "REML" or "ML". If REML = TRUE, you will perform "REML", and if REML = FALSE, you will perform "ML".

pred

If TRUE, the fitting values of y is returned.

return.u.always

If TRUE, BLUP ('u'; u) will be returned.

return.u.each

return.Hinv

If TRUE, H ^ {-1} = (Var[y] / \sum _{l=1} ^ {L} \sigma _ {l} ^ 2) ^ {-1} will be computed. It also returns V ^ {-1} = (Var[y]) ^ {-1}.

Value

$y.pred: The fitting values of y y = X\beta + Zu
$Vu: Estimator for \sigma^2_u, all of the genetic variance
$Ve: Estimator for \sigma^2_e
$beta: BLUE(\beta)
$u: BLUP(Sum of Zu)
$u.each: BLUP(Each u)
$weights: The proportion of each genetic variance (corresponding to each kernel of ZETA) to Vu
$LL: Maximized log-likelihood (full or restricted, depending on method)
$Vinv: The inverse of V = Vu \times ZKZ' + Ve \times I
$Hinv: The inverse of H = ZKZ' + \lambda I

References

Kang, H.M. et al. (2008) Efficient Control of Population Structure in Model Organism Association Mapping. Genetics. 178(3): 1709-1723.

Zhou, X. and Stephens, M. (2012) Genome-wide efficient mixed-model analysis for association studies. Nat Genet. 44(7): 821-824.

Lippert, C. et al. (2014) Greater power and computational efficiency for kernel-based association testing of sets of genetic variants. Bioinformatics. 30(22): 3206-3214.

Examples





  ### Import RAINBOWR
  require(RAINBOWR)
  
  ### Load example datasets
  data("Rice_Zhao_etal")
  Rice_geno_score <- Rice_Zhao_etal$genoScore
  Rice_geno_map <- Rice_Zhao_etal$genoMap
  Rice_pheno <- Rice_Zhao_etal$pheno
  
  ### View each dataset
  See(Rice_geno_score)
  See(Rice_geno_map)
  See(Rice_pheno)
  
  ### Select one trait for example
  trait.name <- "Flowering.time.at.Arkansas"
  y <- as.matrix(Rice_pheno[, trait.name, drop = FALSE])
  
  ### Remove SNPs whose MAF <= 0.05
  x.0 <- t(Rice_geno_score)
  MAF.cut.res <- MAF.cut(x.0 = x.0, map.0 = Rice_geno_map)
  x <- MAF.cut.res$x
  map <- MAF.cut.res$map
  
  
  ### Estimate additive genomic relationship matrix (GRM)
  K.A <- calcGRM(genoMat = x)
  
  
  ### Modify data
  Z <- design.Z(pheno.labels = rownames(y),
                geno.names = rownames(K.A))  ### design matrix for random effects
  pheno.mat <- y[rownames(Z), , drop = FALSE]
  ZETA <- list(A = list(Z = Z, K = K.A))
  
  
  ### Including the additional linear kernel for chromosome 12
  chrNo <- 12
  W.A <- x[, map$chr == chrNo]    ### marker genotype data of chromosome 12
  
  Zs0 <- list(A.part = Z)
  Ws0 <- list(A.part = W.A)       ### This will be regarded as linear kernel
  ### for the variance-covariance matrix of another random effects.
  
  
  ### Solve multi-kernel linear mixed effects model (2 random efects)
  EM3.linker.res <- EM3.linker.cpp(y0 = pheno.mat, X0 = NULL, ZETA = ZETA,
                                   Zs0 = Zs0, Ws0 = Ws0)
  (Vu <- EM3.linker.res$Vu)   ### estimated genetic variance
  (Ve <- EM3.linker.res$Ve)   ### estimated residual variance
  (weights <- EM3.linker.res$weights)   ### estimated proportion of two genetic variances
  (herit <- Vu * weights / (Vu + Ve))   ### genomic heritability (all chromosomes, chromosome 12)
  
  (beta <- EM3.linker.res$beta)   ### Here, this is an intercept.
  u.each <- EM3.linker.res$u.each   ### estimated genotypic values (all chromosomes, chromosome 12)
  See(u.each)

Equation of mixed model for multi-kernel using other packages (much faster than EM3.cpp)

Description

y = X \beta + \sum _{l=1} ^ {L} Z _ {l} u _ {l} + \epsilon

where Var[y] = \sum _{l=1} ^ {L} Z _ {l} K _ {l} Z _ {l}' \sigma _ {l} ^ 2 + I \sigma _ {e} ^ {2}.

Usage

EM3.op(
  y,
  X0 = NULL,
  ZETA,
  eigen.G = NULL,
  package = "gaston",
  tol = NULL,
  n.core = 1,
  REML = TRUE,
  pred = TRUE,
  return.u.always = TRUE,
  return.u.each = TRUE,
  return.Hinv = TRUE
)

Arguments

y

A n \times 1 vector. A vector of phenotypic values should be used. NA is allowed.

X0

A n \times p matrix. You should assign mean vector (rep(1, n)) and covariates. NA is not allowed.

ZETA

eigen.G

A list with

$values: Eigen values
$vectors: Eigen vectors

package

Package name to be used in this function. We only offer the following three packages: "RAINBOWR", "MM4LMM" and "gaston". Default package is 'gaston'.

tol

n.core

Setting n.core > 1 will enable parallel execution on a machine with multiple cores (only for 'MM4LMM').

REML

You can choose which method you will use, "REML" or "ML". If REML = TRUE, you will perform "REML", and if REML = FALSE, you will perform "ML".

pred

If TRUE, the fitting values of y is returned.

return.u.always

return.u.each

return.Hinv

Value

$y.pred: The fitting values of y y = X\beta + Zu
$Vu: Estimator for \sigma^2_u, all of the genetic variance
$Ve: Estimator for \sigma^2_e
$beta: BLUE(\beta)
$u: BLUP(Sum of Zu)
$u.each: BLUP(Each u)
$weights: The proportion of each genetic variance (corresponding to each kernel of ZETA) to Vu
$LL: Maximized log-likelihood (full or restricted, depending on method)
$Vinv: The inverse of V = Vu \times ZKZ' + Ve \times I
$Hinv: The inverse of H = ZKZ' + \lambda I

References

Kang, H.M. et al. (2008) Efficient Control of Population Structure in Model Organism Association Mapping. Genetics. 178(3): 1709-1723.

Zhou, X. and Stephens, M. (2012) Genome-wide efficient mixed-model analysis for association studies. Nat Genet. 44(7): 821-824.

Hunter, D. R., & Lange, K. (2004). A tutorial on MM algorithms. The American Statistician, 58(1), 30-37.

Zhou, H., Hu, L., Zhou, J., & Lange, K. (2015). MM algorithms for variance components models. arXiv preprint arXiv:1509.07426.

Gilmour, A. R., Thompson, R., & Cullis, B. R. (1995), Average information REML: an efficient algorithm for variance parameter estimation in linear mixed models, Biometrics, 1440-1450.

Equation of mixed model for one kernel, a wrapper of two methods

Description

This function estimates maximum-likelihood (ML/REML; resticted maximum likelihood) solutions for the following mixed model.

y = X \beta + Z u + \epsilon

where \beta is a vector of fixed effects and u is a vector of random effects with Var[u] = K \sigma^2_u. The residual variance is Var[\epsilon] = I \sigma^2_e.

Usage

EMM.cpp(
  y,
  X = NULL,
  ZETA,
  eigen.G = NULL,
  eigen.SGS = NULL,
  n.thres = 450,
  reestimation = FALSE,
  n.core = NA,
  lam.len = 4,
  init.range = c(1e-06, 100),
  init.one = 0.5,
  conv.param = 1e-06,
  count.max = 20,
  bounds = c(1e-06, 1e+06),
  tol = NULL,
  optimizer = "nlminb",
  traceInside = 0,
  REML = TRUE,
  silent = TRUE,
  plot.l = FALSE,
  SE = FALSE,
  return.Hinv = TRUE
)

Arguments

y

A n \times 1 vector. A vector of phenotypic values should be used. NA is allowed.

X

A n \times p matrix. You should assign mean vector (rep(1, n)) and covariates. NA is not allowed.

ZETA

eigen.G

A list with

$values: Eigen values
$vectors: Eigen vectors

eigen.SGS

A list with

$values: Eigen values
$vectors: Eigen vectors

n.thres

If n >= n.thres, perform EMM1.cpp. Else perform EMM2.cpp.

reestimation

If TRUE, EMM2.cpp is performed when the estimation by EMM1.cpp may not be accurate.

n.core

Setting n.core > 1 will enable parallel execution on a machine with multiple cores.

lam.len

The number of initial values you set. If this number is large, the estimation will be more accurate, but computational cost will be large. We recommend setting this value 3 <= lam.len <= 6.

init.range

The range of the initial parameters. For example, if lam.len = 5 and init.range = c(1e-06, 1e02), corresponding initial heritabilities will be calculated as seq(1e-06, 1 - 1e-02, length = 5), and then initial lambdas will be set.

init.one

The initial parameter if lam.len = 1.

conv.param

The convergence parameter. If the diffrence of log-likelihood by updating the parameter "lambda" is smaller than this conv.param, the iteration steps will be stopped.

count.max

Sometimes algorithms won't converge for some initial parameters. So if the iteration steps reache to this argument, you can stop the calculation even if algorithm doesn't converge.

bounds

Lower and Upper bounds of the parameter lambda. If the updated parameter goes out of this range, the parameter is reset to the value in this range.

tol

optimizer

The function used in the optimization process. We offer "optim", "optimx", and "nlminb" functions.

traceInside

Perform trace for the optimzation if traceInside >= 1, and this argument shows the frequency of reports.

REML

You can choose which method you will use, "REML" or "ML". If REML = TRUE, you will perform "REML", and if REML = FALSE, you will perform "ML".

silent

If this argument is TRUE, warning messages will be shown when estimation is not accurate.

plot.l

If you want to plot log-likelihood, please set plot.l = TRUE. We don't recommend plot.l = TRUE when lam.len >= 2.

SE

If TRUE, standard errors are calculated.

return.Hinv

If TRUE, the function returns the inverse of H = ZKZ' + \lambda I where \lambda = \sigma^2_e / \sigma^2_u. This is useful for GWAS.

Value

$Vu: Estimator for \sigma^2_u
$Ve: Estimator for \sigma^2_e
$beta: BLUE(\beta)
$u: BLUP(u)
$LL: Maximized log-likelihood (full or restricted, depending on method)
$beta.SE: Standard error for \beta (If SE = TRUE)
$u.SE: Standard error for u^*-u (If SE = TRUE)
$Hinv: The inverse of H = ZKZ' + \lambda I (If return.Hinv = TRUE)
$Hinv2: The inverse of H2 = ZKZ'/\lambda + I (If return.Hinv = TRUE)
$lambda: Estimators for \lambda = \sigma^2_e / \sigma^2_u (If n >= n.thres)
$lambdas: Lambdas for each initial values (If n >= n.thres)
$reest: If parameter estimation may not be accurate, reest = 1, else reest = 0 (If n >= n.thres)
$counts: The number of iterations until convergence for each initial values (If n >= n.thres)

References

Kang, H.M. et al. (2008) Efficient Control of Population Structure in Model Organism Association Mapping. Genetics. 178(3): 1709-1723.

Zhou, X. and Stephens, M. (2012) Genome-wide efficient mixed-model analysis for association studies. Nat Genet. 44(7): 821-824.

Examples



### Perform genomic prediction with 10-fold cross validation

  ### Import RAINBOWR
  require(RAINBOWR)

  ### Load example datasets
  data("Rice_Zhao_etal")
  Rice_geno_score <- Rice_Zhao_etal$genoScore
  Rice_geno_map <- Rice_Zhao_etal$genoMap
  Rice_pheno <- Rice_Zhao_etal$pheno

  ### View each dataset
  See(Rice_geno_score)
  See(Rice_geno_map)
  See(Rice_pheno)

  ### Select one trait for example
  trait.name <- "Flowering.time.at.Arkansas"
  y <- as.matrix(Rice_pheno[, trait.name, drop = FALSE])

  ### Remove SNPs whose MAF <= 0.05
  x.0 <- t(Rice_geno_score)
  MAF.cut.res <- MAF.cut(x.0 = x.0, map.0 = Rice_geno_map)
  x <- MAF.cut.res$x
  map <- MAF.cut.res$map

  ### Estimate genomic relationship matrix (GRM)
  K.A <- calcGRM(genoMat = x)

  ### Modify data
  modify.res <- modify.data(pheno.mat = y, geno.mat = x, return.ZETA = TRUE)
  pheno.mat <- modify.res$pheno.modi
  ZETA <- modify.res$ZETA


  ### Solve linear mixed effects model
  EMM.res <- EMM.cpp(y = pheno.mat, X = NULL, ZETA = ZETA)
  (Vu <- EMM.res$Vu)   ### estimated genetic variance
  (Ve <- EMM.res$Ve)   ### estimated residual variance
  (herit <- Vu / (Vu + Ve))   ### genomic heritability

  (beta <- EMM.res$beta)   ### Here, this is an intercept.
  u <- EMM.res$u   ### estimated genotypic values
  See(u)

  ### Estimate marker effects from estimated genotypic values
  x.modi <- modify.res$geno.modi
  WMat <- calcGRM(genoMat = x.modi, methodGRM = "addNOIA",
                  returnWMat = TRUE)
  K.A <- ZETA$A$K
  if (min(eigen(K.A)$values) < 1e-08) {
    diag(K.A) <- diag(K.A) + 1e-06
  }

  mrkEffectsForW <- crossprod(x = WMat,
                              y = solve(K.A)) %*% as.matrix(u)
  mrkEffects <- mrkEffectsForW / mean(scale(x.modi %*% mrkEffectsForW, scale = FALSE) / u)




  #### Cross-validation for genomic prediction
  noNA <- !is.na(c(pheno.mat))   ### NA (missing) in the phenotype data

  phenoNoNA <- pheno.mat[noNA, , drop = FALSE]   ### remove NA
  ZETANoNA <- ZETA
  ZETANoNA$A$Z <- ZETA$A$Z[noNA, ]   ### remove NA


  nFold <- 10    ### # of folds
  nLine <- nrow(phenoNoNA)
  idCV <- sample(1:nLine %% nFold)   ### assign random ids for cross-validation
  idCV[idCV == 0] <- nFold

  yPred <- rep(NA, nLine)

  for (noCV in 1:nFold) {
    yTrain <- phenoNoNA
    yTrain[idCV == noCV, ] <- NA   ### prepare test data

    EMM.resCV <- EMM.cpp(y = yTrain, X = NULL, ZETA = ZETANoNA)   ### prediction
    yTest <-  EMM.resCV$beta + EMM.resCV$u   ### predicted values

    yPred[idCV == noCV] <- (yTest[noNA])[idCV == noCV]
  }

  ### Plot the results
  plotRange <- range(phenoNoNA, yPred)
  plot(x = phenoNoNA, y = yPred,xlim = plotRange, ylim = plotRange,
       xlab = "Observed values", ylab = "Predicted values",
       main = "Results of Genomic Prediction",
       cex.lab = 1.5, cex.main = 1.5, cex.axis = 1.3)
  abline(a = 0, b = 1, col = 2, lwd = 2, lty = 2)
  R2 <- cor(x = phenoNoNA[, 1], y = yPred) ^ 2
  text(x = plotRange[2] - 10,
       y = plotRange[1] + 10,
       paste0("R2 = ", round(R2, 3)),
       cex = 1.5)

Equation of mixed model for one kernel, GEMMA-based method (inplemented by Rcpp)

Description

This function solves the single-kernel linear mixed effects model by GEMMA (genome wide efficient mixed model association; Zhou et al., 2012) approach.

Usage

EMM1.cpp(
  y,
  X = NULL,
  ZETA,
  eigen.G = NULL,
  n.core = NA,
  lam.len = 4,
  init.range = c(1e-04, 100),
  init.one = 0.5,
  conv.param = 1e-06,
  count.max = 15,
  bounds = c(1e-06, 1e+06),
  tol = NULL,
  REML = TRUE,
  silent = TRUE,
  plot.l = FALSE,
  SE = FALSE,
  return.Hinv = TRUE
)

Arguments

y

A n \times 1 vector. A vector of phenotypic values should be used. NA is allowed.

X

A n \times p matrix. You should assign mean vector (rep(1, n)) and covariates. NA is not allowed.

ZETA

eigen.G

A list with

$values: Eigen values
$vectors: Eigen vectors

n.core

Setting n.core > 1 will enable parallel execution on a machine with multiple cores.

lam.len

The number of initial values you set. If this number is large, the estimation will be more accurate, but computational cost will be large. We recommend setting this value 3 <= lam.len <= 6.

init.range

init.one

The initial parameter if lam.len = 1.

conv.param

The convergence parameter. If the diffrence of log-likelihood by updating the parameter "lambda" is smaller than this conv.param, the iteration steps will be stopped.

count.max

Sometimes algorithms won't converge for some initial parameters. So if the iteration steps reache to this argument, you can stop the calculation even if algorithm doesn't converge.

bounds

Lower and Upper bounds of the parameter 1 / lambda. If the updated parameter goes out of this range, the parameter is reset to the value in this range.

tol

REML

You can choose which method you will use, "REML" or "ML". If REML = TRUE, you will perform "REML", and if REML = FALSE, you will perform "ML".

silent

If this argument is TRUE, warning messages will be shown when estimation is not accurate.

plot.l

If you want to plot log-likelihood, please set plot.l = TRUE. We don't recommend plot.l = TRUE when lam.len >= 2.

SE

If TRUE, standard errors are calculated.

return.Hinv

If TRUE, the function returns the inverse of H = ZKZ' + \lambda I where \lambda = \sigma^2_e / \sigma^2_u. This is useful for GWAS.

Value

$Vu: Estimator for \sigma^2_u
$Ve: Estimator for \sigma^2_e
$beta: BLUE(\beta)
$u: BLUP(u)
$LL: Maximized log-likelihood (full or restricted, depending on method)
$beta.SE: Standard error for \beta (If SE = TRUE)
$u.SE: Standard error for u^*-u (If SE = TRUE)
$Hinv: The inverse of H = ZKZ' + \lambda I (If return.Hinv = TRUE)
$Hinv2: The inverse of H2 = ZKZ'/\lambda + I (If return.Hinv = TRUE)
$lambda: Estimators for \lambda = \sigma^2_e / \sigma^2_u
$lambdas: Lambdas for each initial values
$reest: If parameter estimation may not be accurate, reest = 1, else reest = 0
$counts: The number of iterations until convergence for each initial values

References

Kang, H.M. et al. (2008) Efficient Control of Population Structure in Model Organism Association Mapping. Genetics. 178(3): 1709-1723.

Zhou, X. and Stephens, M. (2012) Genome-wide efficient mixed-model analysis for association studies. Nat Genet. 44(7): 821-824.

Equation of mixed model for one kernel, EMMA-based method (inplemented by Rcpp)

Description

This function solves single-kernel linear mixed model by EMMA (efficient mixed model association; Kang et al., 2008) approach.

Usage

EMM2.cpp(
  y,
  X = NULL,
  ZETA,
  eigen.G = NULL,
  eigen.SGS = NULL,
  tol = NULL,
  optimizer = "nlminb",
  traceInside = 0,
  REML = TRUE,
  bounds = c(1e-09, 1e+09),
  SE = FALSE,
  return.Hinv = FALSE
)

Arguments

y

A n \times 1 vector. A vector of phenotypic values should be used. NA is allowed.

X

A n \times p matrix. You should assign mean vector (rep(1, n)) and covariates. NA is not allowed.

ZETA

eigen.G

A list with

$values: Eigen values
$vectors: Eigen vectors

eigen.SGS

A list with

$values: Eigen values
$vectors: Eigen vectors

tol

optimizer

The function used in the optimization process. We offer "optim", "optimx", and "nlminb" functions.

traceInside

Perform trace for the optimzation if traceInside >= 1, and this argument shows the frequency of reports.

REML

You can choose which method you will use, "REML" or "ML". If REML = TRUE, you will perform "REML", and if REML = FALSE, you will perform "ML".

bounds

Lower and Upper bounds of the parameter lambda. If the updated parameter goes out of this range, the parameter is reset to the value in this range.

SE

If TRUE, standard errors are calculated.

return.Hinv

If TRUE, the function returns the inverse of H = ZKZ' + \lambda I where \lambda = \sigma^2_e / \sigma^2_u. This is useful for GWAS.

Value

$Vu: Estimator for \sigma^2_u
$Ve: Estimator for \sigma^2_e
$beta: BLUE(\beta)
$u: BLUP(u)
$LL: Maximized log-likelihood (full or restricted, depending on method)
$beta.SE: Standard error for \beta (If SE = TRUE)
$u.SE: Standard error for u^*-u (If SE = TRUE)
$Hinv: The inverse of H = ZKZ' + \lambda I (If return.Hinv = TRUE)

References

Kang, H.M. et al. (2008) Efficient Control of Population Structure in Model Organism Association Mapping. Genetics. 178(3): 1709-1723.

Function to remove the minor alleles

Description

Function to remove the minor alleles

Usage

MAF.cut(
  x.0,
  map.0 = NULL,
  min.MAF = 0.05,
  max.HE = 0.999,
  max.MS = 0.05,
  return.MAF = FALSE
)

Arguments

x.0

A n \times m original marker genotype matrix.

map.0

Data frame with the marker names in the first column. The second and third columns contain the chromosome and map position.

min.MAF

Specifies the minimum minor allele frequency (MAF). If a marker has a MAF less than min.MAF, it is removed from the original marker genotype data.

max.HE

Specifies the maximum heterozygous rate (HE). If a marker has a HE more than max.HE, it is removed from the original marker genotype data.

max.MS

Specifies the maximum missing rate (MS). If a marker has a MS more than max.MS, it is removed from the original marker genotype data.

return.MAF

If TRUE, MAF will be returned.

Value

$x: The modified marker genotype data whose SNPs with MAF <= min.MAF were removed.
$map: The modified map information whose SNPs with MAF <= min.MAF were removed.
$before: Minor allele frequencies of the original marker genotype.
$after: Minor allele frequencies of the modified marker genotype.

Check epistatic effects by kernel-based GWAS (genome-wide association studies)

Description

Check epistatic effects by kernel-based GWAS (genome-wide association studies)

Usage

RGWAS.epistasis(
  pheno,
  geno,
  ZETA = NULL,
  package.MM = "gaston",
  covariate = NULL,
  covariate.factor = NULL,
  structure.matrix = NULL,
  n.PC = 0,
  min.MAF = 0.02,
  n.core = 1,
  parallel.method = "mclapply",
  test.method = "LR",
  dominance.eff = TRUE,
  skip.self.int = FALSE,
  haplotype = TRUE,
  num.hap = NULL,
  window.size.half = 5,
  window.slide = 1,
  chi0.mixture = 0.5,
  optimizer = "nlminb",
  gene.set = NULL,
  map.gene.set = NULL,
  plot.epi.3d = TRUE,
  plot.epi.2d = TRUE,
  main.epi.3d = NULL,
  main.epi.2d = NULL,
  saveName = NULL,
  skip.check = FALSE,
  verbose = TRUE,
  verbose2 = FALSE,
  count = TRUE,
  time = TRUE
)

Arguments

pheno

Data frame where the first column is the line name (gid). The remaining columns should be a phenotype to test.

geno

Data frame with the marker names in the first column. The second and third columns contain the chromosome and map position. Columns 4 and higher contain the marker scores for each line, coded as [-1, 0, 1] = [aa, Aa, AA].

ZETA

A list of covariance (relationship) matrix (K: m \times m) and its design matrix (Z: n \times m) of random effects. Please set names of list "Z" and "K"! You can use more than one kernel matrix. For example,

ZETA = list(A = list(Z = Z.A, K = K.A), D = list(Z = Z.D, K = K.D))

Z.A, Z.D: Design matrix (n \times m) for the random effects. So, in many cases, you can use the identity matrix.
K.A, K.D: Different kernels which express some relationships between lines.

For example, K.A is additive relationship matrix for the covariance between lines, and K.D is dominance relationship matrix.

package.MM

The package name to be used when solving mixed-effects model. We only offer the following three packages: "RAINBOWR", "MM4LMM" and "gaston". Default package is 'gaston'. See more details at EM3.general.

covariate

A n \times 1 vector or a n \times p _ 1 matrix. You can insert continuous values, such as other traits or genotype score for special markers. This argument is regarded as one of the fixed effects.

covariate.factor

A n \times p _ 2 dataframe. You should assign a factor vector for each column. Then RGWAS changes this argument into model matrix, and this model matrix will be included in the model as fixed effects.

structure.matrix

You can use structure matrix calculated by structure analysis when there are population structure. You should not use this argument with n.PC > 0.

n.PC

Number of principal components to include as fixed effects. Default is 0 (equals K model).

min.MAF

Specifies the minimum minor allele frequency (MAF). If a marker has a MAF less than min.MAF, it is assigned a zero score.

n.core

Setting n.core > 1 will enable parallel execution on a machine with multiple cores. This argument is not valid when 'parallel.method = "furrr"'.

parallel.method

Method for parallel computation. We offer three methods, "mclapply", "furrr", and "foreach".

When 'parallel.method = "mclapply"', we utilize pbmclapply function in the 'pbmcapply' package with 'count = TRUE' and mclapply function in the 'parallel' package with 'count = FALSE'.

When 'parallel.method = "furrr"', we utilize future_map function in the 'furrr' package. With 'count = TRUE', we also utilize progressor function in the 'progressr' package to show the progress bar, so please install the 'progressr' package from github (https://github.com/futureverse/progressr). For 'parallel.method = "furrr"', you can perform multi-thread parallelization by sharing memories, which results in saving your memory, but quite slower compared to 'parallel.method = "mclapply"'.

When 'parallel.method = "foreach"', we utilize foreach function in the 'foreach' package with the utilization of makeCluster function in 'parallel' package, and registerDoParallel function in 'doParallel' package. With 'count = TRUE', we also utilize setTxtProgressBar and txtProgressBar functions in the 'utils' package to show the progress bar.

We recommend that you use the option 'parallel.method = "mclapply"', but for Windows users, this parallelization method is not supported. So, if you are Windows user, we recommend that you use the option 'parallel.method = "foreach"'.

test.method

RGWAS supports two methods to test effects of each SNP-set.

"LR": Likelihood-ratio test, relatively slow, but accurate (default).
"score": Score test, much faster than LR, but sometimes overestimate -log10(p).

dominance.eff

If this argument is TRUE, dominance effect is included in the model, and additive x dominance and dominance x dominance are also tested as epistatic effects. When you use inbred lines, please set this argument FALSE.

skip.self.int

As default, the function also tests the self-interactions among the same SNP-sets. If you want to avoid this, please set 'skip.self.int = TRUE'.

haplotype

If the number of lines of your data is large (maybe > 100), you should set haplotype = TRUE. When haplotype = TRUE, haplotype-based kernel will be used for calculating -log10(p). (So the dimension of this gram matrix will be smaller.) The result won't be changed, but the time for the calculation will be shorter.

num.hap

When haplotype = TRUE, you can set the number of haplotypes which you expect. Then similar arrays are considered as the same haplotype, and then make kernel(K.SNP) whose dimension is num.hap x num.hap. When num.hap = NULL (default), num.hap will be set as the maximum number which reflects the difference between lines.

window.size.half

This argument decides how many SNPs (around the SNP you want to test) are used to calculated K.SNP. More precisely, the number of SNPs will be 2 * window.size.half + 1.

window.slide

This argument determines how often you test markers. If window.slide = 1, every marker will be tested. If you want to perform SNP set by bins, please set window.slide = 2 * window.size.half + 1.

chi0.mixture

RAINBOWR assumes the deviance is considered to follow a x chisq(df = 0) + (1 - a) x chisq(df = r). where r is the degree of freedom. The argument chi0.mixture is a (0 <= a < 1), and default is 0.5.

optimizer

The function used in the optimization process. We offer "optim", "optimx", and "nlminb" functions.

gene.set

If you have information of gene (or haplotype block), you can use it to perform kernel-based GWAS. You should assign your gene information to gene.set in the form of a "data.frame" (whose dimension is (the number of gene) x 2). In the first column, you should assign the gene name. And in the second column, you should assign the names of each marker, which correspond to the marker names of "geno" argument.

map.gene.set

Genotype map for 'gene.set' (list of haplotype blocks). This is a data.frame with the haplotype block (SNP-set, or gene-set) names in the first column. The second and third columns contain the chromosome and map position for each block. The forth column contains the cumulative map position for each block, which can be computed by cumsumPos function. If this argument is NULL, the map will be constructed by genesetmap function after the SNP-set GWAS. It will take some time, so you can reduce the computational time by assigning this argument beforehand.

plot.epi.3d

If TRUE, draw 3d plot

plot.epi.2d

If TRUE, draw 2d plot

main.epi.3d

The title of 3d plot. If this argument is NULL, trait name is set as the title.

main.epi.2d

The title of 2d plot. If this argument is NULL, trait name is set as the title.

saveName

When drawing any plot, you can save plots in png format. In saveName, you should substitute the name you want to save. When saveName = NULL, the plot is not saved.

skip.check

As default, RAINBOWR checks the type of input data and modifies it into the correct format. However, it will take some time, so if you prepare the correct format of input data, you can skip this procedure by setting 'skip.check = TRUE'.

verbose

If this argument is TRUE, messages for the current steps will be shown.

verbose2

If this argument is TRUE, welcome message will be shown.

count

When count is TRUE, you can know how far RGWAS has ended with percent display.

time

When time is TRUE, you can know how much time it took to perform RGWAS.

Value

$map

Map information for SNPs which are tested epistatic effects.

$scores

$scores: This is the matrix which contains -log10(p) calculated by the test about epistasis effects.
$x, $y: The information of the positions of SNPs detected by regular GWAS. These vectors are used when drawing plots. Each output correspond to the replication of row and column of scores.
$z: This is a vector of $scores. This vector is also used when drawing plots.

References

Storey, J.D. and Tibshirani, R. (2003) Statistical significance for genomewide studies. Proc Natl Acad Sci. 100(16): 9440-9445.

Yu, J. et al. (2006) A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat Genet. 38(2): 203-208.

Kang, H.M. et al. (2008) Efficient Control of Population Structure in Model Organism Association Mapping. Genetics. 178(3): 1709-1723.

Endelman, J.B. (2011) Ridge Regression and Other Kernels for Genomic Selection with R Package rrBLUP. Plant Genome J. 4(3): 250.

Endelman, J.B. and Jannink, J.L. (2012) Shrinkage Estimation of the Realized Relationship Matrix. G3 Genes, Genomes, Genet. 2(11): 1405-1413.

Su, G. et al. (2012) Estimating Additive and Non-Additive Genetic Variances and Predicting Genetic Merits Using Genome-Wide Dense Single Nucleotide Polymorphism Markers. PLoS One. 7(9): 1-7.

Zhou, X. and Stephens, M. (2012) Genome-wide efficient mixed-model analysis for association studies. Nat Genet. 44(7): 821-824.

Listgarten, J. et al. (2013) A powerful and efficient set test for genetic markers that handles confounders. Bioinformatics. 29(12): 1526-1533.

Lippert, C. et al. (2014) Greater power and computational efficiency for kernel-based association testing of sets of genetic variants. Bioinformatics. 30(22): 3206-3214.

Jiang, Y. and Reif, J.C. (2015) Modeling epistasis in genomic selection. Genetics. 201(2): 759-768.

Examples





  ### Import RAINBOWR
  require(RAINBOWR)

  ### Load example datasets
  data("Rice_Zhao_etal")
  Rice_geno_score <- Rice_Zhao_etal$genoScore
  Rice_geno_map <- Rice_Zhao_etal$genoMap
  Rice_pheno <- Rice_Zhao_etal$pheno
  Rice_haplo_block <- Rice_Zhao_etal$haploBlock

  ### View each dataset
  See(Rice_geno_score)
  See(Rice_geno_map)
  See(Rice_pheno)
  See(Rice_haplo_block)

  ### Select one trait for example
  trait.name <- "Flowering.time.at.Arkansas"
  y <- as.matrix(Rice_pheno[, trait.name, drop = FALSE])

  ### Remove SNPs whose MAF <= 0.05
  x.0 <- t(Rice_geno_score)
  MAF.cut.res <- MAF.cut(x.0 = x.0, map.0 = Rice_geno_map)
  x <- MAF.cut.res$x
  map <- MAF.cut.res$map


  ### Estimate genomic relationship matrix (GRM)
  K.A <- calcGRM(genoMat = x)


  ### Modify data
  modify.data.res <- modify.data(pheno.mat = y, geno.mat = x, map = map,
                                 return.ZETA = TRUE, return.GWAS.format = TRUE)
  pheno.GWAS <- modify.data.res$pheno.GWAS
  geno.GWAS <- modify.data.res$geno.GWAS
  ZETA <- modify.data.res$ZETA


  ### View each data for RAINBOWR
  See(pheno.GWAS)
  See(geno.GWAS)
  str(ZETA)


  ### Check epistatic effects (by regarding 11 SNPs as one SNP-set)
  epistasis.res <- RGWAS.epistasis(pheno = pheno.GWAS, geno = geno.GWAS, ZETA = ZETA,
                                   n.PC = 4, test.method = "LR", gene.set = NULL,
                                   window.size.half = 5, window.slide = 11,
                                   package.MM = "gaston", parallel.method = "mclapply",
                                   skip.check = TRUE, n.core = 2)

  See(epistasis.res$scores$scores)


  ### Check epistatic effects (by using the list of haplotype blocks estimated by PLINK)
  ### It will take almost 2 minutes...
  epistasis_haplo_block.res <- RGWAS.epistasis(pheno = pheno.GWAS, geno = geno.GWAS,
                                               ZETA = ZETA, n.PC = 4,
                                               test.method = "LR", gene.set = Rice_haplo_block,
                                               package.MM = "gaston", parallel.method = "mclapply",
                                               skip.check = TRUE, n.core = 2)

  See(epistasis_haplo_block.res$scores$scores)

Description

Print the R code which you should perform for RAINBOWR (Reliable Association INference By Optimizing Weights with R).

Usage

RGWAS.menu()

Value

The R code which you should perform for RAINBOWR GWAS

Testing multiple SNPs simultaneously for GWAS

Description

This function performs SNP-set GWAS (genome-wide association studies), which tests multiple SNPs (single nucleotide polymorphisms) simultaneously. The model of SNP-set GWAS is

y = X \beta + Q v + Z _ {c} u _ {c} + Z _ {r} u _ {r} + \epsilon,

where y is the vector of phenotypic values, X \beta and Q v are the terms of fixed effects, Z _ {c} u _ {c} and Z _ {c} u _ {c} are the term of random effects and e is the vector of residuals. X \beta indicates all of the fixed effects other than population structure, and often this term also plays a role as an intercept. Q v is the term to correct the effect of population structure. Z _ {c} u _ {c} is the term of polygenetic effects, and suppose that u _ {c} follows the multivariate normal distribution whose variance-covariance matrix is the genetic covariance matrix. u _ {c} \sim MVN (0, K _ {c} \sigma_{c}^{2}). Z _ {r} u _ {r} is the term of effects for SNP-set of interest, and suppose that u _ {r} follows the multivariate normal distribution whose variance-covariance matrix is the Gram matrix (linear, exponential, or gaussian kernel) calculated from marker genotype which belong to that SNP-set. Therefore, u _ {r} \sim MVN (0, K _ {r} \sigma_{r}^{2}). Finally, the residual term is assumed to identically and independently follow a normal distribution as shown in the following equation. e \sim MVN (0, I \sigma_{e}^{2}).

Usage

RGWAS.multisnp(
  pheno,
  geno,
  ZETA = NULL,
  package.MM = "gaston",
  covariate = NULL,
  covariate.factor = NULL,
  structure.matrix = NULL,
  n.PC = 0,
  min.MAF = 0.02,
  test.method = "LR",
  n.core = 1,
  parallel.method = "mclapply",
  kernel.method = "linear",
  kernel.h = "tuned",
  haplotype = TRUE,
  num.hap = NULL,
  test.effect = "additive",
  window.size.half = 5,
  window.slide = 1,
  chi0.mixture = 0.5,
  gene.set = NULL,
  map.gene.set = NULL,
  weighting.center = TRUE,
  weighting.other = NULL,
  sig.level = 0.05,
  method.thres = "BH",
  plot.qq = TRUE,
  plot.Manhattan = TRUE,
  plot.method = 1,
  plot.col1 = c("dark blue", "cornflowerblue"),
  plot.col2 = 1,
  plot.type = "p",
  plot.pch = 16,
  saveName = NULL,
  main.qq = NULL,
  main.man = NULL,
  plot.add.last = FALSE,
  return.EMM.res = FALSE,
  optimizer = "nlminb",
  thres = TRUE,
  skip.check = FALSE,
  verbose = TRUE,
  verbose2 = FALSE,
  count = TRUE,
  time = TRUE
)

Arguments

pheno

Data frame where the first column is the line name (gid). The remaining columns should be a phenotype to test.

geno

ZETA

ZETA = list(A = list(Z = Z.A, K = K.A), D = list(Z = Z.D, K = K.D))

Z.A, Z.D: Design matrix (n \times m) for the random effects. So, in many cases, you can use the identity matrix.
K.A, K.D: Different kernels which express some relationships between lines.

For example, K.A is additive relationship matrix for the covariance between lines, and K.D is dominance relationship matrix.

package.MM

covariate

covariate.factor

structure.matrix

You can use structure matrix calculated by structure analysis when there are population structure. You should not use this argument with n.PC > 0.

n.PC

Number of principal components to include as fixed effects. Default is 0 (equals K model).

min.MAF

Specifies the minimum minor allele frequency (MAF). If a marker has a MAF less than min.MAF, it is assigned a zero score.

test.method

RGWAS supports two methods to test effects of each SNP-set.

"LR": Likelihood-ratio test, relatively slow, but accurate (default).
"score": Score test, much faster than LR, but sometimes overestimate -log10(p).

n.core

Setting n.core > 1 will enable parallel execution on a machine with multiple cores. This argument is not valid when 'parallel.method = "furrr"'.

parallel.method

Method for parallel computation. We offer three methods, "mclapply", "furrr", and "foreach".

When 'parallel.method = "mclapply"', we utilize pbmclapply function in the 'pbmcapply' package with 'count = TRUE' and mclapply function in the 'parallel' package with 'count = FALSE'.

kernel.method

It determines how to calculate kernel. There are three methods.

"gaussian": It is the default method. Gaussian kernel is calculated by distance matrix.
"exponential": When this method is selected, exponential kernel is calculated by distance matrix.
"linear": When this method is selected, linear kernel is calculated by NOIA methods for additive GRM.

So local genomic relation matrix is regarded as kernel.

kernel.h

The hyper parameter for gaussian or exponential kernel. If kernel.h = "tuned", this hyper parameter is calculated as the median of off-diagonals of distance matrix of genotype data.

haplotype

num.hap

test.effect

Effect of each marker to test. You can choose "test.effect" from "additive", "dominance" and "additive+dominance". You also can choose more than one effect, for example, test.effect = c("additive", "aditive+dominance")

window.size.half

This argument decides how many SNPs (around the SNP you want to test) are used to calculated K.SNP. More precisely, the number of SNPs will be 2 * window.size.half + 1.

window.slide

This argument determines how often you test markers. If window.slide = 1, every marker will be tested. If you want to perform SNP set by bins, please set window.slide = 2 * window.size.half + 1.

chi0.mixture

RAINBOWR assumes the deviance is considered to follow a x chisq(df = 0) + (1 - a) x chisq(df = r). where r is the degree of freedom. The argument chi0.mixture is a (0 <= a < 1), and default is 0.5.

gene.set

map.gene.set

weighting.center

In kernel-based GWAS, weights according to the Gaussian distribution (centered on the tested SNP) are taken into account when calculating the kernel if Rainbow = TRUE. If weighting.center = FALSE, weights are not taken into account.

weighting.other

You can set other weights in addition to weighting.center. The length of this argument should be equal to the number of SNPs. For example, you can assign SNP effects from the information of gene annotation.

sig.level

Significance level for the threshold. The default is 0.05.

method.thres

Method for detemining threshold of significance. "BH" and "Bonferroni are offered.

plot.qq

If TRUE, draw qq plot.

plot.Manhattan

If TRUE, draw manhattan plot.

plot.method

If this argument = 1, the default manhattan plot will be drawn. If this argument = 2, the manhattan plot with axis based on Position (bp) will be drawn. Also, this plot's color is changed by all chromosomes.

plot.col1

This argument determines the color of the manhattan plot. You should substitute this argument as color vector whose length is 2. plot.col1[1] for odd chromosomes and plot.col1[2] for even chromosomes

plot.col2

Color of the manhattan plot. color changes with chromosome and it starts from plot.col2 + 1 (so plot.col2 = 1 means color starts from red.)

plot.type

This argument determines the type of the manhattan plot. See the help page of "plot".

plot.pch

This argument determines the shape of the dot of the manhattan plot. See the help page of "plot".

saveName

When drawing any plot, you can save plots in png format. In saveName, you should substitute the name you want to save. When saveName = NULL, the plot is not saved.

main.qq

The title of qq plot. If this argument is NULL, trait name is set as the title.

main.man

The title of manhattan plot. If this argument is NULL, trait name is set as the title.

plot.add.last

If saveName is not NULL and this argument is TRUE, then you can add lines or dots to manhattan plots. However, you should also write "dev.off()" after adding something.

return.EMM.res

When return.EMM.res = TRUE, the results of equation of mixed models are included in the result of RGWAS.

optimizer

The function used in the optimization process. We offer "optim", "optimx", and "nlminb" functions.

thres

If thres = TRUE, the threshold of the manhattan plot is included in the result of RGWAS. When return.EMM.res or thres is TRUE, the results will be "list" class.

skip.check

verbose

If this argument is TRUE, messages for the current steps will be shown.

verbose2

If this argument is TRUE, welcome message will be shown.

count

When count is TRUE, you can know how far RGWAS has ended with percent display.

time

When time is TRUE, you can know how much time it took to perform RGWAS.

Details

P-value for each SNP-set is calculated by performing the LR test or the score test (Lippert et al., 2014).

In the LR test, first, the function solves the multi-kernel mixed model and calaculates the maximum restricted log likelihood. Then it performs the LR test by using the fact that the deviance

D = 2 \times (LL _ {alt} - LL _ {null})

follows the chi-square distribution.

In the score test, the maximization of the likelihood is only performed for the null model. In other words, the function calculates the score statistic without solving the multi-kernel mixed model for each SNP-set. Then it performs the score test by using the fact that the score statistic follows the chi-square distribution.

Value

$D: Dataframe which contains the information of the map you input and the results of RGWAS (-log10(p)) which correspond to the map. If there are more than one test.effects, then multiple lists for each test.effect are returned respectively.
$thres: A vector which contains the information of threshold determined by FDR = 0.05.
$EMM.res: This output is a list which contains the information about the results of "EMM" perfomed at first in regular GWAS. If you want to know details, see the description for the function "EMM1" or "EMM2".