The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

Advanced Non-Normal Data Example

Aimee Lee Houde

2024-01-27

In this vignette, the fullfact package is explored using advanced (functions designated by the number 2) for the standard model, i.e. containing random effects for dam, sire, and dam by sire, for non-normal error structures (e.g. binary, proportion, and/or count data types), with the options of including additional random effects for one position (e.g. tank) and/or one block effect (e.g. several blocks of 2 \(\times\) 2 factorial matings).

Simple (functions designated by no number) for the standard model only is explored in the vignette Simple Non-Normal Data Example.

Expert (functions designated by the number 3) for the standard model with the ability of the user to include additional fixed and/or random effects, such as a model including environment treatments and their interactions is explored in the vignette Expert Non-Normal Data Example.

Normal error structure or data type is explored in another three vignettes: (1) Simple Normal Data Example, (2) Advanced Normal Data Example, and (3) Expert Normal Data Example.

Load the package and example data

The example data set is an 11 \(\times\) 11 full factorial mating: 11 dams and 11 sires with all combinations resulting in 121 families. There are two replicates per family.

library("fullfact")  
data(chinook_survival)
head(chinook_survival)
#>   family repli dam sire tray cell alive dead egg_size
#> 1     f1    r1  d1   s1   t7   1A   136   14     7.27
#> 2     f1    r2  d1   s1   t8   1A   146    4     7.27
#> 3     f2    r1  d1   s2   t7   1B   128   22     7.27
#> 4     f2    r2  d1   s2   t8   1B   132   18     7.27
#> 5     f3    r1  d1   s3   t7   1C   142    8     7.27
#> 6     f3    r2  d1   s3   t8   1C   144    6     7.27

Displayed are columns for family identities (ID), replicate ID, dam ID, sire ID, incubation tray ID, incubation cell ID (within tray), Chinook salmon number of offspring alive, number of offspring dead, and dam egg size (mm). The total number of offspring per family is 300 with 150 per replicate.

Convert to a binary data frame

For data that were recorded at the replicate-level, such as the number of offspring dead or alive for survival in the example data set, these data should be converted to the individual-level to not underestimate phenotypic variance and influence variance component estimates (see Puurtinen et al. 2009).

Puurtinen M, Ketola T, Kotiaho JS. 2009. The good-genes and compatible-genes benefits of mate choice. The American Naturalist 174(5): 741-752. DOI: 10.1086/606024

The buildBinary function can assign a binary number (i.e. ‘0’ or ‘1’) to two columns containing the number of offspring and copy information by the number of times equal to the number of offspring. The final data set will have a number of rows matching the total number of offspring.

one is the column name of counts to assign a ‘1’ value, e.g. alive. zero is the column name of counts to assign a ‘0’ value, e.g. dead.

copy is a vector of column numbers (to copy the contents). Does not need to contain the one and zero column names.

The buildMulti function is similar and can assign multiple numbers to multiple columns. multi is a list containing the numbers to assign and matching column names, e.g. list(c(2,1,0),c(“black”,“gray”,“white”)).

chinook_survival2<- buildBinary(dat=chinook_survival,copy=c(1:6,9),one="alive",zero="dead")
rm(chinook_survival) #remove original
head(chinook_survival2)
#>     status family repli dam sire tray cell egg_size
#> 1        1     f1    r1  d1   s1   t7   1A     7.27
#> 1.1      1     f1    r1  d1   s1   t7   1A     7.27
#> 1.2      1     f1    r1  d1   s1   t7   1A     7.27
#> 1.3      1     f1    r1  d1   s1   t7   1A     7.27
#> 1.4      1     f1    r1  d1   s1   t7   1A     7.27
#> 1.5      1     f1    r1  d1   s1   t7   1A     7.27
#Multinomial example
#>chinook_survival$total<- chinook_survival$alive + chinook_survival$dead
#>chinook_survival3<- buildMulti(dat=chinook_survival,copy=c(1:6,9),multi=list(c(2,1,0),
#>c("total","alive","dead")))
#>head(chinook_survival3)

A new column is produced named “status” containing the 1 and 0 values for the offspring. The “alive” and “dead” columns are not included because their column numbers (7 and 8) were not in copy.

Observed variance components

Model random effects are dam, sire, and dam by sire. Options to include one random position and/or one random block effect(s). Extracts the dam, sire, dam, and dam by sire variance components. Calculates the residual and total variance component. Calculates the additive genetic, non-additive genetic, and maternal variance components. Extracts optional position and block variance components.

The residual variance component for the binomial and Poisson error structures with four links are described by Nakagawa and Schielzeth (2010, 2013). Specifically, the residual variance component for binomial errors with the logit link is \(\pi\)2/3; binomial errors with the probit link is 1; Poisson errors with the log link is ln(1/exp(\(\beta\)0) + 1), where \(\beta\)0 is the intercept value from the model without any fixed effects and containing only the random effects; and Poisson errors with the square-root link is 0.25.

Assuming the effects of epistasis are of negligible importance, the additive genetic variance (VA) component is calculated as four times the sire (VS), the non-additive genetic variance (VN) component as four times the dam by sire interaction (VD\(\times\)S), and the maternal variance component (VM) as the dam (VD) – sire (VS) (Lynch and Walsh 1998, p. 603). When there is epistasis, those variance components will be overestimated and this may explain why the percentage of phenotypic variance explained by the components can add up to more than 100% in certain cases.

fam_link is the family and link in family(link) format. Supported options are binomial(link=“logit”), binomial(link=“probit”), poisson(link=“log”), and poisson(link=“sqrt”). Binary or proportion data are typically analyzed with binomial. Count data are typically analyzed with Poisson.

Default in quasi = F. Option for overdispersion or quasi-error structure is quasi = T, such that an observation-level random effect is added to the model (Atkins et al. 2013).

position is the column name containing position factor information.

block is the column name containing block factor information.

Significance values for the random effects are determined using likelihood ratio tests (Bolker et al. 2009).

Atkins DC, Baldwin SA, Zheng C, Gallop RJ, Neighbors C. 2013. A tutorial on count regression and zero-altered count models for longitudinal substance use data. Psychology of Addictive Behaviors 27(1): 166-177. DOI: 10.1037/a0029508

Nakagawa S, Schielzeth H. 2010. Repeatability for Gaussian and non-Gaussian data: a practical guide for biologists. Biological Reviews 85(4): 935-956. DOI: 10.1111/j.1469-185X.2010.00141.x

Nakagawa S, Schielzeth H. 2013. A general and simple method for obtaining R2 from generalized linear mixed-effects models. Methods in Ecology and Evolution 4(2): 133-142. DOI: 10.1111/j.2041-210x.2012.00261.x

Lynch M, Walsh B. 1998. Genetics and Analysis of Quantitative Traits. Sinauer Associates, Massachusetts.

Bolker BM, Brooks ME, Clark CJ, Geange SW, Poulsen JR, Stevens MHH, White J-SS. 2009. Generalized linear mixed models: a practical guide for ecology and evolution. Trends in Ecology and Evolution 24(3): 127-135. DOI: 10.1016/j.tree.2008.10.008

For this example, we explore position (tray) effects for Chinook salmon survival.

survival_mod2<- observGlmer2(observ=chinook_survival2,dam="dam",sire="sire",response="status",
fam_link=binomial(link="logit"),position="tray")
#> [1] "2024-01-27 12:56:12 PST"
#> Time difference of 1.338003 mins
survival_mod2
#> $random
#>     effect    variance     percent      d.AIC      d.BIC     Chi.sq       p.value
#> 1 dam:sire 0.167061285  3.78378550 595.885940 587.386367 597.885940 4.826421e-132
#> 2     tray 0.003668016  0.08307721   1.291355  -7.208218   3.291355  6.964552e-02
#> 3     sire 0.166654347  3.77456873  44.385729  35.886156  46.385729  9.712031e-12
#> 4      dam 0.787937313 17.84606041 126.128476 117.628903 128.128476  1.052074e-29
#> 
#> $other
#>   component variance   percent
#> 1  Residual 3.289868  74.51251
#> 2     Total 4.415189 100.00000
#> 
#> $calculation
#>   component  variance  percent
#> 1  additive 0.6666174 15.09827
#> 2    nonadd 0.6682451 15.13514
#> 3  maternal 0.6212830 14.07149

Produces a list object containing three data frames. Each data frame contains the raw variance components and the variance components as a percentage of the total variance component. The first data frame also contains the difference in AIC and BIC, and likelihood ratio test Chi-square and p-value for all random effects.

Note

The Laplace approximation is used because there were fewer disadvantages relative to penalized quasi-likelihood and Gauss-Hermite quadrature parameter estimation (Bolker et al. 2009). That is, penalized quasi-likelihood is not recommended for count responses with means less than 5 and binary responses with less than 5 successes per group. Gauss-Hermite quadrature is not recommended for more than two or three random effects because of the rapidly declining analytical speed with the increasing number of random effects.

Statistical Power analysis

Power values are calculated by stochastically simulating data for a number of iterations and then calculating the proportion of P-values less than \(\alpha\) (e.g. 0.05) for each component (Bolker 2008). Simulated data are specified by inputs for variance component values and the sample sizes.

Bolker BM. 2008. Ecological Models and Data in R. Princeton University Press, Princeton.

Defaults are alpha = 0.05 for 5% and nsim = 100 for 100 simulations.

varcomp is a vector of dam, sire, dam by sire, position and/or block variance components, i.e. c(dam,sire,dam \(\times\) sire,position/block). If there is a position and a block, c(dam,sire,dam \(\times\) sire,position,block).

nval is a vector of dam, sire, offspring per family, and offspring per position or number of block sample sizes, i.e. c(dam,sire,offspring,position/block). If there is a position and a block, c(dam,sire,offspring,position,block).

position is optional number of positions.

block is optional vector of dams and sires per block, e.g. c(2,2).

poisLog is the residual variance component value if using fam_link = poisson(link="log").

For this example, the variance components of observGlmer2 above are used (i.e. dam= 0.7880, sire= 0.1667, dam \(\times\) sire= 0.1671, tray= 0.0037) and the sample size of the Chinook salmon data set (i.e. dam= 11, sire= 11, offspring= 300, offspring per position= 3300). Position was represented by 11 trays. The actual design was composed of 16 trays with 1,650–2,400 offspring each. However, powerGlmer2 uses an equal number of offspring per position, so the number of trays was decreased from 16 to 15.

Full analysis is 100 simulations. Example has 2 simulations.

#full
#>powerGlmer2(varcomp=c(0.7880,0.1667,0.1671,0.0037),nval=c(11,11,300,2420),
#>fam_link=binomial(link="logit"),position=15) 
#2 simulations
powerGlmer2(varcomp=c(0.7880,0.1667,0.1671,0.0037),nval=c(11,11,300,2420),
fam_link=binomial(link="logit"),position=15,nsim=2) 
#> [1] "2024-01-27 12:57:33 PST"
#> [1] "Starting simulation: 1"
#> [1] "Starting simulation: 2"
#> Time difference of 3.108018 mins
#>       term   n   var_in     var_out power
#> 1      dam  11 0.788000 1.072130283     1
#> 2     sire  11 0.166700 0.180743178     1
#> 3 dam.sire 121 0.167100 0.207502496     1
#> 4 position  15 0.003700 0.003502962     1
#> 5 residual  NA 3.289868 3.289868134    NA

#Block examples using 8 dams, 8 sires (as four 2x2 blocks), and 20 offspring per family
#>powerGlmer2(varcomp=c(0.7880,0.1667,0.1671,0.0037),nval=c(8,8,20,4),
#>fam_link=binomial(link="logit"),block=c(2,2)) 
#>powerGlmer2(varcomp=c(0.7880,0.1667,0.1671,0.0037,0.0037),nval=c(8,8,20,40,4),
#>fam_link=binomial(link="logit"),position=8,block=c(2,2)) #with position

There is sufficient power (\(\ge\) 0.8) for dam, sire, and dam by sire variance components. There was also sufficient power for position (tray) variance components. In the cases of insufficient power (< 0.8), the sample size of dam, sire, and/or offspring can be increased until there is sufficient power.

Taking the reverse approach (can the sample size of dam, sire, or offspring be reduced while maintaining sufficient power?) using the same variance components and offspring sample size, dam and sire sample sizes could be reduced from 11 to 7. The position sample size was reduced accordingly, i.e. 7 dams \(\times\) 7 sires \(\times\) 300 offspring = 14,700, divided by 15 trays for 980 offspring each.

#full
#>powerGlmer2(varcomp=c(0.7880,0.1667,0.1671,0.0037),nval=c(7,7,300,980),
#>fam_link=binomial(link="logit"),position=15)
#2 simulations
powerGlmer2(varcomp=c(0.7880,0.1667,0.1671,0.0037),nval=c(7,7,300,980),
fam_link=binomial(link="logit"),position=15,nsim=2)
#> [1] "2024-01-27 13:00:39 PST"
#> [1] "Starting simulation: 1"
#> [1] "Starting simulation: 2"
#> Time difference of 1.062281 mins
#>       term  n   var_in     var_out power
#> 1      dam  7 0.788000 0.276665384   1.0
#> 2     sire  7 0.166700 0.176126942   1.0
#> 3 dam.sire 49 0.167100 0.120607315   1.0
#> 4 position 15 0.003700 0.005058561   0.5
#> 5 residual NA 3.289868 3.289868134    NA

Bootstrap confidence intervals

Confidence intervals for the additive genetic, non-additive genetic, and maternal variance components can be produced using the bootstrap-t resampling method described by Efron and Tibshirani (1993, p. 160‒162). Observations are resampled with replacement until the original sample size is reproduced. The resampled data are then used in the model and the additive genetic, non-additive genetic, and maternal variance components are extracted. The process is repeated for a number of iterations, typically 1,000 times, to produce a distribution for each component. The confidence interval lower and upper limits and median are extracted from the distribution.

Efron B, Tibshirani R. 1993. An Introduction to the Bootstrap. Chapman and Hall, New York.

Resample observations

The resampRepli function is used to bootstrap resample observations grouped by replicate ID within family ID for a specified number of iterations to create the resampled data set. A similar resampFamily function is able to resample observations grouped by family ID only.

copy is a vector of column numbers (to copy the contents). Does not need to contain the family and/or replicate columns.

Full analysis is 1000 iterations. Example has 5 iterations.

#>resampRepli(dat=chinook_survival2,copy=c(1,4:8),family="family",replicate="repli",iter=1000) #full
#>resampFamily(dat=chinook_survival2,copy=c(1,4:8),family="family",iter=1000) #family only
resampRepli(dat=chinook_survival2,copy=c(1,4:8),family="family",replicate="repli",iter=5) #5 iterations

Because of the large file sizes that can be produced, the resampling of each replicate Y per family X is saved separately as a common separated (X_Y_resampR.csv) file in the working directory. These files are merged to create the final resampled data set (resamp_datR.csv).

If using resampFamily, the file names are X_resampF.csv per family and resamp_datF.csv for the final resampled data set.

Iteration variance components

The equivalent to observGlmer2 is available for the final bootstrap resampled data set, i.e. resampGlmer2.

Default is no overdispersion as quasi = F. The starting model number start = and ending model number end = need to be specified.

Full analysis is 1000 iterations. Example has 2 iterations.

#>survival_datR<- read.csv("resamp_datR.csv") #1000 iterations
#>survival_rcomp2<- resampGlmer2(resamp=survival_datR,dam="dam",sire="sire",response="status",
#>fam_link=binomial(logit),position="tray",start=1,end=1000) #full
data(chinook_resampS) #5 iterations
head(chinook_resampS)
#>   status1 dam1 sire1 tray1 cell1 egg_size1 status2 dam2 sire2 tray2 cell2 egg_size2 status3 dam3 sire3 tray3 cell3
#> 1       1   d1    s1    t7    1A      7.27       1   d1    s1    t7    1A      7.27       1   d1    s1    t7    1A
#> 2       1   d1    s1    t7    1A      7.27       0   d1    s1    t7    1A      7.27       1   d1    s1    t7    1A
#> 3       1   d1    s1    t7    1A      7.27       0   d1    s1    t7    1A      7.27       1   d1    s1    t7    1A
#> 4       1   d1    s1    t7    1A      7.27       1   d1    s1    t7    1A      7.27       1   d1    s1    t7    1A
#> 5       1   d1    s1    t7    1A      7.27       1   d1    s1    t7    1A      7.27       1   d1    s1    t7    1A
#> 6       0   d1    s1    t7    1A      7.27       1   d1    s1    t7    1A      7.27       1   d1    s1    t7    1A
#>   egg_size3 status4 dam4 sire4 tray4 cell4 egg_size4 status5 dam5 sire5 tray5 cell5 egg_size5
#> 1      7.27       1   d1    s1    t7    1A      7.27       1   d1    s1    t7    1A      7.27
#> 2      7.27       1   d1    s1    t7    1A      7.27       1   d1    s1    t7    1A      7.27
#> 3      7.27       1   d1    s1    t7    1A      7.27       1   d1    s1    t7    1A      7.27
#> 4      7.27       1   d1    s1    t7    1A      7.27       1   d1    s1    t7    1A      7.27
#> 5      7.27       1   d1    s1    t7    1A      7.27       1   d1    s1    t7    1A      7.27
#> 6      7.27       1   d1    s1    t7    1A      7.27       1   d1    s1    t7    1A      7.27
survival_rcomp2<- resampGlmer2(resamp=chinook_resampS,dam="dam",sire="sire",response="status",
fam_link=binomial(link="logit"),position="tray",start=1,end=2)
#> [1] "2024-01-27 13:01:43 PST"
#> [1] "Working on model: 1"
#> [1] "Working on model: 2"
#> Time difference of 54.80829 secs
survival_rcomp2[1:2,]
#>    dam:sire        tray      sire       dam Residual    Total  additive    nonadd  maternal
#> 1 0.1913733 0.005188036 0.1721041 0.7757233 3.289868 4.434257 0.6884163 0.7654933 0.6036192
#> 2 0.1776859 0.004506256 0.1661613 0.7820526 3.289868 4.420274 0.6646453 0.7107437 0.6158912

The function provides a data frame with columns containing the raw variance components for dam, sire, dam by sire, residual, total, additive genetic, non-additive genetic, and maternal. Also columns containing the raw variance components for the options of position and/or block. The number of rows in the data frame matches the number of iterations in the resampled data set and each row represents a model number.

Extract confidence intervals

Extract the bootstrap-t confidence intervals (CI) and median for the additive genetic, non-additive genetic, and maternal values from the data frame of models produced using resampGlmer2. Also extracts intervals for optional position and block variance components.

Default confidence interval is 95% as level = 95.

For this example, the “Total” column was used instead of the “tray” column as position because the chinook_bootS example data for the bootstrap-t CI were produced using another model that did not include tray (see the vignette for Simple Non-Normal Data Example).

#>ciMANA(comp=survival_rcomp2,position="tray") #full
data(chinook_bootS) #similar to survival_rcomp2 1000 models, but has no tray
ciMANA2(comp=chinook_bootS,position="Total")
#> $raw
#>   component lower median upper
#> 1  additive 0.575  0.672 0.764
#> 2    nonadd 0.660  0.766 0.872
#> 3  maternal 0.576  0.635 0.694
#> 4     Total 4.381  4.451 4.524
#> 
#> $percentage
#>   component lower median upper
#> 1  additive  13.0   15.1  17.1
#> 2    nonadd  14.9   17.2  19.6
#> 3  maternal  13.1   14.3  15.4
#> 4     Total 100.0  100.0 100.0

The raw values are presented and are converted to a percentage of the total variance for each model. Defaults are the number of decimal places to round CI raw values as rnd_r = 3 and to round the CI percent values as rnd_p = 1.

The bootstrap-t method may produce medians that are largely different from the observed values. In the present example, the raw 95% CI are a good fit to the observed values. Nonetheless, options are provided below for 95% CI that are a poor fit.

Bias and accelerated corrected confidence intervals

The BCa method (bias and acceleration) described by Efron and Tibshirani (1993, p.184‒188) can be used for the correction of bootstrap-t confidence intervals.

bias is a vector of additive, non-additive, maternal, position and/or block variance components, i.e. c(additive,non-additive,maternal,position/block), from the raw observed variance components of observGlmer2. If there is a position and a block, c(additive,non-additive,maternal,position,block).

The raw variance components of the chinook_bootS model were additive= 0.6655, non-additive= 0.6692, maternal= 0.6266, and total= 4.4166. Typically the variance components would be from observGlmer2 above, including position, for the analysis pipeline.

#bias only
ciMANA2(comp=chinook_bootS,bias=c(0.6655,0.6692,0.6266,4.4166),position="Total")
#> $raw
#>   component lower median upper
#> 1  additive 0.563  0.661 0.752
#> 2    nonadd 0.616  0.619 0.686
#> 3  maternal 0.565  0.620 0.677
#> 4     Total 4.345  4.384 4.454
#> 
#> $percentage
#>   component lower median upper
#> 1  additive  12.8   14.8  16.8
#> 2    nonadd  14.1   14.1  15.5
#> 3  maternal  12.8   14.0  15.1
#> 4     Total 100.0  100.0 100.0
#full, observGlmer2 components
#>ciMANA2(comp=survival_rcomp2,bias=c(0.6655,0.6692,0.6266,0.0037),position="tray") 

accel for acceleration correction uses the delete-one observation jackknife data set. In this example delete-30 observations is used because of the large data set; see the next section.

data(chinook_jackS) #delete-30
#bias and acceleration
ciMANA2(comp=chinook_bootS,bias=c(0.6655,0.6692,0.6266,4.4166),position="Total",
accel=chinook_jackS) 
#> $raw
#>   component lower median upper
#> 1  additive 0.563  0.661 0.752
#> 2    nonadd 0.616  0.619 0.686
#> 3  maternal 0.565  0.620 0.677
#> 4     Total 4.345  4.384 4.454
#> 
#> $percentage
#>   component lower median upper
#> 1  additive  12.8   14.8  16.8
#> 2    nonadd  14.1   14.1  15.5
#> 3  maternal  12.8   14.0  15.1
#> 4     Total 100.0  100.0 100.0
#full, observGlmer2 components
#>ciMANA2(comp=survival_rcomp1,bias=c(0.6655,0.6692,0.6266,0.0037),position="tray",
#>accel=survival_jack1)

Jackknife confidence intervals

Jackknife resampling is another method for producing confidence intervals.

The equivalent to observGlmer2 is available for jackknife resampling, i.e. JackGlmer2, using the observed data frame.

Default is delete-one jackknife resampling as size = 1 and no overdispersion as quasi = F.

Because the delete-one observation jackknife resampling may be computationally intensive for large data sets, such as the Chinook salmon survival data set, the JackGlmer2 function has the option of delete-d observation jackknife resampling, for which d > 1. The rows of the observed data frame are shuffled and a block of observations of size d is deleted sequentially. For the example, delete-30 observation jackknife resampling is specified as size = 30, which deletes a block of 30 observations. Thirty observation was selected such that M is close to 1,000 (see below).

Full analysis uses all observations. Example has the first 2 observations.

#full
#>survival_jack2<- JackGlmer2(observ=chinook_survival2,dam="dam",sire="sire",response="status",
#>fam_link=binomial(link="logit"),position="tray",size=30)
#first 2
survival_jack2<- JackGlmer2(observ=chinook_survival2,dam="dam",sire="sire",response="status",
fam_link=binomial(link="logit"),position="tray",size=30,first=2)
#> [1] "2024-01-27 13:02:38 PST"
#> [1] "Removing block: 1 of 1210"
#> [1] "Removing block: 2 of 1210"
#> Time difference of 1.111731 mins
survival_jack2[1:2,]
#>    dam:sire        tray      sire       dam Residual    Total  additive    nonadd  maternal
#> 1 0.1669653 0.003631804 0.1666722 0.7870674 3.289868 4.414205 0.6666890 0.6678611 0.6203952
#> 2 0.1670151 0.003576008 0.1665160 0.7883108 3.289868 4.415286 0.6660641 0.6680604 0.6217948

Extract the jackknife confidence intervals (CI) and median for the additive genetic, non-additive genetic, and maternal values from the data frame of models produced using JackGlmer2.

The mean and the standard error of pseudo-values for each variance component are calculated (Efron and Tibshirani 1993, p.184‒188). The standard error is then used with the Student’s t distribution to provide the lower and upper limits for the confidence interval. For delete-d jackknife resampling, M degrees of freedom were used for producing the confidence interval (Martin et al. 2004): M = N / d, where N is the total number of observations and d is the number of deleted observations. Large values of M, such as 1,000, can translate to the delete-d jackknife resampling method approaching bootstrap resampling expectations (Efron and Tibshirani 1993, p. 149).

Martin, H., Westad, F. & Martens, H. (2004). Improved Jackknife Variance Estimates of Bilinear Model Parameters. COMPSTAT 2004 – Proceedings in Computational Statistics 16th Symposium Held in Prague, Czech Republic, 2004 (ed J. Antoch), pp. 261-275. Physica-Verlag HD, Heidelberg.

Default confidence interval is 95% as level = 95.

full is a vector of additive, non-additive, maternal, total, position and/or block variance components, i.e. c(additive,non-additive,maternal,total,position/block), from the raw observed variance components of observGlmer2. If there is a position and a block, c(additive,non-additive,maternal,total,position,block).

The chinook_jackS example data for the jackknife CI were produced using another model with no position effect (see the vignette for Simple Non-Normal Data Example). The raw variance components of this model additive= 0.6655, non-additive= 0.6692, maternal= 0.6266, and total= 4.4166. Typically the variance components would be from observGlmer2 above for the analysis pipeline.

data(chinook_jackS) #similar to survival_jack2, all observations, but has no tray
ciJack2(comp=chinook_jackS,position="Residual",full=c(0.6655,0.6692,0.6266,4.4166,3.2899))
#> $raw
#>   component lower  mean upper
#> 1  additive 0.570 0.662 0.755
#> 2    nonadd 0.554 0.657 0.761
#> 3  maternal 0.529 0.584 0.638
#> 4  Residual 3.328 3.328 3.328
#> 
#> $percentage
#>   component lower mean upper
#> 1  additive  12.8 14.8  16.8
#> 2    nonadd  12.5 14.7  16.9
#> 3  maternal  12.0 13.1  14.1
#> 4  Residual  73.2 74.5  75.7
#full, all observations, observGlmer2 components
#>ciJack2(comp=survival_jack2,position="tray",full=c(0.6655,0.6692,0.6266,4.4166,0.0037)) 

The raw values are presented and are converted to a percentage of the total variance for each model. Defaults are the number of decimal places to round CI raw values as rnd_r = 3 and to round the CI percent values as rnd_p = 1.

Plotting confidence intervals

The barMANA and boxMANA functions are simple plotting functions for the confidence intervals or all values, respectively, from the bootstrap and jackknife approaches. Default is to display the percentage values as type = perc. Raw values can be displayed as type = raw.

Within the functions, there are simple plot modifications available. For the y-axis, min and max values can be species as ymin and ymax, as well as the increment as yunit. Also, magnification of the axis unit as cex_yaxis and label as cex_ylab. The position of the legend can be specified as leg. Default is “topright”.

Bar plot

The barMANA function produces bar graphs with the bootstrap-t median (ciMANA2) or jackknife pseudo-value mean (ciJack2) as the top of the shaded bar and error bars covering the range of the confidence interval for each of the additive genetic, non-additive genetic, and maternal values of a phenotypic trait.

The length of the error bar can be specified in inches as bar_len.

survival_ci<- ciJack2(comp=chinook_jackS,position="Residual",
full=c(0.6655,0.6692,0.6266,4.4166,3.2899))
oldpar<- par(mfrow=c(2,1))
barMANA(ci_dat=survival_ci) #basic, top
barMANA(ci_dat=survival_ci,bar_len=0.3,yunit=4,ymax=20,cex_ylab=1.3) #modified, bottom
plot of chunk barplot
plot of chunk barplot

Different traits can also be combined on the same bar plot using trait specified in ciMANA or ciJack. The information is combined into a list object. For the example, the jackknife CI is duplicated to simulate ‘different traits’.

survival_ci1<- ciJack2(comp=chinook_jackS,position="Residual",
full=c(0.6655,0.6692,0.6266,4.4166,3.2899),trait="survival_1")
survival_ci2<- ciJack2(comp=chinook_jackS,position="Residual",
full=c(0.6655,0.6692,0.6266,4.4166,3.2899),trait="survival_2")
comb_bar<- list(raw=rbind(survival_ci1$raw,survival_ci2$raw),
percentage=rbind(survival_ci1$percentage,survival_ci2$percentage)) 
barMANA(ci_dat=comb_bar,bar_len=0.3,yunit=4,ymax=20,cex_ylab=1.3)
plot of chunk barplot-comb
plot of chunk barplot-comb

The legend is slightly off in the presented html version but is fine with the R plotting device.

Box plot

The boxMANA function produces box plots using all values for the bootstrap-t resampling data set (resampGlmer2) or jackknife resampling data set (JackGlmer2).

oldpar<- par(mfrow=c(2,1))
boxMANA(comp=chinook_bootS) #from resampGlmer2, basic, top 
boxMANA(comp=chinook_bootS,yunit=2,ymin=10,ymax=22,cex_ylab=1.3,leg="topleft") #modified, bottom
plot of chunk boxplot
plot of chunk boxplot

Different traits can also be combined on the same box plot by adding a “trait” column to the resampling data set. For the example, the bootstrap-t data frame is duplicated to simulate ‘different traits’.

chinook_bootS1<- chinook_bootS; chinook_bootS2<- chinook_bootS #from resampGlmer2
chinook_bootS1$trait<- "survival_1"; chinook_bootS2$trait<- "survival_2"
comb_boot<- rbind(chinook_bootS1,chinook_bootS2)
comb_boot$trait<- as.factor(comb_boot$trait)
boxMANA(comb_boot,yunit=2,ymin=10,ymax=22,cex_ylab=1.3,leg="topleft")
plot of chunk boxplot-comb
plot of chunk boxplot-comb

The recommended follow-up vignette is the Expert Non-Normal Data Example, covering the standard model with the ability of the user to include additional fixed and/or random effects, such as a model including environment treatments and their interactions.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.