For this vignette, we will use the final model achieved in the vignette workflow as an example.
modelFinal <- DImulti(y = c("Y1", "Y2", "Y3"), eco_func = c("NA", "UN"), time = c("time", "CS"),
unit_IDs = 1, prop = 2:5, data = simMVRM, DImodel = "AV", method = "REML",
estimate_theta = TRUE)
print(modelFinal)
#> Note:
#> Method Used = REML
#> Correlation Structure Used =
#> UN (`?nlme::corSymm()`) @ CS (`?nlme::corCompSymm()`)
#>
#> Average Term Model
#> Theta estimate(s) = Y1:0.9704, Y2:0.7538, Y3:1.0089
#>
#> Generalized least squares fit by REML
#> Model: value ~ 0 + func:time:((p1_ID + p2_ID + p3_ID + p4_ID + AV))
#> AIC BIC logLik
#> 7933.636 8107.046 -3935.818
#>
#> Multivariate Correlation Structure: General
#> Formula: ~0 | plot
#> Parameter estimate(s):
#> Correlation:
#> 1 2
#> 2 0.609
#> 3 -0.310 -0.363
#>
#> Repeated Measure Correlation Structure: Compound symmetry
#> Formula: ~0 | plot
#> Parameter estimate(s):
#> Rho
#> 0.3126024
#>
#>
#> Table: Fixed Effect Coefficients
#>
#> Beta Std. Error t-value p-value Signif
#> ------------------- -------- ----------- -------- ----------- -------
#> funcY1:time1:p1_ID -1.364 0.397 -3.431 0.0006143 ***
#> funcY2:time1:p1_ID +0.594 0.384 1.549 0.1216
#> funcY3:time1:p1_ID +0.915 0.401 2.283 0.02253 *
#> funcY1:time2:p1_ID +0.202 0.397 0.509 0.6111
#> funcY2:time2:p1_ID +2.666 0.384 6.947 5.054e-12 ***
#> funcY3:time2:p1_ID -0.542 0.401 -1.352 0.1767
#> funcY1:time1:p2_ID +4.810 0.368 13.062 1.822e-37 ***
#> funcY2:time1:p2_ID +4.523 0.355 12.737 8.871e-36 ***
#> funcY3:time1:p2_ID +6.675 0.371 17.977 4.537e-67 ***
#> funcY1:time2:p2_ID +5.052 0.368 13.720 5.335e-41 ***
#> funcY2:time2:p2_ID +2.767 0.355 7.792 1.056e-14 ***
#> funcY3:time2:p2_ID +6.811 0.371 18.343 1.514e-69 ***
#> funcY1:time1:p3_ID +2.711 0.399 6.790 1.48e-11 ***
#> funcY2:time1:p3_ID -0.498 0.384 -1.298 0.1945
#> funcY3:time1:p3_ID +3.288 0.403 8.160 5.882e-16 ***
#> funcY1:time2:p3_ID +4.467 0.399 11.187 3.234e-28 ***
#> funcY2:time2:p3_ID -3.299 0.384 -8.597 1.633e-17 ***
#> funcY3:time2:p3_ID +3.017 0.403 7.489 1.04e-13 ***
#> funcY1:time1:p4_ID -0.853 0.449 -1.897 0.05792 +
#> funcY2:time1:p4_ID +0.543 0.437 1.242 0.2143
#> funcY3:time1:p4_ID +4.890 0.452 10.811 1.649e-26 ***
#> funcY1:time2:p4_ID -2.469 0.449 -5.493 4.453e-08 ***
#> funcY2:time2:p4_ID +2.023 0.437 4.628 3.926e-06 ***
#> funcY3:time2:p4_ID +3.562 0.452 7.875 5.58e-15 ***
#> funcY1:time1:AV +2.560 0.905 2.828 0.004729 **
#> funcY2:time1:AV +4.283 0.529 8.091 1.021e-15 ***
#> funcY3:time1:AV +6.424 0.998 6.436 1.531e-10 ***
#> funcY1:time2:AV +31.946 0.905 35.291 3.302e-212 ***
#> funcY2:time2:AV +2.885 0.529 5.450 5.669e-08 ***
#> funcY3:time2:AV +19.229 0.998 19.264 5.916e-76 ***
#>
#> Signif codes: 0-0.001 '***', 0.001-0.01 '**', 0.01-0.05 '*', 0.05-0.1 '+', 0.1-1.0 ' '
#>
#> Degrees of freedom: 2016 total; 1986 residual
#> Residual standard error: 1.948135
#>
#> $Multivariate
#> Marginal variance covariance matrix
#> [,1] [,2] [,3]
#> [1,] 4.9095 2.3475 -1.3752
#> [2,] 2.3475 3.0305 -1.2650
#> [3,] -1.3752 -1.2650 4.0087
#> Standard Deviations: 2.2157 1.7408 2.0022
#>
#> $`Repeated Measure`
#> Marginal variance covariance matrix
#> [,1] [,2]
#> [1,] 4.6330 1.4483
#> [2,] 1.4483 4.6330
#> Standard Deviations: 2.1524 2.1524
#>
#> $Combined
#> Marginal variance covariance matrix
#> Y1:1 Y1:2 Y2:1 Y2:2 Y3:1 Y3:2
#> Y1:1 3.79520 1.18640 2.30980 0.72204 -1.17640 -0.36776
#> Y1:2 1.18640 3.79520 0.72204 2.30980 -0.36776 -1.17640
#> Y2:1 2.30980 0.72204 3.79520 1.18640 -1.37740 -0.43059
#> Y2:2 0.72204 2.30980 1.18640 3.79520 -0.43059 -1.37740
#> Y3:1 -1.17640 -0.36776 -1.37740 -0.43059 3.79520 1.18640
#> Y3:2 -0.36776 -1.17640 -0.43059 -1.37740 1.18640 3.79520
#> Standard Deviations: 1.9481 1.9481 1.9481 1.9481 1.9481 1.9481
To predict for any data from this model, which has custom class DImulti, we use the predict() function, which is formatted as below, where object is the DImulti model object, newdata is a dataframe or tibble containing the community designs that you wish to predict from, if left NULL then the data used to train the model will be predicted from instead, and stacked is a boolean which determines whether the output from this function will be given in a stacked/long format (TRUE) or wide format (FALSE).
The first option for prediction is to simply provide the model object to the function to predict from the dataframe we used to train it (simMVRM). By default, the prediction dataframe is output in a stacked format, as it is more commonly used for plotting than a wide output.
#> plot Yvalue Ytype
#> 1 1 -1.3637130 Y1:1
#> 2 1 0.2021854 Y1:2
#> 3 1 0.5944749 Y2:1
#> 4 1 2.6663312 Y2:2
#> 5 1 0.9148420 Y3:1
#> 6 1 -0.5415428 Y3:2
If we would rather a wide output, which can be easier to infer from without plotting, we can set stacked = FALSE.
#> plot Y1:1 Y1:2 Y2:1 Y2:2 Y3:1 Y3:2
#> 1 1 -1.363713 0.2021854 0.5944749 2.666331 0.914842 -0.5415428
#> 2 2 -1.363713 0.2021854 0.5944749 2.666331 0.914842 -0.5415428
#> 3 3 -1.363713 0.2021854 0.5944749 2.666331 0.914842 -0.5415428
#> 4 4 4.809671 5.0522785 4.5225251 2.766658 6.675103 6.8106584
#> 5 5 4.809671 5.0522785 4.5225251 2.766658 6.675103 6.8106584
#> 6 6 4.809671 5.0522785 4.5225251 2.766658 6.675103 6.8106584
We can also provide some subset of the original dataset rather than using it all.
#> plot Yvalue Ytype
#> 1 1 -1.3637130 Y1:1
#> 2 1 0.5944749 Y2:1
#> 3 1 0.9148420 Y3:1
#> 4 4 4.8096710 Y1:1
#> 5 4 4.5225251 Y2:1
#> 6 4 6.6751033 Y3:1
#> 7 7 2.7111060 Y1:1
#> 8 7 -0.4979964 Y2:1
#> 9 7 3.2877219 Y3:1
#> 10 10 -0.8527184 Y1:1
#> 11 10 0.5428615 Y2:1
#> 12 10 4.8902225 Y3:1
#> 13 21 2.9630332 Y1:1
#> 14 21 3.0631656 Y2:1
#> 15 21 5.9112410 Y3:1
Or we can use a dataset which follows the same format as simMVRM but is entirely new data. If no information is supplied for which ecosystem functions or time points from which you wish to predict, then all will be included automatically.
newSim <- data.frame(plot = c(1, 2),
p1 = c(0.25, 0.6),
p2 = c(0.25, 0.2),
p3 = c(0.25, 0.1),
p4 = c(0.25, 0.1))
predict(modelFinal, newdata = newSim)
#> plot Yvalue Ytype
#> 1 1 2.368193 Y1:1
#> 2 1 14.817156 Y1:2
#> 3 1 3.033833 Y2:1
#> 4 1 2.213418 Y2:2
#> 5 1 6.557144 Y3:1
#> 6 1 11.039639 Y3:2
#> 7 2 1.134829 Y1:1
#> 8 2 11.380369 Y1:2
#> 9 2 2.612857 Y2:1
#> 10 2 2.932915 Y2:2
#> 11 2 4.722584 Y3:1
#> 12 2 7.743832 Y3:2
Otherwise, only the ecosystem functions/time points specified will be predicted from. As our dataset is in a wide format, we will need to supply some arbitrary value to our desired ecosystem function column.
newSim <- data.frame(plot = c(1, 2),
p1 = c(0.25, 0.6),
p2 = c(0.25, 0.2),
p3 = c(0.25, 0.1),
p4 = c(0.25, 0.1),
Y1 = 0)
predict(modelFinal, newdata = newSim)
#> plot Yvalue Ytype
#> 1 1 2.368193 Y1:1
#> 2 1 14.817156 Y1:2
#> 3 2 1.134829 Y1:1
#> 4 2 11.380369 Y1:2
In the case that some information is missing from this new data, the function will try to set a value for the column and will inform the user through a warning printed to the console.
newSim <- data.frame(p1 = c(0.25, 0.6),
p2 = c(0.25, 0.2),
p3 = c(0.25, 0.1),
p4 = c(0.25, 0.1))
predict(modelFinal, newdata = newSim)
#> Warning in predict.DImulti(modelFinal, newdata = newSim): The column containing
#> unit_IDs has not been supplied through newdata. This column is required as a
#> grouping factor for the covarying responses, although its value does not matter
#> as there is no between subject effect included. Defaulting to row numbers.
#> plot Yvalue Ytype
#> 1 1 2.368193 Y1:1
#> 2 1 14.817156 Y1:2
#> 3 1 3.033833 Y2:1
#> 4 1 2.213418 Y2:2
#> 5 1 6.557144 Y3:1
#> 6 1 11.039639 Y3:2
#> 7 2 1.134829 Y1:1
#> 8 2 11.380369 Y1:2
#> 9 2 2.612857 Y2:1
#> 10 2 2.932915 Y2:2
#> 11 2 4.722584 Y3:1
#> 12 2 7.743832 Y3:2
You may wish to merge your predictions to your newdata dataframe for plotting, printing, or further analysis. As the function DImulti(), and as a consequence, the function predict.DImulti(), sorts the data it is provided, to ensure proper labelling, you may not be able to directly use cbind() to append the predictions to your dataset. In this case, ensure the unit_IDs column contains unique identifiers for your data rows and that you specify stacked to correctly match your data layout. Then use the function merge().
newSim <- data.frame(plot = c(1, 2),
p1 = c(0.25, 0.6),
p2 = c(0.25, 0.2),
p3 = c(0.25, 0.1),
p4 = c(0.25, 0.1))
preds <- predict(modelFinal, newdata = newSim, stacked = FALSE)
merge(newSim, preds, by = "plot")
#> plot p1 p2 p3 p4 Y1:1 Y1:2 Y2:1 Y2:2 Y3:1
#> 1 1 0.25 0.25 0.25 0.25 2.368193 14.81716 3.033833 2.213418 6.557144
#> 2 2 0.60 0.20 0.10 0.10 1.134829 11.38037 2.612857 2.932915 4.722584
#> Y3:2
#> 1 11.039639
#> 2 7.743832
In the case that your newdata contains non-unique unit_IDs values and stacked = FALSE, any rows with common unit_IDs will be aggregated using the mean() function.
newSim <- data.frame(plot = c(1, 1),
p1 = c(0.25, 0.6),
p2 = c(0.25, 0.2),
p3 = c(0.25, 0.1),
p4 = c(0.25, 0.1))
predict(modelFinal, newdata = newSim, stacked = FALSE)
#> plot Y1:1 Y1:2 Y2:1 Y2:2 Y3:1 Y3:2
#> 1 1 1.751511 13.09876 2.823345 2.573166 5.639864 9.391736