The present document shows how to conduct a sensitivity analyses and calibration exercises on the simulation models included in package medfate
. The document is written assuming that the user is familiarized with the basic water balance model (i.e. function spwb
). The aim of the exercises presented here are:
spwb()
model parameters are more influential in determining stand transpiration and plant drought stress.As an example data set we will use here the same data sets provided to illustrate simulation functions in medfate. This data set consists of a forest with two tree species (Pinus halepensis/T1_54 and Quercus ilex/T2_68) and one shrub species (Quercus coccifera/S1_65 or Kermes oak).
We begin by loading the package and the example forest data:
## Loading required package: sp
## $ID
## [1] "1"
##
## $patchsize
## [1] 10000
##
## $treeData
## Species N DBH Height Z50 Z95
## 1 54 168 37.55 800 750 3000
## 2 68 384 14.60 660 750 3000
##
## $shrubData
## Species Cover Height Z50 Z95
## 1 65 3.75 80 300 1500
##
## $herbCover
## [1] 10
##
## $herbHeight
## [1] 20
##
## attr(,"class")
## [1] "forest" "list"
We also load the example weather data set and the default species parametrization:
We then initialize a soil with four layers (default values of texture, bulk density and rock content) and the species input parameters for simulation function spwb()
:
examplesoil1 = soil(defaultSoilParams(4))
x1 = forest2spwbInput(exampleforestMED,examplesoil1, SpParamsMED, control = defaultControl())
Although it is not necessary, we make an initial call to the model (spwb()
) with the default parameter settings:
## Initial soil water content (mm): 291.257
## Initial snowpack content (mm): 0
## Performing daily simulations .....................................done.
## Final soil water content (mm): 271.212
## Final snowpack content (mm): 0
## Change in soil water content (mm): -20.0449
## Soil water balance result (mm): -20.0449
## Change in snowpack water content (mm): 0
## Snowpack water balance result (mm): 0
## Water balance components:
## Precipitation (mm) 513
## Rain (mm) 462 Snow (mm) 51
## Interception (mm) 86 Net rainfall (mm) 376
## Infiltration (mm) 418 Runoff (mm) 10 Deep drainage (mm) 100
## Soil evaporation (mm) 45 Transpiration (mm) 293
Function spwb()
will be implicitly called multiple times in the sensitivity analyses and calibration analyses that we will illustrate below.
Model sensitivity analyses are used to investigate how variation in the output of a numerical model can be attributed to variations of its input factors. Input factors are elements that can be changed before model execution and may affect its output. They can be model parameters, initial values of state variables, boundary conditions or the input forcing data (Pianosi et al. 2016).
According to Saltelli et al. (2016), there are three main purposes of sensitivity analyses:
Here we will mostly interested in ranking parameters according to different objectives. We will take as input factors three plant traits (leaf area index, fine root distribution and the water potential corresponding to a reduction in plant conductance) in the three plant cohorts (species), so that nine model parameters will be studied. The following shows the initial values for those parameters:
## [1] 0.8167012 0.7977952 0.0653033
## [1] 750 750 300
## [1] -2 -3 -4
In the following code we define a vector of parameter names (using naming rules of function modifyInputParams()
) as well as the input variability space, defined by the minimum and maximum parameter values:
#Parameter names of interest
parNames = c("T1_54/LAI_live", "T2_68/LAI_live", "S1_65/LAI_live",
"T1_54/Z50", "T2_68/Z50", "S1_65/Z50",
"T1_54/Psi_Extract", "T2_68/Psi_Extract", "S1_65/Psi_Extract")
#Parameter minimum and maximum values
parMin = c(0.1,0.1,0.1,
100,100,100,
-7,-7,-7)
parMax = c(2,2,2,
1000,1000,1000,
-1,-1,-1)
In sensitivity analyses, model output is summarized into a single variable whose variation is to be analyzed. Pianosi et al. (2016) distinguish two types of model output functions:
Here we will use examples of both kinds. First, we define a function that, given a simulation result, calculates total transpiration (mm) over the simulated period (one year):
## [1] 292.5647
Another prediction function can focus on plant drought stress. We define a function that, given a simulation result, calculates the average drought stress of plants (measured using the water stress index) over the simulated period:
sf_stress<-function(x) {
lai <- x$spwbInput$above$LAI_live
lai_p <- lai/sum(lai)
stress <- spwb_stress(x, index="WSI", draw = F)
mean(sweep(stress,2, lai_p, "*"), na.rm=T)
}
sf_stress(S1)
## [1] 18.32941
Sensitivity analysis requires model output functions whose parameters are the input factors to be studied. \[\begin{equation}
y = g(\mathbf{x}) = g(x_1, x_2, \dots, x_n)
\end{equation}\] where \(y\) is the output, \(g\) is the output function and \(\mathbf{x} = \{x_1, x_2, \dots, x_n\}\) is the vector of parameter input factors. Functions of_transp
and of_stress
take simulation results as input, not values of input factors. Instead, we need to define functions that take trait values as input, run the soil plant water balance model and return the desired prediction or performance statistic. These functions can be generated using the function factory optimization_function()
. The following code defines one of such functions focusing on total transpiration:
of_transp<-optimization_function(parNames = parNames,
x = x1, soil = examplesoil1,
meteo = examplemeteo,
latitude = 41.82592, elevation = 100,
summary_function = sf_transp)
Note that we provided all the data needed for simulations as input to optimization_function()
, as well as the names of the parameters to study and the function sf_transp
. The resulting object of_transp
is a function itself, which we can call with parameter values (or sets of parameter values) as input:
## [1] 55.93924
## [1] 359.395
It is important to understand the steps that are done when we call of_transp()
:
of_transp()
calls spwb()
using all the parameters specified in its construction (i.e. in the call to the function factory), except for the input factors indicated in parNames
, which are specified as input at the time of calling of_transp()
.sf_transp()
and the output of this last function is returned as output of of_transp()
.We can build a similar model output function, in this case focusing on plant stress (note that the only difference in the call to the factory is in the specification of sf_stress
as summary function, instead of sf_transp
).
of_stress<-optimization_function(parNames = parNames,
x = x1, soil = examplesoil1,
meteo = examplemeteo,
latitude = 41.82592, elevation = 100,
summary_function = sf_stress)
of_stress(parMin)
## [1] 0.7265393
## [1] 108.5529
As mentioned above, another kind of output function can be the evaluation of model performance. Here we will assume that performance in terms of predictability of soil water content is desired; and use a data set of ‘observed’ values (actually simulated values with gaussian error) as reference:
## SWC ETR E_T1_54 E_T2_68 FMC_T1_54 FMC_T2_68
## 2001-01-01 0.3000484 2.213630 0.1405273 0.17762272 114.5560 80.86158
## 2001-01-02 0.3021858 2.557506 0.3274197 0.31564033 114.5929 81.10532
## 2001-01-03 0.3018332 1.028869 0.2427104 0.17259014 114.8712 80.78809
## 2001-01-04 0.3008440 1.865832 0.1386069 0.09992405 114.4544 81.17683
## 2001-01-05 0.3014843 1.922079 0.4472215 0.22262168 114.8581 81.11235
## 2001-01-06 0.3016900 2.426368 0.1628993 0.26955241 114.5257 80.63737
where soil water content dynamics is in column SWC
. The model fit to observed data can be measured using the Nash-Sutcliffe coefficient, which we calculate for the initial run using function evaluation_metric()
:
## [1] 0.9324636
A call to evaluation_metric()
provides the coefficient given a model simulation result, but is not a model output function as we defined above. Analogously to the measures of total transpiration and average plant stress, we can use a function factory to define a model output function that takes input factors as inputs, runs the model and performs the evaluation:
of_eval<-optimization_evaluation_function(parNames = parNames,
x = x1, soil = examplesoil1,
meteo = examplemeteo, latitude = 41.82592, elevation = 100,
measuredData = exampleobs, type = "SWC",
metric = "NSE")
Function of_eval()
stores internally both the data needed for conducting simulations and the data needed for evaluating simulation results, so that we only need to provide values for the input factors:
## [1] 0.9128893
## [1] 0.8316012
Sensitivity analysis is either referred to as local or global, depending on variation of input factors is studied with respect to some initial parameter set (local) or the whole space of input factors is taken into account (global). Here we will conduct global sensitivity analyses using package sensitivity (Ioss et al. 2020):
## Registered S3 method overwritten by 'sensitivity':
## method from
## print.src dplyr
This package provides a suite of approaches to global sensitivity analysis. Among them, we will follow the Elementary Effect Test implemented in function morris()
. We call this function to analyze sensitivity of total transpiration simulated by spwb()
to input factors (500 runs are done, so be patient):
sa_transp <- morris(of_transp, parNames, r = 50,
design = list(type = "oat", levels = 10, grid.jump = 3),
binf = parMin, bsup = parMax, scale=TRUE, verbose=FALSE)
Apart from indicating the sampling design to sample the input factor space, the call to morris()
includes the response model function (in our case of_transp
), the parameter names and parameter value boundaries (i.e. parMin
and parMax
).
##
## Call:
## morris(model = of_transp, factors = parNames, r = 50, design = list(type = "oat", levels = 10, grid.jump = 3), binf = parMin, bsup = parMax, scale = TRUE, verbose = FALSE)
##
## Model runs: 500
## mu mu.star sigma
## T1_54/LAI_live 88.560833 97.432304 96.641969
## T2_68/LAI_live 115.428602 115.738104 95.599569
## S1_65/LAI_live 119.938300 119.938300 90.217958
## T1_54/Z50 -24.852219 33.057904 35.644799
## T2_68/Z50 -26.408773 33.898221 32.100491
## S1_65/Z50 -15.704972 17.950179 18.110924
## T1_54/Psi_Extract -7.245209 7.245209 7.503915
## T2_68/Psi_Extract -10.524463 10.524463 11.323048
## S1_65/Psi_Extract -5.432094 5.432094 5.443987
mu.star
values inform about the mean of elementary effects of each \(i\) factor and can be used to rank all the input factors, whereas sigma
inform about the degree of interaction of the \(i\)-th factor with others. According to the result of this sensitivity analysis, leaf area index (LAI_live
) parameters are the most relevant to determine total transpiration, much more than fine root distribution (Z50
) and the water potentials corresponding to whole-plant conductance reduction (i.e. Psi_Extract
).
We can run the same sensitivity analysis but focusing on the input factors relevant for predicted plant drought stress (i.e. using of_stress
as model output function):
sa_stress <- morris(of_stress, parNames, r = 50,
design = list(type = "oat", levels = 10, grid.jump = 3),
binf = parMin, bsup = parMax, scale=TRUE, verbose=FALSE)
##
## Call:
## morris(model = of_stress, factors = parNames, r = 50, design = list(type = "oat", levels = 10, grid.jump = 3), binf = parMin, bsup = parMax, scale = TRUE, verbose = FALSE)
##
## Model runs: 500
## mu mu.star sigma
## T1_54/LAI_live 187.85982 188.19217 109.29336
## T2_68/LAI_live 184.08751 184.08751 119.81780
## S1_65/LAI_live 157.01856 157.01856 97.04261
## T1_54/Z50 42.12187 43.49016 35.80496
## T2_68/Z50 59.57615 61.71122 61.79803
## S1_65/Z50 42.03816 44.59840 53.03594
## T1_54/Psi_Extract -114.42951 114.42951 84.05694
## T2_68/Psi_Extract -82.73608 82.73608 65.15790
## S1_65/Psi_Extract -61.00125 61.00125 58.56132
Again, LAI values parameters are the most relevant, but closely followed by the water potentials corresponding to whole-plant conductance reduction (i.e. Psi_Extract
), which appear as more relevant than parameters of fine root distribution (Z50
).
Finally, we can study the contribution of input factors to model performance in terms of soil water content dynamics (i.e. using of_eval
as model output function):
sa_eval <- morris(of_eval, parNames, r = 50,
design = list(type = "oat", levels = 10, grid.jump = 3),
binf = parMin, bsup = parMax, scale=TRUE, verbose=FALSE)
##
## Call:
## morris(model = of_eval, factors = parNames, r = 50, design = list(type = "oat", levels = 10, grid.jump = 3), binf = parMin, bsup = parMax, scale = TRUE, verbose = FALSE)
##
## Model runs: 500
## mu mu.star sigma
## T1_54/LAI_live -56.971813 57.002351 69.319800
## T2_68/LAI_live -36.656677 37.390494 52.514755
## S1_65/LAI_live -24.121369 24.129905 28.433808
## T1_54/Z50 70.768482 70.798509 81.853942
## T2_68/Z50 60.500445 60.500445 71.047382
## S1_65/Z50 35.728383 35.739419 53.786266
## T1_54/Psi_Extract 4.315353 4.315353 9.302024
## T2_68/Psi_Extract 3.744051 3.744051 7.349683
## S1_65/Psi_Extract 1.425251 1.425251 2.988838
Contrary to the previous cases, the contribution of LAI parameters is similar to that of parameters of fine root distribution (Z50
), which appear as more relevant than the water potentials corresponding to whole-plant conductance reduction (i.e. Psi_Extract
).
By model calibration we mean here the process of finding suitable parameter values (or suitable parameter distributions) given a set of observations. Hence, the idea is to optimize the correspondence between model predictions and observations by changing model parameter values.
To simplify our analysis and avoid problems of parameter identifiability, we focus here on the calibration of parameter Z50
of fine root distribution. Below we redefine vectors parNames
, parMin
, and parMax
; and we specify a vector of initial values.
#Parameter names of interest
parNames = c("T1_54/Z50", "T2_68/Z50", "S1_65/Z50")
#Parameter minimum and maximum values
parMin = c(100,100,100)
parMax = c(1000,1000,1000)
parIni = x1$below$Z50
In order to run calibration analyses we need to define an objective function. Many evaluation metrics could be used but it is common practice to use likelihood functions . We can use the function factory optimization_evaluation_function
and the ‘observed’ data to this aim, but in this case we specify a log-likelihood with Gaussian error as the evaluation metric for of_eval()
.
Model calibration can be performed using a broad range of approaches. Many of them - including simulated annealing, genetic algorithms, gradient methods, etc. - focus on the maximization or minimization of the objective function. To illustrate this common approach, we will use function optim
from package stats, which provides several optimization methods. In particular we will use “L-BFGS-B”, which is the “BFGS” quasi-Newton method published by Broyden, Fletcher, Goldfarb and Shanno, modified by the inclusion of minimum and maximum boundaries. By default, function optim
performs the minimization of the objective function (here of_eval
), but we can specify a negative value for control parameter fnscale
to turn the process into a maximization of maximum likelihood:
opt_cal = optim(parIni, of_eval, method = "L-BFGS-B",
control = list(fnscale = -1), verbose = FALSE)
The calibration result is the following:
## $par
## [1] 749.3816 749.3827 299.8915
##
## $value
## [1] 1321.386
##
## $counts
## function gradient
## 4 4
##
## $convergence
## [1] 0
##
## $message
## [1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"
Note that the optimized parameters are close to those of Z50
in the original x1
.
## Z50 opt_cal$par
## T1_54 750 749.3816
## T2_68 750 749.3827
## S1_65 300 299.8915
This occurs because these default values were used to generate the ‘observed’ data in exampleobs
, which contains a small amount of non-systematic error.
As an example of a more sophisticated model calibration, we will conduct a Bayesian calibration analysis using package BayesianTools (Hartig et al. 2019):
In a Bayesian analysis one evaluates how the uncertainty in model parameters is changed (hopefully reduced) after observing some data, because observed values do not have the same likelihood under all regions of the parameter space. For a Bayesian analysis we need to specify a (log)likelihood function and the prior distribution (i.e. the initial uncertainty) of the input factors. The central object in the BayesianTools package is the BayesianSetup
. This class, created by calls to createBayesianSetup()
, contains the information about the model to be fit (likelihood), and the priors for model parameters. In absence of previous data, we specify a uniform distribution between the minimum and maximum values, which in the BayesianTools package can be done using function createUniformPrior()
:
prior <- createUniformPrior(parMin, parMax, parIni)
mcmc_setup <- createBayesianSetup(likelihood = of_eval,
prior = prior,
names = parNames)
Function createBayesianSetup()
automatically creates the posterior and various convenience functions for the Markov Chain Monte Carlo (MCMC) samplers. The runMCMC()
function is the main wrapper for all other implemented MCMC functions. Here we call it with three chains of 3000 iterations each.
mcmc_out <- runMCMC(
bayesianSetup = mcmc_setup,
sampler = "DEzs",
settings = list(iterations = 1000, nrChains = 9))
By default runMCMC()
uses parallel computation, but the calibration process is nevertheless rather slow.
A summary function is provided to inspect convergence results and correlation between parameters:
## # # # # # # # # # # # # # # # # # # # # # # # # #
## ## MCMC chain summary ##
## # # # # # # # # # # # # # # # # # # # # # # # # #
##
## # MCMC sampler: DEzs
## # Nr. Chains: 9
## # Iterations per chain: 1000
## # Rejection rate: 0.815
## # Effective sample size: 381
## # Runtime: 1621.505 sec.
##
## # Parameters
## psf MAP 2.5% median 97.5%
## T1_54/Z50 1.043 627.839 580.046 741.939 989.060
## T2_68/Z50 1.035 889.719 569.645 734.803 981.448
## S1_65/Z50 1.030 480.428 121.669 555.615 965.331
##
## ## DIC: -2633.085
## ## Convergence
## Gelman Rubin multivariate psrf: 1.049
##
## ## Correlations
## T1_54/Z50 T2_68/Z50 S1_65/Z50
## T1_54/Z50 1.000 -0.884 -0.097
## T2_68/Z50 -0.884 1.000 -0.066
## S1_65/Z50 -0.097 -0.066 1.000
According to the Gelman-Rubin diagnostic, the convergence can be accepted because the multivariate potential scale reduction factor was ≤ 1.1. We can plot the Markov Chains and the posterior density distribution of parameters that they generate using:
We can also plot the marginal prior and posterior density distributions for each parameter. In this case, we see a similar
Z50
distribution for the two trees, which is more informative than the prior distribution. In contrast, the posterior distribution of Z50
for the kermes oak remains as uncertain as the prior one. This happens because the LAI value of kermes oak is low, so that it has small influence on soil water dynamics regardless of its root distribution.
Plots can also be produced to display the correlation between parameter values.
Here it can be observed the large correlation between
Z50
of the two tree cohorts. Since their LAI values are similar, a similar effect on soil water depletion can be obtained to some extent by exchanging their fine root distribution.
Posterior model prediction distributions can be obtained if we take samples from the Markov chains and use them to perform simulations (here we use sample size of 99 but a larger value is preferred).
## T1_54/Z50 T2_68/Z50 S1_65/Z50
## [1,] 957.5108 342.2688 475.9529
## [2,] 803.5055 250.0038 610.7945
## [3,] 979.6384 379.2092 951.8966
## [4,] 634.6976 910.1615 698.5915
## [5,] 161.8533 641.4018 286.4261
## [6,] 641.1469 758.0742 376.1948
To this aim, medfate includes function multiple_runs()
that allows running a simulation model with a matrix of parameter values. For example, the following code runs spwb()
with all combinations of fine root distribution specified in s
.
MS = multiple_runs(s, x = x1, soil = examplesoil1, meteo = examplemeteo,
latitude = 41.82592, elevation = 100, verbose = FALSE)
Function multiple_runs()
determines the model to be called inspecting the class of x
(here x1
is a spwbInput
). Once we have conducted the simulations we can inspect the posterior distribution of several prediction variables, for example total transpiration:
plot(density(unlist(lapply(MS, sf_transp))), main = "Posterior transpiration",
xlab = "Total transpiration (mm)")
or average plant drought stress:
plot(density(unlist(lapply(MS, sf_stress))),
xlab = "Average plant stress", main="Posterior stress")
Finally, we can use object prior
to generate another sample under the prior parameter distribution, perform simulations:
s_prior = prior$sampler(99)
colnames(s_prior)<- parNames
MS_prior = multiple_runs(s_prior, x = x1, soil = examplesoil1, meteo = examplemeteo,
latitude = 41.82592, elevation = 100, verbose = FALSE)
and compare the prior prediction uncertainty with the posterior prediction uncertainty for the same output variables:
plot(density(unlist(lapply(MS_prior, sf_transp))), main = "Transpiration",
xlab = "Total transpiration (mm)",
xlim = c(280,295), ylim = c(0,1.2))
lines(density(unlist(lapply(MS, sf_transp))), col = "red")
legend("topleft", legend = c("Prior", "Posterior"),
col = c("black", "red"), lty=1, bty="n")
plot(density(unlist(lapply(MS_prior, sf_stress))), main = "Plant stress",
xlab = "Average plant stress",
xlim = c(0,30), ylim = c(0,0.3))
lines(density(unlist(lapply(MS, sf_stress))), col = "red")
legend("topleft", legend = c("Prior", "Posterior"), col = c("black", "red"), lty=1, bty="n")