EcoDiet explored on a realistic example

The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

Heloise Thero, Pierre-Yves Hernvann

2024-03-25

The introduction employed a simplistic expemple of food web to familiarize the user with the basic commands and options of the EcoDiet package. Here we will use a more realistic example (although still artificial!) to run the different EcoDiet configurations, compare their results and hence higlight the complementarity in the different data used.

The data corresponds to 10 trophic groups with stomach content data, and very distinct isotopic measures.

Yes, we are aware that isotopic data is usually messier, but isn’t it a beautiful plot?

X	Cod	Pout	Sardine	Shrimps	Crabs	Bivalves	Worms	Zooplankton
Cod	0	0	0	0	0	0	0	0
Pout	1	0	0	0	0	0	0	0
Sardine	9	0	0	0	0	0	0	0
Shrimps	4	4	29	0	24	0	0	0
Crabs	1	24	0	0	0	0	0	0
Bivalves	0	3	0	0	11	0	0	0
Worms	16	30	0	1	24	0	0	0
Zooplankton	0	27	6	3	0	0	0	0
Phytoplankton	0	0	14	10	0	16	0	20
Detritus	0	0	0	12	19	18	18	0
full	21	30	29	19	29	27	18	20

	group	d13C	d15N
1	Cod	-12.94144	19.18913
2	Cod	-14.96070	20.23939
3	Cod	-13.77822	19.48809
31	Pout	-13.47127	18.57353
32	Pout	-13.16888	17.58714
33	Pout	-14.23085	17.38938
61	Sardine	-14.56111	16.95231
62	Sardine	-15.04729	17.15197
63	Sardine	-14.63688	16.90906

The configuration without literature data

We define the configuration we are in, and preprocess the data:

literature_configuration <- FALSE

data <- preprocess_data(biotracer_data = realistic_biotracer_data,
                        trophic_discrimination_factor = c(0.8, 3.4),
                        literature_configuration = literature_configuration,
                        stomach_data = realistic_stomach_data)
#> The model will investigate the following trophic links:
#>               Bivalves Cod Crabs Detritus Phytoplankton Pout Sardine Shrimps
#> Bivalves             0   0     1        0             0    1       0       0
#> Cod                  0   0     0        0             0    0       0       0
#> Crabs                0   1     0        0             0    1       0       0
#> Detritus             1   0     1        0             0    0       0       1
#> Phytoplankton        1   0     0        0             0    0       1       1
#> Pout                 0   1     0        0             0    0       0       0
#> Sardine              0   1     0        0             0    0       0       0
#> Shrimps              0   1     1        0             0    1       1       0
#> Worms                0   1     1        0             0    1       0       1
#> Zooplankton          0   0     0        0             0    1       1       1
#>               Worms Zooplankton
#> Bivalves          0           0
#> Cod               0           0
#> Crabs             0           0
#> Detritus          1           0
#> Phytoplankton     0           1
#> Pout              0           0
#> Sardine           0           0
#> Shrimps           0           0
#> Worms             0           0
#> Zooplankton       0           0

In this configuration, priors are set for each trophic link identified as plausible by the user but the priors are not informed by literature data, and are thus uninformative:

plot_prior(data, literature_configuration)

The marginal prior distributions have different shape depending on the variables:

it is flat or uniform for \(\eta\), the probabilities that a trophic link exists (all the probabilities of existence are thus equiprobable),
the marginal distributions for each diet proportion \(\Pi\) are peaking at zero, although the joint distribution for \(\Pi\)s is a flat Dirichlet prior, because all the diet proportions must sum to one.

plot_prior(data, literature_configuration, pred = "Pout")

We define the model, and test if it compiles well with a few iterations and adaptation steps:

filename <- "mymodel.txt"
write_model(file.name = filename, literature_configuration = literature_configuration, print.model = F)
mcmc_output <- run_model(filename, data, run_param="test")
#> 
#> Processing function input....... 
#> 
#> Done. 
#>  
#> Compiling model graph
#>    Resolving undeclared variables
#>    Allocating nodes
#> Graph information:
#>    Observed stochastic nodes: 316
#>    Unobserved stochastic nodes: 125
#>    Total graph size: 1104
#> 
#> Initializing model
#> 
#> Adaptive phase, 500 iterations x 3 chains 
#> If no progress bar appears JAGS has decided not to adapt 
#>  
#> 
#>  Burn-in phase, 500 iterations x 3 chains 
#>  
#> 
#> Sampling from joint posterior, 500 iterations x 3 chains 
#>  
#> 
#> Calculating statistics.......
#> Warning in doTryCatch(return(expr), name, parentenv, handler): At least one Rhat
#> value could not be calculated.
#> 
#> Done.
#> 
#>   /!\ Convergence warning:
#> Out of the 51 variables, 14 variables have a Gelman-Rubin statistic > 1.1.
#> You may consider modifying the model run settings.
#> The variables with the poorest convergence are: PI[5,1], PI[4,1], PI[8,2], PI[9,8], PI[7,2], PI[6,2], PI[1,6], PI[10,7], PI[10,8], PI[5,7].
#> JAGS output for model 'mymodel.txt', generated by jagsUI.
#> Estimates based on 3 chains of 1000 iterations,
#> adaptation = 500 iterations (sufficient),
#> burn-in = 500 iterations and thin rate = 1,
#> yielding 1500 total samples from the joint posterior. 
#> MCMC ran for 0.304 minutes at time 2024-03-25 09:50:32.
#> 
#>              mean     sd    2.5%     50%   97.5% overlap0 f  Rhat n.eff
#> eta[4,1]    0.659  0.089   0.478   0.664   0.820    FALSE 1 1.009   228
#> eta[5,1]    0.583  0.090   0.406   0.584   0.754    FALSE 1 1.026    83
#> eta[3,2]    0.089  0.058   0.011   0.078   0.236    FALSE 1 1.006   512
#> eta[6,2]    0.087  0.059   0.011   0.077   0.232    FALSE 1 1.002   752
#> eta[7,2]    0.444  0.101   0.251   0.442   0.644    FALSE 1 1.008   240
#> eta[8,2]    0.212  0.085   0.074   0.202   0.401    FALSE 1 1.002   876
#> eta[9,2]    0.737  0.091   0.536   0.744   0.889    FALSE 1 1.016   128
#> eta[1,3]    0.388  0.085   0.231   0.385   0.558    FALSE 1 1.004   443
#> eta[4,3]    0.650  0.082   0.476   0.652   0.800    FALSE 1 1.001   894
#> eta[8,3]    0.811  0.067   0.666   0.816   0.923    FALSE 1 1.003   625
#> eta[9,3]    0.811  0.068   0.656   0.817   0.921    FALSE 1 1.003  1006
#> eta[1,6]    0.123  0.057   0.037   0.115   0.255    FALSE 1 1.000  1500
#> eta[3,6]    0.784  0.072   0.633   0.789   0.905    FALSE 1 1.001   954
#> eta[8,6]    0.155  0.064   0.055   0.148   0.301    FALSE 1 1.000  1500
#> eta[9,6]    0.970  0.029   0.896   0.978   0.999    FALSE 1 1.006  1111
#> eta[10,6]   0.879  0.055   0.756   0.885   0.962    FALSE 1 1.003  1486
#> eta[5,7]    0.495  0.089   0.325   0.495   0.673    FALSE 1 1.005   359
#> eta[8,7]    0.970  0.030   0.891   0.978   0.999    FALSE 1 1.000  1500
#> eta[10,7]   0.232  0.073   0.103   0.230   0.391    FALSE 1 1.017   122
#> eta[4,8]    0.620  0.105   0.400   0.623   0.815    FALSE 1 0.999  1500
#> eta[5,8]    0.518  0.108   0.307   0.516   0.726    FALSE 1 1.005   402
#> eta[9,8]    0.100  0.064   0.011   0.089   0.256    FALSE 1 1.030    74
#> eta[10,8]   0.198  0.085   0.063   0.190   0.376    FALSE 1 1.007   274
#> eta[4,9]    0.950  0.047   0.825   0.964   0.998    FALSE 1 1.000  1500
#> eta[5,10]   0.954  0.044   0.845   0.967   0.999    FALSE 1 1.001  1500
#> PI[4,1]     0.572  0.374   0.000   0.557   1.000    FALSE 1 2.687     4
#> PI[5,1]     0.428  0.374   0.000   0.443   1.000    FALSE 1 2.687     4
#> PI[3,2]     0.178  0.219   0.000   0.047   0.640    FALSE 1 1.181    15
#> PI[6,2]     0.104  0.166   0.000   0.020   0.610    FALSE 1 1.497     9
#> PI[7,2]     0.459  0.301   0.006   0.445   0.951    FALSE 1 1.548     7
#> PI[8,2]     0.099  0.162   0.000   0.000   0.549    FALSE 1 2.006     5
#> PI[9,2]     0.161  0.138   0.000   0.132   0.487    FALSE 1 1.091    27
#> PI[1,3]     0.175  0.214   0.000   0.062   0.696    FALSE 1 1.115    22
#> PI[4,3]     0.202  0.148   0.000   0.186   0.515    FALSE 1 1.012   189
#> PI[8,3]     0.288  0.196   0.007   0.264   0.712    FALSE 1 1.039    59
#> PI[9,3]     0.334  0.258   0.000   0.291   0.879    FALSE 1 1.008   609
#> PI[1,6]     0.012  0.039   0.000   0.000   0.154    FALSE 1 1.459    13
#> PI[3,6]     0.316  0.193   0.001   0.302   0.712    FALSE 1 1.015   154
#> PI[8,6]     0.061  0.127   0.000   0.000   0.464    FALSE 1 1.197    18
#> PI[9,6]     0.320  0.207   0.016   0.289   0.761    FALSE 1 1.009   230
#> PI[10,6]    0.292  0.203   0.012   0.254   0.748    FALSE 1 1.010   213
#> PI[5,7]     0.281  0.172   0.000   0.292   0.592    FALSE 1 1.253    12
#> PI[8,7]     0.533  0.212   0.064   0.574   0.893    FALSE 1 1.158    18
#> PI[10,7]    0.187  0.279   0.000   0.004   0.864    FALSE 1 1.412     9
#> PI[4,8]     0.242  0.202   0.000   0.210   0.708    FALSE 1 1.013   605
#> PI[5,8]     0.199  0.223   0.000   0.120   0.733    FALSE 1 1.046    51
#> PI[9,8]     0.178  0.253   0.000   0.037   0.816    FALSE 1 1.904     6
#> PI[10,8]    0.382  0.268   0.000   0.399   0.894    FALSE 1 1.365     9
#> PI[4,9]     1.000  0.000   1.000   1.000   1.000    FALSE 1    NA     1
#> PI[5,10]    1.000  0.000   1.000   1.000   1.000    FALSE 1    NA     1
#> deviance  866.740 11.260 845.144 866.363 890.824    FALSE 1 1.022    96
#> 
#> **WARNING** Rhat values indicate convergence failure. 
#> Rhat is the potential scale reduction factor (at convergence, Rhat=1). 
#> For each parameter, n.eff is a crude measure of effective sample size. 
#> 
#> overlap0 checks if 0 falls in the parameter's 95% credible interval.
#> f is the proportion of the posterior with the same sign as the mean;
#> i.e., our confidence that the parameter is positive or negative.
#> 
#> DIC info: (pD = var(deviance)/2) 
#> pD = 62.1 and DIC = 928.88 
#> DIC is an estimate of expected predictive error (lower is better).

You should now try to run the model until it converges (it should take around half an hour to run, so we won’t do it in this vignette):

mcmc_output <- run_model(filename, data, run_param="normal", parallelize = T)

Here are the figures corresponding to the results that have converged:

plot_results(mcmc_output, data)

plot_results(mcmc_output, data, pred = "Pout")

You can also plot the results for specific prey if you want a clearer figure:

plot_results(mcmc_output, data, pred = "Pout", 
             variable = "PI", prey = c("Bivalves", "Worms"))

The configuration with literature data

We now change the configuration to add literature data to the model:

literature_configuration <- TRUE

realistic_literature_diets_path <- system.file("extdata", "realistic_literature_diets.csv",
                                               package = "EcoDiet")
realistic_literature_diets <- read.csv(realistic_literature_diets_path)
knitr::kable(realistic_literature_diets)

X	Cod	Pout	Sardine	Shrimps	Crabs	Bivalves	Worms	Zooplankton	Phytoplankton	Detritus
Cod	0.0000000	0.0000000	0.0000000	0.0000000	0.0000000	0.0000000	0.0	0.0	0	0
Pout	0.4275065	0.0000000	0.0000000	0.0000000	0.0000000	0.0000000	0.0	0.0	0	0
Sardine	0.3603675	0.0000000	0.0000000	0.0000000	0.0000000	0.0000000	0.0	0.0	0	0
Shrimps	0.0300737	0.5295859	0.0002143	0.0000000	0.0082107	0.0000000	0.0	0.0	0	0
Crabs	0.1410430	0.3332779	0.0000000	0.0000000	0.0000000	0.0000000	0.0	0.0	0	0
Bivalves	0.0000000	0.0006130	0.0000000	0.0000000	0.3441081	0.0000000	0.0	0.0	0	0
Worms	0.0410093	0.1023676	0.0000000	0.0171336	0.4435377	0.0000000	0.0	0.0	0	0
Zooplankton	0.0000000	0.0341557	0.7381375	0.9121505	0.0000000	0.0000000	0.0	0.0	0	0
Phytoplankton	0.0000000	0.0000000	0.2616482	0.0000610	0.0000000	0.9966847	0.0	1.0	0	0
Detritus	0.0000000	0.0000000	0.0000000	0.0706550	0.2041434	0.0033153	1.0	0.0	0	0
pedigree	0.8000000	0.1000000	0.5000000	0.3000000	0.7000000	0.1000000	0.2	0.2	1	1

data <- preprocess_data(biotracer_data = realistic_biotracer_data,
                        trophic_discrimination_factor = c(0.8, 3.4),
                        literature_configuration = literature_configuration,
                        stomach_data = realistic_stomach_data,
                        literature_diets = realistic_literature_diets,
                        nb_literature = 12,
                        literature_slope = 0.5)
#> The model will investigate the following trophic links:
#>               Bivalves Cod Crabs Detritus Phytoplankton Pout Sardine Shrimps
#> Bivalves             0   0     1        0             0    1       0       0
#> Cod                  0   0     0        0             0    0       0       0
#> Crabs                0   1     0        0             0    1       0       0
#> Detritus             1   0     1        0             0    0       0       1
#> Phytoplankton        1   0     0        0             0    0       1       1
#> Pout                 0   1     0        0             0    0       0       0
#> Sardine              0   1     0        0             0    0       0       0
#> Shrimps              0   1     1        0             0    1       1       0
#> Worms                0   1     1        0             0    1       0       1
#> Zooplankton          0   0     0        0             0    1       1       1
#>               Worms Zooplankton
#> Bivalves          0           0
#> Cod               0           0
#> Crabs             0           0
#> Detritus          1           0
#> Phytoplankton     0           1
#> Pout              0           0
#> Sardine           0           0
#> Shrimps           0           0
#> Worms             0           0
#> Zooplankton       0           0

Now we see that the prior distributions are informed by the literature data:

when the literature diet input is > 0, the trophic link probabilities \(\eta\) are shifted toward one. Here this is the case for all prey but we could imagine that the user identify a species as a plausible prey whereas it has not been observed being consumed by the predator in the literature. In that case, the literature diet of 0 would drive \(\eta\) toward 0.
the average prior for the diet proportions \(\Pi\) is directly the literature diet input.

plot_prior(data, literature_configuration)

plot_prior(data, literature_configuration, pred = "Pout")

Again, we verify that the model compiles well:

filename <- "mymodel_literature.txt"
write_model(file.name = filename, literature_configuration = literature_configuration, print.model = F)
mcmc_output <- run_model(filename, data, run_param="test")
#> 
#> Processing function input....... 
#> 
#> Done. 
#>  
#> Compiling model graph
#>    Resolving undeclared variables
#>    Allocating nodes
#> Graph information:
#>    Observed stochastic nodes: 316
#>    Unobserved stochastic nodes: 125
#>    Total graph size: 1594
#> 
#> Initializing model
#> 
#> Adaptive phase, 500 iterations x 3 chains 
#> If no progress bar appears JAGS has decided not to adapt 
#>  
#> 
#>  Burn-in phase, 500 iterations x 3 chains 
#>  
#> 
#> Sampling from joint posterior, 500 iterations x 3 chains 
#>  
#> 
#> Calculating statistics.......
#> Warning in doTryCatch(return(expr), name, parentenv, handler): At least one Rhat
#> value could not be calculated.
#> 
#> Done.
#> 
#>   /!\ Convergence warning:
#> Out of the 51 variables, 16 variables have a Gelman-Rubin statistic > 1.1.
#> You may consider modifying the model run settings.
#> The variables with the poorest convergence are: PI[10,8], PI[9,8], PI[10,7], PI[5,8], PI[8,7], PI[8,6], PI[3,6], PI[3,2], PI[10,6], PI[5,7].
#> JAGS output for model 'mymodel_literature.txt', generated by jagsUI.
#> Estimates based on 3 chains of 1000 iterations,
#> adaptation = 500 iterations (sufficient),
#> burn-in = 500 iterations and thin rate = 1,
#> yielding 1500 total samples from the joint posterior. 
#> MCMC ran for 0.251 minutes at time 2024-03-25 09:50:53.
#> 
#>              mean     sd    2.5%     50%   97.5% overlap0 f  Rhat n.eff
#> eta[4,1]    0.670  0.084   0.494   0.676   0.815    FALSE 1 1.000  1500
#> eta[5,1]    0.611  0.088   0.436   0.613   0.774    FALSE 1 0.999  1500
#> eta[3,2]    0.354  0.079   0.207   0.353   0.516    FALSE 1 1.013   160
#> eta[6,2]    0.358  0.085   0.204   0.354   0.532    FALSE 1 1.001  1500
#> eta[7,2]    0.608  0.081   0.452   0.608   0.764    FALSE 1 1.000  1500
#> eta[8,2]    0.444  0.085   0.283   0.443   0.618    FALSE 1 1.004   578
#> eta[9,2]    0.818  0.067   0.680   0.824   0.927    FALSE 1 1.002   884
#> eta[1,3]    0.521  0.079   0.371   0.519   0.678    FALSE 1 1.001  1443
#> eta[4,3]    0.715  0.071   0.567   0.718   0.843    FALSE 1 1.000  1500
#> eta[8,3]    0.848  0.057   0.720   0.854   0.942    FALSE 1 1.005   406
#> eta[9,3]    0.850  0.055   0.724   0.857   0.942    FALSE 1 1.003   690
#> eta[1,6]    0.157  0.061   0.055   0.152   0.292    FALSE 1 1.000  1500
#> eta[3,6]    0.792  0.068   0.641   0.799   0.904    FALSE 1 1.001  1500
#> eta[8,6]    0.192  0.067   0.075   0.187   0.332    FALSE 1 1.011   183
#> eta[9,6]    0.971  0.028   0.897   0.979   0.999    FALSE 1 0.999  1500
#> eta[10,6]   0.881  0.055   0.763   0.887   0.966    FALSE 1 1.000  1500
#> eta[5,7]    0.563  0.080   0.405   0.564   0.716    FALSE 1 1.006   349
#> eta[8,7]    0.971  0.028   0.893   0.980   0.999    FALSE 1 1.012   844
#> eta[10,7]   0.360  0.077   0.219   0.358   0.517    FALSE 1 1.009   233
#> eta[4,8]    0.675  0.092   0.477   0.677   0.836    FALSE 1 1.000  1500
#> eta[5,8]    0.594  0.099   0.392   0.596   0.787    FALSE 1 1.006   338
#> eta[9,8]    0.228  0.083   0.089   0.221   0.407    FALSE 1 1.004   483
#> eta[10,8]   0.310  0.090   0.152   0.306   0.505    FALSE 1 1.004   410
#> eta[4,9]    0.956  0.043   0.840   0.970   0.999    FALSE 1 1.000  1500
#> eta[5,10]   0.958  0.042   0.850   0.970   0.999    FALSE 1 1.004   801
#> PI[4,1]     0.149  0.297   0.000   0.000   1.000    FALSE 1 1.130    27
#> PI[5,1]     0.851  0.297   0.000   1.000   1.000    FALSE 1 1.130    27
#> PI[3,2]     0.087  0.160   0.000   0.003   0.564    FALSE 1 1.253    14
#> PI[6,2]     0.258  0.261   0.000   0.199   0.801    FALSE 1 1.031    74
#> PI[7,2]     0.527  0.283   0.004   0.521   0.978    FALSE 1 1.002   662
#> PI[8,2]     0.025  0.098   0.000   0.000   0.226    FALSE 1 1.193    37
#> PI[9,2]     0.102  0.119   0.000   0.054   0.401    FALSE 1 1.156    20
#> PI[1,3]     0.324  0.304   0.000   0.269   0.999    FALSE 1 1.091    40
#> PI[4,3]     0.095  0.118   0.000   0.046   0.398    FALSE 1 1.012   607
#> PI[8,3]     0.007  0.025   0.000   0.000   0.077    FALSE 1 1.150    67
#> PI[9,3]     0.574  0.298   0.000   0.609   0.997    FALSE 1 1.067    53
#> PI[1,6]     0.044  0.127   0.000   0.000   0.527    FALSE 1 1.159    32
#> PI[3,6]     0.343  0.253   0.003   0.298   0.871    FALSE 1 1.253    12
#> PI[8,6]     0.242  0.301   0.000   0.042   0.864    FALSE 1 1.393     9
#> PI[9,6]     0.262  0.220   0.000   0.222   0.751    FALSE 1 1.066    35
#> PI[10,6]    0.109  0.191   0.000   0.007   0.647    FALSE 1 1.245    15
#> PI[5,7]     0.122  0.152   0.000   0.062   0.535    FALSE 1 1.203    21
#> PI[8,7]     0.082  0.174   0.000   0.001   0.629    FALSE 1 1.422    11
#> PI[10,7]    0.796  0.263   0.008   0.902   1.000    FALSE 1 1.456    10
#> PI[4,8]     0.138  0.190   0.000   0.035   0.680    FALSE 1 1.024   334
#> PI[5,8]     0.120  0.176   0.000   0.010   0.589    FALSE 1 1.451     9
#> PI[9,8]     0.282  0.294   0.000   0.172   0.870    FALSE 1 1.986     5
#> PI[10,8]    0.461  0.380   0.000   0.496   1.000    FALSE 1 2.076     5
#> PI[4,9]     1.000  0.000   1.000   1.000   1.000    FALSE 1    NA     1
#> PI[5,10]    1.000  0.000   1.000   1.000   1.000    FALSE 1    NA     1
#> deviance  866.597 11.080 846.845 865.760 889.928    FALSE 1 1.002  1500
#> 
#> **WARNING** Rhat values indicate convergence failure. 
#> Rhat is the potential scale reduction factor (at convergence, Rhat=1). 
#> For each parameter, n.eff is a crude measure of effective sample size. 
#> 
#> overlap0 checks if 0 falls in the parameter's 95% credible interval.
#> f is the proportion of the posterior with the same sign as the mean;
#> i.e., our confidence that the parameter is positive or negative.
#> 
#> DIC info: (pD = var(deviance)/2) 
#> pD = 61.4 and DIC = 927.997 
#> DIC is an estimate of expected predictive error (lower is better).

You should now try to run the model until it converges (it should take around half an hour to run, so we won’t do it in this vignette):

mcmc_output <- run_model(filename, data, run_param=list(nb_iter=100000, nb_burnin=50000, nb_thin=50, nb_adapt=50000), parallelize = T)

Here are the figures corresponding to the results that have converged:

plot_results(mcmc_output, data)

plot_results(mcmc_output, data, pred = "Pout")

You can save the figures as PNG using:

plot_results(mcmc_output, data, pred = "Pout", save = TRUE, save_path = ".")

Last, if you want to explore further in detail the a posteriori distribution of your parameters \(\Pi\) and \(\eta\), you can run the following code line, which will store the values for all iterations into a data frame.

reshape_mcmc(mcmc_output, data)

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.