ern
Jump to estimated parameter values
In order to calculate Rt from clinical reports or wastewater concentration, we need to assume a few quantities, including a generation interval distribution. The generation interval is the time between consecutive infections. Once we have a generation interval distribution, we can take a daily incidence time series and use it to estimate a daily \(\mathcal{R}_t\) time series by deconvolving incidence with the generation interval distribution using the R package EpiEstim
.
In order to estimate incidence, we need a few more pieces of information depending on the primary data source. For clinical case reports, we additionally need an incubation time distribution, which gives the time between infection and symptom onset, and a reporting delay distribution, which gives the time between symptom onset and a case report. For wastewater concentration, we need a fecal shedding distribution, which describes how individuals shed viral particles into wastewater over time during an infection.
This vignette focuses on quantifying families of distributions for the generation interval, incubation period, and fecal shedding.
Here, we give an example on estimating the different distributions for COVID-19 in Canada. The same approach may be used for other pathogens in different jurisdictions.
For COVID-19 in Canada, the most relevant distribution estimates come from analyzing Omicron infection data as the majority of lineages circulating in the country since January 2021 have been descendants of Omicron. (Manica et al. 2022) assume both the incubation period and generation interval are gamma distributed, and then they use Omicron infection data to estimate distribution parameters for each quantity, along with measures of uncertainty for each parameter. (Note that Manica et al. estimate distributions for two different generation intervals but we will use the intrinsic generation interval)
Figure 1: Table 2 from (Manica et al. 2022). Original caption: “Estimates for the incubation period, diagnostic delay, intrinsic and realized generation time, and household serial intervals of the SARS-CoV-2 Omicron variant. Reported parameters of shape and scale for the incubation period and intrinsic generation time refer to a gamma distribution. Estimates of the incubation period are dervied from the analysis of 80 participants to a single superspreading event in Norway. Data taken from Brandal et al.”
Manica et al. give estimates for the mean, shape, and scale parameters of each family of (gamma) distributions (Figure 1). However, the uncertainty in each estimated parameter is reported in terms of a credible interval, i.e. in a non-parametric way. We need each family of distributions to be specified in a parameteric way, because of how the package ern
computes \(\mathcal{R}_t\) with various sources of uncertainty, in particular the distribution uncertainty (i.e., we are not fully certain of the distribution to use). Hence, ern
generates an ensemble of simulations for the \(\mathcal{R}_t\) time series, where for each individual simulation, we draw from the specified families of distributions (among other quantities) to select one distribution of each type to use for that specific \(\mathcal{R}_t\) calculation.
The following section describes how we generated parametric specifications for the families of generation interval and incubation period distributions from the non-parameteric uncertainties given by Manica et al.
We will specify each family of gamma distributions with a mean and shape parameter (the “distribution parameters”), along with standard deviations about each of these two parameters.
We assume that each distribution parameter is itself gamma distributed. The mean of this distribution is assumed to be the corresponding mean value listed in Figure 1. In order to estimate a standard deviation for this distribution, we assume that the credible interval bounds give quantiles and use them, along with a mean (equal to the mean estimate of the focal parameter) to fit shape parameter for a gamma distribution. Once we have the fitted shape parameter, we can calculate the corresponding standard deviation.
For each family of gamma distributions, we need to fit distributions as described above over two parameters to specify it. We will perform two independent one-dimensional optimizations of the shape parameter for the distributions of the mean and the shape parameter of each gamma family.
The following code defines a function (fit_sd()
) to perform the aforementioned one-dimensional optimization:
# Fit a standard deviation for a gamma distribution
# given a mean and 95% quantiles
fit_sd <- function(mean, ci){
# Score function for optimization
# s = shape
# m = assumed mean
# ci_target = the 95% credible interval bounds, assumed to be 2.5% and 97.5% quantiles
score_fun <- function(s, m, ci_target) {
shape <- s
scale <- m/s
ci <- qgamma(c(0.025, 0.975), shape = shape, scale = scale)
return(sum((ci-ci_target)^2))
}
# Optimize over shape
opt_out <- optim(
par = 1,
fn = score_fun,
m = mean,
ci_target = ci,
lower = 0.0001, upper = 1000,
method = "Brent"
)
# Get best shape parameter found and return associated standard deviation
shape_best <- opt_out$par
sd_best <- shape_to_sd(shape_best, mean)
return(list(
sd = sd_best,
score = opt_out$value
))
}
# Helper function to compute a gamma's standard deviation given the shape and mean
shape_to_sd <- function(shape, mean) mean/sqrt(shape)
We use the method above to generate fitted parameter values for each family of distributions, which are encoded in ern::def_dist_generation_interval(pathogen = "sarscov2")
and ern::def_dist_incubation_period(pathogen = "sarscov2")
. The parameters are summarised here in Tables 1 (generation interval) and 2 (incubation period).
parameter | value type | origin | value |
---|---|---|---|
distribution mean | mean | assumed | 6.8400 |
standard deviation | fitted | 0.7486 | |
distribution shape | mean | assumed | 2.3900 |
standard deviation | fitted | 0.3573 |
parameter | value type | origin | value |
---|---|---|---|
distribution mean | mean | assumed | 3.4900 |
standard deviation | fitted | 0.1477 | |
distribution shape | mean | assumed | 8.5000 |
standard deviation | fitted | 1.8945 |
The mean fitted distributions (that is, the distributions generated by taking the mean distribution mean and mean shape parameter) are illustrated in Figures 2 (generation interval) and 3 (incubation period).
Figure 2: Generation interval distribution for the mean value of the mean and shape parameter
Figure 3: Incubation period distribution for the mean value of the mean and shape parameter
Viral concentration in wastewater has become a popular metric in tracking community transmission of respiratory illnesses. Infected individuals shed viral particles in their stools, which end up in wastewater. Wastewater samples are then tested for viral concentration. Daily changes of viral concentration allow modellers to examine trends and estimate the effective reproductive number (\(R_t\)) over time.
The process we use to estimate \(R_t\) from viral concentration in wastewater in the package ern
is as follows:
Pre-process the time series of viral concentration in wastewater to prepare for computing \(R_t\). This step involves smoothing the rather noisy viral concentration data, as well as mapping from viral concentration to numbers of infections using a scaling factor.
Specify a fecal shedding distribution which describes how infected individuals shed viral particles into wastewater over time. This article focuses on how a family of fecal shedding distributions were estimated for ern
from literature.
Perform a deconvolution of the pre-processed viral concentration time series using the fecal shedding distribution in order to obtain a daily incidence time series (see (Huisman et al)).
Specify a generation interval distribution, which describes the time interval between an individual’s infection and their transmission to another person. The way in which parameters for the family of generation intervals used in ern
are estimated is described in the distributions
vignette.
Perform a deconvolution of the daily incidence time series with the generation interval distribution to obtain the \(R_t\) time series using the R package EpiEstim
.
Individuals infected with respiratory illnesses (such as SARS-CoV-2, influenza, and RSV) shed viral particles into wastewater through their stool. In order to estimate the fecal shedding distributions needed for the \(R_t\) calculation described above, we must assume a model for the fecal shedding kinetics, which refers to the change of the individual’s shedding amount in stools as a function of time since infection.
Research has been done to understand fecal shedding kinetics. In (Nourbakhsh et al. 2021), a SEIR compartmental model was created to simulate and forecast SARS-CoV-2 viral concentration in wastewater. In this model, SARS-CoV-2 fecal shedding in wastewater comes from individuals of different clinical profiles: infectious symptomatic (denoted as \(I\)), infectious asymptomatic (denoted as \(A\)), infectious symptomatic that requires hospitalization (denoted as \(J\)), and noninfectious (denoted as \(Z\)). A diagram illustrating how individuals move through the compartmental model and contribute to wastewater fecal shedding is shown below.
In this diagram, individuals begin in the susceptible compartment (\(S\)). Once exposed to SARS-CoV-2, they move into the exposed compartment (\(E\)) where they can then move into 3 unique compartments: infectious symptomatic (\(I\)), infectious asymptomatic (\(A\)), and infectious symptomatic that requires hospitalization (\(J\)). Individuals in the \(I\) and \(A\) compartments contribute to fecal shedding in their respective compartments along with when they enter the noninfectious compartment (\(Z\)). Individuals in the \(J\) compartment do not enter into the \(Z\) compartment as it is inferred that they do not contribute to fecal shedding while hospitalized (\(H\)). See (Nourbakhsh et al. 2021) for more details on the compartment model.
Within each of the 4 compartments that contribute to fecal shedding lies a number of subcompartments of \(n_c\) (c: compartment name). According to (Nourbakhsh et al. 2021), each subcompartment sheds a different amount of viral SARS-CoV-2 into wastewater. The following figures illustrate the log fecal shedding kinetics of individuals in \(I\), \(A\), \(J\), and \(Z\) compartments.
Using these fecal shedding kinetics values for the 4 compartments, a general fecal shedding distribution for SARS-CoV-2 can be estimated for ern
. This is first done by creating dataframes representing the various fecal shedding profiles of individuals exposed to SARS-CoV-2 (\(I\) + \(Z\), \(A\) + \(Z\), \(J\)).
shed.symp = data.frame(
x = c(seq(from = 1, to = 12, by = 2),seq(from = 13, to = 36, by = 4)),
y = c(5, 7.30103, 7.083333, 6.677121, 6.30, 5.9, 5.40103,4.60103,3.90103,3.10103,2.10103,1.2)
)
shed.asymp = data.frame(
x = c(seq(from = 1, to = 10, by = 2),seq(from = 11, to = 34, by = 4)),
y = c(5, 7.30103, 7.083333, 6.677121, 6.30, 5.40103,4.60103,3.90103,3.10103,2.10103,1.2)
)
shed.severe = data.frame(
x = seq(from = 1, to = 8, by = 2),
y = c(5,7.30103,7.083333,6.677121)
)
Using these fecal shedding profiles, a mean fecal shedding distribution can be created by combining the 3 dataframes, and taking the daily mean viral shedding between the 3 shedding profiles.
fec.l = list(
shed.symp,
shed.asymp,
shed.severe
)
df = dplyr::bind_rows(fec.l) |>
dplyr::group_by(x) |>
dplyr::summarise(y = mean(y, na.rm=TRUE))
Using the stats4
package, a maximum likelihood estimation can be used to estimate a gamma distribution that fits the fecal shedding distribution along with standard error parameters. This gamma distribution can then be used in ern
to obtain incidence estimates needed to generate \(R_t\). The code below illustrates how the fitted fecal shedding gamma distribution for SARS-CoV-2 was obtained for ern
.
get_gamma_fec <- function(df){
log_likelihood = function(mean,shape){
a = shape
s = mean/shape
return(-sum(df$y*(dgamma(df$x, shape = a, scale = s, log = TRUE))))
}
mle.res = stats4::mle(minuslogl = log_likelihood,
start = list(mean = 14, shape = 2),
nobs = nrow(df))
res = stats4::summary(mle.res)
ci = stats4::confint(object = mle.res, level = 0.68)
f = list(
dist = "gamma",
mean = res@coef[[1]],
shape = res@coef[[2]],
# we assume this parameter will be sampled with
# a normal distribution (hence 68%CI ~ 2 std dev):
mean_sd = (ci[1,2] - ci[1,1])/2,
shape_sd = (ci[2,2] - ci[2,1])/2,
max = max(df$x)
)
return(f)
}
fec = get_gamma_fec(df)
#> Profiling...
fec
#> $dist
#> [1] "gamma"
#>
#> $mean
#> [1] 12.90215
#>
#> $shape
#> [1] 1.759937
#>
#> $mean_sd
#> [1] 1.136829
#>
#> $shape_sd
#> [1] 0.2665988
#>
#> $max
#> [1] 33
Below illustrates how the gamma distribution fits to the observed fecal shedding distribution for SARS-CoV-2.
parameter | value type | origin | value |
---|---|---|---|
distribution mean | mean | fitted | 12.9021 |
standard deviation | fitted | 1.1368 | |
distribution shape | mean | fitted | 1.7599 |
standard deviation | fitted | 0.2666 |
Table 4 simply summarizes all families of (gamma) distributions derived in this article.
distribution | parameter | value type | value |
---|---|---|---|
generation interval | distribution mean | mean | 6.8400 |
standard deviation | 0.7486 | ||
distribution shape | mean | 2.3900 | |
standard deviation | 0.3573 | ||
incubation period | distribution mean | mean | 3.4900 |
standard deviation | 0.1477 | ||
distribution shape | mean | 8.5000 | |
standard deviation | 1.8945 | ||
fecal shedding | distribution mean | mean | 12.9021 |
standard deviation | 1.1368 | ||
distribution shape | mean | 1.7599 | |
standard deviation | 0.2666 |
In future releases of ern
, parameter estimates for RSV and influenza will be estimated using data obtained from literature. SARS-CoV-2 parameter estimates will be routinely updated as more literature data is obtained.
Manica et al. 2022: https://www.sciencedirect.com/science/article/pii/S2666776222001405
Huisman et al 2022: https://ehp.niehs.nih.gov/doi/10.1289/EHP10050
Nourbakhsh et al. 2021: https://www.medrxiv.org/content/10.1101/2021.07.19.21260773v1