dlmwwbe: Dynamic Linear Model for Wastewater-based epidemiology with missing values

The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

This package dlmwwbe (Dynamic Linear Model for Wastewater-based Epidemilogy with Missing Data) contains two main function pdlm() (Predictive Dynamic Linear Model) and dllm() (Dynamic Local Level Model). The first one is to fit a dynamic linear model for forecasting the clinical positive cases (or other similar data) using lagged clinical and wastewater data. The second one is to fit a local level model for smoothing the noisy wastewater data. For more details, see papers here.

Dynamic Local Level Model

First, we implement dllm() on the wastewater data collected between 2022 - 2024 in Twin Cities metro area in Minnesota, United States. For the detail of the data, see papers. There are two possible structures: 1. all wastewater data share a single latent state (S = ‘univariate’). 2. Each wastewater data has its own latent sate (S = ‘kvariate’). For a better model fitting, we encourage the use of the log transformation of the original wastewater data by setting the argument log10 = TRUE. This is because the data better approximates the normality assumption in practice. Other transformation might be necessary depending on the nature of the data. The summary() provides some information of the fitted model.

Consider both wastewater data have their individual latent state. The average of the smoother is provided.

data_TC <- wastewater[wastewater$Code == "TC",]
data_TC$SampleDate <- as.Date(data_TC$SampleDate)
fit <- dllm(
  equal.state.var=FALSE,
  equal.obs.var=FALSE,
  log10=TRUE,
  data = data_TC,
  date = "SampleDate",
  obs_cols = c("ORFlab", "Nlab"),
  S = c('kvariate')
)

summary(fit)

#> Dynamic Linear Model Results
#> ============================
#> Model Structure:
#> - Observation columns: ORFlab, Nlab 
#> - Latent state dimension: 2 
#> - Observation dimension: 2 
#> Fit Statistics:
#> Log-likelihood: 239.71
#> AIC: -471.41
#> BIC: -456.16
#> Convergence: Achieved
#> 
#> Estimated Parameters:
#> sigmaW_state1 sigmaW_state2 sigmaV_ORFlab   sigmaV_Nlab 
#>   0.001783840   0.002291088   0.012992110   0.013768954

plot(fit, type='smoother', conf.int = TRUE)

Predictive Dynamic Linear Model

Next, we implement pdlm() on the clinical and wastewater data. Different number of lags are demonstrated. For a better model fitting, we encourage the use of the log transformation of the original wastewater data by setting the argument log10 = TRUE (and add \(1\) for the positive count cases for a valid transformation). The summary() provides some information of the fitted model.

Here, We consider \(0\) and \(2\) lags and plot them along with the observed data on its original scale.

data_TC <- wastewaterhealthworker[wastewaterhealthworker$Code == "TC",]
data_TC$SampleDate <- as.Date(data_TC$SampleDate)
fit <- pdlm(
  data=data_TC,
  formula=HealthWorkerCaseCount ~ WW.tuesday + WW.thursday,
  lags=0,
  log10=TRUE,
  date = NULL,
  equal.state.var = TRUE,
  equal.obs.var = FALSE,
  auto_init = TRUE,
  control = list(maxit = 100))
summary(fit)

#> Dynamic Linear Model Results
#> ============================
#> Model Structure:
#> - Formula: HealthWorkerCaseCount ~ WW.tuesday + WW.thursday 
#> - Lags: 0 
#> - Latent states dimension: 4 
#> - Observation dimension: 3 
#> Fit Statistics:
#> Log-likelihood: 156.52
#> AIC: -297.05
#> BIC: -281.91
#> Convergence: Failed
#> 
#> Estimated Parameters:
#> $Ft
#> [1]  0.004656649 -0.672803533  0.369672144
#> 
#> $Wt
#> [1] 0.005005797 0.015570229
#> 
#> $Vt
#> [1] 0.04743422 0.00998462 1.00000000

plot(fit, conf.int=TRUE)

data_TC <- wastewaterhealthworker[wastewaterhealthworker$Code == "TC",]
data_TC$SampleDate <- as.Date(data_TC$SampleDate)
fit <- pdlm(
  data=data_TC,
  formula=HealthWorkerCaseCount ~ WW.tuesday + WW.thursday,
  lags=2,
  log10=TRUE,
  date = NULL,
  equal.state.var = FALSE,
  equal.obs.var = TRUE,
  auto_init = TRUE,
  control = list(maxit = 100))
summary(fit)

#> Dynamic Linear Model Results
#> ============================
#> Model Structure:
#> - Formula: HealthWorkerCaseCount ~ WW.tuesday + WW.thursday 
#> - Lags: 2 
#> - Latent states dimension: 10 
#> - Observation dimension: 3 
#> Fit Statistics:
#> Log-likelihood: 154.26
#> AIC: -284.52
#> BIC: -261.82
#> Convergence: Failed
#> 
#> Estimated Parameters:
#> $Ft
#> [1] -0.01790474  0.23394470 -0.42737618  0.11332519  0.27906550  0.14315139
#> [7] -0.17890679
#> 
#> $Wt
#> [1] 0.43087811 0.03331713 0.03668333
#> 
#> $Vt
#> [1] 0.0027912527 0.0005323801

plot(fit, conf.int=TRUE)

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.