The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
From sample collection to sequence upload, there is a delay of typically 1–4 weeks. This means that when you look at the latest data, the most recent weeks are always incomplete — not because fewer people were infected, but because results have not arrived yet.
If you ignore this and plot raw counts, you see a false decline in the most recent weeks. This is called right-truncation bias.
survinger fits a parametric delay distribution accounting for the fact that we can only observe delays shorter than the time elapsed since collection (right-truncation correction).
library(survinger)
data(sarscov2_surveillance)
design <- surv_design(
data = sarscov2_surveillance$sequences,
strata = ~ region,
sequencing_rate = sarscov2_surveillance$population[c("region", "seq_rate")],
population = sarscov2_surveillance$population
)
delay_fit <- surv_estimate_delay(design, distribution = "negbin")
print(delay_fit)
#> ── Reporting Delay Distribution ────────────────────────────────────────────────
#> Distribution: "negbin"
#> Strata: none (pooled)
#> Observations: 1349
#> Mean delay: 9.9 days
#>
#> # A tibble: 1 × 5
#> stratum distribution mu size converged
#> <chr> <chr> <dbl> <dbl> <lgl>
#> 1 all negbin 9.95 3.52 TRUE
plot(delay_fit)Given the fitted delay, we can ask: what fraction of sequences collected d days ago have been reported by now?
days <- c(7, 14, 21, 28)
probs <- surv_reporting_probability(delay_fit, delta = days)
data.frame(days_ago = days, prob_reported = round(probs, 3))
#> days_ago prob_reported
#> 1 7 0.403
#> 2 14 0.797
#> 3 21 0.949
#> 4 28 0.989Sequences collected 7 days ago may only be partially reported, while those from 28 days ago are nearly complete.
Nowcasting inflates observed counts by dividing by the reporting probability, giving a better estimate of the true number:
The grey bars show what has been observed; the orange line shows the delay-corrected estimate. The gap is largest in the most recent weeks.
The main inference function applies both corrections simultaneously:
adjusted <- surv_adjusted_prevalence(design, delay_fit, "BA.2.86")
print(adjusted)
#> ── Design-Weighted Delay-Adjusted Prevalence ───────────────────────────────────
#> Correction: "design:hajek+delay:direct"
#>
#> # A tibble: 26 × 9
#> time lineage n_obs_raw n_obs_adjusted prevalence se ci_lower ci_upper
#> <chr> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2024-W01 BA.2.86 53 53 0 0 0 0
#> 2 2024-W02 BA.2.86 68 68 0.00597 0.0178 0 0.0408
#> 3 2024-W03 BA.2.86 40 40 0.143 0.126 0 0.389
#> 4 2024-W04 BA.2.86 41 41 0 0 0 0
#> 5 2024-W05 BA.2.86 48 48 0 0 0 0
#> 6 2024-W06 BA.2.86 52 52 0 0 0 0
#> 7 2024-W07 BA.2.86 62 62 0.00740 0.0204 0 0.0473
#> 8 2024-W08 BA.2.86 55 55 0.0195 0.0332 0 0.0847
#> 9 2024-W09 BA.2.86 43 43 0.0261 0.0480 0 0.120
#> 10 2024-W10 BA.2.86 46 46 0.0697 0.0621 0 0.191
#> # ℹ 16 more rows
#> # ℹ 1 more variable: mean_report_prob <dbl>The mean_report_prob column shows how complete each
week’s data is. Low values indicate that the delay correction is doing
heavy lifting.
negbin (default): Handles
overdispersion well. Recommended for most settings.poisson: Use when delays are very
regular (rare).lognormal: Use when delays have a
heavy right tail.nonparametric: No distributional
assumption. Use when you have enough data and suspect the parametric
forms do not fit.These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.