The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

ViralEntropR: A Computational Pipeline for Entropy-Informed Detection of Emerging Viral Variants

Implements an entropy-informed pipeline for detecting emerging variants in viral amino acid sequence data, extending prior clustering-based approaches including hemagglutinin clustering methods (Li et al., 2015) <doi:10.1142/9789814667944_0018>. Provides a fully vectorized FASTA preprocessing toolkit covering header parsing, two-pass date and country extraction, ambiguous-residue filtering, and integer encoding under a 25-symbol amino acid alphabet. Computes per-site Shannon entropy across user-defined cumulative, sliding, or disjoint temporal partitions and clusters per-site entropy values using Gaussian mixture models via 'mclust' (Scrucca et al., 2016) <doi:10.32614/RJ-2016-021>. Quantifies temporal distributional shifts between partitions using the Hellinger distance (van der Vaart, 1998) <doi:10.1017/CBO9780511802256>, and detects temporal change points non-parametrically using energy statistics (Matteson and James, 2014) <doi:10.1080/01621459.2013.849605> via 'ecp' or wild binary segmentation (Fryzlewicz, 2014) <doi:10.1214/14-AOS1245> via 'HDcpDetect'. Per-site amino-acid frequency tables and entropy trajectory plots characterize sequence composition and evolutionary dynamics across time. A configurable multi-variant simulation engine generates synthetic sequence time series with known ground truth for benchmarking detection pipelines. A curated dataset of SARS-CoV-2 Variants of Concern and Variants of Interest with associated lineage and surveillance metadata is included, along with a bundled National Center for Biotechnology Information (NCBI) Spike protein sample and vignettes demonstrating the full workflow.

Version: 0.6.2
Depends: R (≥ 3.5.0)
Imports: ggplot2 (≥ 3.4.0), grDevices, HDcpDetect, ecp, kableExtra, lubridate, magrittr, mclust, rlang, stats, stringr, utils, zoo
Suggests: Biostrings, DT, dplyr, here, knitr, readxl, rmarkdown, R.rsp, testthat (≥ 3.0.0)
Published: 2026-05-30
DOI: 10.32614/CRAN.package.ViralEntropR
Author: Vadim Tyuryaev ORCID iD [aut, cre], Jane Heffernan [aut], Hanna Jankowski [aut]
Maintainer: Vadim Tyuryaev <vadim.tyuryaev at gmail.com>
BugReports: https://github.com/vadimtyuryaev/ViralEntropR/issues
License: MIT + file LICENSE
URL: https://github.com/vadimtyuryaev/ViralEntropR, https://doi.org/10.5281/zenodo.19040165, https://vadimtyuryaev.github.io/ViralEntropR/
NeedsCompilation: no
Language: en-GB
Materials: README, NEWS
CRAN checks: ViralEntropR results

Documentation:

Reference manual: ViralEntropR.html , ViralEntropR.pdf
Vignettes: Unsupervised Recovery of SARS-CoV-2 Variant Structure via Entropy-Driven Site Selection and PAM Clustering: Precision, Recall, and F1 Evaluation Across Wild-Type and Delta-Dominated Surveillance Periods (source)
Entropy Clustering, Hellinger Distance, and Change Point Analysis for Emerging Viral Variant Detection: A Simulation Study (source)
NCBI SARS-CoV-2 Spike Protein Sequence Preprocessing: From Raw FASTA to an Analysis-Ready Integer-Encoded Matrix (source)

Downloads:

Package source: ViralEntropR_0.6.2.tar.gz
Windows binaries: r-devel: ViralEntropR_0.6.2.zip, r-release: not available, r-oldrel: ViralEntropR_0.6.2.zip
macOS binaries: r-release (arm64): ViralEntropR_0.6.2.tgz, r-oldrel (arm64): ViralEntropR_0.6.2.tgz, r-release (x86_64): ViralEntropR_0.6.2.tgz, r-oldrel (x86_64): ViralEntropR_0.6.2.tgz

Linking:

Please use the canonical form https://CRAN.R-project.org/package=ViralEntropR to link to this page.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.