The “covid19.analytics” R package allows users to obtain live* worldwide data from the novel CoronaVirus Disease originally reported in 2019, CoViD-19, as published by the JHU CCSE repository [1], as well as, provide basic analysis tools and functions to investigate these datasets.
The goal of this package is to make the latest data promptly available to researchers and the scientific community.
The covid19.data()
function allows users to obtain realtime data about the CoViD19 reported cases from the JHU’s CCSE repository, in the following modalities: * “aggregated” data for the latest day, with a great ‘granularity’ of geographical regions (ie. cities, provinces, states, countries) * “time series” data for larger accumulated geographical regions (provinces/countries)
The datasets also include information about the different categories (status) “confirmed”/“deaths”/“recovered” of the cases reported daily per country/region/city.
This data-acquisition function, will first attempt to retrieve the data directly from the JHU repository with the latest updates. If for what ever reason this fails (eg. problems with the connection) the package will load a preserved “image” of the data which is not the latest one but it will still allow the user to explore this older dataset. In this way, the package offers a more robust and resilient approach to the quite dynamical situation with respect to data availability and integrity.
argument | description |
---|---|
aggregated
|
latest number of cases aggregated by country |
Time Series data | |
ts-confirmed
|
time series data of confirmed cases |
ts-deaths
|
time series data of fatal cases |
ts-recovered
|
time series data of recovered cases |
ts-ALL
|
all time series data combined |
Deprecated data formats | |
ts-dep-confirmed
|
time series data of confirmed cases as originally reported (deprecated) |
ts-dep-deaths
|
time series data of deaths as originally reported (deprecated) |
ts-dep-recovered
|
time series data of recovered cases as originally reported (deprecated) |
Combined | |
ALL
|
all of the above |
The covid19.genomic.data()
allows users to obtain the covid19’s genomic sequencing data from NCBI [2].
In addition to the access and retrieval of the data, the package includes some basics functions to estimate totals per regions/country/cities, growth rates and daily changes in the reported number of cases.
We are working in the development of modelling capabilities. A preliminary prototype has been included and can be accessed using the generate.SIR.model
function, which implements a simple SIR (Susceptible-Infected-Recovered) ODE model using the actual data of the virus.
We will continue working on adding and developing new features to the package, in particular modelling and predictive capabilities.
Function | Description | Main Type of Output |
---|---|---|
Data Acquisition | ||
covid19.data
|
obtain live* worldwide data for covid19 virus, from the JHU’s CCSE repository [1] | return dataframes/list with the collected data |
covid19.genomic.data
|
obtain covid19’s genomic sequencing data from NCBI [2] |
list, with the RNA seq data in the “$NC_045512.2” entry
|
Analysis | ||
report.summary
|
summarize the current situation, will download the latest data and summarize different quantities | on screen table and static plots (pie and bar plots) with reported information, can also output the tables into a text file |
tots.per.location
|
compute totals per region and plot time series for that specific region/country | static plots: data + models (exp/linear, Poisson, Gamma), mosaic and histograms when more than one location are selected |
growth.rate
|
compute changes and growth rates per region and plot time series for that specific region/country | static plots: data + models (linear,Poisson,Exp), mosaic and histograms when more than one location are selected |
Graphics and Visualization | ||
total.plts
|
plots in a static and interactive plot total number of cases per day | static and interactive plot |
live.map
|
generates an interactive map displaying cases around the world | static and interactive plot |
Modelling | ||
generate.SIR.model
|
generates a SIR (Susceptible-Infected-Recovered) model | list containing the fits for the SIR model |
plot.SIR.model
|
plot the results from the SIR model | static and interactive plots |
For using the “covi19.analytics” package, first you will need to install it.
The stable version can be downloaded from the CRAN repository:
To obtain the development version you can get it from the github repository, i.e.
# need devtools for installing from the github repo
install.packages("devtools")
# install bioC.logs
devtools::install_github("mponce0/covid19.analytics")
For using the package, either the stable or development version, just load it using the library function:
# obtain all the records combined for "confirmed", "deaths" and "recovered" cases -- *aggregated* data
covid19.data.ALLcases <- covid19.data()
# obtain time series data for "confirmed" cases
covid19.confirmed.cases <- covid19.data("ts-confirmed")
# reads all possible datasets, returning a list
covid19.all.datasets <- covid19.data("ALL")
# reads the latest aggregated data
covid19.ALL.agg.cases <- covid19.data("aggregated")
# reads time series data for casualties
covid19.TS.deaths <- covid19.data("ts-deaths")
Read covid19’s genomic data
# a quick function to overview top cases per region for time series and aggregated records
report.summary()
# save the tables into a text file named 'covid19-SummaryReport_CURRENTDATE.txt' where CURRRENTDATE is the actual date
report.summary(saveReport=TRUE)
# totals for confirmed cases for "Ontario"
tots.per.location(covid19.confirmed.cases,geo.loc="Ontario")
# total for confirmed cases for "Canada"
tots.per.location(covid19.confirmed.cases,geo.loc="Canada")
# total nbr of deaths for "Mainland China"
tots.per.location(covid19.TS.deaths,geo.loc="China")
# total nbr of confirmed cases in Hubei including a confidence band based on moving average
tots.per.location(covid19.confirmed.cases,geo.loc="Hubei", confBnd=TRUE)
The figures show the total number of cases for different cities (provinces/regions) and countries: one the upper plot in log-scale with a linear fit to an exponential law and in linear scale in the bottom panel. Details about the models are included in the plot, in particular the growth rate which in several cases appears to be around 1.2+ as predicted by some models. Notice that in the case of Hubei, the values is closer to 1, as the dispersion of the virus has reached its logistic asymptote while in other cases (e.g. Germany and Italy –for the presented dates–) is still well above 1, indicating its exponential growth.
IMPORTANT Please notice that the “linear exponential” modelling function implements a simple (naive) and straight-forward linear regression model, which is not optimal for exponential fits. The reason is that the errors for large values of the dependent variable weight much more than those for small values when apply the exponential function to go back to the original model. Nevertheless for the sake of a quick interpretation is OK, but one should bare in mind the implications of this simplification.
We also provide two additional models, as shown in the figures above, using the Generalized Linear Model glm()
function, using a Poisson and Gamma family function. In particular, the tots.per.location
function will determine when is possible to automatically generate each model and display the information in the plot as well as details of the models in the console.
# read the time series data for all the cases
all.data <- covid19.data('ts-ALL')
# run on all the cases
tots.per.location(all.data,"Japan")
It is also possible to run the tots.per.location
(and growth.rate
) functions, on the whole data set, for which a quite large but complete mosaic figure will be generated, e.g.
# read time series data for confirmed cases
TS.data <- covid19.data("ts-confirmed")
# compute changes and growth rates per location for all the countries
growth.rate(TS.data)
# compute changes and growth rates per location for 'Italy'
growth.rate(TS.data,geo.loc="Italy")
# compute changes and growth rates per location for 'Italy' and 'Germany'
growth.rate(TS.data,geo.loc=c("Italy","Germany"))
The previous figures show on the upper panel the number of changes on a daily basis in linear scale (thin line, left y-axis) and log scale (thicker line, right y-axis), while the bottom panel displays the growth rate for the given country/region/city.
Combining multiple geographical locations:
# obtain Time Series data
TSconfirmed <- covid19.data("ts-confirmed")
# explore different combinations of regions/cities/countries
# when combining different locations, heatmaps will also be generated comparing the trends among these locations
growth.rate(TSconfirmed,geo.loc=c("Italy","Canada","Ontario","Quebec","Uruguay"))
growth.rate(TSconfirmed,geo.loc=c("Hubei","Italy","Spain","United States","Canada","Ontario","Quebec","Uruguay"))
growth.rate(TSconfirmed,geo.loc=c("Hubei","Italy","Spain","US","Canada","Ontario","Quebec","Uruguay")
# retrieve time series data
TS.data <- covid19.data("ts-ALL")
# static and interactive plot
totals.plt(TS.data)
# retrieve aggregated data
data <- covid19.data("aggregated")
# interactive map of aggregated cases -- with more spatial resolution
live.map(data)
# or
live.map()
# interactive map of the time series data of the confirmed cases with less spatial resolution, ie. aggregated by country
live.map(covid19.data("ts-confirmed"))
Interactive examples can be seen at https://mponce0.github.io/covid19.analytics/
# read time series data for confirmed cases
data <- covid19.data("ts-confirmed")
# run a SIR model for a given geographical location
generate.SIR.model(data,"Hubei", t0=1,t1=15)
generate.SIR.model(data,"Germany",tot.population=83149300)
generate.SIR.model(data,"Uruguay", tot.population=3500000)
generate.SIR.model(data,"Ontario",tot.population=14570000)
# the function will aggregate data for a geographical location, like a country with multiple entries
generate.SIR.model(data,"Canada",tot.population=37590000)
# modelling the spread for the whole world, storing the model and generating an interactive visualization
world.SIR.model <- generate.SIR.model(data,"ALL", t0=1,t1=15, tot.population=7.8e9, staticPlt=FALSE)
# plotting and visualizing the model
plot.SIR.model(world.SIR.model,"World",interactiveFig=TRUE,fileName="world.SIR.model")
(*) Data can be upto 24 hs delayed wrt the latest updates.
[1] 2019 Novel CoronaVirus CoViD-19 (2019-nCoV) Data Repository by Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE) https://github.com/CSSEGISandData/COVID-19
[2] Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome NCBI Reference Sequence: NC_045512.2 https://www.ncbi.nlm.nih.gov/nuccore/NC_045512.2
Source-Credit: CDC/ Alissa Eckert, MS; Dan Higgins, MAMS