Introduction to the R package covid19br

Introduction

This vignette shows how to use the R package covid19br for downloading and exploring data from the COVID-19 pandemic in Brazil and the globe as well. The package downloads datasets from the following repositories:

The last repository has data on the COVID-19 pandemic at the global level (daily counts of confirmed cases, deaths, and recovered patients by countries and territories), and has been widely used all over the world as a reliable source of data information on the COVID-19 pandemic. The former repository, on the other hand, possesses data on the Brazilian territory by city, state, region, and national levels.

We hope that this package may be helpful to other researchers and scientists to understand and fight this terrible pandemic that has been plaguing the world.

Getting started with R package covid19br

We will get started by showing how to use the package to load into R data sets of the COVID-19 pandemic by downloading the COVID-19 data set from the official Brazilian repository https://covid.saude.gov.br

library(covid19br)
library(tidyverse)

# downloading the data (at national level):
brazil <- downloadCovid19("brazil")

# looking at the downloaded data:
glimpse(brazil)
#> Rows: 1,222
#> Columns: 9
#> $ date         <date> 2020-02-25, 2020-02-26, 2020-02-27, 2020-02-28, 2020-02-…
#> $ epi_week     <int> 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11…
#> $ newCases     <int> 0, 1, 0, 0, 1, 0, 0, 0, 1, 4, 6, 6, 6, 0, 9, 18, 25, 21, …
#> $ accumCases   <int> 0, 1, 1, 1, 2, 2, 2, 2, 3, 7, 13, 19, 25, 25, 34, 52, 77,…
#> $ newDeaths    <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ accumDeaths  <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ newRecovered <int> 0, 1, 1, 0, 1, 1, 0, 0, 1, 4, 6, 7, 6, 1, 6, 16, 23, 24, …
#> $ newFollowup  <int> 0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 7, 12, 19, 24, 28, 36, 54, …
#> $ pop          <dbl> 210147125, 210147125, 210147125, 210147125, 210147125, 21…

# plotting the accumulative number of deaths:
ggplot(brazil, aes(x = date, y = accumDeaths)) +
  geom_point() +
  geom_path()

Next, will show how to draw a plot with the daily count of new deaths along with its respective moving averarge. Here, we will use the function pracma::movavg() to compute the moving average.

library(pracma)

# computing the moving average:
brazil <- brazil %>%
  mutate(
    ma_newDeaths = movavg(newDeaths, n = 7, type = "s")
  )

# looking at the transformed data:
glimpse(brazil)
#> Rows: 1,222
#> Columns: 10
#> $ date         <date> 2020-02-25, 2020-02-26, 2020-02-27, 2020-02-28, 2020-02-…
#> $ epi_week     <int> 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11…
#> $ newCases     <int> 0, 1, 0, 0, 1, 0, 0, 0, 1, 4, 6, 6, 6, 0, 9, 18, 25, 21, …
#> $ accumCases   <int> 0, 1, 1, 1, 2, 2, 2, 2, 3, 7, 13, 19, 25, 25, 34, 52, 77,…
#> $ newDeaths    <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ accumDeaths  <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ newRecovered <int> 0, 1, 1, 0, 1, 1, 0, 0, 1, 4, 6, 7, 6, 1, 6, 16, 23, 24, …
#> $ newFollowup  <int> 0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 7, 12, 19, 24, 28, 36, 54, …
#> $ pop          <dbl> 210147125, 210147125, 210147125, 210147125, 210147125, 21…
#> $ ma_newDeaths <dbl> 0.0000000, 0.0000000, 0.0000000, 0.0000000, 0.0000000, 0.…

After computing the desired moving average, it is convenient to reorganize the data to fit the so-called tidy data format. This task can be easily done with the aid of the function pivot_long():

deaths <- brazil %>%
  select(date, newDeaths, ma_newDeaths) %>%
  pivot_longer(
    cols = c("newDeaths", "ma_newDeaths"),
    values_to = "deaths", names_to = "type"
  ) %>%
  mutate(
    type = recode(type, 
           ma_newDeaths = "moving average",
           newDeaths = "count",
    )
  )

# looking at the (tidy) data:
glimpse(deaths)
#> Rows: 2,444
#> Columns: 3
#> $ date   <date> 2020-02-25, 2020-02-25, 2020-02-26, 2020-02-26, 2020-02-27, 20…
#> $ type   <chr> "count", "moving average", "count", "moving average", "count", …
#> $ deaths <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …

# drawing the desired plot:
ggplot(deaths, aes(x = date, y=deaths, color = type)) +
  geom_point() +
  geom_path() + 
  theme(legend.position="bottom")

When dealing with epidemiological data we are often interested in computing quantities such as incidence, mortality and lethality rates. The function covid19br::add_epi_rates() can be used to add those rates to the downloaded data, as shown below:


# downloading the data (region level):
regions <- downloadCovid19("regions") 

# adding the rates to the downloaded data:
regions <- regions %>%
  add_epi_rates()

# looking at the data:
glimpse(regions)
#> Rows: 6,110
#> Columns: 13
#> $ region       <chr> "Midwest", "Midwest", "Midwest", "Midwest", "Midwest", "M…
#> $ date         <date> 2020-02-25, 2020-02-26, 2020-02-27, 2020-02-28, 2020-02-…
#> $ epi_week     <int> 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11…
#> $ newCases     <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 3, 4, …
#> $ accumCases   <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 5, 9, …
#> $ newDeaths    <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ accumDeaths  <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ newRecovered <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ newFollowup  <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ pop          <dbl> 16297074, 16297074, 16297074, 16297074, 16297074, 1629707…
#> $ incidence    <dbl> 0.000000000, 0.000000000, 0.000000000, 0.000000000, 0.000…
#> $ lethality    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ mortality    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …

The function plotly::ggplotly() can be used to draw an interactive plot as follows:

library(plotly)

p <- ggplot(regions, aes(x = date, y = mortality, color = region)) +
  geom_point() +
  geom_path()

ggplotly(p)

In our last example, we will obtain a table summarizing the for the 27 Brazilian capitals in 2023-06-30.

library(kableExtra)

cities <- downloadCovid19("cities")

capitals <- cities %>%
  filter(capital == TRUE, date == max(date)) %>%
  add_epi_rates() %>%
  select(region, state, city, newCases, newDeaths, accumCases, accumDeaths, incidence, mortality, lethality) %>%
  arrange(desc(lethality), desc(mortality), desc(incidence))

# printing the table:
capitals %>%
 kable(
    full_width = F,
    caption = "Summary of the COVID-19 pandemic in the 27 capitals of Brazilian states."
  )
Summary of the COVID-19 pandemic in the 27 capitals of Brazilian states.
region state city newCases newDeaths accumCases accumDeaths incidence mortality lethality
Southeast SP São Paulo 0 0 1180681 45091 9636.621 368.0290 3.82
Northeast MA São Luís 0 0 77734 2762 7054.645 250.6616 3.55
North PA Belém 0 0 158849 5466 10641.402 366.1710 3.44
North AM Manaus 0 0 318276 9942 14581.336 455.4778 3.12
South PR Curitiba 0 0 308533 8847 15960.488 457.6575 2.87
Northeast CE Fortaleza 0 0 410654 11801 15384.091 442.0940 2.87
Southeast RJ Rio de Janeiro 0 0 1337413 38263 19905.229 569.4828 2.86
Northeast BA Salvador 0 0 339216 9145 11809.715 318.3808 2.70
Midwest MT Cuiabá 0 0 154186 3746 25171.293 611.5449 2.43
Northeast AL Maceió 0 0 133169 3228 13069.264 316.7973 2.42
Northeast PE Recife 0 0 307156 6778 18663.849 411.8545 2.21
Midwest MS Campo Grande 0 0 216333 4707 24144.793 525.3454 2.18
North RO Porto Velho 0 0 129962 2752 24542.248 519.6924 2.12
Northeast PI Teresina 0 0 144071 3027 16658.592 350.0049 2.10
Northeast RN Natal 0 0 156791 3142 17734.091 355.3808 2.00
South RS Porto Alegre 0 0 336536 6686 22681.128 450.6086 1.99
Northeast PB João Pessoa 0 0 179498 3299 22187.228 407.7798 1.84
Southeast MG Belo Horizonte 0 0 472778 8451 18820.256 336.4158 1.79
Midwest GO Goiânia 0 0 468810 8073 30921.838 532.4801 1.72
North AP Macapá 0 0 99695 1616 19807.203 321.0636 1.62
Northeast SE Aracaju 0 0 171822 2623 26151.994 399.2311 1.53
North AC Rio Branco 0 0 87966 1226 21596.341 300.9926 1.39
Midwest DF Brasília 0 0 909776 11862 30172.310 393.3979 1.30
North RR Boa Vista 0 0 140963 1654 35310.223 414.3152 1.17
Southeast ES Vitória 0 0 151053 1461 41716.170 403.4830 0.97
North TO Palmas 0 0 90862 734 30375.727 245.3807 0.81
South SC Florianópolis 0 0 172657 1355 34464.332 270.4737 0.78