The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

dtplyr

CRAN status R-CMD-check Codecov test coverage

Overview

dtplyr provides a data.table backend for dplyr. The goal of dtplyr is to allow you to write dplyr code that is automatically translated to the equivalent, but usually much faster, data.table code.

See vignette("translation") for details of the current translations, and table.express and rqdatatable for related work.

Installation

You can install from CRAN with:

install.packages("dtplyr")

Or try the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("tidyverse/dtplyr")

Usage

To use dtplyr, you must at least load dtplyr and dplyr. You may also want to load data.table so you can access the other goodies that it provides:

library(data.table)
library(dtplyr)
library(dplyr, warn.conflicts = FALSE)

Then use lazy_dt() to create a “lazy” data table that tracks the operations performed on it.

mtcars2 <- lazy_dt(mtcars)

You can preview the transformation (including the generated data.table code) by printing the result:

mtcars2 %>% 
  filter(wt < 5) %>% 
  mutate(l100k = 235.21 / mpg) %>% # liters / 100 km
  group_by(cyl) %>% 
  summarise(l100k = mean(l100k))
#> Source: local data table [3 x 2]
#> Call:   `_DT1`[wt < 5][, `:=`(l100k = 235.21/mpg)][, .(l100k = mean(l100k)), 
#>     keyby = .(cyl)]
#> 
#>     cyl l100k
#>   <dbl> <dbl>
#> 1     4  9.05
#> 2     6 12.0 
#> 3     8 14.9 
#> 
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results

But generally you should reserve this only for debugging, and use as.data.table(), as.data.frame(), or as_tibble() to indicate that you’re done with the transformation and want to access the results:

mtcars2 %>% 
  filter(wt < 5) %>% 
  mutate(l100k = 235.21 / mpg) %>% # liters / 100 km
  group_by(cyl) %>% 
  summarise(l100k = mean(l100k)) %>% 
  as_tibble()
#> # A tibble: 3 × 2
#>     cyl l100k
#>   <dbl> <dbl>
#> 1     4  9.05
#> 2     6 12.0 
#> 3     8 14.9

Why is dtplyr slower than data.table?

There are two primary reasons that dtplyr will always be somewhat slower than data.table:

Code of Conduct

Please note that the dtplyr project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.