Time-based filtering

Davis Vaughan

2017-10-06

Introducing time_filter()

time_filter() attempts to make filtering data frames by date much easier than dplyr::filter(). It includes a flexible shorthand notation that allows you to specify entire date ranges with very little typing. The general form of the time_formula that you will use to filter rows is from ~ to, where the left hand side (LHS) is the start date, and the right hand side (RHS) is the end date. Both endpoints are included. Each side of the time_formula can be maximally specified as YYYY-MM-DD + HH:MM:SS.

Datasets required

library(tibbletime)

# Facebook stock prices.
data(FB)

# Convert FB to tbl_time
FB <- as_tbl_time(FB, index = date)

# FANG stock prices
data(FANG)

# Convert FANG to tbl_time and group
FANG <- as_tbl_time(FANG, index = date) %>%
  group_by(symbol)

Year filtering example

In dplyr, if you wanted to get the dates for 2013 in the FB dataset, you might do something like this:

filter(FB, date >= as.Date("2013-01-01"), date <= as.Date("2013-12-31"))
## # A time tibble: 252 x 8
## # Index: date
##    symbol       date  open  high   low close    volume adjusted
##  *  <chr>     <date> <dbl> <dbl> <dbl> <dbl>     <dbl>    <dbl>
##  1     FB 2013-01-02 27.44 28.18 27.42 28.00  69846400    28.00
##  2     FB 2013-01-03 27.88 28.47 27.59 27.77  63140600    27.77
##  3     FB 2013-01-04 28.01 28.93 27.83 28.76  72715400    28.76
##  4     FB 2013-01-07 28.69 29.79 28.65 29.42  83781800    29.42
##  5     FB 2013-01-08 29.51 29.60 28.86 29.06  45871300    29.06
##  6     FB 2013-01-09 29.67 30.60 29.49 30.59 104787700    30.59
##  7     FB 2013-01-10 30.60 31.45 30.28 31.30  95316400    31.30
##  8     FB 2013-01-11 31.28 31.96 31.10 31.72  89598000    31.72
##  9     FB 2013-01-14 32.08 32.21 30.62 30.95  98892800    30.95
## 10     FB 2013-01-15 30.64 31.71 29.88 30.10 173242600    30.10
## # ... with 242 more rows

That’s a lot of typing for one filter step. With tibbletime, because the index was specified at creation, we can do this:

time_filter(FB, time_formula = 2013-01-01 ~ 2013-12-31)
## # A time tibble: 252 x 8
## # Index: date
##    symbol       date  open  high   low close    volume adjusted
##  *  <chr>     <date> <dbl> <dbl> <dbl> <dbl>     <dbl>    <dbl>
##  1     FB 2013-01-02 27.44 28.18 27.42 28.00  69846400    28.00
##  2     FB 2013-01-03 27.88 28.47 27.59 27.77  63140600    27.77
##  3     FB 2013-01-04 28.01 28.93 27.83 28.76  72715400    28.76
##  4     FB 2013-01-07 28.69 29.79 28.65 29.42  83781800    29.42
##  5     FB 2013-01-08 29.51 29.60 28.86 29.06  45871300    29.06
##  6     FB 2013-01-09 29.67 30.60 29.49 30.59 104787700    30.59
##  7     FB 2013-01-10 30.60 31.45 30.28 31.30  95316400    31.30
##  8     FB 2013-01-11 31.28 31.96 31.10 31.72  89598000    31.72
##  9     FB 2013-01-14 32.08 32.21 30.62 30.95  98892800    30.95
## 10     FB 2013-01-15 30.64 31.71 29.88 30.10 173242600    30.10
## # ... with 242 more rows

At first glance, this might not look like less code, but this is before any shorthand is applied. Note how the filtering condition is specified as a formula separated by a ~.

Using time_filter shorthand, this can be written:

time_filter(FB, 2013 ~ 2013)
## # A time tibble: 252 x 8
## # Index: date
##    symbol       date  open  high   low close    volume adjusted
##  *  <chr>     <date> <dbl> <dbl> <dbl> <dbl>     <dbl>    <dbl>
##  1     FB 2013-01-02 27.44 28.18 27.42 28.00  69846400    28.00
##  2     FB 2013-01-03 27.88 28.47 27.59 27.77  63140600    27.77
##  3     FB 2013-01-04 28.01 28.93 27.83 28.76  72715400    28.76
##  4     FB 2013-01-07 28.69 29.79 28.65 29.42  83781800    29.42
##  5     FB 2013-01-08 29.51 29.60 28.86 29.06  45871300    29.06
##  6     FB 2013-01-09 29.67 30.60 29.49 30.59 104787700    30.59
##  7     FB 2013-01-10 30.60 31.45 30.28 31.30  95316400    31.30
##  8     FB 2013-01-11 31.28 31.96 31.10 31.72  89598000    31.72
##  9     FB 2013-01-14 32.08 32.21 30.62 30.95  98892800    30.95
## 10     FB 2013-01-15 30.64 31.71 29.88 30.10 173242600    30.10
## # ... with 242 more rows

Or even more succinctly as:

time_filter(FB, ~2013)
## # A time tibble: 252 x 8
## # Index: date
##    symbol       date  open  high   low close    volume adjusted
##  *  <chr>     <date> <dbl> <dbl> <dbl> <dbl>     <dbl>    <dbl>
##  1     FB 2013-01-02 27.44 28.18 27.42 28.00  69846400    28.00
##  2     FB 2013-01-03 27.88 28.47 27.59 27.77  63140600    27.77
##  3     FB 2013-01-04 28.01 28.93 27.83 28.76  72715400    28.76
##  4     FB 2013-01-07 28.69 29.79 28.65 29.42  83781800    29.42
##  5     FB 2013-01-08 29.51 29.60 28.86 29.06  45871300    29.06
##  6     FB 2013-01-09 29.67 30.60 29.49 30.59 104787700    30.59
##  7     FB 2013-01-10 30.60 31.45 30.28 31.30  95316400    31.30
##  8     FB 2013-01-11 31.28 31.96 31.10 31.72  89598000    31.72
##  9     FB 2013-01-14 32.08 32.21 30.62 30.95  98892800    30.95
## 10     FB 2013-01-15 30.64 31.71 29.88 30.10 173242600    30.10
## # ... with 242 more rows

The shorthand notation works as follows. In the first example, 2013 ~ 2013 is expanded to 2013-01-01 + 00:00:00 ~ 2013-12-31 + 23:59:59. It works by identifying the periodicity of the provided input (yearly), and expanding it the the beginning and end of that period. The one sided formula ~2013 works similarly, and it useful when you want to select every date inside a period.

Month filtering example

As another example of this shorthand, if you wanted to select every date in March, 2015:

time_filter(FB, ~2015-03)
## # A time tibble: 22 x 8
## # Index: date
##    symbol       date  open  high   low close   volume adjusted
##  *  <chr>     <date> <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
##  1     FB 2015-03-02 79.00 79.86 78.52 79.75 21662500    79.75
##  2     FB 2015-03-03 79.61 79.70 78.52 79.60 18635000    79.60
##  3     FB 2015-03-04 79.30 81.15 78.85 80.90 28126700    80.90
##  4     FB 2015-03-05 81.23 81.99 81.05 81.21 27825700    81.21
##  5     FB 2015-03-06 80.90 81.33 79.83 80.01 24488600    80.01
##  6     FB 2015-03-09 79.68 79.91 78.63 79.44 18925100    79.44
##  7     FB 2015-03-10 78.50 79.26 77.55 77.55 23067100    77.55
##  8     FB 2015-03-11 77.80 78.43 77.26 77.57 20215700    77.57
##  9     FB 2015-03-12 78.10 79.05 77.91 78.93 16093300    78.93
## 10     FB 2015-03-13 78.60 79.38 77.68 78.05 18557300    78.05
## # ... with 12 more rows
# In dplyr it looks like this
# (and you have to think, does March have 30 or 31 days?)
# filter(FB, date >= as.Date("2015-03-01"), date <= as.Date("2015-03-31"))

Grouped example

Working with grouped tbl_time objects is just as you might expect.

FANG %>%
  time_filter(2013-01-01 ~ 2013-01-04)
## # A time tibble: 12 x 8
## # Index:  date
## # Groups: symbol [4]
##    symbol       date     open     high      low    close   volume
##  *  <chr>     <date>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
##  1     FB 2013-01-02  27.4400  28.1800  27.4200  28.0000 69846400
##  2     FB 2013-01-03  27.8800  28.4700  27.5900  27.7700 63140600
##  3     FB 2013-01-04  28.0100  28.9300  27.8300  28.7600 72715400
##  4   AMZN 2013-01-02 256.0800 258.1000 253.2600 257.3100  3271000
##  5   AMZN 2013-01-03 257.2700 260.8800 256.3700 258.4800  2750900
##  6   AMZN 2013-01-04 257.5800 259.8000 256.6500 259.1500  1874200
##  7   NFLX 2013-01-02  95.2100  95.8100  90.6900  92.0100 19431300
##  8   NFLX 2013-01-03  91.9700  97.9200  91.5300  96.5900 27912500
##  9   NFLX 2013-01-04  96.5400  97.7100  95.5400  95.9800 17761100
## 10   GOOG 2013-01-02 719.4212 727.0013 716.5512 723.2512  5101500
## 11   GOOG 2013-01-03 724.9313 731.9312 720.7212 723.6713  4653700
## 12   GOOG 2013-01-04 729.3412 741.4713 727.6812 737.9713  5547600
## # ... with 1 more variables: adjusted <dbl>

Finer periods

Filtering can also be done by hour / minute / second. Note that the form of this is slightly different than the standard, YYYY-MM-DD + HH:MM:SS. The big difference here is that a + is required to divide the date from the time.

# Dummy example. Every second in a day
example <- create_series(~2013-01-01, period = 1~s)

# The first 2 minutes of the day
example %>%
  time_filter(2013-01-01 ~ 2013-01-01 + 00:02)
## # A time tibble: 180 x 1
## # Index: date
##                   date
##  *              <dttm>
##  1 2013-01-01 00:00:00
##  2 2013-01-01 00:00:01
##  3 2013-01-01 00:00:02
##  4 2013-01-01 00:00:03
##  5 2013-01-01 00:00:04
##  6 2013-01-01 00:00:05
##  7 2013-01-01 00:00:06
##  8 2013-01-01 00:00:07
##  9 2013-01-01 00:00:08
## 10 2013-01-01 00:00:09
## # ... with 170 more rows
# 3 specific hours of the day
# Equivalent to:
# 2013-01-01 + 03:00:00 ~ 2013-01-01 + 06:59:59
example %>%
  time_filter(2013-01-01 + 3 ~ 2013-01-01 + 6)
## # A time tibble: 14,400 x 1
## # Index: date
##                   date
##  *              <dttm>
##  1 2013-01-01 03:00:00
##  2 2013-01-01 03:00:01
##  3 2013-01-01 03:00:02
##  4 2013-01-01 03:00:03
##  5 2013-01-01 03:00:04
##  6 2013-01-01 03:00:05
##  7 2013-01-01 03:00:06
##  8 2013-01-01 03:00:07
##  9 2013-01-01 03:00:08
## 10 2013-01-01 03:00:09
## # ... with 14,390 more rows

[ syntax

For interactive use, to get an even quicker look at a dataset you can use the traditional extraction operator [ with the formula syntax.

FB[~2013]
## # A time tibble: 252 x 8
## # Index: date
##    symbol       date  open  high   low close    volume adjusted
##  *  <chr>     <date> <dbl> <dbl> <dbl> <dbl>     <dbl>    <dbl>
##  1     FB 2013-01-02 27.44 28.18 27.42 28.00  69846400    28.00
##  2     FB 2013-01-03 27.88 28.47 27.59 27.77  63140600    27.77
##  3     FB 2013-01-04 28.01 28.93 27.83 28.76  72715400    28.76
##  4     FB 2013-01-07 28.69 29.79 28.65 29.42  83781800    29.42
##  5     FB 2013-01-08 29.51 29.60 28.86 29.06  45871300    29.06
##  6     FB 2013-01-09 29.67 30.60 29.49 30.59 104787700    30.59
##  7     FB 2013-01-10 30.60 31.45 30.28 31.30  95316400    31.30
##  8     FB 2013-01-11 31.28 31.96 31.10 31.72  89598000    31.72
##  9     FB 2013-01-14 32.08 32.21 30.62 30.95  98892800    30.95
## 10     FB 2013-01-15 30.64 31.71 29.88 30.10 173242600    30.10
## # ... with 242 more rows
FB[2013~2014-02, c(1,2,3)]
## # A time tibble: 292 x 3
## # Index: date
##    symbol       date  open
##  *  <chr>     <date> <dbl>
##  1     FB 2013-01-02 27.44
##  2     FB 2013-01-03 27.88
##  3     FB 2013-01-04 28.01
##  4     FB 2013-01-07 28.69
##  5     FB 2013-01-08 29.51
##  6     FB 2013-01-09 29.67
##  7     FB 2013-01-10 30.60
##  8     FB 2013-01-11 31.28
##  9     FB 2013-01-14 32.08
## 10     FB 2013-01-15 30.64
## # ... with 282 more rows