time_filter()
attempts to make filtering data frames by date much easier than dplyr::filter()
. It includes a flexible shorthand notation that allows you to specify entire date ranges with very little typing. The general form of the time_formula
that you will use to filter rows is from ~ to
, where the left hand side (LHS) is the start date, and the right hand side (RHS) is the end date. Both endpoints are included. Each side of the time_formula
can be maximally specified as YYYY-MM-DD + HH:MM:SS
.
library(tibbletime)
# Facebook stock prices.
data(FB)
# Convert FB to tbl_time
FB <- as_tbl_time(FB, index = date)
# FANG stock prices
data(FANG)
# Convert FANG to tbl_time and group
FANG <- as_tbl_time(FANG, index = date) %>%
group_by(symbol)
In dplyr
, if you wanted to get the dates for 2013
in the FB
dataset, you might do something like this:
filter(FB, date >= as.Date("2013-01-01"), date <= as.Date("2013-12-31"))
## # A time tibble: 252 x 8
## # Index: date
## symbol date open high low close volume adjusted
## * <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 FB 2013-01-02 27.44 28.18 27.42 28.00 69846400 28.00
## 2 FB 2013-01-03 27.88 28.47 27.59 27.77 63140600 27.77
## 3 FB 2013-01-04 28.01 28.93 27.83 28.76 72715400 28.76
## 4 FB 2013-01-07 28.69 29.79 28.65 29.42 83781800 29.42
## 5 FB 2013-01-08 29.51 29.60 28.86 29.06 45871300 29.06
## 6 FB 2013-01-09 29.67 30.60 29.49 30.59 104787700 30.59
## 7 FB 2013-01-10 30.60 31.45 30.28 31.30 95316400 31.30
## 8 FB 2013-01-11 31.28 31.96 31.10 31.72 89598000 31.72
## 9 FB 2013-01-14 32.08 32.21 30.62 30.95 98892800 30.95
## 10 FB 2013-01-15 30.64 31.71 29.88 30.10 173242600 30.10
## # ... with 242 more rows
That’s a lot of typing for one filter step. With tibbletime
, because the index
was specified at creation, we can do this:
time_filter(FB, time_formula = 2013-01-01 ~ 2013-12-31)
## # A time tibble: 252 x 8
## # Index: date
## symbol date open high low close volume adjusted
## * <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 FB 2013-01-02 27.44 28.18 27.42 28.00 69846400 28.00
## 2 FB 2013-01-03 27.88 28.47 27.59 27.77 63140600 27.77
## 3 FB 2013-01-04 28.01 28.93 27.83 28.76 72715400 28.76
## 4 FB 2013-01-07 28.69 29.79 28.65 29.42 83781800 29.42
## 5 FB 2013-01-08 29.51 29.60 28.86 29.06 45871300 29.06
## 6 FB 2013-01-09 29.67 30.60 29.49 30.59 104787700 30.59
## 7 FB 2013-01-10 30.60 31.45 30.28 31.30 95316400 31.30
## 8 FB 2013-01-11 31.28 31.96 31.10 31.72 89598000 31.72
## 9 FB 2013-01-14 32.08 32.21 30.62 30.95 98892800 30.95
## 10 FB 2013-01-15 30.64 31.71 29.88 30.10 173242600 30.10
## # ... with 242 more rows
At first glance, this might not look like less code, but this is before any shorthand is applied. Note how the filtering condition is specified as a formula
separated by a ~
.
Using time_filter
shorthand, this can be written:
time_filter(FB, 2013 ~ 2013)
## # A time tibble: 252 x 8
## # Index: date
## symbol date open high low close volume adjusted
## * <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 FB 2013-01-02 27.44 28.18 27.42 28.00 69846400 28.00
## 2 FB 2013-01-03 27.88 28.47 27.59 27.77 63140600 27.77
## 3 FB 2013-01-04 28.01 28.93 27.83 28.76 72715400 28.76
## 4 FB 2013-01-07 28.69 29.79 28.65 29.42 83781800 29.42
## 5 FB 2013-01-08 29.51 29.60 28.86 29.06 45871300 29.06
## 6 FB 2013-01-09 29.67 30.60 29.49 30.59 104787700 30.59
## 7 FB 2013-01-10 30.60 31.45 30.28 31.30 95316400 31.30
## 8 FB 2013-01-11 31.28 31.96 31.10 31.72 89598000 31.72
## 9 FB 2013-01-14 32.08 32.21 30.62 30.95 98892800 30.95
## 10 FB 2013-01-15 30.64 31.71 29.88 30.10 173242600 30.10
## # ... with 242 more rows
Or even more succinctly as:
time_filter(FB, ~2013)
## # A time tibble: 252 x 8
## # Index: date
## symbol date open high low close volume adjusted
## * <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 FB 2013-01-02 27.44 28.18 27.42 28.00 69846400 28.00
## 2 FB 2013-01-03 27.88 28.47 27.59 27.77 63140600 27.77
## 3 FB 2013-01-04 28.01 28.93 27.83 28.76 72715400 28.76
## 4 FB 2013-01-07 28.69 29.79 28.65 29.42 83781800 29.42
## 5 FB 2013-01-08 29.51 29.60 28.86 29.06 45871300 29.06
## 6 FB 2013-01-09 29.67 30.60 29.49 30.59 104787700 30.59
## 7 FB 2013-01-10 30.60 31.45 30.28 31.30 95316400 31.30
## 8 FB 2013-01-11 31.28 31.96 31.10 31.72 89598000 31.72
## 9 FB 2013-01-14 32.08 32.21 30.62 30.95 98892800 30.95
## 10 FB 2013-01-15 30.64 31.71 29.88 30.10 173242600 30.10
## # ... with 242 more rows
The shorthand notation works as follows. In the first example, 2013 ~ 2013
is expanded to 2013-01-01 + 00:00:00 ~ 2013-12-31 + 23:59:59
. It works by identifying the periodicity of the provided input (yearly), and expanding it the the beginning and end of that period. The one sided formula ~2013
works similarly, and it useful when you want to select every date inside a period.
As another example of this shorthand, if you wanted to select every date in March, 2015:
time_filter(FB, ~2015-03)
## # A time tibble: 22 x 8
## # Index: date
## symbol date open high low close volume adjusted
## * <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 FB 2015-03-02 79.00 79.86 78.52 79.75 21662500 79.75
## 2 FB 2015-03-03 79.61 79.70 78.52 79.60 18635000 79.60
## 3 FB 2015-03-04 79.30 81.15 78.85 80.90 28126700 80.90
## 4 FB 2015-03-05 81.23 81.99 81.05 81.21 27825700 81.21
## 5 FB 2015-03-06 80.90 81.33 79.83 80.01 24488600 80.01
## 6 FB 2015-03-09 79.68 79.91 78.63 79.44 18925100 79.44
## 7 FB 2015-03-10 78.50 79.26 77.55 77.55 23067100 77.55
## 8 FB 2015-03-11 77.80 78.43 77.26 77.57 20215700 77.57
## 9 FB 2015-03-12 78.10 79.05 77.91 78.93 16093300 78.93
## 10 FB 2015-03-13 78.60 79.38 77.68 78.05 18557300 78.05
## # ... with 12 more rows
# In dplyr it looks like this
# (and you have to think, does March have 30 or 31 days?)
# filter(FB, date >= as.Date("2015-03-01"), date <= as.Date("2015-03-31"))
Working with grouped tbl_time
objects is just as you might expect.
FANG %>%
time_filter(2013-01-01 ~ 2013-01-04)
## # A time tibble: 12 x 8
## # Index: date
## # Groups: symbol [4]
## symbol date open high low close volume
## * <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 FB 2013-01-02 27.4400 28.1800 27.4200 28.0000 69846400
## 2 FB 2013-01-03 27.8800 28.4700 27.5900 27.7700 63140600
## 3 FB 2013-01-04 28.0100 28.9300 27.8300 28.7600 72715400
## 4 AMZN 2013-01-02 256.0800 258.1000 253.2600 257.3100 3271000
## 5 AMZN 2013-01-03 257.2700 260.8800 256.3700 258.4800 2750900
## 6 AMZN 2013-01-04 257.5800 259.8000 256.6500 259.1500 1874200
## 7 NFLX 2013-01-02 95.2100 95.8100 90.6900 92.0100 19431300
## 8 NFLX 2013-01-03 91.9700 97.9200 91.5300 96.5900 27912500
## 9 NFLX 2013-01-04 96.5400 97.7100 95.5400 95.9800 17761100
## 10 GOOG 2013-01-02 719.4212 727.0013 716.5512 723.2512 5101500
## 11 GOOG 2013-01-03 724.9313 731.9312 720.7212 723.6713 4653700
## 12 GOOG 2013-01-04 729.3412 741.4713 727.6812 737.9713 5547600
## # ... with 1 more variables: adjusted <dbl>
Filtering can also be done by hour / minute / second. Note that the form of this is slightly different than the standard, YYYY-MM-DD + HH:MM:SS
. The big difference here is that a +
is required to divide the date from the time.
# Dummy example. Every second in a day
example <- create_series(~2013-01-01, period = 1~s)
# The first 2 minutes of the day
example %>%
time_filter(2013-01-01 ~ 2013-01-01 + 00:02)
## # A time tibble: 180 x 1
## # Index: date
## date
## * <dttm>
## 1 2013-01-01 00:00:00
## 2 2013-01-01 00:00:01
## 3 2013-01-01 00:00:02
## 4 2013-01-01 00:00:03
## 5 2013-01-01 00:00:04
## 6 2013-01-01 00:00:05
## 7 2013-01-01 00:00:06
## 8 2013-01-01 00:00:07
## 9 2013-01-01 00:00:08
## 10 2013-01-01 00:00:09
## # ... with 170 more rows
# 3 specific hours of the day
# Equivalent to:
# 2013-01-01 + 03:00:00 ~ 2013-01-01 + 06:59:59
example %>%
time_filter(2013-01-01 + 3 ~ 2013-01-01 + 6)
## # A time tibble: 14,400 x 1
## # Index: date
## date
## * <dttm>
## 1 2013-01-01 03:00:00
## 2 2013-01-01 03:00:01
## 3 2013-01-01 03:00:02
## 4 2013-01-01 03:00:03
## 5 2013-01-01 03:00:04
## 6 2013-01-01 03:00:05
## 7 2013-01-01 03:00:06
## 8 2013-01-01 03:00:07
## 9 2013-01-01 03:00:08
## 10 2013-01-01 03:00:09
## # ... with 14,390 more rows
[
syntaxFor interactive use, to get an even quicker look at a dataset you can use the traditional extraction operator [
with the formula syntax.
FB[~2013]
## # A time tibble: 252 x 8
## # Index: date
## symbol date open high low close volume adjusted
## * <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 FB 2013-01-02 27.44 28.18 27.42 28.00 69846400 28.00
## 2 FB 2013-01-03 27.88 28.47 27.59 27.77 63140600 27.77
## 3 FB 2013-01-04 28.01 28.93 27.83 28.76 72715400 28.76
## 4 FB 2013-01-07 28.69 29.79 28.65 29.42 83781800 29.42
## 5 FB 2013-01-08 29.51 29.60 28.86 29.06 45871300 29.06
## 6 FB 2013-01-09 29.67 30.60 29.49 30.59 104787700 30.59
## 7 FB 2013-01-10 30.60 31.45 30.28 31.30 95316400 31.30
## 8 FB 2013-01-11 31.28 31.96 31.10 31.72 89598000 31.72
## 9 FB 2013-01-14 32.08 32.21 30.62 30.95 98892800 30.95
## 10 FB 2013-01-15 30.64 31.71 29.88 30.10 173242600 30.10
## # ... with 242 more rows
FB[2013~2014-02, c(1,2,3)]
## # A time tibble: 292 x 3
## # Index: date
## symbol date open
## * <chr> <date> <dbl>
## 1 FB 2013-01-02 27.44
## 2 FB 2013-01-03 27.88
## 3 FB 2013-01-04 28.01
## 4 FB 2013-01-07 28.69
## 5 FB 2013-01-08 29.51
## 6 FB 2013-01-09 29.67
## 7 FB 2013-01-10 30.60
## 8 FB 2013-01-11 31.28
## 9 FB 2013-01-14 32.08
## 10 FB 2013-01-15 30.64
## # ... with 282 more rows