Changing periodicity

Davis Vaughan

2017-10-06

Introducing as_period()

Often with time series you want to aggregate your dataset to a more granular period. An example of this might be moving from a daily series to a monthly series to look at broader trends in your data. as_period() allows you to do exactly this.

as_period() accepts two types of input:

Datasets required

library(tibbletime)

# Facebook stock prices.
data(FB)

# Convert FB to tbl_time
FB <- as_tbl_time(FB, index = date)

# FANG stock prices
data(FANG)

# Convert FANG to tbl_time and group
FANG <- as_tbl_time(FANG, index = date) %>%
  group_by(symbol)

Daily to monthly

To see this in action, transform the daily FB data set to monthly data.

as_period(FB, 1~month)
## # A time tibble: 48 x 8
## # Index: date
##    symbol       date  open  high   low close    volume adjusted
##  *  <chr>     <date> <dbl> <dbl> <dbl> <dbl>     <dbl>    <dbl>
##  1     FB 2013-01-02 27.44 28.18 27.42 28.00  69846400    28.00
##  2     FB 2013-02-01 31.01 31.02 29.63 29.73  85856700    29.73
##  3     FB 2013-03-01 27.05 28.12 26.81 27.78  54064800    27.78
##  4     FB 2013-04-01 25.63 25.89 25.28 25.53  22249300    25.53
##  5     FB 2013-05-01 27.85 27.92 27.31 27.43  64567600    27.43
##  6     FB 2013-06-03 24.27 24.32 23.71 23.85  35733800    23.85
##  7     FB 2013-07-01 24.97 25.06 24.62 24.81  20582200    24.81
##  8     FB 2013-08-01 37.30 38.29 36.92 37.49 106066500    37.49
##  9     FB 2013-09-03 41.84 42.16 41.51 41.87  48774900    41.87
## 10     FB 2013-10-01 49.97 51.03 49.45 50.42  98114000    50.42
## # ... with 38 more rows
# Additionally, the following are equivalent
# as_period(FB, 1~m)
# as_period(FB, "monthly")
# as_period(FB, "m")

Generic periods

You aren’t restricted to only 1 month periods. Maybe you wanted every 2 months?

as_period(FB, 2~m)
## # A time tibble: 24 x 8
## # Index: date
##    symbol       date  open  high   low close   volume adjusted
##  *  <chr>     <date> <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
##  1     FB 2013-01-02 27.44 28.18 27.42 28.00 69846400    28.00
##  2     FB 2013-03-01 27.05 28.12 26.81 27.78 54064800    27.78
##  3     FB 2013-05-01 27.85 27.92 27.31 27.43 64567600    27.43
##  4     FB 2013-07-01 24.97 25.06 24.62 24.81 20582200    24.81
##  5     FB 2013-09-03 41.84 42.16 41.51 41.87 48774900    41.87
##  6     FB 2013-11-01 50.85 52.09 49.72 49.75 95033000    49.75
##  7     FB 2014-01-02 54.83 55.22 54.19 54.71 43195500    54.71
##  8     FB 2014-03-03 66.96 68.05 66.51 67.41 56824100    67.41
##  9     FB 2014-05-01 60.43 62.28 60.21 61.15 82429000    61.15
## 10     FB 2014-07-01 67.58 68.44 67.39 68.06 33243000    68.06
## # ... with 14 more rows

Or maybe every 25 days? Note that the dates do not line up exactly with a difference of 25 days. This is due to the data set not being completely regular (there are gaps due to weekends and holidays). as_period() chooses the first date it can find in the period specified.

as_period(FB, 25~d)
## # A time tibble: 59 x 8
## # Index: date
##    symbol       date  open  high   low close   volume adjusted
##  *  <chr>     <date> <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
##  1     FB 2013-01-02 27.44 28.18 27.42 28.00 69846400    28.00
##  2     FB 2013-01-28 31.88 32.51 31.81 32.47 59682500    32.47
##  3     FB 2013-02-21 28.28 28.55 27.15 27.28 49642300    27.28
##  4     FB 2013-03-18 26.37 26.79 25.78 26.49 26653700    26.49
##  5     FB 2013-04-12 28.00 28.00 27.24 27.40 28697400    27.40
##  6     FB 2013-05-07 27.55 27.85 26.85 26.89 41259100    26.89
##  7     FB 2013-06-03 24.27 24.32 23.71 23.85 35733800    23.85
##  8     FB 2013-06-26 24.51 24.65 23.99 24.16 29890300    24.16
##  9     FB 2013-07-22 25.99 26.13 25.72 26.05 27526300    26.05
## 10     FB 2013-08-15 36.36 37.07 36.02 36.56 56521100    36.56
## # ... with 49 more rows

Details and the start_date argument

By default, the date that starts the first group is calculated as:

  1. Find the minimum date in your dataset.

  2. Floor that date to the period that you specified.

In the 1 month example above, 2013-01-02 is the first date in the series, and because “monthly” was chosen, the first group is defined as (2013-01-01 to 2013-01-31).

Occasionally this is not what you want. Consider what would happen if you changed the period to “every 2 days”. The first date is 2013-01-02, but because “daily” is chosen, this isn’t floored to 2013-01-01 so the groups are (2013-01-02, 2013-01-03), (2013-01-04, 2013-01-05) and so on. If you wanted the first group to be (2013-01-01, 2013-01-02), you can use the start_date argument.

# Without start_date
as_period(FB, 2~d)
## # A time tibble: 607 x 8
## # Index: date
##    symbol       date  open  high   low close   volume adjusted
##  *  <chr>     <date> <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
##  1     FB 2013-01-02 27.44 28.18 27.42 28.00 69846400    28.00
##  2     FB 2013-01-04 28.01 28.93 27.83 28.76 72715400    28.76
##  3     FB 2013-01-07 28.69 29.79 28.65 29.42 83781800    29.42
##  4     FB 2013-01-08 29.51 29.60 28.86 29.06 45871300    29.06
##  5     FB 2013-01-10 30.60 31.45 30.28 31.30 95316400    31.30
##  6     FB 2013-01-14 32.08 32.21 30.62 30.95 98892800    30.95
##  7     FB 2013-01-16 30.21 30.35 29.53 29.85 75332700    29.85
##  8     FB 2013-01-18 30.31 30.44 29.27 29.66 49631500    29.66
##  9     FB 2013-01-22 29.75 30.89 29.74 30.73 55243300    30.73
## 10     FB 2013-01-24 31.27 31.49 30.81 31.08 43845100    31.08
## # ... with 597 more rows
# With start_date
as_period(FB, 2~d, start_date = "2013-01-01")
## # A time tibble: 619 x 8
## # Index: date
##    symbol       date  open  high   low close    volume adjusted
##  *  <chr>     <date> <dbl> <dbl> <dbl> <dbl>     <dbl>    <dbl>
##  1     FB 2013-01-02 27.44 28.18 27.42 28.00  69846400    28.00
##  2     FB 2013-01-03 27.88 28.47 27.59 27.77  63140600    27.77
##  3     FB 2013-01-07 28.69 29.79 28.65 29.42  83781800    29.42
##  4     FB 2013-01-09 29.67 30.60 29.49 30.59 104787700    30.59
##  5     FB 2013-01-11 31.28 31.96 31.10 31.72  89598000    31.72
##  6     FB 2013-01-14 32.08 32.21 30.62 30.95  98892800    30.95
##  7     FB 2013-01-15 30.64 31.71 29.88 30.10 173242600    30.10
##  8     FB 2013-01-17 30.08 30.42 30.03 30.14  40256700    30.14
##  9     FB 2013-01-22 29.75 30.89 29.74 30.73  55243300    30.73
## 10     FB 2013-01-23 31.10 31.50 30.80 30.82  48899800    30.82
## # ... with 609 more rows

The side argument

By default, the first date per period is returned. If you want the end of each period instead, specify the side = "end" argument.

as_period(FB, 1~y, side = "end")
## # A time tibble: 4 x 8
## # Index: date
##   symbol       date   open   high    low  close   volume adjusted
## *  <chr>     <date>  <dbl>  <dbl>  <dbl>  <dbl>    <dbl>    <dbl>
## 1     FB 2013-12-31  54.12  54.86  53.91  54.65 43076200    54.65
## 2     FB 2014-12-31  79.54  79.80  77.86  78.02 19935400    78.02
## 3     FB 2015-12-31 106.00 106.17 104.62 104.66 18298700   104.66
## 4     FB 2016-12-30 116.60 116.83 114.77 115.05 18600100   115.05

Grouped datasets

One of the neat things about working in the tidyverse is that these functions can also work with grouped datasets. Here we transform the daily series of the 4 FANG stocks to a periodicity of every 2 years.

FANG %>%
  as_period(2~y)
## # A time tibble: 8 x 8
## # Index:  date
## # Groups: symbol [4]
##   symbol       date     open     high      low    close   volume  adjusted
## *  <chr>     <date>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>     <dbl>
## 1     FB 2013-01-02  27.4400  28.1800  27.4200  28.0000 69846400  28.00000
## 2     FB 2015-01-02  78.5800  78.9300  77.7000  78.4500 18177500  78.45000
## 3   AMZN 2013-01-02 256.0800 258.1000 253.2600 257.3100  3271000 257.31000
## 4   AMZN 2015-01-02 312.5800 314.7500 306.9600 308.5200  2783200 308.51999
## 5   NFLX 2013-01-02  95.2100  95.8100  90.6900  92.0100 19431300  13.14429
## 6   NFLX 2015-01-02 344.0600 352.3200 341.1200 348.9400 13475000  49.84857
## 7   GOOG 2013-01-02 719.4212 727.0013 716.5512 723.2512  5101500 361.26435
## 8   GOOG 2015-01-02 529.0124 531.2724 524.1024 524.8124  1447500 524.81240