Often with time series you want to aggregate your dataset to a more granular period. An example of this might be moving from a daily series to a monthly series to look at broader trends in your data. as_period()
allows you to do exactly this.
as_period()
accepts two types of input:
A formula specified as multiple ~ period
that defines the period that you want to aggregate to. An example would be 1 ~ year
for aggregating to yearly data.
A character for a few common transforms. Some of these are "yearly"
, "monthly"
, etc. Shorthand is available, such that "y"
is accepted as "yearly"
. See the documentation for more detail, ?as_period()
.
library(tibbletime)
# Facebook stock prices.
data(FB)
# Convert FB to tbl_time
FB <- as_tbl_time(FB, index = date)
# FANG stock prices
data(FANG)
# Convert FANG to tbl_time and group
FANG <- as_tbl_time(FANG, index = date) %>%
group_by(symbol)
To see this in action, transform the daily FB
data set to monthly data.
as_period(FB, 1~month)
## # A time tibble: 48 x 8
## # Index: date
## symbol date open high low close volume adjusted
## * <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 FB 2013-01-02 27.44 28.18 27.42 28.00 69846400 28.00
## 2 FB 2013-02-01 31.01 31.02 29.63 29.73 85856700 29.73
## 3 FB 2013-03-01 27.05 28.12 26.81 27.78 54064800 27.78
## 4 FB 2013-04-01 25.63 25.89 25.28 25.53 22249300 25.53
## 5 FB 2013-05-01 27.85 27.92 27.31 27.43 64567600 27.43
## 6 FB 2013-06-03 24.27 24.32 23.71 23.85 35733800 23.85
## 7 FB 2013-07-01 24.97 25.06 24.62 24.81 20582200 24.81
## 8 FB 2013-08-01 37.30 38.29 36.92 37.49 106066500 37.49
## 9 FB 2013-09-03 41.84 42.16 41.51 41.87 48774900 41.87
## 10 FB 2013-10-01 49.97 51.03 49.45 50.42 98114000 50.42
## # ... with 38 more rows
# Additionally, the following are equivalent
# as_period(FB, 1~m)
# as_period(FB, "monthly")
# as_period(FB, "m")
You aren’t restricted to only 1 month periods. Maybe you wanted every 2 months?
as_period(FB, 2~m)
## # A time tibble: 24 x 8
## # Index: date
## symbol date open high low close volume adjusted
## * <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 FB 2013-01-02 27.44 28.18 27.42 28.00 69846400 28.00
## 2 FB 2013-03-01 27.05 28.12 26.81 27.78 54064800 27.78
## 3 FB 2013-05-01 27.85 27.92 27.31 27.43 64567600 27.43
## 4 FB 2013-07-01 24.97 25.06 24.62 24.81 20582200 24.81
## 5 FB 2013-09-03 41.84 42.16 41.51 41.87 48774900 41.87
## 6 FB 2013-11-01 50.85 52.09 49.72 49.75 95033000 49.75
## 7 FB 2014-01-02 54.83 55.22 54.19 54.71 43195500 54.71
## 8 FB 2014-03-03 66.96 68.05 66.51 67.41 56824100 67.41
## 9 FB 2014-05-01 60.43 62.28 60.21 61.15 82429000 61.15
## 10 FB 2014-07-01 67.58 68.44 67.39 68.06 33243000 68.06
## # ... with 14 more rows
Or maybe every 25 days? Note that the dates do not line up exactly with a difference of 25 days. This is due to the data set not being completely regular (there are gaps due to weekends and holidays). as_period()
chooses the first date it can find in the period specified.
as_period(FB, 25~d)
## # A time tibble: 59 x 8
## # Index: date
## symbol date open high low close volume adjusted
## * <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 FB 2013-01-02 27.44 28.18 27.42 28.00 69846400 28.00
## 2 FB 2013-01-28 31.88 32.51 31.81 32.47 59682500 32.47
## 3 FB 2013-02-21 28.28 28.55 27.15 27.28 49642300 27.28
## 4 FB 2013-03-18 26.37 26.79 25.78 26.49 26653700 26.49
## 5 FB 2013-04-12 28.00 28.00 27.24 27.40 28697400 27.40
## 6 FB 2013-05-07 27.55 27.85 26.85 26.89 41259100 26.89
## 7 FB 2013-06-03 24.27 24.32 23.71 23.85 35733800 23.85
## 8 FB 2013-06-26 24.51 24.65 23.99 24.16 29890300 24.16
## 9 FB 2013-07-22 25.99 26.13 25.72 26.05 27526300 26.05
## 10 FB 2013-08-15 36.36 37.07 36.02 36.56 56521100 36.56
## # ... with 49 more rows
start_date
argumentBy default, the date that starts the first group is calculated as:
Find the minimum date in your dataset.
Floor that date to the period that you specified.
In the 1 month example above, 2013-01-02
is the first date in the series, and because “monthly” was chosen, the first group is defined as (2013-01-01 to 2013-01-31).
Occasionally this is not what you want. Consider what would happen if you changed the period to “every 2 days”. The first date is 2013-01-02
, but because “daily” is chosen, this isn’t floored to 2013-01-01
so the groups are (2013-01-02, 2013-01-03), (2013-01-04, 2013-01-05) and so on. If you wanted the first group to be (2013-01-01, 2013-01-02), you can use the start_date
argument.
# Without start_date
as_period(FB, 2~d)
## # A time tibble: 607 x 8
## # Index: date
## symbol date open high low close volume adjusted
## * <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 FB 2013-01-02 27.44 28.18 27.42 28.00 69846400 28.00
## 2 FB 2013-01-04 28.01 28.93 27.83 28.76 72715400 28.76
## 3 FB 2013-01-07 28.69 29.79 28.65 29.42 83781800 29.42
## 4 FB 2013-01-08 29.51 29.60 28.86 29.06 45871300 29.06
## 5 FB 2013-01-10 30.60 31.45 30.28 31.30 95316400 31.30
## 6 FB 2013-01-14 32.08 32.21 30.62 30.95 98892800 30.95
## 7 FB 2013-01-16 30.21 30.35 29.53 29.85 75332700 29.85
## 8 FB 2013-01-18 30.31 30.44 29.27 29.66 49631500 29.66
## 9 FB 2013-01-22 29.75 30.89 29.74 30.73 55243300 30.73
## 10 FB 2013-01-24 31.27 31.49 30.81 31.08 43845100 31.08
## # ... with 597 more rows
# With start_date
as_period(FB, 2~d, start_date = "2013-01-01")
## # A time tibble: 619 x 8
## # Index: date
## symbol date open high low close volume adjusted
## * <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 FB 2013-01-02 27.44 28.18 27.42 28.00 69846400 28.00
## 2 FB 2013-01-03 27.88 28.47 27.59 27.77 63140600 27.77
## 3 FB 2013-01-07 28.69 29.79 28.65 29.42 83781800 29.42
## 4 FB 2013-01-09 29.67 30.60 29.49 30.59 104787700 30.59
## 5 FB 2013-01-11 31.28 31.96 31.10 31.72 89598000 31.72
## 6 FB 2013-01-14 32.08 32.21 30.62 30.95 98892800 30.95
## 7 FB 2013-01-15 30.64 31.71 29.88 30.10 173242600 30.10
## 8 FB 2013-01-17 30.08 30.42 30.03 30.14 40256700 30.14
## 9 FB 2013-01-22 29.75 30.89 29.74 30.73 55243300 30.73
## 10 FB 2013-01-23 31.10 31.50 30.80 30.82 48899800 30.82
## # ... with 609 more rows
side
argumentBy default, the first date per period is returned. If you want the end of each period instead, specify the side = "end"
argument.
as_period(FB, 1~y, side = "end")
## # A time tibble: 4 x 8
## # Index: date
## symbol date open high low close volume adjusted
## * <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 FB 2013-12-31 54.12 54.86 53.91 54.65 43076200 54.65
## 2 FB 2014-12-31 79.54 79.80 77.86 78.02 19935400 78.02
## 3 FB 2015-12-31 106.00 106.17 104.62 104.66 18298700 104.66
## 4 FB 2016-12-30 116.60 116.83 114.77 115.05 18600100 115.05
One of the neat things about working in the tidyverse
is that these functions can also work with grouped datasets. Here we transform the daily series of the 4 FANG stocks to a periodicity of every 2 years.
FANG %>%
as_period(2~y)
## # A time tibble: 8 x 8
## # Index: date
## # Groups: symbol [4]
## symbol date open high low close volume adjusted
## * <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 FB 2013-01-02 27.4400 28.1800 27.4200 28.0000 69846400 28.00000
## 2 FB 2015-01-02 78.5800 78.9300 77.7000 78.4500 18177500 78.45000
## 3 AMZN 2013-01-02 256.0800 258.1000 253.2600 257.3100 3271000 257.31000
## 4 AMZN 2015-01-02 312.5800 314.7500 306.9600 308.5200 2783200 308.51999
## 5 NFLX 2013-01-02 95.2100 95.8100 90.6900 92.0100 19431300 13.14429
## 6 NFLX 2015-01-02 344.0600 352.3200 341.1200 348.9400 13475000 49.84857
## 7 GOOG 2013-01-02 719.4212 727.0013 716.5512 723.2512 5101500 361.26435
## 8 GOOG 2015-01-02 529.0124 531.2724 524.1024 524.8124 1447500 524.81240