This vignette explains briefly how to use the function adam()
and the related auto.adam()
in smooth
package. It does not aim at covering all aspects of the function, but focuses on the main ones.
ADAM is Augmented Dynamic Adaptive Model. It is a model that underlies ETS, ARIMA and regression, connecting them in a unified framework. The underlying model for ADAM is a Single Source of Error state space model, which is explained in detail separately in an online textbook.
The main philosophy of adam()
function is to be agnostic of the provided data. This means that it will work with ts
, msts
, zoo
, xts
, data.frame
, numeric
and other classes of data. The specification of seasonality in the model is done using a separate parameter lags
, so you are not obliged to transform the existing data to something specific, and can use it as is. If you provide a matrix
, or a data.frame
, or a data.table
, or any other multivariate structure, then the function will use the first column for the response variable and the others for the explanatory ones. One thing that is currently assumed in the function is that the data is measured at a regular frequency. If this is not the case, you will need to introduce missing values manually.
In order to run the experiments in this vignette, we need to load the following packages:
First and foremost, ADAM implements ETS model, although in a more flexible way than (Hyndman et al. 2008): it supports different distributions for the error term, which are regulated via distribution
parameter. By default, the additive error model relies on Normal distribution, while the multiplicative error one assumes Inverse Gaussian. If you want to reproduce the classical ETS, you would need to specify distribution="dnorm"
. Here is an example of ADAM ETS(MMM) with Normal distribution on a N2568 data from M3 competition (if you provide an Mcomp
object, adam()
will automatically set the train and test sets, the forecast horizon and even the needed lags):
testModel <- adam(M3[[2568]], "MMM", lags=c(1,12), distribution="dnorm")
summary(testModel)
#>
#> Model estimated using adam() function: ETS(MMM)
#> Response variable: M3..2568..
#> Distribution used in the estimation: Normal
#> Loss function type: likelihood; Loss function value: 869.8367
#> Coefficients:
#> Estimate Std. Error Lower 2.5% Upper 97.5%
#> alpha 0.1092 0.0349 0.0400 0.1782
#> beta 0.0288 0.0198 0.0000 0.0680
#> gamma 0.0022 0.0546 0.0000 0.1102
#> level 4587.3904 175.1692 4239.8168 4933.8682
#> trend 1.0038 0.0019 1.0001 1.0075
#> seasonal_1 1.1785 0.0204 1.1526 1.2301
#> seasonal_2 0.8163 0.0143 0.7904 0.8679
#> seasonal_3 0.8234 0.0144 0.7975 0.8750
#> seasonal_4 1.5721 0.0261 1.5461 1.6237
#> seasonal_5 0.7448 0.0131 0.7189 0.7964
#> seasonal_6 1.2687 0.0219 1.2428 1.3203
#> seasonal_7 0.8923 0.0153 0.8664 0.9439
#> seasonal_8 0.9121 0.0160 0.8862 0.9637
#> seasonal_9 1.2291 0.0225 1.2032 1.2807
#> seasonal_10 0.8835 0.0163 0.8575 0.9351
#> seasonal_11 0.8383 0.0155 0.8124 0.8899
#>
#> Sample size: 116
#> Number of estimated parameters: 17
#> Number of degrees of freedom: 99
#> Information criteria:
#> AIC AICc BIC BICc
#> 1773.674 1779.918 1820.485 1835.327
plot(forecast(testModel,h=18,interval="parametric"))
You might notice that the summary contains more than what is reported by other smooth
functions. This one also produces standard errors for the estimated parameters based on Fisher Information calculation. Note that this is computationally expensive, so if you have a model with more than 30 variables, the calculation of standard errors might take plenty of time. As for the default print()
method, it will produce a shorter summary from the model, without the standard errors (similar to what es()
does):
testModel
#> Time elapsed: 0.18 seconds
#> Model estimated using adam() function: ETS(MMM)
#> Distribution assumed in the model: Normal
#> Loss function type: likelihood; Loss function value: 869.8367
#> Persistence vector g:
#> alpha beta gamma
#> 0.1092 0.0288 0.0022
#>
#> Sample size: 116
#> Number of estimated parameters: 17
#> Number of degrees of freedom: 99
#> Information criteria:
#> AIC AICc BIC BICc
#> 1773.674 1779.918 1820.485 1835.327
#>
#> Forecast errors:
#> ME: 586.333; MAE: 797.299; RMSE: 995.959
#> sCE: 144.981%; sMAE: 10.953%; sMSE: 1.872%
#> MASE: 0.325; RMSSE: 0.314; rMAE: 0.352; rRMSE: 0.328
Also, note that the prediction interval in case of multiplicative error models are approximate. It is advisable to use simulations instead (which is slower, but more accurate):
If you want to do the residuals diagnostics, then it is recommended to use plot
function, something like this (you can select, which of the plots to produce):
By default ADAM will estimate models via maximising likelihood function. But there is also a parameter loss
, which allows selecting from a list of already implemented loss functions (again, see documentation for adam()
for the full list) or using a function written by a user. Here is how to do the latter on the example of another M3 series:
lossFunction <- function(actual, fitted, B){
return(sum(abs(actual-fitted)^3))
}
testModel <- adam(M3[[1234]], "AAN", silent=FALSE, loss=lossFunction)
testModel
#> Time elapsed: 0.02 seconds
#> Model estimated using adam() function: ETS(AAN)
#> Distribution assumed in the model: Normal
#> Loss function type: custom; Loss function value: 23993012
#> Persistence vector g:
#> alpha beta
#> 0.6316 0.2494
#>
#> Sample size: 45
#> Number of estimated parameters: 4
#> Number of degrees of freedom: 41
#> Information criteria are unavailable for the chosen loss & distribution.
#>
#> Forecast errors:
#> ME: -346.9; MAE: 346.9; RMSE: 395.39
#> sCE: -34.086%; sMAE: 4.261%; sMSE: 0.236%
#> MASE: 4.8; RMSSE: 4.416; rMAE: 3.942; rRMSE: 3.567
Note that you need to have parameters actual, fitted and B in the function, which correspond to the vector of actual values, vector of fitted values on each iteration and a vector of the optimised parameters.
loss
and distribution
parameters are independent, so in the example above, we have assumed that the error term follows Normal distribution, but we have estimated its parameters using a non-conventional loss because we can. Some of distributions assume that there is an additional parameter, which can either be estimated or provided by user. These include Asymmetric Laplace (distribution="dalaplace"
) with alpha
, Generalised Normal and Log Generalised normal (distribution=c("gnorm","dlgnorm")
) with shape
and Student’s T (distribution="dt"
) with nu
:
The model selection in ADAM ETS relies on information criteria and works correctly only for the loss="likelihood"
. There are several options, how to select the model, see them in the description of the function: ?adam()
. The default one uses branch-and-bound algorithm, similar to the one used in es()
, but only considers additive trend models (the multiplicative trend ones are less stable and need more attention from a forecaster):
testModel <- adam(M3[[2568]], "ZXZ", lags=c(1,12), silent=FALSE)
#> Forming the pool of models based on... ANN , ANA , MNM , MAM , Estimation progress: 71 %86 %100 %... Done!
testModel
#> Time elapsed: 0.62 seconds
#> Model estimated using adam() function: ETS(MAM)
#> Distribution assumed in the model: Inverse Gaussian
#> Loss function type: likelihood; Loss function value: 866.1911
#> Persistence vector g:
#> alpha beta gamma
#> 0.089 0.010 0.000
#>
#> Sample size: 116
#> Number of estimated parameters: 17
#> Number of degrees of freedom: 99
#> Information criteria:
#> AIC AICc BIC BICc
#> 1766.382 1772.627 1813.193 1828.036
#>
#> Forecast errors:
#> ME: 721.028; MAE: 854.919; RMSE: 1093.439
#> sCE: 178.287%; sMAE: 11.744%; sMSE: 2.256%
#> MASE: 0.348; RMSSE: 0.345; rMAE: 0.377; rRMSE: 0.36
Note that the function produces point forecasts if h>0
, but it won’t generate prediction interval. This is why you need to use forecast()
method (as shown in the first example in this vignette).
Similarly to es()
, function supports combination of models, but it saves all the tested models in the output for a potential reuse. Here how it works:
testModel <- adam(M3[[2568]], "CXC", lags=c(1,12))
testForecast <- forecast(testModel,h=18,interval="semiparametric", level=c(0.9,0.95))
testForecast
#> Point forecast Lower bound (5%) Lower bound (2.5%) Upper bound (95%)
#> Sep 1992 10876.513 9293.316 9023.745 12612.33
#> Oct 1992 7801.605 2560.635 1724.737 13924.27
#> Nov 1992 7405.515 2339.327 1523.019 13276.78
#> Dec 1992 10138.461 4273.559 3350.614 17043.62
#> Jan 1993 10471.932 4587.469 3658.074 17376.46
#> Feb 1993 7230.942 2497.844 1715.470 12597.47
#> Mar 1993 7381.974 2739.276 1967.226 12617.15
#> Apr 1993 13905.674 7742.474 6756.119 21037.19
#> May 1993 6594.619 2662.537 1988.602 10916.41
#> Jun 1993 11355.079 6620.779 5830.979 16657.17
#> Jul 1993 7960.777 4525.808 3931.766 11700.13
#> Aug 1993 8154.546 5422.507 4944.738 11098.76
#> Sep 1993 11037.561 9376.111 9094.875 12867.38
#> Oct 1993 7920.785 2424.799 1539.398 14293.50
#> Nov 1993 7517.406 2183.861 1317.381 13661.26
#> Dec 1993 10303.304 4162.165 3187.962 17493.08
#> Jan 1994 10630.673 4469.101 3488.579 17822.59
#> Feb 1994 7330.624 2331.426 1500.047 12974.02
#> Upper bound (97.5%)
#> Sep 1992 12983.56
#> Oct 1992 15348.51
#> Nov 1992 14630.24
#> Dec 1992 18660.50
#> Jan 1993 18986.22
#> Feb 1993 13802.22
#> Mar 1993 13784.19
#> Apr 1993 22668.94
#> May 1993 11850.04
#> Jun 1993 17825.28
#> Jul 1993 12496.95
#> Aug 1993 11717.85
#> Sep 1993 13260.75
#> Oct 1993 15763.91
#> Nov 1993 15068.37
#> Dec 1993 19166.81
#> Jan 1994 19490.33
#> Feb 1994 14234.96
plot(testForecast)
Yes, now we support vectors for the levels in case you want to produce several. In fact, we also support side for prediction interval, so you can extract specific quantiles without a hustle:
forecast(testModel,h=18,interval="semiparametric", level=c(0.9,0.95,0.99), side="upper")
#> Point forecast Upper bound (90%) Upper bound (95%) Upper bound (99%)
#> Sep 1992 10876.410 12196.480 12612.07 13427.66
#> Oct 1992 7801.416 12368.054 13923.65 17095.63
#> Nov 1992 7406.684 11796.520 13278.77 16289.92
#> Dec 1992 10160.973 15310.387 17076.98 20687.20
#> Jan 1993 10474.721 15621.995 17380.31 20966.46
#> Feb 1993 7239.713 11279.442 12609.30 15280.20
#> Mar 1993 7383.991 11329.466 12619.62 15202.64
#> Apr 1993 13912.287 19253.957 21046.81 24671.87
#> May 1993 6595.207 9875.918 10917.06 12973.49
#> Jun 1993 11372.630 15382.480 16680.07 19262.39
#> Jul 1993 7970.064 10818.636 11711.37 13463.39
#> Aug 1993 8154.298 10402.313 11098.10 12455.49
#> Sep 1993 11045.751 12436.877 12877.04 13742.73
#> Oct 1993 7926.201 12691.210 14301.54 17574.45
#> Nov 1993 7530.220 12135.203 13680.35 16811.29
#> Dec 1993 10319.104 15685.178 17515.30 21246.69
#> Jan 1994 10636.776 16005.958 17830.26 21543.18
#> Feb 1994 7338.960 11592.766 12986.65 15780.97
A brand new thing in the function is the possibility to use several frequencies (double / triple / quadruple / … seasonal models). Here is an example of what we can have in case of half-hourly data:
testModel <- adam(forecast::taylor, "MMdM", lags=c(1,48,336), silent=FALSE, h=336, holdout=TRUE)
testModel
#> Time elapsed: 41.37 seconds
#> Model estimated using adam() function: ETS(MMdM)[48, 336]
#> Distribution assumed in the model: Inverse Gaussian
#> Loss function type: likelihood; Loss function value: 25233.23
#> Persistence vector g:
#> alpha beta gamma1 gamma2
#> 0.9958 0.9910 0.0040 0.0002
#> Damping parameter: 0.6824
#> Sample size: 3696
#> Number of estimated parameters: 390
#> Number of degrees of freedom: 3306
#> Information criteria:
#> AIC AICc BIC BICc
#> 51246.45 51338.73 53670.31 54049.34
#>
#> Forecast errors:
#> ME: -22.336; MAE: 763.228; RMSE: 1054.954
#> sCE: -25.363%; sMAE: 2.579%; sMSE: 0.127%
#> MASE: 1.174; RMSSE: 1.118; rMAE: 0.114; rRMSE: 0.129
Note that the more lags you have, the more initial seasonal components the function will need to estimate, which is a difficult task. The optimiser might not get close to the optimal value, so we can help it. First, we can give more time for the calculation, increasing the number of iterations via maxeval
(the default value is 20 iterations for each optimised parameter. So, in case of the previous model it is 389*20=7780):
testModel <- adam(forecast::taylor, "MMdM", lags=c(1,48,336), silent=FALSE, h=336, holdout=TRUE,
maxeval=10000)
testModel
#> Time elapsed: 27.29 seconds
#> Model estimated using adam() function: ETS(MMdM)[48, 336]
#> Distribution assumed in the model: Inverse Gaussian
#> Loss function type: likelihood; Loss function value: 25730.06
#> Persistence vector g:
#> alpha beta gamma1 gamma2
#> 0.9841 0.9219 0.0031 0.0085
#> Damping parameter: 0.9384
#> Sample size: 3696
#> Number of estimated parameters: 390
#> Number of degrees of freedom: 3306
#> Information criteria:
#> AIC AICc BIC BICc
#> 52240.12 52332.40 54663.98 55043.01
#>
#> Forecast errors:
#> ME: -1003.906; MAE: 1202.239; RMSE: 1483.427
#> sCE: -1139.978%; sMAE: 4.063%; sMSE: 0.251%
#> MASE: 1.849; RMSSE: 1.572; rMAE: 0.18; rRMSE: 0.181
This will take more time, but will typically lead to more refined parameters. You can control other parameters of the optimiser as well, such as algorithm
, xtol_rel
, print_level
and others, which are explained in the documentation for nloptr
function from nloptr package (run nloptr.print.options()
for details). Second, we can give a different set of initial parameters for the optimiser, have a look at what the function saves:
and use this as a starting point (e.g. with a different algorithm):
testModel <- adam(forecast::taylor, "MMdM", lags=c(1,48,336), silent=FALSE, h=336, holdout=TRUE,
B=testModel$B)
testModel
#> Time elapsed: 40.64 seconds
#> Model estimated using adam() function: ETS(MMdM)[48, 336]
#> Distribution assumed in the model: Inverse Gaussian
#> Loss function type: likelihood; Loss function value: 25416.87
#> Persistence vector g:
#> alpha beta gamma1 gamma2
#> 0.9919 0.9794 0.0081 0.0001
#> Damping parameter: 0.9329
#> Sample size: 3696
#> Number of estimated parameters: 390
#> Number of degrees of freedom: 3306
#> Information criteria:
#> AIC AICc BIC BICc
#> 51613.74 51706.01 54037.59 54416.62
#>
#> Forecast errors:
#> ME: -1435.565; MAE: 1534.428; RMSE: 1809.874
#> sCE: -1630.147%; sMAE: 5.186%; sMSE: 0.374%
#> MASE: 2.36; RMSSE: 1.918; rMAE: 0.229; rRMSE: 0.221
Finally, we can speed up the process by using a different initialisation of the state vector, such as backcasting:
testModel <- adam(forecast::taylor, "MMdM", lags=c(1,48,336), silent=FALSE, h=336, holdout=TRUE,
initial="b")
The result might be less accurate than in case of the optimisation, but it should be faster.
In addition, you can specify some parts of the initial state vector or some parts of the persistence vector, here is an example:
testModel <- adam(forecast::taylor, "MMdM", lags=c(1,48,336), silent=TRUE, h=336, holdout=TRUE,
initial=list(level=30000, trend=1), persistence=list(beta=0.1))
testModel
#> Time elapsed: 40.44 seconds
#> Model estimated using adam() function: ETS(MMdM)[48, 336]
#> Distribution assumed in the model: Inverse Gaussian
#> Loss function type: likelihood; Loss function value: 25840.13
#> Persistence vector g:
#> alpha beta gamma1 gamma2
#> 0.9941 0.1000 0.0059 0.0001
#> Damping parameter: 0.9454
#> Sample size: 3696
#> Number of estimated parameters: 387
#> Number of degrees of freedom: 3309
#> Number of provided parameters: 3
#> Information criteria:
#> AIC AICc BIC BICc
#> 52454.25 52545.04 54859.46 55232.35
#>
#> Forecast errors:
#> ME: 281.783; MAE: 805.936; RMSE: 1064.79
#> sCE: 319.976%; sMAE: 2.724%; sMSE: 0.129%
#> MASE: 1.24; RMSSE: 1.128; rMAE: 0.12; rRMSE: 0.13
The function also handles intermittent data (the data with zeroes) and the data with missing values. This is partially covered in the vignette on the oes() function. Here is a simple example:
testModel <- adam(rpois(120,0.5), "MNN", silent=FALSE, h=12, holdout=TRUE,
occurrence="odds-ratio")
testModel
#> Time elapsed: 0.04 seconds
#> Model estimated using adam() function: iETS(MNN)
#> Occurrence model type: Odds ratio
#> Distribution assumed in the model: Mixture of Bernoulli and Inverse Gaussian
#> Loss function type: likelihood; Loss function value: -73.6064
#> Persistence vector g:
#> alpha
#> 9e-04
#>
#> Sample size: 108
#> Number of estimated parameters: 5
#> Number of degrees of freedom: 103
#> Information criteria:
#> AIC AICc BIC BICc
#> 0.2743 0.5050 13.6849 4.8609
#>
#> Forecast errors:
#> Bias: 2.38%; sMSE: 33.357%; rRMSE: 0.806; sPIS: -514.857%; sCE: 133.242%
Finally, adam()
is faster than es()
function, because its code is more efficient and it uses a different optimisation algorithm with more finely tuned parameters by default. Let’s compare:
adamModel <- adam(M3[[2568]], "CCC")
esModel <- es(M3[[2568]], "CCC")
"adam:"
#> [1] "adam:"
adamModel
#> Time elapsed: 2.16 seconds
#> Model estimated: ETS(CCC)
#> Loss function type: likelihood
#>
#> Number of models combined: 30
#> Sample size: 116
#> Average number of estimated parameters: 27.616
#> Average number of degrees of freedom: 88.384
#>
#> Forecast errors:
#> ME: 680.185; MAE: 828.538; RMSE: 1058.764
#> sCE: 168.188%; sMAE: 11.382%; sMSE: 2.115%
#> MASE: 0.337; RMSSE: 0.334; rMAE: 0.366; rRMSE: 0.349
"es():"
#> [1] "es():"
esModel
#> Time elapsed: 3.99 seconds
#> Model estimated: ETS(CCC)
#> Initial values were optimised.
#>
#> Loss function type: likelihood
#> Error standard deviation: 422.9088
#> Sample size: 116
#> Information criteria:
#> (combined values)
#> AIC AICc BIC BICc
#> 98.7499 99.0990 101.3425 102.1491
#>
#> Forecast errors:
#> MPE: 4.1%; sCE: 120.9%; Bias: 60.3%; MAPE: 6.9%
#> MASE: 0.299; sMAE: 10.1%; sMSE: 1.6%; rMAE: 0.324; rRMSE: 0.301
As mentioned above, ADAM does not only contain ETS, it also contains ARIMA model, which is regulated via orders
parameter. If you want to have a pure ARIMA, you need to switch off ETS, which is done via model="NNN"
:
testModel <- adam(M3[[1234]], "NNN", silent=FALSE, orders=c(0,2,2))
testModel
#> Time elapsed: 0.06 seconds
#> Model estimated using adam() function: ARIMA(0,2,2)
#> Distribution assumed in the model: Normal
#> Loss function type: likelihood; Loss function value: 255.293
#> ARMA parameters of the model:
#> MA:
#> theta1[1] theta2[1]
#> -1.0911 0.3212
#>
#> Sample size: 45
#> Number of estimated parameters: 5
#> Number of degrees of freedom: 40
#> Information criteria:
#> AIC AICc BIC BICc
#> 520.5861 522.1245 529.6194 532.5476
#>
#> Forecast errors:
#> ME: -348.339; MAE: 348.339; RMSE: 396.565
#> sCE: -34.227%; sMAE: 4.278%; sMSE: 0.237%
#> MASE: 4.82; RMSSE: 4.429; rMAE: 3.958; rRMSE: 3.578
Given that both models are implemented in the same framework, they can be compared using information criteria.
The functionality of ADAM ARIMA is similar to the one of msarima
function in smooth
package, although there are several differences.
First, changing the distribution
parameter will allow switching between additive / multiplicative models. For example, distribution="dlnorm"
will create an ARIMA, equivalent to the one on logarithms of the data:
testModel <- adam(M3[[2568]], "NNN", silent=FALSE, lags=c(1,12),
orders=list(ar=c(1,1),i=c(1,1),ma=c(2,2)), distribution="dlnorm")
testModel
#> Time elapsed: 1.56 seconds
#> Model estimated using adam() function: SARIMA(1,1,2)[1](1,1,2)[12]
#> Distribution assumed in the model: Log Normal
#> Loss function type: likelihood; Loss function value: 862.2645
#> ARMA parameters of the model:
#> AR:
#> phi1[1] phi1[12]
#> -0.5402 0.0428
#> MA:
#> theta1[1] theta2[1] theta1[12] theta2[12]
#> -0.3328 -0.5113 -0.4904 -0.5096
#>
#> Sample size: 116
#> Number of estimated parameters: 33
#> Number of degrees of freedom: 83
#> Information criteria:
#> AIC AICc BIC BICc
#> 1790.529 1817.895 1881.398 1946.441
#>
#> Forecast errors:
#> ME: 65.793; MAE: 537.901; RMSE: 640.823
#> sCE: 16.268%; sMAE: 7.389%; sMSE: 0.775%
#> MASE: 0.219; RMSSE: 0.202; rMAE: 0.238; rRMSE: 0.211
Second, it does not have intercept. If you want to have one, you can do this reintroducing ETS component and imposing some restrictions:
testModel <- adam(M3[[2568]], "ANN", silent=FALSE, lags=c(1,12), persistence=0,
orders=list(ar=c(1,1),i=c(1,1),ma=c(2,2)), distribution="dnorm")
testModel
#> Time elapsed: 0.48 seconds
#> Model estimated using adam() function: ETS(ANN)+SARIMA(1,1,2)[1](1,1,2)[12]
#> Distribution assumed in the model: Normal
#> Loss function type: likelihood; Loss function value: 894.3529
#> Persistence vector g:
#> alpha
#> 0
#>
#> ARMA parameters of the model:
#> AR:
#> phi1[1] phi1[12]
#> -0.2172 -0.2942
#> MA:
#> theta1[1] theta2[1] theta1[12] theta2[12]
#> -0.8249 -0.0300 0.1000 0.0178
#>
#> Sample size: 116
#> Number of estimated parameters: 34
#> Number of degrees of freedom: 82
#> Number of provided parameters: 1
#> Information criteria:
#> AIC AICc BIC BICc
#> 1856.706 1886.089 1950.328 2020.165
#>
#> Forecast errors:
#> ME: 464.761; MAE: 697.35; RMSE: 821.751
#> sCE: 114.92%; sMAE: 9.58%; sMSE: 1.274%
#> MASE: 0.284; RMSSE: 0.259; rMAE: 0.308; rRMSE: 0.271
This way we get the global level, which acts as an intercept. The drift is not supported in the model either.
Third, you can specify parameters of ARIMA via the arma
parameter in the following manner:
testModel <- adam(M3[[2568]], "NNN", silent=FALSE, lags=c(1,12),
orders=list(ar=c(1,1),i=c(1,1),ma=c(2,2)), distribution="dnorm",
arma=list(ar=c(0.1,0.1), ma=c(-0.96, 0.03, -0.12, 0.03)))
testModel
#> Time elapsed: 0.48 seconds
#> Model estimated using adam() function: SARIMA(1,1,2)[1](1,1,2)[12]
#> Distribution assumed in the model: Normal
#> Loss function type: likelihood; Loss function value: 898.4653
#> ARMA parameters of the model:
#> AR:
#> phi1[1] phi1[12]
#> 0.1 0.1
#> MA:
#> theta1[1] theta2[1] theta1[12] theta2[12]
#> -0.96 0.03 -0.12 0.03
#>
#> Sample size: 116
#> Number of estimated parameters: 27
#> Number of degrees of freedom: 89
#> Number of provided parameters: 6
#> Information criteria:
#> AIC AICc BIC BICc
#> 1850.931 1868.112 1925.277 1966.115
#>
#> Forecast errors:
#> ME: 435.5; MAE: 661.144; RMSE: 779.272
#> sCE: 107.685%; sMAE: 9.082%; sMSE: 1.146%
#> MASE: 0.269; RMSSE: 0.246; rMAE: 0.292; rRMSE: 0.257
Finally, the initials for the states can also be provided, although getting the correct ones might be a challenging task (you also need to know how many of them to provide; checking testModel$initial
might help):
testModel <- adam(M3[[2568]], "NNN", silent=FALSE, lags=c(1,12),
orders=list(ar=c(1,1),i=c(1,1),ma=c(2,0)), distribution="dnorm",
initial=list(arima=M3[[2568]]$x[1:24]))
testModel
#> Time elapsed: 1.17 seconds
#> Model estimated using adam() function: SARIMA(1,1,2)[1](1,1,0)[12]
#> Distribution assumed in the model: Normal
#> Loss function type: likelihood; Loss function value: 897.9052
#> ARMA parameters of the model:
#> AR:
#> phi1[1] phi1[12]
#> -0.5363 0.0422
#> MA:
#> theta1[1] theta2[1]
#> -0.5540 -0.4069
#>
#> Sample size: 116
#> Number of estimated parameters: 31
#> Number of degrees of freedom: 85
#> Information criteria:
#> AIC AICc BIC BICc
#> 1857.810 1881.429 1943.172 1999.309
#>
#> Forecast errors:
#> ME: 307.675; MAE: 590.659; RMSE: 695.015
#> sCE: 76.078%; sMAE: 8.114%; sMSE: 0.912%
#> MASE: 0.24; RMSSE: 0.219; rMAE: 0.261; rRMSE: 0.229
If you work with ADAM ARIMA model, then there is no such thing as “usual” bounds for the parameters, so the function will use the bounds="admissible"
, checking the AR / MA polynomials in order to make sure that the model is stationary and invertible (aka stable).
Similarly to ETS, you can use different distributions and losses for the estimation. Note that the order selection for ARIMA is done in auto.adam()
function, not in the adam()
!
Finally, ARIMA is typically slower than ETS, mainly because the maxeval
is set by default to be at least 1000. But this is inevitable due to an increased complexity of the model - otherwise it won’t be estimated properly. If you want to speed things up, use initial="backcasting"
and reduce the number of iterations.
Another important feature of ADAM is introduction of explanatory variables. Unlike in es()
, adam()
expects a matrix for data
and can work with a formula. If the latter is not provided, then it will use all explanatory variables. Here is a brief example:
If you work with data.frame or similar structures, then you can use them directly, ADAM will extract the response variable either assuming that it is in the first column or from the provided formula (if you specify one via formula
parameter). Here is an example, where we create a matrix with lags and leads of an explanatory variable:
BJData <- cbind(as.data.frame(BJsales),as.data.frame(xregExpander(BJsales.lead,c(-7:7))))
colnames(BJData)[1] <- "y"
testModel <- adam(BJData, "ANN", h=18, silent=FALSE, holdout=TRUE, formula=y~xLag1+xLag2+xLag3)
testModel
#> Time elapsed: 0.07 seconds
#> Model estimated using adam() function: ETSX(ANN)
#> Distribution assumed in the model: Normal
#> Loss function type: likelihood; Loss function value: 221.9657
#> Persistence vector g (excluding xreg):
#> alpha
#> 1
#>
#> Sample size: 132
#> Number of estimated parameters: 6
#> Number of degrees of freedom: 126
#> Information criteria:
#> AIC AICc BIC BICc
#> 455.9315 456.6035 473.2283 474.8689
#>
#> Forecast errors:
#> ME: 0.43; MAE: 1.289; RMSE: 1.704
#> sCE: 3.426%; sMAE: 0.571%; sMSE: 0.006%
#> MASE: 1.057; RMSSE: 1.091; rMAE: 0.576; rRMSE: 0.679
Similarly to es()
, there is a support for variables selection, but via the regressors
parameter instead of xregDo
, which will then use stepwise()
function from greybox
package on the residuals of the model:
The same functionality is supported with ARIMA, so you can have, for example, ARIMAX(0,1,1), which is equivalent to ETSX(A,N,N):
The two models might differ because they have different initialisation in the optimiser and different bounds for parameters (ARIMA relies on invertibility condition, while ETS does the traditional (0,1) bounds by default). It is possible to make them identical if the number of iterations is increased and the initial parameters are the same. Here is an example of what happens, when the two models have exactly the same parameters:
BJData <- BJData[,c("y",names(testModel$initial$xreg))];
testModel <- adam(BJData, "NNN", h=18, silent=TRUE, holdout=TRUE, orders=c(0,1,1),
initial=testModel$initial, arma=testModel$arma)
testModel
#> Time elapsed: 0 seconds
#> Model estimated using adam() function: ARIMAX(0,1,1)
#> Distribution assumed in the model: Normal
#> Loss function type: likelihood; Loss function value: 64.9884
#> ARMA parameters of the model:
#> MA:
#> theta1[1]
#> 0.2188
#>
#> Sample size: 132
#> Number of estimated parameters: 1
#> Number of degrees of freedom: 131
#> Number of provided parameters: 7
#> Information criteria:
#> AIC AICc BIC BICc
#> 131.9767 132.0075 134.8595 134.9346
#>
#> Forecast errors:
#> ME: 0.57; MAE: 0.584; RMSE: 0.778
#> sCE: 4.543%; sMAE: 0.258%; sMSE: 0.001%
#> MASE: 0.479; RMSSE: 0.498; rMAE: 0.261; rRMSE: 0.31
names(testModel$initial)[1] <- names(testModel$initial)[[1]] <- "level"
testModel2 <- adam(BJData, "ANN", h=18, silent=TRUE, holdout=TRUE,
initial=testModel$initial, persistence=testModel$arma$ma+1)
testModel2
#> Time elapsed: 0 seconds
#> Model estimated using adam() function: ETSX(ANN)
#> Distribution assumed in the model: Normal
#> Loss function type: likelihood; Loss function value: 1e+300
#> Persistence vector g (excluding xreg):
#> alpha
#> 1.2188
#>
#> Sample size: 132
#> Number of estimated parameters: 1
#> Number of degrees of freedom: 131
#> Number of provided parameters: 7
#> Information criteria:
#> AIC AICc BIC BICc
#> 131.9767 132.0075 134.8595 134.9346
#>
#> Forecast errors:
#> ME: 0.57; MAE: 0.584; RMSE: 0.778
#> sCE: 4.543%; sMAE: 0.258%; sMSE: 0.001%
#> MASE: 0.479; RMSSE: 0.498; rMAE: 0.261; rRMSE: 0.31
Another feature of ADAM is the time varying parameters in the SSOE framework, which can be switched on via regressors="adapt"
:
testModel <- adam(BJData, "ANN", h=18, silent=FALSE, holdout=TRUE, regressors="adapt")
testModel$persistence
#> alpha delta1 delta2 delta3 delta4 delta5
#> 0.725000000 0.280911209 0.000402308 0.008901798 0.164429727 0.004079590
Note that the default number of iterations might not be sufficient in order to get close to the optimum of the function, so setting maxeval
to something bigger might help. If you want to explore, why the optimisation stopped, you can provide print_level=41
parameter to the function, and it will print out the report from the optimiser. In the end, the default parameters are tuned in order to give a reasonable solution, but given the complexity of the model, they might not guarantee to give the best one all the time.
Finally, you can produce a mixture of ETS, ARIMA and regression, by using the respective parameters, like this:
testModel <- adam(BJData, "AAN", h=18, silent=FALSE, holdout=TRUE, orders=c(1,0,1))
summary(testModel)
#>
#> Model estimated using adam() function: ETSX(AAN)+ARIMA(1,0,1)
#> Response variable: y
#> Distribution used in the estimation: Normal
#> Loss function type: likelihood; Loss function value: 78.2047
#> Coefficients:
#> Estimate Std. Error Lower 2.5% Upper 97.5%
#> alpha 0.9779 0.1020 0.7759 1.0000
#> beta 0.0024 0.0259 0.0000 0.0537
#> phi1[1] 0.6266 0.1271 0.3750 0.8779
#> theta1[1] -0.3837 0.2357 -0.6566 0.0823
#> level 36.3414 7.0439 22.3937 50.2634
#> trend 0.0486 0.0227 0.0037 0.0935
#> ARIMAState1 3.2626 1.5514 0.1906 6.3290
#> xLag3 5.1150 0.1471 4.8237 5.4058
#> xLag7 1.1700 0.1632 0.8469 1.4926
#> xLag4 4.2067 0.1995 3.8117 4.6010
#> xLag6 2.4823 0.2364 2.0143 2.9495
#> xLag5 3.0783 0.2167 2.6492 3.5066
#>
#> Sample size: 132
#> Number of estimated parameters: 13
#> Number of degrees of freedom: 119
#> Information criteria:
#> AIC AICc BIC BICc
#> 182.4094 185.4941 219.8858 227.4169
This might be handy, when you explore a high frequency data, want to add calendar events, apply ETS and add AR/MA errors to it.
While the original adam()
function allows selecting ETS components and explanatory variables, it does not allow selecting the most suitable distribution and / or ARIMA components. This is what auto.adam()
function is for.
In order to do the selection of the most appropriate distribution, you need to provide a vector of those that you want to check:
testModel <- auto.adam(M3[[1234]], "XXX", silent=FALSE,
distribution=c("dnorm","dlaplace","ds"))
#> Evaluating models with different distributions... dnorm , dlaplace , ds , Done!
testModel
#> Time elapsed: 0.21 seconds
#> Model estimated using adam() function: ETS(AAN)
#> Distribution assumed in the model: Normal
#> Loss function type: likelihood; Loss function value: 255.2972
#> Persistence vector g:
#> alpha beta
#> 0.6828 0.2275
#>
#> Sample size: 45
#> Number of estimated parameters: 5
#> Number of degrees of freedom: 40
#> Information criteria:
#> AIC AICc BIC BICc
#> 520.5943 522.1328 529.6276 532.5558
#>
#> Forecast errors:
#> ME: -348.202; MAE: 348.202; RMSE: 396.376
#> sCE: -34.214%; sMAE: 4.277%; sMSE: 0.237%
#> MASE: 4.818; RMSSE: 4.427; rMAE: 3.957; rRMSE: 3.576
This process can also be done in parallel on either the automatically selected number of cores (e.g. parallel=TRUE
) or on the specified by user (e.g. parallel=4
):
If you want to add ARIMA or regression components, you can do it in the exactly the same way as for the adam()
function. Here is an example of ETS+ARIMA:
testModel <- auto.adam(M3[[1234]], "AAN", orders=list(ar=2,i=2,ma=2), silent=TRUE,
distribution=c("dnorm","dlaplace","ds","dgnorm"))
testModel
#> Time elapsed: 0.38 seconds
#> Model estimated using adam() function: ETS(AAN)+ARIMA(2,2,2)
#> Distribution assumed in the model: Normal
#> Loss function type: likelihood; Loss function value: 254.918
#> Persistence vector g:
#> alpha beta
#> 0.0426 0.0426
#>
#> ARMA parameters of the model:
#> AR:
#> phi1[1] phi2[1]
#> -0.3875 -0.1535
#> MA:
#> theta1[1] theta2[1]
#> -0.7218 -0.0702
#>
#> Sample size: 45
#> Number of estimated parameters: 13
#> Number of degrees of freedom: 32
#> Information criteria:
#> AIC AICc BIC BICc
#> 535.8360 547.5779 559.3226 581.6714
#>
#> Forecast errors:
#> ME: -335.535; MAE: 335.535; RMSE: 382.516
#> sCE: -32.969%; sMAE: 4.121%; sMSE: 0.221%
#> MASE: 4.643; RMSSE: 4.272; rMAE: 3.813; rRMSE: 3.451
However, this way the function will just use ARIMA(2,2,2) and fit it together with ETS. If you want it to select the most appropriate ARIMA orders from the provided (e.g. up to AR(2), I(1) and MA(2)), you need to add parameter select=TRUE
to the list in orders
:
testModel <- auto.adam(M3[[1234]], "XXN", orders=list(ar=2,i=2,ma=2,select=TRUE),
distribution="default", silent=FALSE)
#> Evaluating models with different distributions... default , Selecting ARIMA orders... 5 %10 %38 %71 % 100 %. The best ARIMA is selected.
#> Done!
testModel
#> Time elapsed: 0.15 seconds
#> Model estimated using adam() function: ETS(AAN)
#> Distribution assumed in the model: Normal
#> Loss function type: likelihood; Loss function value: 255.2972
#> Persistence vector g:
#> alpha beta
#> 0.6828 0.2275
#>
#> Sample size: 45
#> Number of estimated parameters: 5
#> Number of degrees of freedom: 40
#> Information criteria:
#> AIC AICc BIC BICc
#> 520.5943 522.1328 529.6276 532.5558
#>
#> Forecast errors:
#> ME: -348.202; MAE: 348.202; RMSE: 396.376
#> sCE: -34.214%; sMAE: 4.277%; sMSE: 0.237%
#> MASE: 4.818; RMSSE: 4.427; rMAE: 3.957; rRMSE: 3.576
Knowing how to work with adam()
, you can use similar principles, when dealing with auto.adam()
. Just keep in mind that the provided persistence
, phi
, initial
, arma
and B
won’t work, because this contradicts the idea of the model selection.
Finally, there is also the mechanism of automatic outliers detection, which extracts residuals from the best model, flags observations that lie outside the prediction interval of thw width level
in sample and then refits auto.adam()
with the dummy variables for the outliers. Here how it works:
testModel <- auto.adam(Mcomp::M3[[2568]], "PPP", silent=FALSE, outliers="use",
distribution="default")
#> Evaluating models with different distributions... default ,
#> Dealing with outliers...
testModel
#> Time elapsed: 1 seconds
#> Model estimated using adam() function: ETSX(MMdM)
#> Distribution assumed in the model: Inverse Gaussian
#> Loss function type: likelihood; Loss function value: 853.0803
#> Persistence vector g (excluding xreg):
#> alpha beta gamma
#> 0.0228 0.0217 0.0000
#> Damping parameter: 0.954
#> Sample size: 116
#> Number of estimated parameters: 22
#> Number of degrees of freedom: 94
#> Information criteria:
#> AIC AICc BIC BICc
#> 1750.161 1761.042 1810.739 1836.603
#>
#> Forecast errors:
#> ME: 748.317; MAE: 856.59; RMSE: 1102.071
#> sCE: 185.035%; sMAE: 11.767%; sMSE: 2.292%
#> MASE: 0.349; RMSSE: 0.348; rMAE: 0.378; rRMSE: 0.363
If you specify outliers="select"
, the function will create leads and lags 1 of the outliers and then select the most appropriate ones via the regressors
parameter of adam.
If you want to know more about ADAM, you are welcome to visit the online textbook (this is a work in progress at the moment).
Hyndman, Rob J, Anne B Koehler, J Keith Ord, and Ralph D Snyder. 2008. Forecasting with Exponential Smoothing. Springer Berlin Heidelberg.