<- aus_livestock %>%
aus_pigs filter(str_detect(Animal, "Pigs") & str_detect(State, "Victoria")) %>%
select(Month, Count)
autoplot(aus_pigs, .vars = Count) +
scale_y_continuous(labels = scales::label_comma(1, scale = 1/1e3, suffix = "K"),
breaks = seq(0, 200e3, 20000)) +
labs(y = "Count of slaughtered pigs", x = "",
title = "Number of pigs slaughtered in Victoria")
1 Introduction
As I’ve said in one of my previous posts, Rob J. Hyndman has excellent book on forecasting that is full of exercises by the end of each chapter. One of those chapters - to be more precise, Chapter 8 - Exponential Smoothing - has tasks that require the implementation of ETS. This really helped me to understand ETS. But, before I go into that part, let’s see what theory says about ETS.
2 Theory
Hyndman says the following on ETS:
Exponential smoothing was proposed in the late 1950s (Brown, 1959; Holt, 1957; Winters, 1960), and has motivated some of the most successful forecasting methods. Forecasts produced using exponential smoothing methods are weighted averages of past observations, with the weights decaying exponentially as the observations get older. In other words, the more recent the observation the higher the associated weight. This framework generates reliable forecasts quickly and for a wide range of time series, which is a great advantage and of major importance to applications in industry.
He also goes on to say that ETS (to be more precise, SES or simple exponential smoothing) is most suitable for forecasting data with no clear trend or seasonal pattern. The math behind the ETS is presented in 2 ways: through weighted average form and through component form. From the programming standpoint, I like component form better, but let’s present both approaches.
2.1 Weighted form
\[ \hat{y}_{T+1|T} = \alpha{}y_{T} + (1 - \alpha{})\hat{y}_{T|T-1} \tag{1}\]
Meaning of these mathematical symbols is as follows:
- \(\hat{y}_{T+1|T}\): first (forecasted) future value (\(T+1\)), given last historical value (\(T\)).
- \(\hat{y}_{T}\): last historical value.
- \(\hat{y}_{T|T-1}\): last forecasted value at index \(T\). Spoiler alert: this is where magic happens.
- \(\alpha{}\): smoothing parameter.
\(\hat{y}_{T|T-1}\) is where the magic happens. Fitted values in ETS are one-step forecasts of the training data, and by deduction we can conclude that:
\[ \hat{y}_{T+1|T} = \sum_{j=0}^{T-1} \alpha{}(1 - \alpha{})^{j}y_{T-j} + (1 - \alpha{})^{T}\mathcal{l}_{0} \tag{2}\]
Hold your horses, you’re probably having questions about \(\mathcal{l}_{0}\). We will get to that term in a minute, but when you see this term, read it as starting value.
2.2 Component form
The component form will get us to the same results, but it’s actually easier to see what is really going on. It consists of one simple equation, which we (surprise, surprise) call the forecast equation:
\[ \hat{y}_{t+h|t} = \mathcal{l}_{t} \tag{3}\] Same as before: future value, given the historical value (\(\hat{y}_{t+h|t}\)) is equal to \(\mathcal{l}_{t}\). But what is that? That we call the smoothing equation, and it’s given by the following formula:
\[ \mathcal{l}_{t} = \alpha{}y_{t} + (1 - \alpha{})\mathcal{l}_{t-1} \tag{4}\]
This is identical to the first formula in the weighted form, only the symbols are different. That’s it.
Okay, enough with the theory. Let’s implement this.
3 Implementation
Exercise 1 is as follows:
Consider the the number of pigs slaughtered in Victoria, available in the
aus_livestock
dataset. Use theETS()
function to estimate the equivalent model for simple exponential smoothing. Find the optimal values of \(\alpha{}\) and \(\mathcal{l}_{0}\), and generate forecasts for the next four months.
tidyverse
and fpp3
is loaded, so we can go on with the solution.
Let’s forecast with ETS
using tools from fable
:
<- aus_pigs %>%
fit model(ETS(Count ~ error("A") + trend("N") + season("N")))
report(fit)
Series: Count
Model: ETS(A,N,N)
Smoothing parameters:
alpha = 0.3221247
Initial states:
l[0]
100646.6
sigma^2: 87480760
AIC AICc BIC
13737.10 13737.14 13750.07
Remember those \(\mathcal{l}_{0}\) and \(\alpha{}\) values from theory? Well, here they are again. The point of ETS is to optimize those parameters so that Sum of Squared Errors is minimized. \(\mathcal{l}_{0}\) is the first value for the ETS model since that value cannot be given by the model. Why? For the simple reason that every forecast for ETS model requires last historical data. First historical value (or first point in training time series data) does not have any, so we need \(\mathcal{l}_{0}\).
Let’s see fit on the training data:
augment(fit) %>%
select(Month, Count, .fitted) %>%
rename(Training = Count, Fitted = .fitted) %>%
pivot_longer(-Month, names_to = "Category", values_to = "Count") %>%
mutate(Year = year(Month),
MonthInt = month(Month)) %>%
filter(Year >= 2003) %>%
ggplot(aes(x = MonthInt, y = Count, color = Category)) +
geom_line() +
scale_y_continuous(labels = scales::label_comma(1, scale = 1/1e3, suffix = "K"),
breaks = seq(0, 200e3, 30000)) +
scale_x_continuous(breaks = seq(0, 12, 2)) +
labs(y = "Count of slaughtered pigs", x = "",
title = "Number of pigs slaughtered in Victoria") +
scale_color_brewer(palette = "Set1") +
facet_wrap(. ~ Year, scales = "free_x")
Nomen est omen, and in ETS we can see the smoothed curve for each1 historical year.
What is the actual forecast?
<- fit %>%
fc forecast(h = 4)
fc
# A fable: 4 x 4 [1M]
# Key: .model [1]
.model Month Count .mean
<chr> <mth> <dist> <dbl>
1 "ETS(Count ~ error(\"A\") + trend(\"N\") + ~ 2019 Jan N(95187, 8.7e+07) 95187.
2 "ETS(Count ~ error(\"A\") + trend(\"N\") + ~ 2019 Feb N(95187, 9.7e+07) 95187.
3 "ETS(Count ~ error(\"A\") + trend(\"N\") + ~ 2019 Mar N(95187, 1.1e+08) 95187.
4 "ETS(Count ~ error(\"A\") + trend(\"N\") + ~ 2019 Apr N(95187, 1.1e+08) 95187.
So, around 95k. Let’s plot this:
<- fc %>%
plot_all autoplot(aus_pigs) +
scale_y_continuous(labels = scales::label_comma(1, scale = 1/1e3, suffix = "K"),
breaks = seq(0, 300e3, 20000)) +
labs(y = "Count of slaughtered pigs",
title = "Number of pigs slaughtered in Victoria",
x = "")
<- fc %>%
plot_zoom autoplot(aus_pigs %>% filter(lubridate::year(Month) >= 2010)) +
scale_y_continuous(labels = scales::label_comma(1, scale = 1/1e3, suffix = "K"),
breaks = seq(0, 300e3, 20000)) +
labs(y = "Count of slaughtered pigs",
title = "Zoomed plot (2010 onwards)",
x = "")
::plot_grid(plot_all, plot_zoom,
cowplotnrow = 1)
Seems reasonable. The 95% confidence interval accounts for historical variation that is clearly visible in the past 5 years.
Let’s try to implement ETS on our own. Our target is:
coef(fit) %>%
select(term, estimate)
# A tibble: 2 x 2
term estimate
<chr> <dbl>
1 alpha 0.322
2 l[0] 100647.
4 Coding the ETS method
Solution is made out of few components, so bear with me.
4.1 exp_smoothing_manual()
<- function(y,
exp_smoothing_manual
arg_alpha,
arg_ell_zero,bool_forecast_h1 = FALSE) {
if (bool_forecast_h1) {
<- length(y) + 1
total_len else {
} <- length(y)
total_len
}
<- (1 - arg_alpha)
anti_alpha
<- rep(NA, total_len)
store_predictions 1] <- arg_ell_zero
store_predictions[
for (i in seq_along(store_predictions)) {
if (i == 1) {
<- store_predictions[i]
last_y <- store_predictions[i]
last_yhat else {
} <- y[i - 1]
last_y <- store_predictions[i - 1]
last_yhat
}
<- arg_alpha * last_y
term_01
<- anti_alpha * last_yhat
term_02
<- term_01 + term_02
yhat
<- yhat
store_predictions[i]
}
<- yhat[length(yhat)]
out
return(out)
}
This is nothing else than function2 that implements the formula as given in theory \(\rightarrow{}\) Equation 4.
So, does this function give the same result as ETS model from fable
?
<- coef(fit) %>%
tbl_coef select(term, estimate)
<- exp_smoothing_manual(
forecast_josip y = aus_pigs$Count,
arg_alpha = tbl_coef$estimate[tbl_coef$term == "alpha"],
arg_ell_zero = tbl_coef$estimate[tbl_coef$term == "l[0]"],
bool_forecast_h1 = TRUE
)
<- fc$.mean[1]
forecast_fable
# Moment of truth:
== forecast_fable forecast_josip
[1] TRUE
Splendid!
But, now we want to build our own ETS model.
We will need the implementation of exp_smoothing_manual()
that returns SSE.
4.2 optimize_exp_smoothing()
<- function(pars = c(arg_alpha, arg_ell_zero), y) {
optimize_exp_smoothing
<- rep(NA, length(y))
out_predicted
for (i in seq_along(out_predicted)) {
<- exp_smoothing_manual(
out_predicted[i] y = y[1:i],
arg_alpha = pars[1],
arg_ell_zero = pars[2]
)
}
<- (out_predicted - y) ^ 2
squared_errors <- sum(squared_errors)
out return(out)
}
So, we will need to find \(\mathcal{l}_{0}\) and \(\alpha{}\) that will give us minimum of sum of squared errors, where error is defined as \((y - \hat{y})^2\).
Let’s optimize it.
4.3 optim()
R has optimization function called optim()
(info)
The optim()
requires starting values for it’s algorithms. For \(\alpha{}\), I’ve decided to go with 0.5, while for \(\mathcal{l}_0\), first historical value should be fine.
<- optim(
optim_pigs par = c(0.5, aus_pigs$Count[1]),
y = aus_pigs$Count,
fn = optimize_exp_smoothing,
method = "Nelder-Mead"
)
optim_pigs
$par
[1] 3.221274e-01 9.922308e+04
$value
[1] 48642449532
$counts
function gradient
99 NA
$convergence
[1] 0
$message
NULL
Let’s compare these results with parameters from ETS (fable
):
coef(fit) %>%
select(term, estimate) %>%
mutate(estimate_me = optim_pigs$par,
diff_abs = abs(estimate_me - estimate),
diff_rel = diff_abs / estimate) %>%
mutate(across(where(is.numeric), function(x) round(x, 4)))
# A tibble: 2 x 5
term estimate estimate_me diff_abs diff_rel
<chr> <dbl> <dbl> <dbl> <dbl>
1 alpha 0.322 0.322 0 0
2 l[0] 100647. 99223. 1424. 0.0141
So, I’ve got \(\alpha{}\) correct - 0.322 - but I’m missing \(\mathcal{l}_{0}\) term by 1.41%. Why? Honestly, I don’t know, I would have to check the source code of fable
ETS for differences or choose different optimization algorithm from optim()
. But, this is good enough for me, at least for the purpose of solving this exercise.