AI Basics: Exponential Smoothening

Preface:

It widely accepted that the retail industry has noticed that people’s buying patterns have changed drastically due to the adverse effects of the pandemic. Customers look towards purchasing lower priced items, fewer non-essential items and more important items in bulk, especially when there is pending news of lockdowns in cities. There are also visible shifts in purchasing trends of many items including fresh food items, items with longer shelf-lives, and self-care items. To make short-term sales forecasts for this “new normal”, the retail and FMCG industry can learn from the new customer buying patterns post-COVID using some exponential smoothening techniques.

Questions to consider:

Some basic questions to answer before exploring these techniques: what is the target variable to predict? What is the correct forecast term? What is the correct granularity? How much of history is available to learn from? What is the data volatility like? Are there any in-scope/out-of-scope concerns?

The above are some questions to consider during the problem setup.

Let us look at three exponential smoothening techniques to make short-term sales forecasts based on post-COVID sales data:

T1: Simple Exponential Smoothening

The graph above shows weekly trends of tea packet sales at a specific store or outlet. Simple exponential smoothening can be used to predict sales in the next week according to the following formula:

; where y_t is the current sales and 0 ≤ α ≤ 1 is a smoothening parameter that controls the decay over time.

If α ≈ 0, then more weight is given to older sales figures when forecasting sales for next week. If α ≈ 1, then more weight is given to newer sales figures when forecasting sales for next week.

To better facilitate longer-term predictions, say

weeks ahead, we can follow the following compound form, that comprises of a forecast equation and a smoothening equation:

; where yt+h is the h week ahead forecast, lt which is the weighted average of sales seen so far, with the weight of older records having an exponentially decreasing impact on the predictions.

The forecasting for a few weeks ahead is shown here in orange:

The best fit equation involves finding a set of $latex (\alpha, y_0, l_0)$ that minimizes sum of square of errors. This technique is also commonly referred to as single exponential smoothening or flat forecasting.

T2: Double Exponential Smoothening

Here is another graph of weekly sales of a different product, canned corn, at a store. Here, there is a visible trend pattern, whose forecast can be better represented using the following compound formula:

, where yt+h is the h week ahead sales forecast, lt is the level of the series at time

is the approximation of trend at time t and 0 ≤ α, β ≤ 1 are the smoothening parameters for both the level and the trend components.

Here, the assumption is that the forecast will be increasing in a linear manner. Typically, these types of forecasts tend to over-predict. Dampening is a technique that can be used to mitigate this effect as follows:

; where 0 ≤ Φ ≤ 1 is the dampening parameter that reduces the effect of the linear increase.

The forecasting for a few weeks ahead is shown here in orange (no dampening) and grey (with dampening

Again here, the best fit equation involves finding a set of values for

that minimizes sum of square of errors. This technique is also commonly referred to as Holt’s Linear Method.

T3: Triple Exponential Smoothening

Here is another graph of weekly sales trends for a particular item at a specific store. Here, we can see that there is not only a visible trend over time, but the product also shows seasonal behavior as well. Forecasting sales for items that behave in this manner can be better represented by extending the previous technique to capture seasonality effects (1 forecast equation and 3 smoothening equations — level, trend, and seasonality).

There are two ways in which the seasonality component can be captured — additive and multiplicative. The former works better when the seasonal variations are approximately constant, while the latter works when the seasonal variations change proportional to the level of the series (in other words, seasonal variations grows/declines according to the trend).

Let us look at the compound form for additive equations first:

, where yt+h is the h week ahead sales forecast, m the frequency of the seasonality patterns, for example if the seasonality patterns repeat every 4 weeks, then m=4, k is the integer of k = (h — 1)/4 which ensures that only the latest seasonality component is taken into consideration when deriving the forecast. lt is the level of the series at time t, and it is the weighted average of the seasonal and non-seasonal component at time t. b­_t is the approximation of trend at time t and is the same as double exponential smoothening. s_t is the approximation of seasonality component at time t and is the weighted average of current seasonal component and the previous seasonal component m periods ago. The smoothening parameters for each of these 3 equations are defined as: 0 ≤ α, β, γ ≤ 1

The multiplicative form, with the same definition of variables look like this:

Final Thoughts

Short-term forecasting techniques are handy methods to use when making forecasts using a limited amount of data. When using these techniques, it is important to ensure that the problem setup is well thought out, in terms of granularity, forecasting period, handling of data volatility, etc. It is also important to use the right technique as well, as sometimes even though trends and seasonality are both theoretically evident, with the shorter duration for forecasting, you may not see seasonality patterns. This means that it would be best to select the double exponential smoothening over the triple. The results of these techniques can also be used as a good baseline model when comparing with other models. Of course, if there is sufficient data, there are plenty of other techniques, ARIMA (a superset of the exponential smoothening techniques), regression or likewise, that would provide better forecasting outcomes.

OCTAVE, the John Keells Group Centre of Excellence for Data and Advanced Analytics, is the cornerstone of the Group’s data-driven decision making.