\(\definecolor{first}{RGB}{0, 255, 221}\) \(\definecolor{second}{RGB}{181, 181, 98}\) \(\definecolor{third}{RGB}{18,110,213}\) \(\definecolor{fourth}{RGB}{114,0,172}\) \(\definecolor{5}{RGB}{45,177,93}\) \(\definecolor{6}{RGB}{251,0,29}\) \(\definecolor{7}{RGB}{255, 127, 0}\) \(\definecolor{8}{RGB}{255,0,255}\) \(\definecolor{9}{RGB}{0,255,0}\) \(\definecolor{10}{RGB}{255,0,0}\) \(\definecolor{normal}{RGB}{0,0,0}\)

An exponential moving average (EMA), also called an exponentially weighted moving average (EWMA), is a type of moving average where the weighting applied to generate the average decays exponentially the farther back in time you go. This is contrasted with a simple moving average (SMA) which always has a finite length but weights all the points in the series within that length equally. Due to the exponential weight decay of an EMA it does not necessitate a finite length as with an SMA. Because of this it can be a bit more efficient than an SMA when calculating the EMA on large time series data since each new data point can recursively apply the EMA from the previous data point’s EMA; contrast this with an SMA where each data point either must step through several previous points in the series to calculate the sum at each new point, or, must store the sum at each previous point in the series which is calculated before dividing by its length. Therefore if the sums are cached to improve computational efficiency there is a hit to space efficiency, if not then the computational efficiency is impaired instead.

However the efficiency advantage of an EMA only really applies if you are sequentially calculating the EMA one time through from some selected starting point in the time series to an end point in the series. This isnt always a viable use case, particularly if the time series data is extremely large, or even infinite, and an EMA must be calculate at arbitrary points or across small sub-sets of the data at any one time. For this reason it is common to use an EMA that has a finite length similar to an SMA, applied to past points for a finite time into the past. Because EMA weights exponentially decay, so long as the length is sufficiently large, it should closely approximate the usual case where the EMA length is infinite, or at least, goes back to the beginning of the dateset. However, like an SMA, when using a finite length there is no need to iterate across the length of past data points to calculate the EMA at each point if you cache previously calculated values. By storing the sum at each previous data point we can make this process efficient by only referencing the sum associated with the last data point and calculating our new EMA from there. Because the sum for an EMA is the same as the EMA itself, that is, there is no need to divide by the length as is the case with an SMA, we are able to calculate an EMA with a finite length with greater space efficiency than we would an SMA, only storing the EMA calculated at each point and no need to store an additional sum variable.

In this post I will discuss how to calculate the EMA value with a finite length efficiently without the need for iterating over past data points and how we derive the math to accomplish this.

Calculating EMA with Finite Length

First lets take a look at the equation used to calculate the EMA for any point. This equation assumes a starting point of \(V_0\); when dealing with an EMA of fixed length then our \(V_0\) would be the point in the time series that is the number of steps behind the current data point in the sequence by length.

$$ S_t = \begin{cases} V_0, & t = 0 \\ \alpha V_t + (1 - \alpha) \cdot S_{t-1}, & t > 0 \end{cases} $$

    \(\alpha\) is the weighting coefficient. It is a value between 0 and 1,
    \(t\) the time where 0 is the start time,
    \(V_t\) the value, \(V\), at time \(t\),
    \(S_t\) the sum at time \(t\), this is the same as the EMA at time \(t\).

Example of Length 5

Lets run through all the math we need to accomplish our end goal for a fixed length of 5, then we can hopefully pick out the patterns and generalized this to any length value.

Say we have some series values in an array as follows.

$$ V = [V_0, V_1, V_2, V_3, V_4] $$

Now if we take a EMA of the above series then each point in the series will have a corresponding EMA value associated with it, namely \(S_t\). The first value being trivial and is as follow.

$$ S_0 = V_0 $$

At this point each following EMA is a recursive relationship to the previous, back to, at most, the length parameter of an EMA or the first value in the series, whatever comes first. For simplicity lets presume we have an EMA length of 5. Now lets calculate the EMA at \(t=1\).

$$ S_1 = \left(1 - \alpha\right) \cdot S_0 + \alpha V_1 $$
$$ S_1 = \left(1 - \alpha\right) \cdot \underbrace{V_0}_{S_0} + \alpha V_1 $$

We can now take this process and apply it to the other elements in the series.

$$ S_2 = \left(1 - \alpha\right) \cdot S_1 + \alpha V_2 $$
$$ S_2 = \left(1 - \alpha\right) \cdot \overbrace{\left(\left(1 - \alpha\right) \cdot \underbrace{V_0}_{S_0} + \alpha V_1\right)}^{S_1} + \alpha V_2 $$
$$ S_3 = \left(1 - \alpha\right) \cdot S_2 + \alpha V_3 $$
$$ S_3 = \left(1 - \alpha\right) \cdot \underbrace{\left(\left(1 - \alpha\right) \cdot \overbrace{\left(\left(1 - \alpha\right) \cdot \underbrace{V_0}_{S_0} + \alpha V_1\right)}^{S_1} + \alpha V_2\right)}_{S_2} + \alpha V_3 $$
$$ S_4 = \left(1 - \alpha\right) \cdot S_3 + \alpha V_4 $$
$$ S_4 = \left(1 - \alpha\right) \cdot \overbrace{\left(\left(1 - \alpha\right) \cdot \underbrace{\left(\left(1 - \alpha\right) \cdot \overbrace{\left(\left(1 - \alpha\right) \cdot \underbrace{V_0}_{S_0} + \alpha V_1\right)}^{S_1} + \alpha V_2\right)}_{S_2} + \alpha V_3\right)}^{S_3} + \alpha V_4 $$

The problem here is that if we start the process over and repeat it for every point it is horribly inefficient. With a length of 5 as seen here this entire process needs to be repeated for every point. If we have a array, \(V\) of sufficient length, and a length parameter that is of any sizable length this process becomes particularly inefficient. Remember that if there was a 6th point here after \(v_5\) we would have to start over and start at \(V_1\) rather than \(V_0\) since the length is 5 and thus would not go back to the 0 index for subsequent values in the series.

One way to make this more efficient is by figuring out a way of dropping \(V_0\) from the previously calculated sum before we append the next value to it. For a SMA this would be trivial subtraction but for an EMA it is a bit more complex. To see how we accomplish that we need to first expand the series above and simplify it, once we do this some patterns become evident.

$$ S_4 = \left(1 - \alpha\right) \cdot \left(\left(1 - \alpha\right) \cdot \left(\left(1 - \alpha\right) \cdot \color{first} \left(\left(1 - \alpha\right) \cdot V_0 + \alpha V_1\right) \color{normal} + \alpha V_2\right) + \alpha V_3\right) + \alpha V_4 $$
$$ S_4 = \left(1 - \alpha\right) \cdot \left(\left(1 - \alpha\right) \cdot \left(\left(1 - \alpha\right) \cdot \color{first} \left(V_0 - \alpha V_0 + \alpha V_1\right) \color{normal} + \alpha V_2\right) + \alpha V_3\right) + \alpha V_4 $$
$$ S_4 = \left(1 - \alpha\right) \cdot \left(\left(1 - \alpha\right) \cdot \color{second} \left(\left(1 - \alpha\right) \cdot \left(V_0 - \alpha V_0 + \alpha V_1\right) + \alpha V_2\right) \color{normal} + \alpha V_3\right) + \alpha V_4 $$
$$ S_4 = \left(1 - \alpha\right) \cdot \left(\left(1 - \alpha\right) \cdot \color{second} \left(V_0 - \alpha V_0 - \alpha V_0 + {\alpha}^2 V_0 + \alpha V_1 - {\alpha}^2 V_1 + \alpha V_2\right) \color{normal} + \alpha V_3\right) + \alpha V_4 $$
$$ S_4 = \left(1 - \alpha\right) \cdot \left(\left(1 - \alpha\right) \cdot \color{second} \left(\left(1 - \alpha - \alpha + {\alpha}^2\right) V_0 + \left(\alpha - {\alpha}^2\right) V_1 + \alpha V_2\right) \color{normal} + \alpha V_3\right) + \alpha V_4 $$
$$ S_4 = \left(1 - \alpha\right) \cdot \color{third} \left(\left(1 - \alpha\right) \cdot \left(\left(1 - 2\alpha + {\alpha}^2\right) V_0 + \left(\alpha - {\alpha}^2\right) V_1 + \alpha V_2\right) + \alpha V_3\right) \color{normal} + \alpha V_4 $$
$$ S_4 = \left(1 - \alpha\right) \cdot \color{third} \left(\left(1 - 3\alpha + 3 {\alpha}^2 - {\alpha}^3\right) V_0 + \left(\alpha - 2 {\alpha}^2 + {\alpha}^3\right) V_1 + \left(\alpha - {\alpha}^2\right) V_2 + \alpha V_3\right) \color{normal} + \alpha V_4 $$

finally after simplifying the last step we arrive at our equation for a EMA of length 5.

$$ \begin{split} S_4 = \\ &\color{5}\left(1 - 4\alpha + 6{\alpha}^2 - 4{\alpha}^3 + {\alpha}^4\right)\color{normal} V_0 + \\ &\color{6}\left(\alpha - 3{\alpha}^2 + 3{\alpha}^3 - {\alpha}^4\right)\color{normal} V_1 + \\ &\color{7}\left(\alpha - 2{\alpha}^2 + {\alpha}^3\right)\color{normal} V_2 + \\ &\color{8}\left(\alpha - {\alpha}^2\right)\color{normal} V_3 + \\ &\color{9}\alpha\color{normal} V_4 \end{split} $$

Since all of our coefficients for \(V_t\) are effectively constants lets simplify that and see what it looks like.

$$ S_4 = \color{5}{\left(1 - \alpha\right)}^4\color{normal} V_0 + \color{6}\alpha{\left(1 - \alpha\right)}^3\color{normal} V_1 + \color{7}\alpha{\left(1 - \alpha\right)}^2\color{normal} V_2 + \color{8}\alpha{\left(1 - \alpha\right)}^1\color{normal} V_3 + \color{9}\alpha {\left(1 - \alpha\right)}^0\color{normal} V_4 $$

At this point the pattern becomes clear. what we effectively did was we took a somewhat confusing recursive equation and unrolled it into a flat structure. In fact we are now treating it as a much simpler weighted average where each element of \(V\) is weight by the coefficients marked in color in the above equation. Another important property to notice is how this equation evolves for each successive iteration of our EMA equation. Presuming our EMA had a length of 6 or longer instead then on the next element, \(V_5\) we would simply increment the exponent of each of our existing coefficients by \(1\) and then tack on to the end with addition the \(\alpha{\left(1 - \alpha\right)}^0 V_5\) and we would be good. However thats the easy scenario to handle, what we really want to do is handle the next iteration when length remains \(5\); that means dropping the \(V_0\) argument before adding on the \(V_5\) argument, thus reuse the sum and not needing to regenerate it. For clarity lets look at what our equation needs to look like if we wish to drop \(V_0\) but before we actually apply \(V_5\).

$$ {S_4}' = \color{6}{\left(1 - \alpha\right)}^3\color{normal} V_1 + \color{7}\alpha{\left(1 - \alpha\right)}^2\color{normal} V_2 + \color{8}\alpha{\left(1 - \alpha\right)}^1\color{normal} V_3 + \color{9}\alpha {\left(1 - \alpha\right)}^0\color{normal} V_4 $$

Notice that we didnt just drop the first \(v_0\) term but we also had to drop the leading \(\alpha\) from the\(V_1\) term. this is effectively what \(S_4\) would have looked like if it had a length of \(4\) instead of \(5\). the reason the first term always lacks the leading \(\alpha\) is due to the starting condition specified in the EMA equation. If we can get the equation in this form then applying the next iteration of the EMA equation would bring the \(S_5\) length back to \(5\) again, so we know we are on the right track. What we have to figure out now is how can we go from \(S_4\) to \({S_4}’\).

Dropping the first term itself is trivial, thats just simple subtraction.

$$ S_4 - {\left(1 - \alpha\right)}^4\color{normal} V_0 $$
$$ \color{10}\alpha\color{normal}{\left(1 - \alpha\right)}^3 V_1 + \alpha{\left(1 - \alpha\right)}^2 V_2 + \alpha{\left(1 - \alpha\right)}^1 V_3 + \alpha {\left(1 - \alpha\right)}^0 V_4 $$

Easy enough, but now we need to get rid of that leading \(\alpha\), so how do we do that?

We know that if we add another \(V_1\) term with some coefficient which we will call \(X\) then we can combine this with the existing \(V_1\) term and if we select our \(X\) carefully we can use it to cancel out the leading alpha. That would look something like this.

$$ \alpha{\left(1 - \alpha\right)}^3 V_1 + X V_1 + ... = {\left(1 - \alpha\right)}^3 V_1 + ... $$

So all we have to do is set that up as an equation and solve for X and we will quickly see what value of X will satisfy our requirement.

$$ \alpha{\left(1 - \alpha\right)}^3 + X = {\left(1 - \alpha\right)}^3 $$
$$ X = {\left(1 - \alpha\right)}^3 - \alpha{\left(1 - \alpha\right)}^3 $$
$$ X = \left(1 - \alpha\right) {\left(1 - \alpha\right)}^3 $$
$$ X = {\left(1 - \alpha\right)}^4 $$

It is always nice when equations play along with one’s expectations and give us nice clean terms and patterns start to emerge. Now that we have the value of X we know how we can get from \(S_4\) to \({S_4}’\).

$$ {S_4}' = S_4 - {\left(1 - \alpha\right)}^4 V_0 + {\left(1 - \alpha\right)}^4 V_1 $$

Finally now that we have \({S_4}’\) we simply apply the original EMA equation to that value in order to produce our value for \(S_5\) when EMA length is 5.

$$ S_5 = \left(1 - \alpha\right) \cdot {S_4}' + \alpha V_5 $$

From here on out the pattern holds for successive values as well, we can calculate each successive EMA value at any point without recursively iterating through the previous values but rather by modifying the previous value’s EMA in the series.

Generalization of Length N

At this point the pattern is pretty obvious, I hope. Lets sum things up real fast by generalizing what we learned above and rewrite our EMA equation for a fixed length of \(N\). When we want to calculate the EMA for a point that is more than N (length) away from the start of the time series then we have the following generalization from the above.

$$ S_t = \alpha V_t + \left(1 - \alpha\right) \cdot \left(S_{t-1} - {\left(1 - \alpha\right)}^N V_{t-N} + {\left(1 - \alpha\right)}^N V_{t-N+1}\right) $$

All we have to do now is simplify it and then handle the edge cases near the beginning of the time series and we have a complete solution. Lets start by simplifying.

$$ S_t = \left(1 - \alpha\right)^{N+1}\left(V_{t-N+1} - V_{t-N}\right) - \alpha S_{t-1} + S_{t-1} + \alpha V_t $$

Now lets just explicitly define the edge cases and we have our final equation.

$$ S_t = \begin{cases} V_0, & t = 0 \\ \alpha V_t + \left(1 - \alpha\right) \cdot S_{t-1}, & 0 < t < N \\ \left(1 - \alpha\right)^{N+1}\left(V_{t-N+1} - V_{t-N}\right) - \alpha S_{t-1} + S_{t-1} + \alpha V_t, & t \geq N \end{cases} $$

    \(N\) is the length,
    \(\alpha\) is the weighting coefficient. It is a value between 0 and 1,
    \(t\) the time where 0 is the start time,
    \(V_t\) the value, \(V\), at time \(t\),
    \(S_t\) the sum at time \(t\), this is the same as the EMA at time \(t\).