## Jeffrey Phillips Freeman discussing Bioalgorithms, Machine Learning, Computer Science, Electrical Engineering, HAM Radio and everything science.

An exponential moving average (EMA), also called an exponentially weighted moving average (EWMA), is a type of moving average where the weighting applied to generate the average decays exponentially the farther back in time you go. This is contrasted with a simple moving average (SMA) which always has a finite length but weights all the points in the series within that length equally. Due to the exponential weight decay of an EMA it does not necessitate a finite length as with an SMA. Because of this it can be a bit more efficient than an SMA when calculating the EMA on large time series data since each new data point can recursively apply the EMA from the previous data point’s EMA; contrast this with an SMA where each data point either must step through several previous points in the series to calculate the sum at each new point, or, must store the sum at each previous point in the series which is calculated before dividing by its length. Therefore if the sums are cached to improve computational efficiency there is a hit to space efficiency, if not then the computational efficiency is impaired instead.

However the efficiency advantage of an EMA only really applies if you are sequentially calculating the EMA one time through from some selected starting point in the time series to an end point in the series. This isnt always a viable use case, particularly if the time series data is extremely large, or even infinite, and an EMA must be calculate at arbitrary points or across small sub-sets of the data at any one time. For this reason it is common to use an EMA that has a finite length similar to an SMA, applied to past points for a finite time into the past. Because EMA weights exponentially decay, so long as the length is sufficiently large, it should closely approximate the usual case where the EMA length is infinite, or at least, goes back to the beginning of the dateset. However, like an SMA, when using a finite length there is no need to iterate across the length of past data points to calculate the EMA at each point if you cache previously calculated values. By storing the sum at each previous data point we can make this process efficient by only referencing the sum associated with the last data point and calculating our new EMA from there. Because the sum for an EMA is the same as the EMA itself, that is, there is no need to divide by the length as is the case with an SMA, we are able to calculate an EMA with a finite length with greater space efficiency than we would an SMA, only storing the EMA calculated at each point and no need to store an additional sum variable.

In this post I will discuss how to calculate the EMA value with a finite length efficiently without the need for iterating over past data points and how we derive the math to accomplish this.

# Calculating EMA with Finite Length

First lets take a look at the equation used to calculate the EMA for any point. This equation assumes a starting point of $$V_0$$; when dealing with an EMA of fixed length then our $$V_0$$ would be the point in the time series that is the number of steps behind the current data point in the sequence by length.

$$\definecolor{first}{RGB}{0, 255, 221} \definecolor{second}{RGB}{181, 181, 98} \definecolor{third}{RGB}{18,110,213} \definecolor{fourth}{RGB}{114,0,172} \definecolor{5}{RGB}{45,177,93} \definecolor{6}{RGB}{251,0,29} \definecolor{7}{RGB}{255, 127, 0} \definecolor{8}{RGB}{255,0,255} \definecolor{9}{RGB}{0,255,0} \definecolor{10}{RGB}{255,0,0} \definecolor{normal}{RGB}{0,0,0} S_t = \begin{cases} V_0, & t = 0 \\ \alpha V_t + (1 - \alpha) \cdot S_{t-1}, & t > 0 \end{cases}$$

Where
$$\alpha$$ is the weighting coefficient. It is a value between 0 and 1,
$$t$$ the time where 0 is the start time,
$$V_t$$ the value, $$V$$, at time $$t$$,
$$S_t$$ the sum at time $$t$$, this is the same as the EMA at time $$t$$.

## Example of Length 5

Lets run through all the math we need to accomplish our end goal for a fixed length of 5, then we can hopefully pick out the patterns and generalized this to any length value.

Say we have some series values in an array as follows.

$$V = [V_0, V_1, V_2, V_3, V_4]$$

Now if we take a EMA of the above series then each point in the series will have a corresponding EMA value associated with it, namely $$S_t$$. The first value being trivial and is as follow.

$$S_0 = V_0$$

At this point each following EMA is a recursive relationship to the previous, back to, at most, the length parameter of an EMA or the first value in the series, whatever comes first. For simplicity lets presume we have an EMA length of 5. Now lets calculate the EMA at $$t=1$$.

$$S_1 = \left(1 - \alpha\right) \cdot S_0 + \alpha V_1$$
$$S_1 = \left(1 - \alpha\right) \cdot \underbrace{V_0}_{S_0} + \alpha V_1$$

We can now take this process and apply it to the other elements in the series.

$$S_2 = \left(1 - \alpha\right) \cdot S_1 + \alpha V_2$$
$$S_2 = \left(1 - \alpha\right) \cdot \overbrace{\left(\left(1 - \alpha\right) \cdot \underbrace{V_0}_{S_0} + \alpha V_1\right)}^{S_1} + \alpha V_2$$
$$S_3 = \left(1 - \alpha\right) \cdot S_2 + \alpha V_3$$
$$S_3 = \left(1 - \alpha\right) \cdot \underbrace{\left(\left(1 - \alpha\right) \cdot \overbrace{\left(\left(1 - \alpha\right) \cdot \underbrace{V_0}_{S_0} + \alpha V_1\right)}^{S_1} + \alpha V_2\right)}_{S_2} + \alpha V_3$$
$$S_4 = \left(1 - \alpha\right) \cdot S_3 + \alpha V_4$$
$$S_4 = \left(1 - \alpha\right) \cdot \overbrace{\left(\left(1 - \alpha\right) \cdot \underbrace{\left(\left(1 - \alpha\right) \cdot \overbrace{\left(\left(1 - \alpha\right) \cdot \underbrace{V_0}_{S_0} + \alpha V_1\right)}^{S_1} + \alpha V_2\right)}_{S_2} + \alpha V_3\right)}^{S_3} + \alpha V_4$$

The problem here is that if we start the process over and repeat it for every point it is horribly inefficient. With a length of 5 as seen here this entire process needs to be repeated for every point. If we have a array, $$V$$ of sufficient length, and a length parameter that is of any sizable length this process becomes particularly inefficient. Remember that if there was a 6th point here after $$v_5$$ we would have to start over and start at $$V_1$$ rather than $$V_0$$ since the length is 5 and thus would not go back to the 0 index for subsequent values in the series.

One way to make this more efficient is by figuring out a way of dropping $$V_0$$ from the previously calculated sum before we append the next value to it. For a SMA this would be trivial subtraction but for an EMA it is a bit more complex. To see how we accomplish that we need to first expand the series above and simplify it, once we do this some patterns become evident.

$$S_4 = \left(1 - \alpha\right) \cdot \left(\left(1 - \alpha\right) \cdot \left(\left(1 - \alpha\right) \cdot \color{first} \left(\left(1 - \alpha\right) \cdot V_0 + \alpha V_1\right) \color{normal} + \alpha V_2\right) + \alpha V_3\right) + \alpha V_4$$
$$S_4 = \left(1 - \alpha\right) \cdot \left(\left(1 - \alpha\right) \cdot \left(\left(1 - \alpha\right) \cdot \color{first} \left(V_0 - \alpha V_0 + \alpha V_1\right) \color{normal} + \alpha V_2\right) + \alpha V_3\right) + \alpha V_4$$
$$S_4 = \left(1 - \alpha\right) \cdot \left(\left(1 - \alpha\right) \cdot \color{second} \left(\left(1 - \alpha\right) \cdot \left(V_0 - \alpha V_0 + \alpha V_1\right) + \alpha V_2\right) \color{normal} + \alpha V_3\right) + \alpha V_4$$
$$S_4 = \left(1 - \alpha\right) \cdot \left(\left(1 - \alpha\right) \cdot \color{second} \left(V_0 - \alpha V_0 - \alpha V_0 + {\alpha}^2 V_0 + \alpha V_1 - {\alpha}^2 V_1 + \alpha V_2\right) \color{normal} + \alpha V_3\right) + \alpha V_4$$
$$S_4 = \left(1 - \alpha\right) \cdot \left(\left(1 - \alpha\right) \cdot \color{second} \left(\left(1 - \alpha - \alpha + {\alpha}^2\right) V_0 + \left(\alpha - {\alpha}^2\right) V_1 + \alpha V_2\right) \color{normal} + \alpha V_3\right) + \alpha V_4$$
$$S_4 = \left(1 - \alpha\right) \cdot \color{third} \left(\left(1 - \alpha\right) \cdot \left(\left(1 - 2\alpha + {\alpha}^2\right) V_0 + \left(\alpha - {\alpha}^2\right) V_1 + \alpha V_2\right) + \alpha V_3\right) \color{normal} + \alpha V_4$$
$$S_4 = \left(1 - \alpha\right) \cdot \color{third} \left(\left(1 - 3\alpha + 3 {\alpha}^2 - {\alpha}^3\right) V_0 + \left(\alpha - 2 {\alpha}^2 + {\alpha}^3\right) V_1 + \left(\alpha - {\alpha}^2\right) V_2 + \alpha V_3\right) \color{normal} + \alpha V_4$$

finally after simplifying the last step we arrive at our equation for a EMA of length 5.

$$\begin{split} S_4 = \\ &\color{5}\left(1 - 4\alpha + 6{\alpha}^2 - 4{\alpha}^3 + {\alpha}^4\right)\color{normal} V_0 + \\ &\color{6}\left(\alpha - 3{\alpha}^2 + 3{\alpha}^3 - {\alpha}^4\right)\color{normal} V_1 + \\ &\color{7}\left(\alpha - 2{\alpha}^2 + {\alpha}^3\right)\color{normal} V_2 + \\ &\color{8}\left(\alpha - {\alpha}^2\right)\color{normal} V_3 + \\ &\color{9}\alpha\color{normal} V_4 \end{split}$$

Since all of our coefficients for $$V_t$$ are effectively constants lets simplify that and see what it looks like.

$$S_4 = \color{5}{\left(1 - \alpha\right)}^4\color{normal} V_0 + \color{6}\alpha{\left(1 - \alpha\right)}^3\color{normal} V_1 + \color{7}\alpha{\left(1 - \alpha\right)}^2\color{normal} V_2 + \color{8}\alpha{\left(1 - \alpha\right)}^1\color{normal} V_3 + \color{9}\alpha {\left(1 - \alpha\right)}^0\color{normal} V_4$$

At this point the pattern becomes clear. what we effectively did was we took a somewhat confusing recursive equation and unrolled it into a flat structure. In fact we are now treating it as a much simpler weighted average where each element of $$V$$ is weight by the coefficients marked in color in the above equation. Another important property to notice is how this equation evolves for each successive iteration of our EMA equation. Presuming our EMA had a length of 6 or longer instead then on the next element, $$V_5$$ we would simply increment the exponent of each of our existing coefficients by $$1$$ and then tack on to the end with addition the $$\alpha{\left(1 - \alpha\right)}^0 V_5$$ and we would be good. However thats the easy scenario to handle, what we really want to do is handle the next iteration when length remains $$5$$; that means dropping the $$V_0$$ argument before adding on the $$V_5$$ argument, thus reuse the sum and not needing to regenerate it. For clarity lets look at what our equation needs to look like if we wish to drop $$V_0$$ but before we actually apply $$V_5$$.

$${S_4}' = \color{6}{\left(1 - \alpha\right)}^3\color{normal} V_1 + \color{7}\alpha{\left(1 - \alpha\right)}^2\color{normal} V_2 + \color{8}\alpha{\left(1 - \alpha\right)}^1\color{normal} V_3 + \color{9}\alpha {\left(1 - \alpha\right)}^0\color{normal} V_4$$

Notice that we didnt just drop the first $$v_0$$ term but we also had to drop the leading $$\alpha$$ from the$$V_1$$ term. this is effectively what $$S_4$$ would have looked like if it had a length of $$4$$ instead of $$5$$. the reason the first term always lacks the leading $$\alpha$$ is due to the starting condition specified in the EMA equation. If we can get the equation in this form then applying the next iteration of the EMA equation would bring the $$S_5$$ length back to $$5$$ again, so we know we are on the right track. What we have to figure out now is how can we go from $$S_4$$ to $${S_4}’$$.

Dropping the first term itself is trivial, thats just simple subtraction.

$$S_4 - {\left(1 - \alpha\right)}^4\color{normal} V_0$$
$$\color{10}\alpha\color{normal}{\left(1 - \alpha\right)}^3 V_1 + \alpha{\left(1 - \alpha\right)}^2 V_2 + \alpha{\left(1 - \alpha\right)}^1 V_3 + \alpha {\left(1 - \alpha\right)}^0 V_4$$

Easy enough, but now we need to get rid of that leading $$\alpha$$, so how do we do that?

We know that if we add another $$V_1$$ term with some coefficient which we will call $$X$$ then we can combine this with the existing $$V_1$$ term and if we select our $$X$$ carefully we can use it to cancel out the leading alpha. That would look something like this.

$$\alpha{\left(1 - \alpha\right)}^3 V_1 + X V_1 + ... = {\left(1 - \alpha\right)}^3 V_1 + ...$$

So all we have to do is set that up as an equation and solve for X and we will quickly see what value of X will satisfy our requirement.

$$\alpha{\left(1 - \alpha\right)}^3 + X = {\left(1 - \alpha\right)}^3$$
$$X = {\left(1 - \alpha\right)}^3 - \alpha{\left(1 - \alpha\right)}^3$$
$$X = \left(1 - \alpha\right) {\left(1 - \alpha\right)}^3$$
$$X = {\left(1 - \alpha\right)}^4$$

It is always nice when equations play along with one’s expectations and give us nice clean terms and patterns start to emerge. Now that we have the value of X we know how we can get from $$S_4$$ to $${S_4}’$$.

$${S_4}' = S_4 - {\left(1 - \alpha\right)}^4 V_0 + {\left(1 - \alpha\right)}^4 V_1$$

Finally now that we have $${S_4}’$$ we simply apply the original EMA equation to that value in order to produce our value for $$S_5$$ when EMA length is 5.

$$S_5 = \left(1 - \alpha\right) \cdot {S_4}' + \alpha V_5$$

From here on out the pattern holds for successive values as well, we can calculate each successive EMA value at any point without recursively iterating through the previous values but rather by modifying the previous value’s EMA in the series.

## Generalization of Length N

At this point the pattern is pretty obvious, I hope. Lets sum things up real fast by generalizing what we learned above and rewrite our EMA equation for a fixed length of $$N$$. When we want to calculate the EMA for a point that is more than N (length) away from the start of the time series then we have the following generalization from the above.

$$S_t = \alpha V_t + \left(1 - \alpha\right) \cdot \left(S_{t-1} - {\left(1 - \alpha\right)}^N V_{t-N} + {\left(1 - \alpha\right)}^N V_{t-N+1}\right)$$

All we have to do now is simplify it and then handle the edge cases near the beginning of the time series and we have a complete solution. Lets start by simplifying.

$$S_t = \left(1 - \alpha\right)^{N+1}\left(V_{t-N+1} - V_{t-N}\right) - \alpha S_{t-1} + S_{t-1} + \alpha V_t$$

Now lets just explicitly define the edge cases and we have our final equation.

$$S_t = \begin{cases} V_0, & t = 0 \\ \alpha V_t + \left(1 - \alpha\right) \cdot S_{t-1}, & 0 \lt t \lt N \\ \left(1 - \alpha\right)^{N+1}\left(V_{t-N+1} - V_{t-N}\right) - \alpha S_{t-1} + S_{t-1} + \alpha V_t, & t \geq N \end{cases}$$

Where
$$N$$ is the length,
$$\alpha$$ is the weighting coefficient. It is a value between 0 and 1,
$$t$$ the time where 0 is the start time,
$$V_t$$ the value, $$V$$, at time $$t$$,
$$S_t$$ the sum at time $$t$$, this is the same as the EMA at time $$t$$.

I write a lot of technical documents, and it is important to me that they look nice. In the past I have always resorted to using latex directly. It’s an ugly language, hard to maintain, but it has always been the best solution. Microsoft Word completely fails at complex documents, and is tedious and buggy if you want to do anything technical with it. If I have a simpler document without much math or special formatting I’ve always used Markdown.

Recently however I was able to get the best of both worlds. Using R-script’s flavored Markdown it adds a tremendous amount of power to Markdown. The most important feature is the fact that it uses pandoc in order to compile your Markdown text into a latex template which is then used to compile a PDF. This means I get all the power of latex with all the simplicity of Markdown. That alone made it a winner for me, but it also allows you to compile into HTML and a number of other formats which can make it even more useful.

It also has one other very sweet added bonus if you don’t mind coding in R-Script: you can put R-script code directly into your markdown in order to produce charts or other output programmatically. R-script is a heavily math oriented programming language so it can be extremely powerful when writing math papers.

# Getting Started

If you want to get started using R-flavored Markdown to compile into latex and PDF I have provided my templates on GitHub. I also have another GitHub repository where I provide a complete example of a markdown document and makefile that will compile it into a PDF. Really simple to figure it out, just clone the repository and start editing it. Make sure, of course, that you have R-script installed and all the prerequisites you might need. Finally I also have a third GitHub repository with even more templates and examples. Some of the examples go in depth on showing how you can use r-script inside the Markdown to generate plots and graphs or other data. Worth a look there are templates for CVs/resumes, articles, powerpoint presentations, syllabus, etc.

R-flavored Markdown files use the extension “Rmd” and use all the conventional Markdown syntax with some additions. All you need to do is add some front-matter to the begining of the Markdown file which provides some hints to the latex template to do some formatting. The front-matter looks like the following, taken directly from the example I linked. The rest of the content of the file is just the Markdown, very simple. You can see the full file with front-matter and Markdown as used in this example by clicking the link.

output:
pdf_document:
citation_package: natbib
keep_tex: true
fig_caption: true
latex_engine: pdflatex
template: templates/syncleus-white.tex

# Required properties
title: "Conditional Probabilities and Bayes Theorem"
abstract: "A simple explanation of conditional probabilities."
author:
name: Jeffrey Phillips Freeman
affiliation: Syncleus
correspondence: jeffrey.freeman@syncleus.com

# Optional Properties
other-authors:
- name: John Doe
affiliation: NoTech
tags: Statistics, Math
start-date: "April 6, 2017"
date: "r format(Sys.time(), '%B %d, %Y')"
draft: "r format(Sys.time(), '%B %d, %Y')"
revision: 1.1
tocdepth: 5
title-color: 0,0,90
# No indentation
indent: 0pt
parskip: 3mm
# Standard Anglo-american indentation
#indent: 15pt
#parskip: 0mm
archive: Pseudoscience Journal of America
geometry: left=2cm, right=2cm, top=2.25cm, bottom=2.25cm, headheight=13.6pt, letterpaper
bibliography: ./probabilities.bib
biblio-style: plainnat
biblio-title: References


Below you will see the output of the compiled pdf using this example, different temples can give different layouts and formatting.

In the above examples you can see how the date properties are generated dynamically as the current date using R-script. This gives you just a small taste of how you can use R-script directly inside the Markdown files. Most of the properties above are self explanatory but let me provide a few descriptions for clarity all the same.

• title - The title of the document.
• abstract - A few sentences describing the document used int he abstract box once rendered.
• author - Information about the author
• other-authors - Optional supplemental authors.
• tags - optional list of keywords.
• start-date - Date the author began working on the paper. This is used int he copyright information.
• date - Date the paper was released, this is used int he copyright information.
• draft - Optional text which will appear in a Draft watermark on the paper. If this property is left out then no draft watermark will be added.
• revision - The revision version number of the paper displayed int he upper right corner.
• tocdepth - The maximum depth displayed int he table of contents for nested sections.
• title-color - 3 comma separated integers from 0 to 255 representing the RGB color of the title text.
• indent - The size of the paragraph indentation.
• parskip - The vertical distance between paragraphs.
• archive - The name of the journal or archive the paper is to be published in.
• geometry - Optional parameters that overrides the latex geometry, useful for setting custom margins.
• bibliography - location of the natbib or bibtext bibliography file.
• biblio-style - Optional override for the bibliography style.
• biblio-title - The title used for the bibliography section.

Thats really all there is to it. One last thing, I made sure the properties I picked for my template match those expected in a middleman blog written in markdown. This adds some convenience that the same source file can be compiled by either Middleman or R-script.

# Introduction

Almost 8 years ago, on Aug 15, 2009, I invented a new game-changing algorithm called the Hyperassociative Map algorithm. It was released as part of the dANN v2.x library. The HAM algorithm, as it is often called, has since been used by countless developers and in hundreds of projects. HAM is a Graph Drawing algorithm that is similar to force-directed algorithms but in a class all its own. Unlike other force-directed algorithms HAM does not have a sense of momentum or acceleration which makes it debatable if it can even still be called force-directed.

Below is a video demonstration of HAM in action. In this 3D visualization the vertices of the graph are displayed as grey spheres but the edges are not rendered. The graph’s topology is relatively simple containing 128 nodes in groups of 16 layered such that each group is fully connected to each adjacent group. This results in 256 edges between each adjacent group. Since the groups on either end only have one other group they are adjacent to that means there is a total of 1,792 edges. Despite this the graph aligns quickly and smoothly on a single 1 Ghz processor as demonstrated in the video. It starts with randomized locations for each vertex and then aligns. After each alignment the graph is reset with new random starting positions to show that the same alignment is achieved every time.

What makes HAM so special is that it retains many of the advantages that have made force-directed algorithms so popular while simultaneously addressing their short comings. Wikipedia describes the following advantages to using force-directed algorithms, all of which hold true for the HAM algorithm.

• Good-quality results - The output obtained usually have very good results based on the following criteria: uniform edge length, uniform vertex distribution and showing symmetry. This last criterion is among the most important ones and is hard to achieve with any other type of algorithm.
• Flexibility - Force-directed algorithms can be easily adapted and extended to fulfill additional aesthetic criteria. This makes them the most versatile class of graph drawing algorithms. Examples of existing extensions include the ones for directed graphs, 3D graph drawing, cluster graph drawing, constrained graph drawing, and dynamic graph drawing.
• Intuitive - Since they are based on physical analogies of common objects, like springs, the behavior of the algorithms is relatively easy to predict and understand. This is not the case with other types of graph-drawing algorithms.
• Simplicity - Typical force-directed algorithms are simple and can be implemented in a few lines of code. Other classes of graph-drawing algorithms, like the ones for orthogonal layouts, are usually much more involved.
• Interactivity - Another advantage of this class of algorithm is the interactive aspect. By drawing the intermediate stages of the graph, the user can follow how the graph evolves, seeing it unfold from a tangled mess into a good-looking configuration. In some interactive graph drawing tools, the user can pull one or more nodes out of their equilibrium state and watch them migrate back into position. This makes them a preferred choice for dynamic and online graph-drawing systems.
• Strong theoretical foundations - While simple ad-hoc force-directed algorithms often appear in the literature and in practice (because they are relatively easy to understand), more reasoned approaches are starting to gain traction. Statisticians have been solving similar problems in multidimensional scaling (MDS) since the 1930s, and physicists also have a long history of working with related n-body problems - so extremely mature approaches exist. As an example, the stress majorization approach to metric MDS can be applied to graph drawing as described above. This has been proven to converge monotonically. Monotonic convergence, the property that the algorithm will at each iteration decrease the stress or cost of the layout, is important because it guarantees that the layout will eventually reach a local minimum and stop. Damping schedules cause the algorithm to stop, but cannot guarantee that a true local minimum is reached.

However the two disadvantages described of force-directed algorithms, namely high running time and poor local minima, have been corrected in the HAM algorithm. As described earlier HAM is not a true force-directed algorithm because it lacks any sense of momentum. This was intentional as it ensures there is no need for a dampening schedule to eliminate oscillations that arise from the momentum of nodes. This has the added advantage that the algorithm does not prematurely come to rest at a local minima. It also means fewer processing cycles wasted on modeling oscillations and vibrations throughout the network.

These properties alone already make HAM a worthwhile algorithm for general study and real-world applications, however it is important to note that HAM was originally designed with a very specific use case in mind. Originally HAM was designed to facilitate the distribution of massive real-time graph processing networks. The sort of scenario where each vertex in a graph had to process some input data and produce some output data and where each vertex is part of a large interdependent graph working on the data in real time. When distributing the tasks across a cluster of computers it is critical that vertices that are highly interconnected reside on the same computer in the cluster and physically close to the computers housing the vertices that will ultimately receive the data, process it and then carry it throughout the rest of the network. For this purpose HAM was created to model graphs such that each node in a compute cluster took ownership of tasks associated with vertices that were spatially close to each other according to the HAM’s drawing of the compute graph.

In order for HAM to be successful at it’s job it needed to exhibit a few very specific properties. For starters the interactivity property mentioned earlier was a must. HAM needed to be able to work with a graph that is constantly changing its topology with new vertices able to be added, removed, or reconfigured in real time. This is ultimately what led the algorithm to be modeled in a way that made it similar to force-directed algorithms.

The other requirement is that the motion of the vertices had to be smooth without any oscillations as they align. This was critical because if oscillations occurred on a vertex as it was near the border that distinguishes one compute node from another then those oscillations across that border would cause the task in the compute cluster to be transferred between the nodes in the cluster each time. Since this is an expensive operation it is important that as HAM aligned the vertices didn’t jitter causing them to cross these borders excessively.

Finally HAM needed to be able to be parallelized and segmented. That means that it needed to scale well for multi-threading but in such a way that each thread didn’t need to be aware of the entire graph in order to process it; instead each thread had to be capable of computing the alignment of HAM on an isolated section of the graph. This is obviously critical because of the distributed nature of the compute graph, particularly if we want something capable of unbounded scaling. I basically wanted an algorithm that could be successful on even massively large graphs.

With almost 8 years of testing it has become evident that HAM is top in its class compared to many graph drawing algorithms. Despite this it is still scarcely understood by those studying graph drawing algorithms. For this reason I wanted to write this article to share some of its internal workings so others can adapt and play with the algorithm for their own projects.

# The Algorithm

In this section I want to get into the internal workings of the Hyperassociative Map algorithm, HAM. Below is the pseudocode breakdown explaining the algorithm. Notice I use some math notation here for simplicity. Most notably I use vector notation where all variables representing vectors have a small arrow above the variable and the norm, or magnitude, of the vector is represented by double vertical bars on either side, for example $$||\vec{p}||$$. If you have trouble with vector notation or just want to see a concrete example the full working java code can be found at the end of this article for reference.

\begin{algorithm}
\caption{Hyperassociative Map}
\begin{algorithmic}
% equilibrium distance
\REQUIRE $\tilde{\chi} > 0$
%repulsion strength
\REQUIRE $\delta > 1$
% learning rate
\REQUIRE $\eta = 0.05$
% alignment threshold (determines when graph is aligned)
\REQUIRE $\beta =$ 0.005
\PROCEDURE{HAM}{Vertex Set \textbf{as} $g$}
\STATE \CALL{Randomize}{$g$}
\WHILE{\CALL{AlignAll}{$g$} $> \beta \cdot \tilde{\chi}$}
\STATE optionally recenter the graph
\ENDWHILE
\ENDPROCEDURE
\PROCEDURE{AlignAll}{Vertex Set \textbf{as} $g$}
\STATE $\zeta = 0$
\FORALL{$v$ \textbf{in} $g$}
\STATE $\vec{{\scriptsize \triangle} p} =$ \CALL{Align}{$v$}
\IF{$||\vec{{\scriptsize \triangle} p}|| > \zeta$}
\STATE $\zeta = ||\vec{{\scriptsize \triangle} p}||$
\ENDIF
\STATE \CALL{Place}{$v$, \CALL{Position}{$v$} $+ \vec{{\scriptsize \triangle} p}$}
\ENDFOR
\RETURN $\zeta$
\ENDPROCEDURE
\PROCEDURE{Align}{Vertex \textbf{as} $v$}
\STATE $\vec{p} =$ \CALL{Position}{$v$}
\STATE $\vec{{\scriptsize \triangle} p} = 0$
\FORALL{$m$ \textbf{in} \CALL{Neighbors}{$v$}}
\STATE $\vec{q} =$ \CALL{Position}{$m$} - $\vec{p}$
\STATE $\vec{{\scriptsize \triangle} p} = \vec{{\scriptsize \triangle} p} + \vec{q} \cdot \frac{(||\vec{q}|| - \tilde{\chi}) \cdot \eta}{||\vec{q}||}$
\ENDFOR
\FORALL{$m$ \textbf{in} \CALL{NotNeighbors}{$v$}}
\STATE $\vec{q} =$ \CALL{Position}{$m$} - $\vec{p}$
\STATE $\vec{c} = \vec{q} \cdot \frac{-\eta}{{||\vec{q}||}^{\delta + 1}}$
\IF{$||\vec{c}|| > \tilde{\chi}$}
\STATE $\vec{c} = \vec{c} \cdot \frac{\tilde{\chi}}{||\vec{c}||}$
\ENDIF
\STATE $\vec{{\scriptsize \triangle} p} = \vec{{\scriptsize \triangle} p} + \vec{c}$
\ENDFOR
\RETURN $\vec{{\scriptsize \triangle} p}$
\ENDPROCEDURE
\PROCEDURE{Randomize}{Vertex Array \textbf{as} $g$}
\STATE randomise position of all vertex in $g$
\ENDPROCEDURE
\PROCEDURE{Place}{Vertex \textbf{as} $v$, Vector \textbf{as} $\vec{p}$}
\STATE sets the position of $v$ to $\vec{p}$
\ENDPROCEDURE
\PROCEDURE{Neighbors}{Vertex \textbf{as} $v$}
\RETURN set of all vertex adjacent to $v$
\ENDPROCEDURE
\PROCEDURE{NotNeighbors}{Vertex \textbf{as} $v$}
\STATE $s =$ set of all vertex not adjacent to $v$
\STATE $w =$ set of all vertex whose position is close to that of $v$
\RETURN $s \cap w$
\ENDPROCEDURE
\PROCEDURE{Position}{Vertex \textbf{as} $v$}
\RETURN vector representing position of $v$
\ENDPROCEDURE
\end{algorithmic}
\end{algorithm}


Obviously the pseudocode packs a lot of information into only a few lines so I’ll try to explain some of the more important parts so you have an idea at what you’re looking at.

## Constants

First, lets consider the constants defined at the beginning. the variable $$\tilde{\chi}$$ is called the Equilibrium Distance. It defines the ideal distance between two vertices connected by an edge. If a pair of vertices connected by a single edge are the only vertices present then they will align such that they are approximately as far apart as the value of $$\tilde{\chi}$$. For simplicity here we have represented $$\tilde{\chi}$$ as a single constant but in practice it is also possible to assign a different value to this constant for each edge, resulting in a graph with different aesthetic qualities. This value must of course always be a positive number greater than $$0$$. The default value, and the one used in the demonstration video, is $$1$$.

The second constant is called the Repulsion Strength and is represented by the $$\delta$$ variable. This constant determines how strong the repulsion between two unconnected vertices are, that is two vertices not connected by an edge. Lower values for $$\delta$$ result in a stronger repulsion and larger numbers represent a weaker repulsion. The default value is $$2$$ and this is the value used in the demonstration video.

Next is the Learning Rate constant, $$\eta$$. This is simply a scaling factor applied when the vertices are aligned to ensure the graph is aligned to each node with equivalent effect rather than over-fitting to the last node processed.

The last constant is the Alignment threshold, $$\beta$$, this represents the minimum movement threshold. Once the vertices move less than this value during an alignment cycle it is presumed the graph is sufficiently aligned and the loop ends.

## Align Procedure

The algorithm itself is represented such that it is broken up into three major procedures. The procedure named HAM is the entry point for the algorithm, the procedure named Align calculates the incremental alignment for a single vertex, and the procedure named AlignAll calculates alignment once for every vertex in the graph.

Lets first explain what is going on in the Align procedure. Here we have a single value being passed in, the vertex to be aligned, $$v$$. On line 19 the current position of the vertex is obtained and represented as a euclidean vector, $$\vec{p}$$. Next on line 20 a zero vector is initialized and represented as $$\vec{{\scriptsize \triangle} p}$$. The vector $$\vec{{\scriptsize \triangle} p}$$ is calculated throughout the procedure and represents the desired change to the current position of the vector $$\vec{p}$$. In other words once $$\vec{{\scriptsize \triangle} p}$$ is calculated it can be added to the current position of the vertex and will result in the new position for the vertex. Just as if calculating forces the $$\vec{{\scriptsize \triangle} p}$$ will be the sum of all the composite influences acting on the vertex; so it represents the overall influence exerted on the vertex at any time.

When calculating $$\vec{{\scriptsize \triangle} p}$$ the procedure must iterate through all the other vertices that have an effect on the vertex being aligned. There are two type of vertices each with different effects: neighbors, and non-neighbors. Neighbors are all the vertices connected directly to the current vertex by an edge, non-neighbors are all the other vertices not connected by an edge.

First the influence from the neighbor vertices is calculated on lines 21 - 24. The influence two neighbor vertices have on each other is different depending on how far apart they are. If they are closer than the Equilibrium Distance, $$\tilde{\chi}$$, then the effect is repulsive. If they are farther apart than $$\tilde{\chi}$$ then the effect is attractive. The calculation for this is represented by line 23. It basically calculates the vector that represents the difference between the position of the vertex being aligned and its neighbor and reduces the magnitude of the vector back to $$(||\vec{q}|| - \tilde{\chi}) \cdot \eta$$. To look at it another way if the equation was just $$||\vec{q}|| - \tilde{\chi}$$ then the new position of the vector would be exactly at the Equilibrium Distance, but instead it is scaled to a fraction of this by $$\eta$$ which adjusts how quickly the vertex will approach its equilibrium point.

Next the influence between non-neighbor vertices is calculated on lines 25 - 32. Non-neighbor vertices, that is vertices not connected by an edge, always exhibit a purely repulsive influence. Line 27 calculates this in a similar technique as before. That is the difference between the position of the two vertices is represented by $$\vec{q}$$ and then its magnitude is scaled. Of course it’s also negative to indicate that the force is repulsive. The equation just seems confusing in its simplified and compacted form. Initially it was derived by calculating the new magnitude of $$\vec{q}$$ as the following.

$$\frac{-1}{{\mid\mid\vec{q}\mid\mid}^{\delta}} \cdot \eta$$

This makes a lot more sense as we know in nature repulsive forces are the inverse square of their distance. So this accurately represents a repulsive influence that diminishes with distance. Once we actually apply that magnitude to the vector and simplify we arrive at our final equation.

$$\vec{q} \cdot \frac{\frac{-1}{{\mid\mid\vec{q}\mid\mid}^{\delta}} \cdot \eta}{\mid\mid\vec{q}\mid\mid} \Rightarrow \vec{q} \cdot \frac{-\eta}{{||\vec{q}||}^{\delta + 1}}$$

The only caveat to this is seen in lines 28 to 30 where it checks the distance moved as a result of the repulsive influence. If it is greater than the Equilibrium Distance, $$\tilde{\chi}$$, then its magnitude is scaled back to be $$\tilde{\chi}$$. This is done because at very close distances the exponential nature of the repulsive influence becomes overwhelming and we want to ensure the majority of this influence works at a distance to allow the graph to spread apart but still allow the other influences to be the dominate influences on the graph.

At this point the computed change in position for the vertex is simply returned at line 33 for further processing by the AlignAll procedure.

## AlignAll Procedure

The AlignAll Procedure is extremely straight forward. It is passed in the set of all vertices in the graph as $$g$$ and iterates over the set while aligning them one at a time. Each vertex will get aligned once per call to the procedure, this means the procedure will usually need to be called multiple times.

On line 8 the Maximum Vertex Movement variable, represented as $$\zeta$$, is initialized to $$0$$. This variable represents the greatest distance any vertex moved during the alignment; after being calculated it’s value is returned on line 15. The Maximum Vertex Movement is important for determining when the HAM algorithm has finished processing.

Other than that this procedure doesn’t do anything special, the vertex alignment vector is calculated on line 10 and the new position for the vertex is set on line 14.

## HAM Procedure

The HAM procedure is another rather straight forward procedure to explain. It starts by assigning some initial random coordinates to each vertex in the graph. After that it continually loops calling AlignAll until the graph is sufficiently aligned.

On line 3 the AlignAll procedure is called in a loop until the Max Vertex Movement returned is less than $$\beta \cdot \tilde{\chi}$$. This is just the Alignment Threshold normalized by the Equilibrium Distance constant. The Alignment Threshold is sufficiently small such that if the movements in the graph are less than this value then they are considered negligible and the alignment can end.

As an optional step after each alignment iteration it may be desired to translate the entire graph so it is centered around the zero vector. There is a small amount of drift as the alignment of the graph is calculated and by doing this it ensures the graph remains in the center of the view when rendered. The drift is usually negligible however so this step is entirely optional. In the full java example below the logic for centering the graph is included.

# Appendix: Full Java Code

public class HyperassociativeMap<G extends Graph<N, ?>, N> implements
GraphDrawer<G, N> {
private static final double REPULSIVE_WEAKNESS = 2.0;
private static final double DEFAULT_LEARNING_RATE = 0.05;
private static final double EQUILIBRIUM_DISTANCE = 1.0;
private static final double EQUILIBRIUM_ALIGNMENT_FACTOR = 0.005;

private final G graph;
private final int dimensions;
private Map<N, Vector> coordinates = Collections.synchronizedMap(new
HashMap<N, Vector>());
private static final Random RANDOM = new Random();
private final boolean useWeights;
private double equilibriumDistance;
private double learningRate = DEFAULT_LEARNING_RATE;
private double maxMovement = 0.0;

public HyperassociativeMap(final G graph, final int dimensions, final
double equilibriumDistance, final boolean useWeights) {
if (graph == null)
throw new IllegalArgumentException("Graph can not be null");
if (dimensions <= 0)
throw new IllegalArgumentException("dimensions must be 1 or more");

this.graph = graph;
this.dimensions = dimensions;
this.equilibriumDistance = equilibriumDistance;
this.useWeights = useWeights;

// refresh all nodes
for (final N node : this.graph.getNodes()) {
this.coordinates.put(node, randomCoordinates(this.dimensions));
}
}

@Override
public G getGraph() {
return graph;
}

public double getEquilibriumDistance() {
return equilibriumDistance;
}

public void setEquilibriumDistance(final double equilibriumDistance) {
this.equilibriumDistance = equilibriumDistance;
}

public void resetLearning() {
maxMovement = 0.0;
}

@Override
public void reset() {
resetLearning();
// randomize all nodes
for (final N node : coordinates.keySet()) {
coordinates.put(node, randomCoordinates(dimensions));
}
}

@Override
public boolean isAlignable() {
return true;
}

@Override
public boolean isAligned() {
return isAlignable()
&& (maxMovement < (EQUILIBRIUM_ALIGNMENT_FACTOR *
equilibriumDistance))
&& (maxMovement > 0.0);
}

@Override
public void align() {
// refresh all nodes
if (!coordinates.keySet().equals(graph.getNodes())) {
final Map<N, Vector> newCoordinates = new HashMap<N, Vector>();
for (final N node : graph.getNodes()) {
if (coordinates.containsKey(node)) {
newCoordinates.put(node, coordinates.get(node));
} else {
newCoordinates.put(node, randomCoordinates(dimensions));
}
}
coordinates = Collections.synchronizedMap(newCoordinates);
}

maxMovement = 0.0;
Vector center;

center = processLocally();

// divide each coordinate of the sum of all the points by the number of
// nodes in order to calculate the average point, or center of all the
// points
for (int dimensionIndex = 1; dimensionIndex <= dimensions;
dimensionIndex++) {
center = center.setCoordinate(center.getCoordinate
(dimensionIndex) / graph.getNodes().size(), dimensionIndex);
}

recenterNodes(center);
}

@Override
public int getDimensions() {
return dimensions;
}

@Override
public Map<N, Vector> getCoordinates() {
return Collections.unmodifiableMap(coordinates);
}

private void recenterNodes(final Vector center) {
for (final N node : graph.getNodes()) {
coordinates.put(node, coordinates.get(node).calculateRelativeTo
(center));
}
}

public boolean isUsingWeights() {
return useWeights;
}

Map<N, Double> getNeighbors(final N nodeToQuery) {
final Map<N, Double> neighbors = new HashMap<N, Double>();
for (final TraversableCloud<N> neighborEdge : graph.getAdjacentEdges
(nodeToQuery)) {
final Double currentWeight = (((neighborEdge instanceof Weighted)
&& useWeights) ? ((Weighted) neighborEdge).getWeight() :
equilibriumDistance);
for (final N neighbor : neighborEdge.getNodes()) {
if (!neighbor.equals(nodeToQuery)) {
neighbors.put(neighbor, currentWeight);
}
}
}
return neighbors;
}

private Vector align(final N nodeToAlign) {
// calculate equilibrium with neighbors
final Vector location = coordinates.get(nodeToAlign);
final Map<N, Double> neighbors = getNeighbors(nodeToAlign);

Vector compositeVector = new Vector(location.getDimensions());
// align with neighbours
for (final Entry<N, Double> neighborEntry : neighbors.entrySet()) {
final N neighbor = neighborEntry.getKey();
final double associationEquilibriumDistance = neighborEntry
.getValue();

Vector neighborVector = coordinates.get(neighbor)
.calculateRelativeTo(location);
double newDistance = Math.abs(neighborVector.getDistance()) -
associationEquilibriumDistance;
newDistance *= learningRate;
neighborVector = neighborVector.setDistance(newDistance);
compositeVector = compositeVector.add(neighborVector);
}
// calculate repulsion with all non-neighbors
for (final N node : graph.getNodes()) {
if ((!neighbors.containsKey(node)) && (node != nodeToAlign)
&& (!graph.getAdjacentNodes(node).contains(nodeToAlign))) {
Vector nodeVector = coordinates.get(node).calculateRelativeTo
(location);
double newDistance = -1.0 / Math.pow
(nodeVector.getDistance(), REPULSIVE_WEAKNESS);
if (Math.abs(newDistance) > Math.abs(equilibriumDistance)) {
newDistance = Math.copySign(equilibriumDistance,
newDistance);
}
newDistance *= learningRate;
nodeVector = nodeVector.setDistance(newDistance);
compositeVector = compositeVector.add(nodeVector);
}
}
Vector newLocation = location.add(compositeVector);
final Vector oldLocation = coordinates.get(nodeToAlign);
double moveDistance = Math.abs(newLocation.calculateRelativeTo
(oldLocation).getDistance());

if (moveDistance > maxMovement) {
maxMovement = moveDistance;
}

coordinates.put(nodeToAlign, newLocation);
return newLocation;
}
public static Vector randomCoordinates(final int dimensions) {
final double[] randomCoordinates = new double[dimensions];
for (int randomCoordinatesIndex = 0; randomCoordinatesIndex <
dimensions; randomCoordinatesIndex++) {
randomCoordinates[randomCoordinatesIndex] = (RANDOM.nextDouble()
* 2.0) - 1.0;
}

return new Vector(randomCoordinates);
}

private Vector processLocally() {
Vector pointSum = new Vector(dimensions);
for (final N node : graph.getNodes()) {
final Vector newPoint = align(node);
for (int dimensionIndex = 1; dimensionIndex <= dimensions;
dimensionIndex++) {
pointSum = pointSum.setCoordinate(pointSum.getCoordinate
(dimensionIndex) + newPoint.getCoordinate
(dimensionIndex), dimensionIndex);
}
}
return pointSum;
}
}


I recently updated the code for my blog (yes, this one) to render, in classic latex style, algorithms as pseudocode. This style has been so extensively used from the 80’s and continues to be used as a standard format for rendering algorithms as pseudocode. Below is an example of what I’m talking about, its the Quicksort algorithm rendered using this style.

% This quicksort algorithm is extracted from Chapter 7, Introduction to Algorithms (3rd edition)
\begin{algorithm}
\caption{Quicksort}
\begin{algorithmic}
\PROCEDURE{Quicksort}{$A, p, r$}
\IF{$p < r$}
\STATE $q =$ \CALL{Partition}{$A, p, r$}
\STATE \CALL{Quicksort}{$A, p, q - 1$}
\STATE \CALL{Quicksort}{$A, q + 1, r$}
\ENDIF
\ENDPROCEDURE
\PROCEDURE{Partition}{$A, p, r$}
\STATE $x = A[r]$
\STATE $i = p - 1$
\FOR{$j = p$ \TO $r - 1$}
\IF{$A[j] < x$}
\STATE $i = i + 1$
\STATE exchange
$A[i]$ with     $A[j]$
\ENDIF
\STATE exchange $A[i]$ with $A[r]$
\ENDFOR
\ENDPROCEDURE
\end{algorithmic}
\end{algorithm}


If you’d like to use it yourself then it isn’t too hard. You can get most of the magic from a nice little java script library that you can find on github called pseudocode.js. The problem is that you will only get half way there using the javascript alone. The parts that are there work great, but it won’t search your html for you and render anything, you have to write that yourself. For that if you put the following block of code at the end of your page’s source, and make sure the project I linked earlier is already installed, then it will solve that problem for you.

  var blocks = document.getElementsByClassName("pseudocode");
for(var blockId = 0; blockId < blocks.length; blockId++) {
var block = blocks[blockId];

var code = block.textContent;
var options = {
lineNumber: true
};

var outputEl = document.createElement('div');
outputEl.className += " pseudocode-out";
block.parentNode.insertBefore(outputEl, block.nextSibling);

pseudocode.render(code, outputEl, options);
}

while( blocks) {
blocks.parentNode.removeChild(blocks);
}



Now all you have to do is add any element with a class of “pseudocode” and write the actual psueudocode inside the element, following the format specified in the pseudocode project. I prefer to define this as a “pre” element so if javascript is turned off it will still produce a human readable block. So other then that I didn’t have to do anything too special to get it working.