Jeffrey Phillips Freeman discussing Bioalgorithms, Machine Learning, Computer Science, Electrical Engineering, HAM Radio and everything science.

An Efficient Exponential Moving Average of Finite Length

\(\definecolor{first}{RGB}{0, 255, 221}\) \(\definecolor{second}{RGB}{181, 181, 98}\) \(\definecolor{third}{RGB}{18,110,213}\) \(\definecolor{fourth}{RGB}{114,0,172}\) \(\definecolor{5}{RGB}{45,177,93}\) \(\definecolor{6}{RGB}{251,0,29}\) \(\definecolor{7}{RGB}{255, 127, 0}\) \(\definecolor{8}{RGB}{255,0,255}\) \(\definecolor{9}{RGB}{0,255,0}\) \(\definecolor{10}{RGB}{255,0,0}\) \(\definecolor{normal}{RGB}{0,0,0}\)

An exponential moving average (EMA), also called an exponentially weighted moving average (EWMA), is a type of moving average where the weighting applied to generate the average decays exponentially the farther back in time you go. This is contrasted with a simple moving average (SMA) which always has a finite length but weights all the points in the series within that length equally. Due to the exponential weight decay of an EMA it does not necessitate a finite length as with an SMA. Because of this it can be a bit more efficient than an SMA when calculating the EMA on large time series data since each new data point can recursively apply the EMA from the previous data point’s EMA; contrast this with an SMA where each data point either must step through several previous points in the series to calculate the sum at each new point, or, must store the sum at each previous point in the series which is calculated before dividing by its length. Therefore if the sums are cached to improve computational efficiency there is a hit to space efficiency, if not then the computational efficiency is impaired instead.

However the efficiency advantage of an EMA only really applies if you are sequentially calculating the EMA one time through from some selected starting point in the time series to an end point in the series. This isnt always a viable use case, particularly if the time series data is extremely large, or even infinite, and an EMA must be calculate at arbitrary points or across small sub-sets of the data at any one time. For this reason it is common to use an EMA that has a finite length similar to an SMA, applied to past points for a finite time into the past. Because EMA weights exponentially decay, so long as the length is sufficiently large, it should closely approximate the usual case where the EMA length is infinite, or at least, goes back to the beginning of the dateset. However, like an SMA, when using a finite length there is no need to iterate across the length of past data points to calculate the EMA at each point if you cache previously calculated values. By storing the sum at each previous data point we can make this process efficient by only referencing the sum associated with the last data point and calculating our new EMA from there. Because the sum for an EMA is the same as the EMA itself, that is, there is no need to divide by the length as is the case with an SMA, we are able to calculate an EMA with a finite length with greater space efficiency than we would an SMA, only storing the EMA calculated at each point and no need to store an additional sum variable.

In this post I will discuss how to calculate the EMA value with a finite length efficiently without the need for iterating over past data points and how we derive the math to accomplish this.

Calculating EMA with Finite Length

First lets take a look at the equation used to calculate the EMA for any point. This equation assumes a starting point of \(V_0\); when dealing with an EMA of fixed length then our \(V_0\) would be the point in the time series that is the number of steps behind the current data point in the sequence by length.

$$ S_t = \begin{cases} V_0, & t = 0 \\ \alpha V_t + (1 - \alpha) \cdot S_{t-1}, & t > 0 \end{cases} $$

    \(\alpha\) is the weighting coefficient. It is a value between 0 and 1,
    \(t\) the time where 0 is the start time,
    \(V_t\) the value, \(V\), at time \(t\),
    \(S_t\) the sum at time \(t\), this is the same as the EMA at time \(t\).

Example of Length 5

Lets run through all the math we need to accomplish our end goal for a fixed length of 5, then we can hopefully pick out the patterns and generalized this to any length value.

Say we have some series values in an array as follows.

$$ V = [V_0, V_1, V_2, V_3, V_4] $$

Now if we take a EMA of the above series then each point in the series will have a corresponding EMA value associated with it, namely \(S_t\). The first value being trivial and is as follow.

$$ S_0 = V_0 $$

At this point each following EMA is a recursive relationship to the previous, back to, at most, the length parameter of an EMA or the first value in the series, whatever comes first. For simplicity lets presume we have an EMA length of 5. Now lets calculate the EMA at \(t=1\).

$$ S_1 = \left(1 - \alpha\right) \cdot S_0 + \alpha V_1 $$
$$ S_1 = \left(1 - \alpha\right) \cdot \underbrace{V_0}_{S_0} + \alpha V_1 $$

We can now take this process and apply it to the other elements in the series.

$$ S_2 = \left(1 - \alpha\right) \cdot S_1 + \alpha V_2 $$
$$ S_2 = \left(1 - \alpha\right) \cdot \overbrace{\left(\left(1 - \alpha\right) \cdot \underbrace{V_0}_{S_0} + \alpha V_1\right)}^{S_1} + \alpha V_2 $$
$$ S_3 = \left(1 - \alpha\right) \cdot S_2 + \alpha V_3 $$
$$ S_3 = \left(1 - \alpha\right) \cdot \underbrace{\left(\left(1 - \alpha\right) \cdot \overbrace{\left(\left(1 - \alpha\right) \cdot \underbrace{V_0}_{S_0} + \alpha V_1\right)}^{S_1} + \alpha V_2\right)}_{S_2} + \alpha V_3 $$
$$ S_4 = \left(1 - \alpha\right) \cdot S_3 + \alpha V_4 $$
$$ S_4 = \left(1 - \alpha\right) \cdot \overbrace{\left(\left(1 - \alpha\right) \cdot \underbrace{\left(\left(1 - \alpha\right) \cdot \overbrace{\left(\left(1 - \alpha\right) \cdot \underbrace{V_0}_{S_0} + \alpha V_1\right)}^{S_1} + \alpha V_2\right)}_{S_2} + \alpha V_3\right)}^{S_3} + \alpha V_4 $$

The problem here is that if we start the process over and repeat it for every point it is horribly inefficient. With a length of 5 as seen here this entire process needs to be repeated for every point. If we have a array, \(V\) of sufficient length, and a length parameter that is of any sizable length this process becomes particularly inefficient. Remember that if there was a 6th point here after \(v_5\) we would have to start over and start at \(V_1\) rather than \(V_0\) since the length is 5 and thus would not go back to the 0 index for subsequent values in the series.

One way to make this more efficient is by figuring out a way of dropping \(V_0\) from the previously calculated sum before we append the next value to it. For a SMA this would be trivial subtraction but for an EMA it is a bit more complex. To see how we accomplish that we need to first expand the series above and simplify it, once we do this some patterns become evident.

$$ S_4 = \left(1 - \alpha\right) \cdot \left(\left(1 - \alpha\right) \cdot \left(\left(1 - \alpha\right) \cdot \color{first} \left(\left(1 - \alpha\right) \cdot V_0 + \alpha V_1\right) \color{normal} + \alpha V_2\right) + \alpha V_3\right) + \alpha V_4 $$
$$ S_4 = \left(1 - \alpha\right) \cdot \left(\left(1 - \alpha\right) \cdot \left(\left(1 - \alpha\right) \cdot \color{first} \left(V_0 - \alpha V_0 + \alpha V_1\right) \color{normal} + \alpha V_2\right) + \alpha V_3\right) + \alpha V_4 $$
$$ S_4 = \left(1 - \alpha\right) \cdot \left(\left(1 - \alpha\right) \cdot \color{second} \left(\left(1 - \alpha\right) \cdot \left(V_0 - \alpha V_0 + \alpha V_1\right) + \alpha V_2\right) \color{normal} + \alpha V_3\right) + \alpha V_4 $$
$$ S_4 = \left(1 - \alpha\right) \cdot \left(\left(1 - \alpha\right) \cdot \color{second} \left(V_0 - \alpha V_0 - \alpha V_0 + {\alpha}^2 V_0 + \alpha V_1 - {\alpha}^2 V_1 + \alpha V_2\right) \color{normal} + \alpha V_3\right) + \alpha V_4 $$
$$ S_4 = \left(1 - \alpha\right) \cdot \left(\left(1 - \alpha\right) \cdot \color{second} \left(\left(1 - \alpha - \alpha + {\alpha}^2\right) V_0 + \left(\alpha - {\alpha}^2\right) V_1 + \alpha V_2\right) \color{normal} + \alpha V_3\right) + \alpha V_4 $$
$$ S_4 = \left(1 - \alpha\right) \cdot \color{third} \left(\left(1 - \alpha\right) \cdot \left(\left(1 - 2\alpha + {\alpha}^2\right) V_0 + \left(\alpha - {\alpha}^2\right) V_1 + \alpha V_2\right) + \alpha V_3\right) \color{normal} + \alpha V_4 $$
$$ S_4 = \left(1 - \alpha\right) \cdot \color{third} \left(\left(1 - 3\alpha + 3 {\alpha}^2 - {\alpha}^3\right) V_0 + \left(\alpha - 2 {\alpha}^2 + {\alpha}^3\right) V_1 + \left(\alpha - {\alpha}^2\right) V_2 + \alpha V_3\right) \color{normal} + \alpha V_4 $$

finally after simplifying the last step we arrive at our equation for a EMA of length 5.

$$ \begin{split} S_4 = \\ &\color{5}\left(1 - 4\alpha + 6{\alpha}^2 - 4{\alpha}^3 + {\alpha}^4\right)\color{normal} V_0 + \\ &\color{6}\left(\alpha - 3{\alpha}^2 + 3{\alpha}^3 - {\alpha}^4\right)\color{normal} V_1 + \\ &\color{7}\left(\alpha - 2{\alpha}^2 + {\alpha}^3\right)\color{normal} V_2 + \\ &\color{8}\left(\alpha - {\alpha}^2\right)\color{normal} V_3 + \\ &\color{9}\alpha\color{normal} V_4 \end{split} $$

Since all of our coefficients for \(V_t\) are effectively constants lets simplify that and see what it looks like.

$$ S_4 = \color{5}{\left(1 - \alpha\right)}^4\color{normal} V_0 + \color{6}\alpha{\left(1 - \alpha\right)}^3\color{normal} V_1 + \color{7}\alpha{\left(1 - \alpha\right)}^2\color{normal} V_2 + \color{8}\alpha{\left(1 - \alpha\right)}^1\color{normal} V_3 + \color{9}\alpha {\left(1 - \alpha\right)}^0\color{normal} V_4 $$

At this point the pattern becomes clear. what we effectively did was we took a somewhat confusing recursive equation and unrolled it into a flat structure. In fact we are now treating it as a much simpler weighted average where each element of \(V\) is weight by the coefficients marked in color in the above equation. Another important property to notice is how this equation evolves for each successive iteration of our EMA equation. Presuming our EMA had a length of 6 or longer instead then on the next element, \(V_5\) we would simply increment the exponent of each of our existing coefficients by \(1\) and then tack on to the end with addition the \(\alpha{\left(1 - \alpha\right)}^0 V_5\) and we would be good. However thats the easy scenario to handle, what we really want to do is handle the next iteration when length remains \(5\); that means dropping the \(V_0\) argument before adding on the \(V_5\) argument, thus reuse the sum and not needing to regenerate it. For clarity lets look at what our equation needs to look like if we wish to drop \(V_0\) but before we actually apply \(V_5\).

$$ {S_4}' = \color{6}{\left(1 - \alpha\right)}^3\color{normal} V_1 + \color{7}\alpha{\left(1 - \alpha\right)}^2\color{normal} V_2 + \color{8}\alpha{\left(1 - \alpha\right)}^1\color{normal} V_3 + \color{9}\alpha {\left(1 - \alpha\right)}^0\color{normal} V_4 $$

Notice that we didnt just drop the first \(v_0\) term but we also had to drop the leading \(\alpha\) from the\(V_1\) term. this is effectively what \(S_4\) would have looked like if it had a length of \(4\) instead of \(5\). the reason the first term always lacks the leading \(\alpha\) is due to the starting condition specified in the EMA equation. If we can get the equation in this form then applying the next iteration of the EMA equation would bring the \(S_5\) length back to \(5\) again, so we know we are on the right track. What we have to figure out now is how can we go from \(S_4\) to \({S_4}’\).

Dropping the first term itself is trivial, thats just simple subtraction.

$$ S_4 - {\left(1 - \alpha\right)}^4\color{normal} V_0 $$
$$ \color{10}\alpha\color{normal}{\left(1 - \alpha\right)}^3 V_1 + \alpha{\left(1 - \alpha\right)}^2 V_2 + \alpha{\left(1 - \alpha\right)}^1 V_3 + \alpha {\left(1 - \alpha\right)}^0 V_4 $$

Easy enough, but now we need to get rid of that leading \(\alpha\), so how do we do that?

We know that if we add another \(V_1\) term with some coefficient which we will call \(X\) then we can combine this with the existing \(V_1\) term and if we select our \(X\) carefully we can use it to cancel out the leading alpha. That would look something like this.

$$ \alpha{\left(1 - \alpha\right)}^3 V_1 + X V_1 + ... = {\left(1 - \alpha\right)}^3 V_1 + ... $$

So all we have to do is set that up as an equation and solve for X and we will quickly see what value of X will satisfy our requirement.

$$ \alpha{\left(1 - \alpha\right)}^3 + X = {\left(1 - \alpha\right)}^3 $$
$$ X = {\left(1 - \alpha\right)}^3 - \alpha{\left(1 - \alpha\right)}^3 $$
$$ X = \left(1 - \alpha\right) {\left(1 - \alpha\right)}^3 $$
$$ X = {\left(1 - \alpha\right)}^4 $$

It is always nice when equations play along with one’s expectations and give us nice clean terms and patterns start to emerge. Now that we have the value of X we know how we can get from \(S_4\) to \({S_4}’\).

$$ {S_4}' = S_4 - {\left(1 - \alpha\right)}^4 V_0 + {\left(1 - \alpha\right)}^4 V_1 $$

Finally now that we have \({S_4}’\) we simply apply the original EMA equation to that value in order to produce our value for \(S_5\) when EMA length is 5.

$$ S_5 = \left(1 - \alpha\right) \cdot {S_4}' + \alpha V_5 $$

From here on out the pattern holds for successive values as well, we can calculate each successive EMA value at any point without recursively iterating through the previous values but rather by modifying the previous value’s EMA in the series.

Generalization of Length N

At this point the pattern is pretty obvious, I hope. Lets sum things up real fast by generalizing what we learned above and rewrite our EMA equation for a fixed length of \(N\). When we want to calculate the EMA for a point that is more than N (length) away from the start of the time series then we have the following generalization from the above.

$$ S_t = \alpha V_t + \left(1 - \alpha\right) \cdot \left(S_{t-1} - {\left(1 - \alpha\right)}^N V_{t-N} + {\left(1 - \alpha\right)}^N V_{t-N+1}\right) $$

All we have to do now is simplify it and then handle the edge cases near the beginning of the time series and we have a complete solution. Lets start by simplifying.

$$ S_t = \left(1 - \alpha\right)^{N+1}\left(V_{t-N+1} - V_{t-N}\right) - \alpha S_{t-1} + S_{t-1} + \alpha V_t $$

Now lets just explicitly define the edge cases and we have our final equation.

$$ S_t = \begin{cases} V_0, & t = 0 \\ \alpha V_t + \left(1 - \alpha\right) \cdot S_{t-1}, & 0 < t < N \\ \left(1 - \alpha\right)^{N+1}\left(V_{t-N+1} - V_{t-N}\right) - \alpha S_{t-1} + S_{t-1} + \alpha V_t, & t \geq N \end{cases} $$

    \(N\) is the length,
    \(\alpha\) is the weighting coefficient. It is a value between 0 and 1,
    \(t\) the time where 0 is the start time,
    \(V_t\) the value, \(V\), at time \(t\),
    \(S_t\) the sum at time \(t\), this is the same as the EMA at time \(t\).

An In-depth Look at Duals and Their Circuits

\(\definecolor{current}{RGB}{0, 255, 221}\) \(\definecolor{voltage}{RGB}{181, 181, 98}\) \(\definecolor{impedance}{RGB}{18,110,213}\) \(\definecolor{resistance}{RGB}{114,0,172}\) \(\definecolor{reactance}{RGB}{45,177,93}\) \(\definecolor{imaginary}{RGB}{251,0,29}\) \(\definecolor{capacitance}{RGB}{255, 127, 0}\) \(\definecolor{inductance}{RGB}{255,0,255}\) \(\definecolor{permeability}{RGB}{0,255,0}\) \(\definecolor{permittivity}{RGB}{255,0,0}\) \(\definecolor{normal}{RGB}{0,0,0}\)

Duality is an approach that has been applied across countless disciplines where one takes an existing structure and transforms it into an equivalent structure, often with the intention of making it more useful for a particular context. In electronic circuits this usually means we take an existing circuit schematic and transform it in such a way that it serves a similar purpose but suited to our specific use case. One extremely trivial example of this would be converting two resistors in series, which act as a voltage divider, into two resistors in parallel producing a current divider. Another similarly trivial example would be to take a voltage divider and double the values of its resistors such that it divides the voltage by the same ratio but uses half the current to do so. In both cases the fundamental idea of dividing a value by a given ratio is the same, we just transform the circuit in different ways that are suited to our needs.

One of the key advantages of duality is that it is feature-preserving, as such if the original circuit has particular desirable features but otherwise may not be well suited for our application we can transform the circuit into a dual in such a way as to preserve the desirable features but transform the undesirable features. For example a resistor based voltage divider has the advantageous feature of being relatively stable across various frequencies where other types of voltage dividers may have a very limited frequency range, so a resistor based voltage divider may be best suited for an extremely broadband application (large range of frequencies). However our application may require low-power consumption and the voltage divider may not need to drive a load and only need to be provided as an input to an IC with high impedance. As such if our reference circuit is for a voltage divider that uses relatively small values for resistors, and thus draws too much power, we can transform the voltage divider into its dual that preserves the frequency-stability of the original but reduces the current draw. This of course is a very trivial example, the concept of duality can, and often is, applied to much larger complex circuits as well. For this reason understanding circuit duality, and duality in general, specifically how to recognize it, apply it, and common circuit applications is a vital tool in anyone’s mental toolbox.

What is a Dual?

Duality is the transformation of a mathematical model or structure into an equivalent mathematical model or structure such that each element of the original has a one-to-one relationship with an element in the result. The transformation between its elements usually represents an involution, which means if the same transformation is applied twice you wind up with the original; at the very least the transformation must be reversible (invertible) back to its original form. This implies that the transformation must be unique for any given input. Both the overall structure once converted is said to be the dual of the original, but so too are the individual elements of the structure considered the dual of their counterpart in the dual structure.

Taking the reciprocal is an example of an involution, and thus a trivial example of duality.

$$ f(x) = \frac{1}{x} $$

Since \(f(x)\) is an involution the following must be true for any involution function.

$$ f(f(x)) = x $$

Of course for the reciprocal this holds true.

$$ x = 5 $$
$$ f(5) = \frac{1}{5} $$
$$ f(\frac{1}{5}) = 5 $$

Therefore we can say \(5\) is the dual of \(\frac{1}{5}\) under the reciprocal transformation.

An involution can sometimes be an identity function, and thus create fixed points where the involution of one element is unchanged. This of course still holds true to the rules of a duality transformation whereby if the involution is applied twice you still wind up at the original, it is invertible. The identity transformation would be:

$$ f(x) = x $$

Therefore it is trivial to see this holds true as an involution since the following is true.

$$ f(f(x)) = x $$

In this case any value is the dual of itself under the identity transformation. Which isn’t really saying much, but it is important to understand a fixed point is still a dual caused by an involution.

As stated earlier duals are usually transformed between each other through an involution function, but this does not need to be the case. Any invertible (reversible) function can be a valid way to express duality. The technical term for the property of a function to be reversible is to say it is bijective But this is just a fancy way of saying you can reverse the function to get to where you started. For example adding one to a value is a bijective function since you can also subtract one and always get to where you started and there is no ambiguity in doing so. However multiplying by 0 is not bijective (invertible) because once you multiply by 0 you have no way of getting back to where you started, all numbers would transform into 0. In other words in order for a function to be invertible the output of the function must be a unique value for any given input of the function, otherwise ambiguity is introduced and there would be no way to reverse the process.

$$ f(x) = x + 1 \label{addone} $$

As stated equation \(\eqref{addone}\) is invertible as you can always subtract \(1\) and get back to where you started.

$$ f^{-1}(x) = x - 1 $$
$$ f^{-1}(f(x)) = x \label{inv} $$

In this case \(f^{-1}(x)\) is called the inverse function to \(f(x)\) and the notation used in equation \(\eqref{inv}\) is the typical notation used to represent an inverse function. It should be trivially obvious but keep in mind an inverse function must work in both directions, in other words.

$$ f(f^{-1}(x)) = f^{-1}(f(x)) = x $$

It should be noted that an involution function is closely related to a function and its inverse. All an involution function really is is a function where its inverse is itself.

Also bear in mind not all values will have a dual, consider the reciprocal function, which is an involution (its own inverse) and thus a valid transformation.

$$ f(x) = f^{-1}(x) = \frac{1}{x} $$

In the case of the reciprocal function a value of 0 for x is undefined since you can not divide by 0, any other real value other than 0, however, is valid. Therefore we can say that under the reciprocal transformation all real number values have a dual except for 0, which does not have a dual.

Examples of Duals

There are many common examples of duals in almost every subject from philosophy, to mechanical engineering, it is a pervasive idea that can often be useful in many fields. Here are a few example values and their duals under different inversion transformations.

  • True is the dual of False under the negation transformation
  • 10 is the dual of 0.1 under the reciprocal transformation
  • 5 is the dual of -5 under the negation transformation.
  • A current divider circuit is the dual of a voltage divider circuit under series-parallel transformation
  • A capacitor based high-pass filter is the dual of an inductor based high-pass filter under reciprocal impedance transformation
  • A bandpass filter is the dual of a band-stop filter under series-parallel transformation
  • Position is the dual of velocity under the derivative/integral transformation
  • Up is the dual of Down under vertical flip transformation
  • In philosophy the mind is the dual of the physical world under dual-aspect theory

Similarly here are some examples of transformations and their inverse that are therefore capable of producing duals

  • Reciprocal transformation is its own inverse, an involution.
  • Negation transformation is its own inverse, an involution.
  • derivative transformation is the inverse of an integral transformation
  • A geometric flip transformation is its own inverse, an involution
  • doubling a value is the inverse of halving a value

The Dual of a function

Just as we have shown above that individual variables and values have a dual under an invertible function, likewise functions can also have duals in the same manner. Imagine we have an invertible function \(T(x)\) which will convert something to its dual, and we have some function \(f(x)\) we wish to find the dual of, then simply by passing the function into T we can produce its dual. Specifically \(T(f(x)) = f^T(x)\) where the functions \(f(x)\) and \(f^T(x)\) are duals of each other. It is important to note here only \(T(x)\) needs to be invertible; neither \(f(x)\) nor \(f^T(x)\) needs to have this property. For example; say the transformation under which we create the duals is the reciprocal function, which is invertible, but \(f(x)\) is the square function, which is not invertible. We know it isn’t invertible because 10 squared is 100 and -10 squared is also 100. So there is no way to reverse the value of 100 and get the original value since some information was lost, we no longer know if the original value was positive or negative.

$$ T(x) = \frac{1}{X} $$
$$ f(x) = x^2 $$
$$ f^T(x) = T(f(x)) = \frac{1}{x^2} $$

We can now see the function \(f(x) = x^2\) is the dual of \(f^T(x) = \frac{1}{x^2}\) under the reciprocal transformation.

Manipulating A System of Equations

Things get slightly more complicated when we start talking about systems rather than single variables or functions. A system is a collection of variables where some or all of the variables are dependent on the others; in other words, two or more mathematical functions dictate the value of one or more variables in relationship to other variables. In fact if you’re reading this blog you are already familiar with one very important type of system we all care about, an electrical circuit. In an electrical circuit, the variables are things like the voltage at various points in the circuit and the system of functions are the electrical components that connect these points.

Mapping a System of Equations

From this point on we need a good way to visualize systems of equations so I can do a better job talking about them. So I want to describe a graphical language for diagramming systems of equations. Let’s start with a simple generic component that provides an impedance, doesn’t matter just yet if it’s a resistor or something else. The following is a simple schematic of a lone component where some of the variables we care about are labeled.

Simple Resistor

Keep in mind when we talk about Ohm’s Law, we usually talk about the voltage across a component but here I have separated out the voltage on each terminal instead; it will make things a little more straight forward. Just keep in mind the voltage across the component is simply \(\color{voltage}V_+\color{normal} - \color{voltage}V_-\color{normal}\) in this case. Let’s represent that component in terms of Ohm’s Law which will give us the relationship between its voltage, impedance, and current. I will intentionally arrange the equation in such a way that doesn’t give any one variable preferential treatment by solving for 0, this is to emphasize the fact that it is a relationship between all of the variables.

$$ \color{current}I\color{normal} \cdot \color{impedance}Z\color{normal} + \color{voltage}V_-\color{normal} - \color{voltage}V_+\color{normal} = 0 $$

The way I want to diagram an equation such as this would be as follows.

In this diagram we have variables in circles, these can have any arbitrary name and dont need to be the same name as the variables in the equations (you will see why that is necessary soon). The rectangles contain the equation, and the lines that connect it associate the shared variables, in circles, with the local variables defined in the equation. So here we have the variable \(W\) associated with \(\color{current}I\color{normal}\) for example. Lets apply this to a slightly more complex example, a simple voltage divider circuit.

As you can see here we now have two equations, one for each component in our circuit which is capable of relating every variable/point in our circuit to every other. Some variables like current are shared between both components and relate to the same local variable in each equation. However others, like \(\color{voltage}V_{out}\color{normal}\), are also shared between both components, and both equations, but represents a different variable in each equation. This is the reason we needed to pick the specific format for our diagrams where the shared variables depicted in circles often have different names from the local variables in the equations. One important rule these diagrams will always follow is that the number of lines that connect to any particular equation will be exactly equal to the number of variables in the equation. Also note that at this stage we don’t make any distinction between if a variable is a constant, or an unknown, every variable is treated the same. We will call this form of the diagram the variable relationship diagram as it shows the relationship between variables.

The diagram as it stands now, however, is only the first step if we want to actually solve for the variables and, for example, determine the actual voltage at \(\color{voltage}V_{out}\color{normal}\); for that we need to actually figure out what variables have known quantities, and the relationship between them. The first step to do that is to pick whichever variables have known values and plug in the actual values for those variables into the circles. We also need to change the lines in the diagram into one-directional arrows. Any known variables will always have all of the connections facing outward. Presuming we know the source voltage and the impedance of our two components lets fill in some arbitrary values now and show how that might look.

Here the known variables are highlighted in green, the unknown variables are highlighted in red, and the appropriate arrows were added. Notice we have two shared variables left, since we have two simultaneous equations we already know these can be treated as unknowns and solved for. Since in the convention we are using an arrow pointing into an equation is a variable that will be plugged into it, then an arrow pointing out of an equation will be a value the equation will solve for. As we know we can solve for any single variable in an equation and as long as all the other variables are known then we can turn our equation into a function and solve it. For example we could rearrange our equation into the following function.

$$ \color{voltage}I \cdot Z\color{normal} + \color{voltage}V_-\color{normal} - \color{voltage}V_+\color{normal} = 0 $$
$$ \color{voltage}V_+\color{normal} = \color{voltage}I \cdot Z\color{normal} + \color{voltage}V_-\color{normal} $$

All we did was isolate the \(\color{voltage}V_+\color{normal}\) on one side of the equation and now it is a function. We can represent this in our diagram by making a rule for ourselves such that when we are trying to solve for a system of equations every equation in a box in our diagram should have one and only one outgoing arrow and all other arrows should be incoming. The outgoing arrow simply represents which of the variables we are choosing to solve for in the local equation. The other rule is that any unknown we can solve for must have at least one (and usually only one) incoming arrow and the rest of the arrows are outgoing. Finally, all lines must be converted to arrows. If we can follow these stated rules and ensure each unknown we care about has an incoming arrow then the system of equations should be solvable. For example here is what the final solvable diagram would look like in this case.

Keep in mind in the above diagram there is more than one way we could have made the system solvable and still followed the outlined rules. For example the arrows connected to \(\color{voltage}V_{out}\color{normal}\) could have been reversed along with the arrows connection to \(\color{current}I\color{normal}\) also being reversed in which case we would also have had a solvable system of equations. We will call this final form of the diagram the variable dependency diagram.

Lets recap real quick the rules for a variable relationship diagram:

  1. Each equation, denoted by a rectangular box, should have exactly one line for each variable in the equation.
  2. Each global variable, denoted by a circle, should have one line connecting it to each equation it is used in.
  3. Lines can only connect variables (circles) with equations (rectangles)
  4. The variable name next to a line should match one of the variable names in the equation it connects to.
  5. The global variable name depicted inside a circle does not need to match the variable name on a line that connects it, but it is allowed to.
  6. At this stage none of the lines should have arrows associated with it.

Similarly lets recap the rules for a variable dependency diagram:

  1. Rules 1 through 5 above also apply here.
  2. All lines in the diagram should be depicted as one-directional arrows.
  3. For each equation in diagram one and only one line should be an outgoing arrow, all other arrows should be inbound.
  4. All known global variables, depicted with a circle, should have all of its lines as outgoing arrows only.
  5. All unknown global variables, depicted with a circle, should have at least one, and usually only one, incoming arrow, all others should be outgoing. If this rule can not be satisfied then the system is not solvable.
  6. There may be more than one configuration that satisfies the above rules, if so pick any arbitrary layout capable of satisfying the rules.

Let’s finish up by actually solving for the unknowns in the above variable dependency diagram.

First let’s take the left-hand equation.

$$ \color{voltage}I \cdot Z\color{normal} + \color{voltage}V_-\color{normal} - \color{voltage}V_+\color{normal} = 0 $$
$$ \color{voltage}I \cdot 7\Omega\color{normal} + \color{voltage}0V\color{normal} - \color{voltage}V_{out}\color{normal} = 0 $$
$$ \color{voltage}V_{out}\color{normal} = \color{voltage}I \cdot 7\Omega\color{normal} + \color{voltage}0V\color{normal} $$
$$ \color{voltage}V_{out}\color{normal} = \color{voltage}I\cdot 7\Omega\color{normal} $$

This now represents the value of \(\color{voltage}V_{out}\color{normal}\) which can then be used when we solve the right-hand equation.

$$ \color{voltage}I \cdot Z\color{normal} + \color{voltage}V_-\color{normal} - \color{voltage}V_+\color{normal} = 0 $$
$$ \color{voltage}I \cdot 10\Omega\color{normal} + \color{voltage}I \cdot 7\Omega\color{normal} - \color{voltage}10V\color{normal} = 0 $$
$$ \color{voltage}I \cdot 10\Omega\color{normal} + \color{voltage}I \cdot 7\Omega\color{normal} = \color{voltage}10V\color{normal} $$
$$ \color{voltage}I \cdot (10\Omega + 7\Omega)\color{normal} = \color{voltage}10V\color{normal} $$
$$ \color{voltage}I \cdot 17\Omega\color{normal} = \color{voltage}10V\color{normal} $$
$$ \color{current}I\color{normal} = \color{current}\frac{10}{17}A \color{normal} $$

Now that we have solved for \(\color{current}I\color{normal}\) we can finish solving the left-hand equation where we left off.

$$ \color{voltage}V_{out}\color{normal} = \color{voltage}I \cdot 7\Omega\color{normal} $$
$$ \color{voltage}V_{out}\color{normal} = \color{voltage}\frac{10}{17}A \cdot 7\Omega\color{normal} $$
$$ \color{voltage}V_{out}\color{normal} = \color{voltage}\frac{70}{17}V\color{normal} $$
$$ \color{voltage}V_{out}\color{normal} = \color{voltage}4 \frac{2}{17}V\color{normal} $$

Easy right?

Manipulating a Variable Relationship Diagram

One thing we also need to know how to do is manipulate the variable relationship diagram, which is exactly the same as manipulating a system of equations. Obviously if we have two equations that share a common variable we can combine them into a single equation which eliminates the common variable by simple substitution. A simple example of that is as follows.

$$ f = 2x+t $$
$$ t = 7+y $$

These can be combined into:

$$ f = 2x + 7 + y \label{combined} $$

Notice the variable \(t\) disappears when we do this. Of course the same is true in reverse, we can pull out some terms in an equation, replace it with a variable, and get two new equations as a result and one additional variable. For example in the equation \(\eqref{combined}\) we can pull out \(7 + y\) as a term, replace it with \(t\) and wind up with the two equations we started with.

We can likewise do the same with our variable relationship diagram, afterall it is just a way of visualizing systems of equations. Let’s say we took our earlier example of a voltage divider and its corresponding variable relationship diagram. The following is what it would look like if we split up the equation \(\color{voltage}I \cdot Z\color{normal} + \color{voltage}V_-\color{normal} - \color{voltage}V_+\color{normal} = 0\) by pulling out the \(\color{voltage}I \cdot Z\color{normal}\) term and replacing it with a new variable we will call \(\color{voltage}V\color{normal}\).

Separated system of equations

Here we can see two new shared variables are created, one called \(\color{voltage}V_1\color{normal}\) and the other \(\color{voltage}V_2\color{normal}\), both represent the voltage across their respective components. Other than the addition of two new variables, and changing 2 equations into four, this diagram still represents the same voltage divider as before. It can often be useful to break up equations in this way when we are considering how to construct duals for a system of equations, but we will get to that later. Usually what we find though is breaking up equations into smaller ones is only useful down to three variables, anymore than that and we just get long chains that add a lot of verbosity to the diagram but don’t really add much value beyond that. So typically when transforming to a dual we start by representing a system with three-variable equations and then we can make it more compact as we work with it.

One interesting aspect of working with things visually is it gives us clues as to which equations can be combined simply due to how the diagram is laid out. For example if you start at any equation and draw a path following the connecting lines and variables to any other equation, even if you go through multiple equations in the diagram to get there, then the path you created can be combined into a single equation. For example if we create a vertical path between the pair of equations \(\color{voltage}I \cdot Z\color{normal} - \color{voltage}V\color{normal} = 0\) up to \(\color{voltage}V\color{normal} + \color{voltage}V_-\color{normal} - \color{voltage}V_+\color{normal} = 0\) then those two paths can combine the equations and we would wind back at where we started before we split the equations up. Likewise we can also create a path starting at any of the four equations creating a loop through the other three such that the path goes through all four equations and in doing so can compact all four equations into one big equation. In fact being able to combine all the equations into one big equation is always possible as long as the system of equations we are representing is solvable.

The Dual of a Circuit

Now lets reiterate what we said earlier when we defined what a dual is: The transformation of a mathematical model or structure into an equivalent model or structure such that each element of the original has a one-to-one relationship with an element in the result, where the transformation between its elements is invertible.

At this point it should be obvious that when we talk about a mathematical model it is analogous to a system of equations, and for our focus here the equations describe an electrical circuit. Let’s take a minute and consider what our definition of a dual means when we talk about a circuit. We are saying that a dual of a circuit must be functionally equivalent, in other words it is at least partially feature-preserving, and each of its components must have a one-to-one relationship between the circuits. This doesn’t necessarily mean the dual will have the same components or that the voltage, current, resistance, or any other value, will be the same in one circuit or its dual, however they do need a one-to-one relationship where some value at some point in one circuit can serve the same function at some point in the dual of that circuit. For example if we create a dual circuit and find the output of the circuit is inverted from the original then that is acceptable since it can be seen as representing the same information in a different but equivalent way. In other words for every one-to-one relationship in a circuit and its dual, the values or components in each must also be duals of each other. In fact that’s all the dual of a system really is, some system where you replace part or all of the system with its dual in such a way that the utility of the overall system is preserved.

The Circuit Dual Under Reciprocal Impedance

There are many different types of circuit duals and even more types of duals in the general sense. The most commonly taught circuit dual is the reciprocal impedance circuit dual. In this form of dual circuit each component is replaced with an equivalent component that has reciprocal impedance characteristics. Simultaneously power sources such as batteries have their polarity reversed. This will result in a dual circuit where the amplitude of the signals at any point in the circuit is unchanged but are in different locations in the new circuit. If you flip the components around the power source however instead of flipping the power source, which is functionally the same thing, the position of the voltage points will remain unchanged; which tends to be a more convenient way to visualize it. The phase of the signals, as well as the current, at any point undergoes a transformation however and is not preserved. Generally forward phase shifts become negative phase shifts of the same degree. In this type of circuit inductors become the dual of capacitors as they always have reciprocal impedance of each other for the same value component.

The type of circuit dual here, the dual under reciprocal impedance, is typically the type of Dual that is intended when an electrical engineer talks about a circuit dual. Let’s start with a really simple example to show what we mean, consider the following RC circuit that acts as a low-pass filter.

Capacitor Low Pass Filter

It should be apparent right away that this is very similar to our voltage divider from before except that now instead of specifying some generic unnamed components we are defining \(\color{impedance}Z_1\color{normal}\) to be the value from a resistor and \(\color{impedance}Z_2\color{normal}\) to be the value from a capacitor. This will function as a low pass filter allowing low frequency signals from \(\color{voltage}V_s\color{normal}\) to pass relatively unattenuated out of \(\color{voltage}V_{out}\color{normal}\) while attenuating higher frequency signals. A DC signal will pass through completely unaffected and as the frequency increases less of the signal will make it out.

For those of you who are familiar with this circuit you probably already know there is another way you can construct a low-pass filter that is very similar but uses an inductor in place of a capacitor. That circuit would look like this.

Capacitor Low Pass Filter

Assuming the correct values are picked for the components in each of these circuits then these two circuits will behave similarly and either circuit would be an effective low-pass filter.

In this case these two circuits are what we would call duals of each other whereby the inductor is the dual of the capacitor, the position of the two components are reversed, but everything in one circuit has a one to one relationship with the other one. This is one of the simplest examples of a circuit dual. Similarly the system of equations that describe these two circuits will also be duals of each other, let’s take a look at that.

Since the above two circuits are essentially voltage dividers, where the impedance of the capacitor and inductor change according to frequency, we can describe both systems using a similar set of equations as we did above, just with a bit of extra detail. Let’s start by looking at the system of equations that would describe the capacitor based low-pass filter. We already know the impedance of a resistor is simply its resistance, the impedance of a capacitor is as follows.

$$ \color{impedance}Z_2\color{normal} = \color{impedance}\frac{1}{2\pi f C j}\color{normal} = \color{reactance}-\frac{1}{2\pi f C}\color{imaginary}j\color{normal} $$

    \(f\) is the frequency applied in Hz,
    \(\color{capacitance}C\color{normal}\) is capacitance in Farad, and
    \(\color{imaginary}j\color{normal}\) is the imaginary number.

If complex numbers confuse you don’t worry, for the most part for what we will be discussing you can simply ignore the imaginary number. If we apply this to our earlier variable relationship diagram then it would look something like this.

If we want to create a dual for the circuit represented by the above variable relationship diagram we can start by rearranging the diagram following certain rules. Keep in mind, however, that in the context here the system of equations we are working with is intended to represent a circuit. If we were dealing with a pure system of equations, that were not intended to represent something physical like a circuit, then we would have a great deal of freedom in the sorts of duals we could create by rearranging the system of equations. But since we are specifically interested in talking about circuits, we need to be mindful that any rearrangement we do can actually be represented as a circuit int he real world, it is more than just a lump of equations on a piece of paper. One simple mental check you can do when working with physical systems, like a circuit, is when creating a dual make sure when you move something around that the units associated with the various variables still make sense. If \(\color{impedance}Z_1\color{normal}\) is an impedance you can’t just go plugging it into an equation for a variable where voltage is expected, for example, that just wouldn’t make much sense.

The other important factor to consider here is that because we are talking about a dual, and because a dual needs to maintain a one-to-one relationship between elements of its dual, we need to ensure that all of the relevant elements in our circuit are represented in our variable relationship diagram. This means that at a minimum there should be one shared variable representing the voltage at each node in our circuit, here those would be \(\color{voltage}Gnd\color{normal}\), \(\color{voltage}V_{out}\color{normal}\), and \(\color{voltage}V_s\color{normal}\). We also need to represent the current for each mesh in the circuit; since our circuit here is just a single loop we only have one variable representing current, \(\color{current}I\color{normal}\). Naturally since the components themselves represent impedances, and ultimately the components will be what get manipulated in order to create our dual, it is useful to also insure the impedance for each component is represented in our diagram as well; here we have \(\color{impedance}Z_1\color{normal}\) and \(\color{impedance}Z_2\color{normal}\) doing just that.

One way we can create a dual from our variable relationship diagram is to pick two shared variables of the same type, such as impedances, with the intention of flipping their positions. In this case we want to flip the impedance of our two components so we will pick \(\color{impedance}Z_2\color{normal}\) and \(\color{impedance}Z_1\color{normal}\) to be flipped to create our dual. Let’s color that in our diagram blue. At the same time take any equations connected to the selected nodes that do not lie between some path between the nodes and color them green. For example the equation for the impedance of a capacitor fits that description so we will color it green, however, the equation \(\color{voltage}I \cdot Z\color{normal} - \color{voltage}V\color{normal} = 0\) does lie on a path between the two blue nodes, so we will not color it green. Next, repeat the process such that everything you just colored green has everything attached to it that is not already colored changed to green as well; repeat until there is nothing left to color. When we are done the parts colored green should not lie on a path between the two blue points. After this the diagram would be as follows.

The areas in green in our graph are the parts we intend to flip across the blue shared variables. The next step is to start at each of the two shared variables colored in blue, in this case \(\color{impedance}Z_1\color{normal}\) and \(\color{impedance}Z_2\color{normal}\), and find the shortest path between them and color it blue. We want to ensure that every non-green line connected to these variables is blue when we are done; if needed we may need to pick more than one path to accomplish that, but using the least number of paths to do so is ideal. In this case we can select a single path and satisfy that condition.

Next combine the equations in blue into a single equation.

$$ \color{voltage}I \cdot Z\color{normal} - \color{voltage}V\color{normal} = 0 $$
$$ \color{voltage}I \cdot Z\color{normal} = \color{voltage}V\color{normal} $$
$$ \color{current}I\color{normal} = \color{current}\frac{V}{Z}\color{normal} $$
$$ \color{current}\frac{V_1}{Z_1}\color{normal} = \color{current}\frac{V_2}{Z_2}\color{normal} $$
$$ \color{current}\frac{V_1}{Z_1}\color{normal} - \color{current}\frac{V_2}{Z_2}\color{normal} = 0 $$

Giving us the following diagram.

This of course creates a problem, we eliminated the global variable for current, \(\color{current}I\color{normal}\), and as stated earlier when creating duals we care about the value of \(\color{current}I\color{normal}\) so we need to add it back somewhere else in the system of equations. There are multiple places we can add \(\color{current}I\color{normal}\) back into our equation, for example the voltage across either of our two components in the circuit, divided by the impedance of that component, would give us \(\color{current}I\color{normal}\); as such we could create the equation for that and connect the \(\color{voltage}V_2\color{normal}\) shared variable and the \(\color{impedance}Z_2\color{normal}\) shared variable to it and add \(\color{current}I\color{normal}\) hanging off in our diagram on the left side. We can also do the same on the right side for the other component and get the same value of \(\color{current}I\color{normal}\). However if we did it this way we would find we would need to rearrange the diagram because if you recall the rules earlier when we defined our paths, colored in blue, this would mean we would have to create an additional path through the new equation. The path in either of those cases would mean we would have to combine and manipulate the additional path in the same way we are currently doing with the one path we have. Therefore the easier way to add \(\color{current}I\color{normal}\) back for our purposes is as a relationship between our two impedances and \(\color{voltage}V_s\color{normal}\) with the following equation.

$$ \color{current}I\color{normal} = \color{current}\frac{V_s}{Z_1 + Z_2}\color{normal} $$
$$ \color{current}\frac{V_s}{Z_1 + Z_2}\color{normal} - \color{current}I\color{normal} = 0 $$

This equation is just Ohm’s law applied to the whole circuit with the voltage across the circuit being \(\color{voltage}V_s\color{normal}\) and the total impedance of the circuit being the sum of our two individual impedances. Defining it in this way isn’t necessary but it will save us the headache of doing additional work manipulating our diagram to get it into the proper form to calculate our dual.

Next if the combined equation has more than three variables associated with it, split out the variables that are not part of the path. Let’s start with the equation on the right hand side, in that equation we want to split out \(\color{voltage}V_1\color{normal}\) and \(\color{voltage}V_2\color{normal}\).

$$ \color{current}\frac{V_1}{Z_1}\color{normal} - \color{current}\frac{V_2}{Z_2}\color{normal} = 0 $$
$$ \color{current}\frac{V_1}{Z_1}\color{normal} = \color{current}\frac{V_2}{Z_2}\color{normal} $$
$$ \color{voltage}\frac{V_1 \cdot Z_2}{Z_1}\color{normal} = \color{voltage}V_2\color{normal} $$
$$ \frac{\color{impedance}Z_2\color{normal}}{\color{impedance}Z_1\color{normal}} = \frac{\color{voltage}V_2\color{normal}}{\color{voltage}V_1\color{normal}} $$
$$ \frac{\color{impedance}Z_2\color{normal}}{\color{impedance}Z_1\color{normal}} - \frac{\color{voltage}V_2\color{normal}}{\color{voltage}V_1\color{normal}} = 0 $$
$$ \frac{\color{impedance}Z_2\color{normal}}{\color{impedance}Z_1\color{normal}} - X = 0 $$

That is one equation, and the second:

$$ X = \frac{\color{voltage}V_2\color{normal}}{\color{voltage}V_1\color{normal}} $$
$$ \frac{\color{voltage}V_2\color{normal}}{\color{voltage}V_1\color{normal}} - X = 0 $$

Again lets apply this to our diagram

We also want to split the equation on the right hand side for the same reason.

$$ \color{current}\frac{V_s}{Z_1+Z_2}\color{normal} - \color{current}I\color{normal} = 0 $$
$$ \color{current}\frac{V_s}{Z_1+Z_2}\color{normal} = \color{current}I\color{normal} $$
$$ \color{impedance}\frac{1}{Z_1+Z_2}\color{normal} = \color{impedance}\frac{I}{V_s}\color{normal} $$
$$ \color{impedance}Z_1\color{normal} + \color{impedance}Z_2\color{normal} = \color{impedance}\frac{V_s}{I}\color{normal} $$
$$ \color{impedance}Z_1\color{normal} + \color{impedance}Z_2\color{normal} - \color{impedance}\frac{V_s}{I}\color{normal} = 0 $$

Which we can split into our two new equations.

$$ \color{impedance}Z_1\color{normal} + \color{impedance}Z_2\color{normal} - Y = 0 $$
$$ \color{impedance}\frac{V_s}{I}\color{normal} - Y = 0 $$

Representing that in the diagram we get.

We should now take a moment to consider what properties the dual circuit we want to create needs to have in order for it to meet our use case. For the sake of simplicity let’s start by saying we don’t care about the phase of the AC signals throughout the circuit, only the amplitude. This lets us get rid of the imaginary number, \(\color{imaginary}j\color{normal}\), in our equation for capacitor impedance and lets us treat all the variables throughout our system of equations as real numbers. We could handle the complex case if we wanted but for now I think it’s important we keep this example simple.

Since we’ve done away with imaginary and complex numbers let’s consider what shared variables in our system of equations need to remain fixed points when we transform it to our dual and what variables are free to change. Keep in mind all the variables in the end will always be a dual of their counterpart in the original, we just decide which variables are fixed points, and thus their own dual, and which are not fixed. Remember an element in a system can always be a dual of itself so this isn’t a problem.

Well, \(\color{voltage}V_s\color{normal}\) represents our input signal and \(\color{voltage}Gnd\color{normal}\) defines the bias, or in this case lack of a bias, applied to that signal, so we know those values must be fixed. Similarly we know we want our dual circuit to behave as an equivalent low-pass filter to the original therefore we can also assert that \(\color{voltage}V_{out}\color{normal}\) must likewise be fixed. The rest of the variables we aren’t too concerned with, so let’s start by coloring our fixed points red.

By defining a certain point as fixed this will cause this effect to propagate throughout a portion of our diagram. Essentially any variables in our diagram which are dependent solely on other fixed variables will themselves become fixed variables. To illustrate this first find any equations where either all but one variable directly connected to it is red, or where all variables are red, and make that equation red. Likewise after doing that any equation which is red which connects to a black variable, make that variable red. Repeat this until there are no more changes to be made. Note that at no point will you change the color of anything that is in blue in the diagram.

At this point we have defined our expectations and appropriately annotated our diagram. Everything in red represents the portion of the system that will remain fixed while the black portion will be free to transform. It is important to note that not all choices for fixed points are possible or useful, for example in this case if we had also defined current, \(\color{current}I\color{normal}\), as a fixed point then there would be no valid transformation that would satisfy this, we will see why in a moment.

In our diagram any blue equations connected with a red portion will be the equation that will dictate the transform we must use when flipping the green portion of our graph to the other shared variable. If both of the blue equations were connected to a red variable then that would mean that we must satisfy the transform, which we will calculate in a minute, to satisfy both of these equations simultaneously. Often that is not possible, it would only be possible if both equations would somehow dictate the same transform. As we will see in a minute that is not the case here so we know that only one of the two blue equations is allowed to be connected to a red segment. As such we also know that given the fixed points we already selected, annotated in red, it is not possible to also make \(\color{current}I\color{normal}\) fixed. With that said if all we really cared about was \(\color{current}I\color{normal}\) and \(\color{voltage}V_s\color{normal}\) remaining fixed, and allowed the other variables to transform, then that would be doable as well. But if we did that we would be solving a very different problem since we are trying to create a dual which is feature-preserving with regards to still being low-pass filter.

The final step is to figure out the transformation the left-hand blue equation will dictate for a flip across \(\color{impedance}Z_2\color{normal}\) and \(\color{impedance}Z_1\color{normal}\). To do that pick for one of the shared variables at either end and solve for it. In this case we will start with \(\color{impedance}Z_1\color{normal}\).

$$ \color{impedance}Z_2\color{normal} = X \cdot \color{impedance}Z_1\color{normal} $$
$$ \color{impedance}Z_2\color{normal} = X \cdot \color{impedance}Z_1\color{normal} $$
$$ \color{impedance}Z_1\color{normal} = \color{impedance}Z_2\color{normal} \cdot \frac{1}{X} $$

Now solve for the other variable, in this case \(\color{impedance}Z_2\color{normal}\)

$$ \frac{\color{impedance}Z_2\color{normal}}{\color{impedance}Z_1\color{normal}} - X = 0 $$
$$ \frac{\color{impedance}Z_2\color{normal}}{\color{impedance}Z_1\color{normal}} = X $$
$$ \color{impedance}Z_2\color{normal} = X \cdot \color{impedance}Z_1\color{normal} $$

These two equations represent inverses of each other, obviously. Next lets represent them as functions of \(X\) and also make \(\color{impedance}Z_1\color{normal}\) and \(\color{impedance}Z_2\color{normal}\) the same variable as follows.

$$ \color{impedance}Z_1(X)\color{normal} = \frac{1}{X} \cdot \color{impedance}Z\color{normal} $$
$$ \color{impedance}Z_2(X)\color{normal} = X \cdot \color{impedance}Z\color{normal} $$

Next we want to figure out what transformation we have to do to X in one function in order to produce the other function. In other words, we want to figure out what the function \(T(x)\) needs to be in the following equation.

$$ \color{impedance}Z_1(T(X))\color{normal} = \color{impedance}Z_2(X)\color{normal} \label{argtrans} $$

Since the equation here is simple it is probably already obvious, but let’s expand our functions and work through it anyway; expanding our two functions for equation \(\eqref{argtrans}\) and we get.

$$ \color{impedance}Z\color{normal} \cdot \frac{1}{T(X)} = X \cdot \color{impedance}Z\color{normal} $$

Let’s just solve for \(T(X)\) to see what the function definition is.

$$ \frac{1}{T(X)} = \frac{X \cdot \color{normal}Z\color{impedance}}{\color{impedance}Z\color{normal}} $$
$$ \frac{1}{T(X)} = X $$
$$ T(X) = \frac{1}{X} $$

We also need to find the inverse function of \(T(X)\) which would be the function that satisfies the following.

$$ f^{-1}(T^{-1}(X)) = f(X) $$

Another way of stating the same thing is you just take the function \(T(X)\) and solve for X.

$$ T(X) = \frac{1}{X} $$
$$ y = \frac{1}{X} $$
$$ x = \frac{1}{y} $$
$$ T^{-1}(X) = \frac{1}{X} $$

As we can see in this case the inverse of \T(X)\) is itself. This isnt always the case, as we discussed earlier this makes the transform in this case an involution.

Now we know in order to perform the flip between the two components in the circuit we must perform the reciprocal transform on each component when we do. To do this we can flip the two nodes and insert our reciprocal transformation between each of them. If the transform were not an involution then we would perform the inverse transform on \(\color{impedance}Z_2\color{normal}\) to get \(\color{impedance}Z_{\hat{1}}\color{normal}\) and vice versa as follows.

$$ \color{impedance}Z_{\bar{1}}\color{normal} = T^{-1}(\color{impedance}Z_2\color{normal}) $$
$$ \color{impedance}Z_{\bar{2}}\color{normal} = T(\color{impedance}Z_1\color{normal}) $$

If we were talking about a simple voltage divider made with two resistors, where \(\color{impedance}Z_1\color{normal} = \color{resistance}100\Omega\color{normal}\) and \(\color{impedance}Z_2\color{normal} = \color{resistance}10\Omega\color{normal}\), then we could flip the position of these components, take the reciprocal of each and we would have an equivalent system, where \(\color{impedance}Z_{\bar{1}}\color{normal} = \color{resistance}\frac{1}{10}\Omega\color{normal}\) and \(\color{impedance}Z_{\bar{2}}\color{normal} = \color{resistance}\frac{1}{100}\Omega\color{normal}\), and in doing so all other variables in red will remain fixed. Meanwhile, as expected, current, \(\color{current}I\color{normal}\), will change significantly. However in this case we are working with a capacitor and a resistor and not two resistors so that changes things. Let’s illustrate our new diagram under the transform.

Lets expand the transform function and we get the following.

We quickly notice that if we combine the reciprocal equation and the impedance equation for the capacitor, it would take the reciprocal and thus eliminate the fraction. What we wind up with looks identical at that point to the equation for an inductor. That is because the impedance for an inductor is always the dual of the impedance of a capacitor of the same value under the reciprocal transform. As such our capacitor becomes an inductor, therefore we can combine these two equations in our diagram and  arrive at the equation for an inductor by changing the variable \(\color{capacitance}C\color{normal}\) to \(\color{inductance}L\color{normal}\), which is convention for representing inductance. Similarly the dual of a resistor’s impedance under the reciprocal transform would be admittance, which is just the reciprocal of resistance. Therefore we can likewise change the resistor in our original circuit with a new resistor where the capacitor used to be that has an admittance value that is the same as the resistance value of the old resistor. The standard variable for admittance is \(Y\) so lets likewise make that change in our diagram.

This final diagram is our dual where \(\color{capacitance}C\color{normal} = \color{inductance}L\color{normal}\) in the original circuit and \(Y_{\bar{2}} = \color{inductance}Z_2\color{normal}\) as well. This now represents an inductor based low-pass filter as we illustrated in the circuit diagram earlier.

To recap, we know that the impedance equation for an inductor is the reciprocal of that of a capacitor, so in our capacitor based low-pass filter we know we can swap the position of the capacitor and the resistor, use the reciprocal value for the new resistor, and since an inductor is already the reciprocal of a capacitor, we just ensure the inductor has the same inductance in Henrys as the capacitor has capacitance in Farads. Thus transforming our capacitor based low-pass filter into its inductor based dual.

The Voltage-current Dual Under Parallel-series Transformation

A voltage-current circuit dual under parallel-series transformation is another valid type of dual. In this sort of dual points in the circuit that represent voltage signals are transformed into equivalent current signals and vice versa. Every point in the original circuit that represents a voltage has an equivalent current through a mesh in the dual circuit. As such the number of nodes in the original circuit (not counting ground) is equal to the number of meshes in the dual circuit and vice versa. This is accomplished by transforming each component from its series topology to a parallel topology and vice versa.

The process for working with the system of equations that define this type of dual is similar to the process we used earlier except we are simply flipping around different parts of the graph and choosing different relationships. I encourage you to give it a try for yourself.

Under this type of dual a voltage divider becomes its equivalent, a current divider. Take the following voltage divider as an example.

Likewise here is its voltage-current dual under parallel-series transformation, a current divider.

As you can see \(\color{current}I_1\color{normal}\) becomes equivalent to \(\color{voltage}V_1\color{normal}\) and \(\color{current}I_2\color{normal}\) becomes equivalent to \(\color{voltage}V_2\color{normal}\). We can also see that the second circuit has two meshes and one node; meanwhile, the first circuit has two nodes and one mesh. More importantly, however, the ratio of the current being divided between \(\color{current}I_1\color{normal}\) and \(\color{current}I_2\color{normal}\) in the current divider is the same as the ratio between voltages \(\color{voltage}V_1\color{normal}\) and \(\color{voltage}V_2\color{normal}\) in the voltage divider.

The Electric-magnetic Dual under Capacitance-permeance Transformation

There is another type of circuit dual that is far more obscure, rarely understood, and almost never considered, that is the circuit dual created by swapping the magnetic field and the electric field in much the same way we can create a circuit dual that swaps the voltage and current values. Every electric circuit has an equivalent magnetic circuit as its dual, though in practice this isnt always all that useful to create such a dual. In a magnetic circuit dual of an electric circuit resistance becomes magnetic resistance, electric fields are replaced with magnetic fields, inductors behave like capacitors, capacitors behave like inductors, Electromotive Force (EMF), measured in volts, is replaced with Magnetomotive Force (MMF), measured in amps, and current is replaced with magnetic current which is the rate of change of magnetic flux, which conveniently enough has the units of volts. In other words a magnet or inductor where the field is collapsing or growing at a constant rate in a magnetic circuit is equivalent to a DC an electric circuit. Similarly since a spinning magnet by definition is constantly accelerating at the edges where its two poles are located (remember rotating objects are constantly accelerating at their edge by definition) would be equivalent to an AC electric citcuit.

Resistance is the property of a material to impede the flow of current and reluctance is the property of a material to impede the propagation of a magnetic field. Magnetic resistance is closely related to reluctance, at least when represented as a complex number which we will calculate later on. Low reluctance, coupled with high electrical resistivity, gives us low magnetic resistance, however reluctance is a complex value and varies with frequency. Since energy isn’t dissipated when a static magnetic field is applied, and thus a static magnetic field does not represent current flow in a magnetic circuit. When a changing magnetic field is applied magnetic resistance does dissipate energy fom magnetic field, so it is functionally equivalent with regard to its ability to do work just as a resistor would be in an electrical circuit. Because of this distinction even copper wires which tend to have relatively low resistance but high reluctance would need to be replaced with iron wires that have a higher electrical resistance but a very low reluctance. So in the end the dual of an electric circuit creates a magnetic circuit that really doesn’t look much like what we think of as a circuit at all, but functionally, and in terms of its ability to do work, it will be equivalent.

Even though in a magnetic circuit the units for voltage and current get swapped this is very different than the voltage-current dual under parallel-series transformation. In this case we aren’t actually swapping the arrangement of components but rather voltage and current become swapped in place and are represented by entirely different forces, namely those caused by the magnetic field. In doing so the transform is no longer a Series-parallel, the transform is called the capacitance-permeance transform, which we will explain later on.

Keep in mind there are actually two types of magnetic duals, both of which are refered to as a magnetic circuit, and they tend to be a bit different. The type we are describing here is an equivalent-work dual in the sense that if you have an electric circuit that does work, then its magnetic dual as described here will also be doing the same amount of work. This is called the gyrator-capacitor model or less commonly the capacitor-permeance model for a magnetic circuit. There is another type of magnetic circuit dual which is more often described but does not preserve work-equivalence, that is called the resistance-reluctance model.


First let’s start with an illustration to show how in a magnetic circuit the magnetic field can propagate through an iron wire in much the same way an electric field propagates through a copper wire.

Here we see a permanent bar magnet at the top of each segment and a compass at the bottom. Assume that the distance between the bar magnetic at the top and the compass at the bottom is significant such that the without the iron wire to propagate the magnetic field around the outside, then the bar magnet would have no effect on the compass. We see that by adding the iron wire to propagate the magnetic field we can produce a similar magnetic field at the far end where the compass is at.

Now in this image the bar magnet in each frame is intended to illustrate a stationary magnet. As such under ordinary conditions the magnet can not do any actual work, the compass will remain in its fixed position. This would be similar to an ordinary electric circuit with a voltage applied and no current flowing, with a volt meter attached in place of the compass. This is not intended to represent the circuit dual of the above but only a demonstration of how magnetic flux propagates through a magnetic conductor in much the same way electric fields propagate through an electric wire.

Before we show actual examples of circuit duals lets cover some of the relevant properties of an electric circuit and their dual in a magnetic circuit.

Magnetomotive force (MMF)\(\color{voltage}\mathcal{F}\color{normal} = \color{voltage}\int \mathbf{H}\cdot\operatorname{d}\mathbf{l}\color{normal} \)ampereElectromotive force (EMF)\( \color{voltage}V\color{normal} = \color{voltage}\int \mathbf{E}\cdot\operatorname{d}\mathbf{l}\color{normal} \)volt
Magnetic fieldHampere/meter = newton/weberElectric fieldEvolt/meter = newton/coulomb
Magnetic flux\(\Phi\)weberElectric chargeQCoulomb
Magnetic Current\( \color{current}\dot \Phi\color{normal} = \color{current}\frac{d\Phi}{dt}\color{normal} \)weber/second = voltCurrent\( \color{current}I\color{normal} \)coulomb/second = ampere
Magnetic impedance\( \color{impedance}\mathcal{Z}(\omega)\color{normal} = \color{impedance}\frac{\mathcal{F}(\omega)}{\dot \Phi(\omega)}\color{normal} \)1/ohm = mho = siemensImpedance\( \color{impedance}Z(\omega)\color{normal} = \color{impedance}\frac{V(\omega)}{I(\omega)}\color{normal} \)ohm
Magnetic resistance\( \color{resistance}\mathcal{R}\color{normal} =  \color{resistance}\operatorname{Re}(\mathcal{Z}(\omega))\color{normal} \)1/ohm = mho = siemensResistance\( \color{resistance}R\color{normal} =  \color{resistance}\operatorname{Re}(Z(\omega))\color{normal} \)ohm
Magnetic reactance\( \color{reactance}\mathcal{X}\color{normal} =  \color{reactance}\operatorname{Im}(\mathcal{Z}(\omega))\color{normal} \)1/ohm = mho = siemensReactance\( \color{reactance}X\color{normal} =  \color{reactance}\operatorname{Im}(Z(\omega))\color{normal} \)ohm
Magnetic admittance\( \mathcal {Y}(\omega)=\frac{\color{current}\dot \Phi(\omega)\color{normal}}{\color{voltage}\mathcal{F}(\omega)\color{normal}}\)ohmAdmittance\( Y(\omega)=\frac{\color{current}I(\omega)\color{normal}}{\color{voltage}\mathcal{E}(\omega)\color{normal}} \)1/ohm = mho = siemens
Magnetic conductance\( \mathcal{G} =  \operatorname{Re}(\mathcal{Y}(\omega)) \)ohmElectric conductance\( G =  \operatorname{Re}(Y(\omega)) \)1/ohm = mho = siemens
Magnetic susceptance\( \mathcal{B} =  \operatorname{Im}(\mathcal{Y}(\omega)) \)ohmElectric susceptance\( B =  \operatorname{Im}(Y(\omega)) \)1/ohm = mho = siemens
Magnetic inductance\( \color{inductance}\mathcal{L}\color{normal} = \color{inductance}\frac{\mathcal{X}(\omega)}{\omega}\color{normal} \)FaradInductance\( \color{inductance}L\color{normal} = \color{inductance}\frac{X(\omega)}{\omega}\color{normal} \)Henry
Permeance / magnetic capacitance\( \color{capacitance}\mathcal{C}\color{normal} = \color{capacitance}\frac{\mathcal{B}(\omega)}{\omega}\color{normal} \)HenryCapacitance\( \color{capacitance}C\color{normal} = \color{capacitance}\frac{B(\omega)}{\omega}\color{normal} \)Farad
Power\( P = \color{voltage}\mathcal{F}\color{normal} \cdot \color{current}\bar{\dot \Phi}\color{normal} \)Watts = Joule/secondPower\( P = \color{voltage}V\color{normal} \cdot \color{current}\bar{I}\color{normal} \)Watts = Joule/second

Ohms law also has its dual for use in a magnetic circuit and it is called Hopkinson’s law and is defined in a similar way as Ohm’s law except by simply substituting our duals for each property as noted in the above table1.

$$ \color{voltage}\mathcal{F}\color{normal} = \color{current}\frac{d \Phi}{dt}\color{normal} \cdot \color{impedance}\mathcal{Z}\color{normal} $$

Which is sometimes simplified to just the following.

$$ \color{voltage}\mathcal{F}\color{normal} = \color{current}{\dot \Phi}\color{normal} \cdot \color{impedance}\mathcal{Z}\color{normal} $$

    \(\color{voltage}\mathcal{F}\color{normal}\) is the magnetomotive force, also called magnetic voltage, and is in ampere,
    \(\color{current}\frac{d \Phi}{dt}\color{normal}\) or \(\color{current}{\dot \Phi}\color{normal}\) is the rate of change of the magnetic flux also called the magnetic current, measured in volts, and
    \(\color{impedance}\mathcal{Z}\color{normal}\) is the magnetic impedance, which is measured in siemens, the reciprocal unit of the ohm.

As long as these equivalences are kept in mind then we can easily calculate the power of our magnetic circuit in the same way we would an electric circuit. Since this model is work-equivalent a magnetic circuit which is the dual of an electric circuit will always have the same power per component, at least when we represent the components as ideal components. Since power is a measure of the rate at which work is being done we likewise will always have the same work being done by these two circuit duals. This is defined by Joule’s Law which in a magnetic circuit is as follows1.

$$ P = \color{voltage}\mathcal{F}\color{normal} \cdot \bar{\color{current}\dot \Phi\color{normal}} $$

    \(P\) is the power in watts,
    \(\color{voltage}\mathcal{F}\color{normal}\) is the MMF. in ampere, and
    \(\bar{\color{current}\dot \Phi\color{normal}}\) is the conjugate of the magnetic current, in volts.

Of course you only need to take the conjugate when dealing with complex numbers in the frequency domain, in the time domain where we use real numbers the conjugate of a real number is always itself, so this can be ignored. Keep in mind in the frequency domain these values must be in RMS form and not absolute magnitudes.

One other minor consideration here is that a magnetic dual circuit will have magnetic field lines that are perpendicular to the orientation of magnetic fields in its electric dual. Similarly the magnetic field lines in the magnetic circuit will be parallel to the electric field lines in the electric circuit. For example the magnetic field lines which surround the wires in a magnetic circuit run parallel and along the wires unlike in an electric circuit where they form concentric circles around the wires.

Magnetic DC and AC Current

When we talk about magnetic DC or AC current things get a bit confusing. In practice most authors would simply avoid these terms all together, and for good reason. But as should be obvious at this point every concept in our electric circuit model has a dual in the magnetic circuit model, so it is worth touching on this.

The reason it is a bit misleading is because in an electric circuit a DC current implies electrons are moving in a single direction at a constant rate, the electrons never speed up or slow down. However remember that in an electric circuit current is defined as the rate of change of flux, \(\color{current}\dot{\Phi}\color{normal} = \color{current}\frac{d\Phi}{dt}\color{normal}\). This means the dual of a DC current from an electric circuit would be a constantly increasing (or decreasing) flux, \(\Phi\), in a magnetic circuit. Such an effect could be produced if the source of the magnetic field feeding into our magnetic circuit happened to be an ideal voltage source connected to an ideal inductor. As you know as an inductor decreases its impedance in the time-domain as it charges, therefore in order to maintain a constant voltage across an inductor the current through the inductor would have to increase at a constant rate, thus resulting in a constant rate of increase in the magnetic flux it produces.

While in a theoretical context where we are modeling a magnetic circuit with ideal components this sense of a DC circuit works just fine. However in the real world parasitic resistance would quickly overwhelm the ever increasing current and make any magnetic DC current that is sustained for an appreciable length of time impractical. So while we can talk about magnetic DC currents as a way of better understanding the duality, in practice, they’re not something we are likely to employ for any length of time as we we would with DC in an electric circuit.

On the other hand AC magnetic circuits are perfectly fine and in fact the norm. If the same voltage source combined with an inductor was used as the power source but the voltage source was made into a sinusoidal AC voltage instead then the resulting changing magnetic flux would represent an AC magnetic circuit. This would also be equivalent to a rotating magnet with a constant angular velocity in place of the inductor and voltage source.

The Magnetic Capacitor

Much as an electric capacitor is a component which acts as a reservoir able to to store energy in the form of an electric field or convert that stored energy back into current in an electric circuit, a magnetic capacitor is a component of a magnetic circuit that acts as a reservoir for energy in the form of a magnetic field. Similarly an electric capacitor is made of two electrically conductive plates with a material in between them which has high permittivity and high resistance a good magnetic capacitor would consist of two plates made of material with high magnetic conductivity with a material between them which has high permeability and a low magnetic resistance, which we will cover in the next section. Remember as we pointed out in our table of duals above, permeability is to the magnetic field what permittivity is to the electric field, and electric resistance has the reciprocal units of magnetic resistance. We cover the relationship between electrical resistance and magnetic resistance in more detail in the next section but for now just keep in mind that higher electrical resistance of a material results in lower magnetic resistance for that same material. As such a good magnetic capacitor will usually be made of a material with high electrical resistance.

In practice since the wires that connect the components in a magnetic circuit tend to have low magnetic resistance by design, usually made of iron ferrite, it is typical that the plates of a magnetic capacitor are made of the same material as that which is filling the space between the plates, where you would normally expect a dielectric to be in an electric capacitor. As such a magnetic capacitor is nothing more than a block of material usually of the same construction as your magnetic wires, and there are effectively no actual plates of any kind, just the surface at each end of the block. None the less the surface area of the plane at each end of the block of material determines the magnetic capacitance in much the same way the surface area of the plates in an electric capacitor would, that is, the larger the surface area the greater the capacitance in both the electric and magnetic duals of a capacitor. Similarly in much the same way the distance between the two plates lowers the capacitance in an electric capacitor, the length of the block of material making up our magnetic capacitor also lowers the magnetic capacitance of our magnetic capacitor.

This contrast may seem odd at first but when you think about it a bit further it actually makes a lot of sense. In an electric circuit an inductor is really just a long wire, we tend to coil them up to save some space or help direct the magnetic field, but a copper wire is just an inductor. The only real difference between an inductor and a wire is the geometry, and the specific geometry or a wire, particularly if it is very long, along with the frequency traveling through it, determines if the inductive effects are significant or not. In electric circuits when an ordinary wire happens to be long enough where the inductive effects become significant we would call that parasitic inductance. Likewise if a magnetic wire has dimensions that are sufficient to cause it to have a significant and unwanted magnetic capacitance that would simply be parasitic capacitance for a magnetic circuit. So its really not that odd at all that every magnetic wire is effectively a magnetic capacitor since in the electric domain every electric wire is effectively an inductor.

The fact that a magnetic capacitor is just a block of iron or iron ferrite actually makes a lot of sense when you think about it, afterall in a ferrite core inductor the core’s purpose is to act as a reservoir to hold the magnetic field in a smaller space than what would be needed for an air core. So even in an electric circuit iron ferrite acts as a magnetic field reservoir.

Note that magnetic capacitance is just another word for permeance, the two terms can be used interchangeably. Likewise permeance is also simply the reciprocal of reluctance. Usually when modeling capacitors in either an electric circuit or magnetic circuit we treat the capacitors as ideal components that experience no resistive loss; resistive loss in a capacitor is called ESR, effective series resistance. However in the real world if we wish to model a real electrical or magnetic capacitor we must measure their respective capacitance as complex values instead, which enables us to calculate the ESR. For the same reason we also use complex permittivity or complex permeability when calculating the capacitance/permeance for the same reason, at least when we aren’t dealing with an ideal model. We cover all this in more detail in the section on magnetic resistance, but for now just keep in mind magnetic capacitance can be represented as a complex value if we wish to determine the ESR. Many of you who are used to doing models of electric circuits likely only ever considered capacitors as ideal components and thus are only familiar with real number values for capacitance; the same ideal model approach can be used here. If considering a magnetic capacitor as ideal then it is usually acceptable to simply represent its capacitance as a real number as we would with analysis of electric circuits.

The following is the equation for determining the complex magnetic capacitance of a component2.

$$ \color{capacitance}\mathcal{C}_{_{real}}\color{normal} = \color{capacitance}\mu\cdot\frac{S}{L}\color{normal} \label{realcap} $$

    \(\color{capacitance}\mathcal{C}_{_{real}}\color{normal}\) is the complex magnetic capacitance, also called permeance, the unit of which is the Henry,
    \(\color{permeability}\mu\color{normal}\) is the complex2 permeability of the material at the given frequency,
    \(S\) is the cross sectional surface area of the material, and
    \(L\) is the length of the material.

If we wish to calculate the real-value magnetic capacitance of an idealized capacitor the equation is the same except \(\color{permeability}\mu\color{normal}\) becomes a real number; specifically it looses its real component and only represents the magnitude of the imaginary part of the complex permeability1. The specific equation would then look like the following.

$$ \color{capacitance}\mathcal{C}_{_{ideal}}\color{normal} = \color{capacitance}\mu'' \frac{S}{L}\color{normal} \label{idealcap} $$

    \(\color{capacitance}\mathcal{C}_{_{ideal}}\color{normal}\) is the idealized real-value magnetic capacitance, and
    \(\mu^{\prime\prime}\) is the imaginary component of the complex2 permeability of the material at the given frequency.

Specifically \(\mu^{\prime\prime}\) relates to the complex permeability as follows.

$$ \color{permeability}\mu\color{normal} = \color{permeability}\mu' - j\mu''\color{normal} $$

The following diagram shows an example demonstrating the above equation for a block of material.

Keep in mind depending on if you use the complex capacitance equation above \(\eqref{realcap}\), or the idealized form \(\eqref{idealcap}\), would affect how you construct reactance and impedance equations. The ideal form drops the imaginary number and as such is always positive, so when constructing the impedance you must explicitly add the imaginary number back in, for example. This is the approach most people are used to since capacitors are usually modeled as ideal. However in the real-world capacitance equation the imaginary number is already included in the complex value \(\color{permeability}\mu\color{normal}\) therefore when using that form of the equation to construct your impedance values you do not need to add the imaginary number back in, however when using it to construct the reactance value you must make sure to take the imaginary part, which will be negative, and drop the \(\color{imaginary}j\color{normal}\). Whichever approach you use, be sure to be consistent. The following demonstrates this.

$$ \color{impedance}\mathcal{Z}\color{normal} = \frac{1}{\omega\color{capacitance}\mathcal{C_{_{ideal}}}\color{imaginary}j\color{normal}} = \frac{1}{\omega\color{capacitance}\mathcal{C_{_{real}}}\color{normal}} $$
$$ \color{reactance}\mathcal{X}\color{normal} = -\frac{1}{\omega\color{capacitance}\mathcal{C_{_{ideal}}}\color{normal}} = \operatorname{Im}\left(\frac{1}{\omega\color{capacitance}\mathcal{C_{_{real}}}\color{normal}}\right) $$

The Magnetic Inductor and Gyrator

As discussed in the previous section in an electric circuit a capacitor is a component which acts as a reservoir for an electric field where it can store energy from the circuit as well as deliver that energy back into the circuit. Similarly in an electric circuit an inductor is a component which acts as a reservoir for a magnetic field which can store or deliver energy from, and to, the circuit. In addition a capacitor in an electrical circuit is storing the type of field, the electric field, being worked within the wires of an electrical circuit, this should be obvious as an electric voltage between two points is the measure of the difference in the strength of the electric field between those points, and electric current is the movement of electrons, an electric charge carrier. This is why we can measure the voltage across a capacitor and use that to indicate how full the capacitor happens to be. Likewise as we recently covered the magnetic capacitor stores a magnetic field instead, which is similarly the type of field inherent to a magnetic circuit. This is contrasted with an electric inductor; in an electric circuit, an inductor stores energy in the form of a magnetic field, which is not the type of field propagating through an electric circuit’s copper wires, outside of parasitic effects or inductors anyway, though even in that case the magnetic field largely manifests outside of the copper wires themselves. So in a sense the purpose of an inductor in an electric circuit is to convert the intrinsic electric field energy of the circuit into magnetic field energy, and then store it. In the case of an electric inductor this is stored in the ferrite core, in the case of an air core inductor it is stored in the space around the inductor. This analogy also applies when we talk about magnetic inductors, that is, their purpose is to convert the intrinsic magnetic field energy of a magnetic circuit into electrical field energy and store this energy, or to deliver that stored energy back into the magnetic circuit.

One thing that may be obvious from the description I just gave is that an electric inductor, like its magnetic inductor counterpart, is a component that is capable of converting electric energy into magnetic energy, and its magnetic inductor counterpart must reverse this process and be able convert magnetic energy into electric energy. However we already know that an electric inductor works in both directions; just as the electric energy of the circuit can charge it, converting it to magnetic energy stored in its ferrite core, it can also take that stored magnetic energy in the ferrite core and deliver it back into the circuit in the form of electric energy. In magnetic circuits a component that is able to convert electric energy into magnetic energy, and magnetic energy into electric energy is called a gyrator, and it’s really no different than what we think of as an electric inductor already, just working in the opposite direction. Keep in mind a gyrator is not the magnetic dual of an electric inductor, a magnetic inductor, but it is essential in getting there, which we will do in a moment, first let me finish explaining what a gyrator is.

Again, a gyrator is nothing more than an electrical inductor wrapped around a magnetic conductor of some type, usually a ferrite core or a piece of iron. Despite this, because it is viewed a bit differently than we tend to view an inductor used an electric circuit it is given a special name, and even its own circuit symbol. It is best to view a gyrator as a component that converts between electrical and magnetic energy. The following is a diagram that demonstrates this.

In the above diagram we see the physical gyrator on the left hand side as an inductor wrapping around a ferrite core, with electrical leads protruding to the left, and then the ferrite core becomes the wires of the magnetic circuit. The leads to the magnetic circuit extend to the right of the diagram.  On the right hand side of the diagram we see what the schematic symbol for a gyrator looks like in a magnetic circuit schematic.

One interesting point is that your instinct might be to think of the electrical current through the inductor as being translated to the magnetic current in the core, but in fact that isn’t the case. If you recall from our table above relating the duals in a magnetic circuit to those in an electric circuit you can see that the dual for the EMF (Electromotive Force), measured in volts, of an electric circuit is the MMF (Magnetomotive Force), measured in ampere13, in a magnetic circuit; likewise the dual for the electric current, measured in ampere, is the magnetic current, measured in volts, and is equivalent to the rate of change of the magnetic flux in the circuit, \(\dot{\Phi}\)34. This may seem odd but it actually makes some sense when you consider the context of a gyrator. Consider Faraday’s law which relates the rate of change of the magnetic flux of an inductor with the voltage across it.

$$ \color{voltage}V\color{normal} = N \cdot \color{current}\frac{d\Phi}{dt}\color{normal} = N \cdot \color{current}\dot{\Phi}\color{normal} $$

    \(\color{voltage}V\color{normal}\) is the EMF (Electromotive Force) in volts,
    \(N\) is the number of turns of the inductor, and
    \(\color{current}\frac{d\Phi}{dt}\color{normal}\) and \(\color{current}\dot\Phi\\color{normal}) are the rate of change of the magnetic flux, also called the magnetic current, in volts.

It should be clear from this relationship that the magnetic current, \(\color{current}\dot\Phi\color{normal}\), is simply the voltage across the inductor divided by the number of turns in the inductor.

$$ \color{current}\dot\Phi\color{normal} = \frac{\color{voltage}V\color{normal}}{N} $$

Since \(N\) is a dimensionless unit we can presume that \(\color{current}\dot\Phi\color{normal}\) uses the same units as \(\color{voltage}V\color{normal}\), volts. Similarly the Magnetomotive Force, MMF, of an inductor is defined as the current through the inductor multiplied by the number of turns of the inductor1.

$$ \color{voltage}\mathcal{F}\color{normal} = N \cdot \color{current}I\color{normal} $$

    \(\color{voltage}\mathcal{F}\color{normal}\) is the MMF (Magnetomotive Force) in ampere,
    \(N\) is the number of turns of the inductor, and
    \(\color{current}I\color{normal}\) is the current through the inductor in ampere.

One problem with the standard diagram for a gyrator as shown above is it doesn’t actually indicate what side is the electric side housing the leads into the electric inductor, and what side is the magnetic side. This is somewhat intentional as it is intended to be a generic component acting as a blackbox which inverts voltage and current from wide side onto the inverse on the other. Since in the real world the distinction between the two halves of a gyrator is important lets use the following block diagram for a gyrator instead so I can explicitly illustrate the electric side, on the left in this block diagram, and the magnetic side, on the right.

Here is the cool part, if we put two gyrators in series connected by the magnetic side, and optionally adjust the number of turns in the inductor, we have a plain old electric transformer with an iron or iron ferrite core, such as the following.

Likewise if we connect two gyrators in series but this time connect them on their electric side, again optionally carrying the number of turns, we would have the magnetic circuit dual of a transformer.

So by now you might be wondering how all this ties into the magnetic circuit dual of an inductor, the magnetic inductor. Well as described in the introduction to this section a magnetic inductor is a component that converts the magnetic energy in a magnetic circuit into an electric field and stores it, and is able to deliver that energy back into the circuit. To do that we simply use a gyrator to convert the magnetic energy into electric, and then use an electric capacitor hooked up to the leads in order to act as the reservoir for the energy stored as an electric field. The following diagram would illustrate a magnetic inductor.

Magnetic Impedance, Resistance, and Reactance

Magnetic resistance is the dual of electrical resistance; just as electrical resistance impedes the flow of electric current by converting its energy into heat, magnetic resistance impedes the flow of a changing magnetic field, magnetic current, by converting some portion of that energy into heat.

In the resistance-reluctance Model of a magnetic circuit, which is not the model we have been discussing, the dual of resistance is simply reluctance. Reluctance does change with frequency, and can attenuate the propagation of magnetic flux through it. Materiales with high reluctance will show the magnetic flux declining at a greater rate due to distance. However this is not necceseraly due to energy dissipation of the magnetic field itself, particularly when dealing with a static magnetic field, no energy is lost due to reluctance. Reluctance of a material will effect a static magnetic field such as a permanent magnet and not just changing fields. Since it doesn’t necceseraly reduce the quantity of energy in the system, then by definition, it is not resistance.

In the model for magnetic circuits we are discussing, the capacitor-permeance model, the dual of electrical resistance is called magnetic resistance. Bear in mind that in this particular model both magnetic current and magnetic resistance mean something different than in the resistance-reluctance model. In the resistance-reluctance model magnetic current is the magnetic field itself, and as such a static field, such as with a permanent magnet, constantly has current, and reluctance impedes that current. However as mentioned earlier because that model is not work-equivalent to an electric circuit the presence of current or resistance in that case does not imply any loss of energy as we normally would think of when we talk about a current relative to resistance. This is the reason the resistance-reluctance Model is rarely used, it isn’t consistent with the expected meaning of those terms. However in the capacitor-permeance model we have been discussing the current is not the magnetic field but rather the rate of change of the magnetic field, and resistance is not reluctance but magnetic resistance which is a materials ability to convert a changing magnetic field into heat. This is analogous with dielectric Loss of a capacitor in a changing electric field, which we often see in capacitors and give rise to their ESR, and thus energy loss as heat. In case of magnetic resistance it is the same effect but with regards to the magnetic field instead.

Remember from earlier in the section where we discussed magnetic capacitors that we pointed out that a magnetic capacitor is the same as a magnetic wire, a ferrite core usually, with dimenions sufficient enough for the capacitance to be relevant, and like electric capacitors a magnetic capacitor is designed using materials that ensure it has a low magnetic resistance, which implies it will also have a high electrical resistance. So the only real difference between a magnetic capacitor and a magnetic wire is that the material and dimensions of a magnetic wire is selected to minimize both magnetic resistance and magnetic capacitance, meanwhile a magnetic capacitor has materials and dimensions selected to minimize magnetic resistance but maximize magnetic capacitance. In the same sense a magnetic resistor is no different; it is a block of material representing the component where the materials and dimensions are selected so as to have the magnetic resistance match the intended resistance value while minimizing magnetic capacitance. This is not all that different from an electric resistor in the sense that its materials and dimensions are selected to match a particular electrical resistance while minimizing its electrical inductance. This is why we dont use long copper wires for resistors, if the wire is long enough it can have the desired electrical resistance but its inductance would be too high to be useful, so we usually try to design electric resistors to be as short as possible.

Before we can understand magnetic resistance I think its important to understand the analogous concept of electrical resistance of materials and how it arises at various frequencies, especially since most of the time it is grossly oversimplified and misunderstood even in electrical circuits. Contrary to typical thinking electrical resistance isn’t simply a static property where materials have inherent resistance, although that can sometimes be the case. A resistor for example always has about the same resistance, sure it varies a little with heat or other conditions, but for the most part the resistance will stay about the same regardless of frequency or other influences. But a resistor is just one very straight forward and simple example of how electrical resistance may present in a circuit. In reality resistance is a much more generalized idea, electrical resistance represents the ability of a material or component to convert current into energy that escapes the circuit in some way. In the case of a electical resistor it is converting energy from the electrical current, and thus reducing the flow of current by doing so, and releasing that energy into the surrounding air in the form of heat. The energy contained in the heat released will be the same quantity of energy as the energy that was removed from the circuit.

When we talk about power in a circuit, more specifically real power, which is the real value component of power when power is represented as a complex number, we are talking about the rate at which a circuit does work. This is another way of saying, in the case of resistors anyway, the rate at which energy in the circuit is converted into heat energy. This definition is the same whether we are talking about magnetic circuits or electric, therefore the real measured resistance of a circuit also means the same between between an electric and magnetic circuit, that is, the ability to convert their respect forms of current into heat. Where the difference between these crops up is in the nuance, in realizing that measured real world resistance is not completely determined by circuit resistance due to resistivity. Let me give a real world example of this, again, in the simple to understand electrical circuit model. If you have an antenna and that antenna is well tuned to the frequency of the signal in a circuit it will typically be said to have a 50 ohm impedance, this is a real value impedance, therefore it is resistance, not reactance. In other words at that particular frequency it is behaving as if it were a physical 50 ohm resistor. The difference is that while a resistor is doing work that causes heat to be produced the antenna is doing work that causes RF waves to be propagated out into space. But in terms of your instruments if you measure the real resistance in the circuit due to the antenna at the given frequency it will have real (not imaginary) resistance, it will look indistinguishable from a resistor; the same energy is being lost at the same rate as in a resistor but simply as RF energy rather than heat. In other words the circuit has real resistance despite the fact that the components itself have 0 resistivity; if you measured resistance using an ohm meter, which measures DC resistance only, it would not be the same. As an important side note, the antenna would be distinguishable, however, from a capacitor or inductor with a 50 ohm impedance. In that case the impedance of a capacitor or inductor at 50 ohms would be entirely imaginary, not real, and as such would not look at all like the antenna or a resistor in our case; we would call this reactance, not resistance. Because capacitors and inductors have minimal real resistance they don’t do any work, no energy is released from the circuit, instead the energy is stored in the capacitor or inductor and usually injected back into the circuit at a later point, which is not at all the same thing.

So now that we understand a bit about the nature of real measured resistance of a circuit at a given frequency is not the same as the resistance of a circuit at DC, or its resistivity, lets try to express how we define that in the electric world using some actual math. This should give us an understanding that we can use to see how this is similar to the same sorts of equations in our magnetic circuit dual. A great example of this in electric circuits is the ESR, equivalent series resistance, that arises when we add a capacitor to a circuit that uses a dielectric material other than a vacuum. Bear in mind a capacitor is very similar to an open circuit where the high resistance is due to a dielectric rather than air. In this case the ESR arises due to two factors, one is the actual real resistance of the material at DC, which tends to be very high, and as such has minimal effect on ESR. The other is due to losses caused by a changing electric field through the dielectric material, it is this part of the real resistance which will vary significantly with frequency and as such the ESR of a capacitor also varies significantly with frequency.

Keep in mind that because in a magnetic circuit wires, capacitors, and resistors are all modeled similarly we will find the equations we go over when talking about electrical capacitors will have direct relevance in describing magnetic resistance, so just bear with me and this will all make sense by the end.

The loss rate of energy in a material at a particular frequency is called the loss tangent it is also called the dissipation factor, it is a unitless ratio and is almost always a positive value. A value of infinity indicates that all the energy it takes from the circuit is lost. For example a resistor has a loss tangent that is extremely large at most frequencies since almost all of the energy it consumes is lost as heat. Likewise a perfectly resonant antenna presenting as 50 ohms real resistance at a given frequency has a similarly extremely large loss tangent at that given frequency; though unlike the resistor the loss tangent will vary as the frequency changes and the antenna is no longer in tune. Similarly an ideal capacitor or inductor, which by definition would have an ESR of 0, has a loss tangent value of 0 because all the energy consumed by the capacitor or inductor is stored in their respective fields and are not lost to the circuit; at a later point in the AC cycle that energy can, and often is, recovered by injecting that energy back into the circuit as the component discharges. In reality most real world components exhibit a finite, non-zero, loss tangent. In addition there are several types of loss tangent, and more generally energy loss, for any any component, at a given frequency, which combine to give the overall loss tangent. For example there is a loss tangent due to the magnetic field, a loss tangent due to the electric field, and energy loss due to resistivity, which can be measured as DC resistance. In some cases, depending on how deep your analysis is going, each of these loss tangents maybe be broken down further into multiple loss tangents that add up as well, but we wont go that deep here. Typically in many real world applications only one of the two types of loss tangent mentioned is dominant or relevant; for example in an electrical capacitor the majority of the loss is due to the dielectric’s interaction with the changing electric field, since that is the dominant field passing through the dielectric. Similarly for an electrical inductor it is mostly the magnetic loss that is dominant, and for an electrical resistor it is mostly loss due to resistivity. However even though only one tend to dominate for each component in reality all three do have some effect in order to add up to the total loss tangent.

As mentioned for an electrical capacitor the dominant loss tangent is the electric loss tangent, however even though the loss due to resistivity is minimal it does have an effect, and that effect becomes significant if the dielectric is not a very good resistor. When we analyze a capacitor we usually consider the combined effect of the electric loss tangent and the resistive loss, which is called the dielectric loss tangent. The following is the definition for the dielectric loss tangent5:

$$ \tan{(\delta_d)}  = \frac{\omega\varepsilon'' + \sigma}{\omega\varepsilon'} \label{dieleclosstan} $$

    \(\omega\) is the angular frequency of the signal,
    \(\sigma\) is the DC conductivity of the material, and
    \(\varepsilon’\) and \(\varepsilon^{\prime\prime}\) are components of the complex permittivity, \(\color{permittivity}\varepsilon\color{normal}\) of the material such that.

$$ \color{permittivity}\varepsilon\color{normal} = \color{permittivity}\varepsilon' - j\varepsilon''\color{normal} $$

As mentioned the dielectric loss tangent present in equation \(\eqref{dieleclosstan}\) is the combination of two different types of loss, the loss due to the changing electric field’s interaction with the dielectric, this is determined by the complex permittivity of a material, and loss due to resistivity, the DC resistance loss, which is the result of a dielectric having some finite resistance. When the resistance of the dielectric approaches infinity, or to put it another way when the conductivity, \(\sigma\), of the dielectric material approaches 0, the limit of this will give us the electric loss tangent5. Since most dielectrics used in capacitors have low conductivity the electric loss tangent is usually a good approximation in that case.

$$ \tan{(\delta_e)}  = \lim_{\sigma \to 0} \frac{\omega \varepsilon'' + \sigma} {\omega \varepsilon'} \label{diaelectloss} $$
$$ \tan{(\delta_e)}  = \frac{\varepsilon''}{\varepsilon'} \label{eleclosstan} $$

In this context we would call \(\tan{(\delta_e)}\) the electric loss tangent, while \(\delta_e\) we would call the electric Loss tangent angle.

The dielectric loss tangent for a component is the result of many different nuanced and complicated effects including, but not limited to, polarization of a dielectric, concentration of current due to the skin effect, inadvertent dielectric coupling, RF emissions, and DC resistance. The result of such complicated interactions is that it can be difficult to predict the loss tangent of a electrical capacitor at a particular frequency. However once the complex permittivity and conductance value is known then we can use equation \(\eqref{dieleclosstan}\) to calculate a reasonable approximation of the overall loss tangent.

The dielectric loss tangent angle is the angle on the complex plane between the vector representing the complex impedance of the capacitor at the desired frequency and the negative axis for reactance. Remember that the real axis part of impedance represents real measured resistance at a given frequency, which in this case is the ESR, and is different from the DC resistance, however it is always less than the DC resistance; meanwhile the negative reactance part of impedance represents mostly the ideal capacitance of our capacitor. We can illustrate this with the following diagram of the complex plane.

The angle here is the dielectric loss tangent angle specifically, which would be equal to the electric loss tangent angle if the dielectric has negligable conductivity. If we were measuring the magnetic loss tangent angle then the angle would be measured relative to the positive reactive axis instead. As such any loss tangent angle is always a positive value between 0 and 90 inclusive6. From this diagram it should also become obvious that the dielectric loss tangent angle is the ratio of the ESR over the reactance as a positive value. If this isn’t immediately obvious to you try to remember your geometry classes where the tangent of the angle is defined as the ratio of the opposite side from the angle (ESR) divided by the adjacent side (reactance). Therefore we can define the following equation.

$$ \color{impedance}Z\color{normal} = \color{resistance}R_{_{ESR}}\color{normal} + \color{reactance}X\color{imaginary}j\color{normal} $$
$$ \tan{(\delta)} = \frac{\color{resistance}R_{_{ESR}}\color{normal}}{|\color{reactance}X\color{normal}|} \label{losstan} $$

    \(|\color{reactance}X\color{normal}|\) is the positive value of the impedance’s reactive component, it should not include the imaginary unit \(\color{imaginary}j\color{normal}\).

Notice that in the above equation we did not specify the \(e\) subscript, that is because the same equation will work for both the loss tangent of a capacitor or an inductor owing to the fact that each respective version would always take the loss tangent angle relative to either the negative reactive axis or the positive. So this works equally well for either situation.

At this point all we know is the ratio of real resistance to reactance of a material at the giving frequency, but much like conductivity and resistivity you cant really use this information to define the impedance or resistance of a component; for that you need to know its physical dimensions or its reactance. So the loss tangent is a property of the material and at a given frequency is more or less the same regardless of the size of the material, as is the complex permittivity and conductivity that define the dielectric loss tangent. Therefore to actually go from this to the impedance value we need to take a few extra steps.

Lets start by defining the reactance of a capacitor relative to its capacitance.

$$ \color{reactance}X\color{normal} = \color{reactance}-\frac{1}{\omega C_{_{ideal}}}\color{normal} $$

From this we know what we need a capacitor’s reactance and dielectric loss tangent in order to calculate the ESR as follows.

$$ \color{resistance}R_{_{ESR}}\color{normal} = \left\lvert\color{reactance}-\frac{1}{\omega C_{_{ideal}}}\color{normal}\right\rvert \cdot \tan{(\delta_e)} $$

Since we are trying to cancel out the denominator in equation \(\eqref{losstan}\), which is an absolute value, we should also take the absolute value of our reactance here. We can also define the ESR in terms of physical dimensions of the capacitor instead with the following equation.

$$ \color{capacitance}C_{_{real}}\color{normal} = \color{permittivity}\varepsilon\color{normal}\frac{A}{D} $$

    \(\color{capacitance}C_{_{real}}\color{normal}\) is the complex capacitance, in farads,
    \(A\) is the area of overlap of the two plates, in square meters,
    \(\color{permittivity}\varepsilon\color{normal}\) is the complex absolute permittivity of the dielectric, and
    \(D\) is the separation between the plates, in meters.

One important point here, is that when dealing with an ideal capacitor the complex absolute permittivity, \(\color{permittivity}\varepsilon\color{normal}\), would have a finite imaginary component, representing the reactance, but the real component would effectively be infinite, representing a resistance of 0. The real component would be where the dielectric loss would be represented if it were a finite value. If we used a capacitors real-world complex absolute permittivity, this would similarly result in a complex-valued capacitance, which could be used as an alternative route to calculating the ESR from the approach we are currently taking. Since most people are used to working with circuit models that represent capacitors as ideal they are also used to a simplified equation here where the permittivity is represented as a real number which is the magnitude of the imaginary component of the complex permittivity; this in turn causes the capacitance to be a real number as well. This is the reason the equation for the impedance of a capacitor is usually represented as \(\color{impedance}\frac{1}{\omega C j}\color{normal}\); basically the imaginary number needs to be added back in. If you use the complex-valued capacitance for \(\color{capacitance}C\color{normal}\) the imaginary number is already a part of the capacitance value and wouldn’t be added in when calculating the complex impedance. In our situation we are accounting for the ESR through the electric loss tangent side of the equation, and we are trying to find the purely imaginary part of the complex capacitor, the reactance, so instead we would want to use the equation for an ideal capacitor, which would be as follows.

$$ \color{capacitance}C_{_{ideal}}\color{normal} = \color{capacitance}\varepsilon''\frac{A}{D}\color{normal} $$

    \(\color{capacitance}C_{_{ideal}}\color{normal}\) is the positive real-valued capacitance, in farads,
    \(A\) is the area of overlap of the two plates, in square meters,
    \(\varepsilon^{\prime\prime}\) is the magnitude of the imaginary component of the complex absolute permittivity for the dielectric, and
    \(D\) is the separation between the plates, in meters.

From this we can substitute the above equation in for capacitance and arrive at our ESR as a function of the capacitor’s physical dimensions, the dielectric’s complex absolute permittivity, and the frequency of the signal.

$$ \color{resistance}R_{_{ESR}}\color{normal} = \left\lvert\color{reactance}-\frac{1}{\omega (\varepsilon''\frac{A}{D})}\color{normal}\right\rvert \cdot \tan{(\delta_e)} $$
$$ \color{resistance}R_{_{ESR}}\color{normal} = \left\lvert\color{reactance}-\frac{D}{\omega\varepsilon'' A}\color{normal}\right\rvert \cdot \tan{(\delta_e)} $$

Likewise we can also substitute in equation \(\eqref{eleclosstan}\) for our electric loss tangent and arrive at the following.

$$ \color{resistance}R_{_{ESR}}\color{normal} = \left\lvert\color{reactance}-\frac{D}{\omega\varepsilon'' A}\color{normal}\right\rvert \cdot \frac{\omega\varepsilon'' + \sigma}{\omega\varepsilon'} $$
$$ \color{resistance}R_{_{ESR}}\color{normal} = \color{resistance}D \cdot \frac{\omega\varepsilon'' + \sigma}{\omega^2\varepsilon'\varepsilon'' A}\color{normal} $$

Now that we have a reasonable understanding of how electrical resistance can arise due to energy loss from a a changing electric field, lets go through the same process and define the equivalent equation for the losses experienced due to a changing magnetic field. Much like the case with the electric loss tangent, demonstrated in equation \(\eqref{eleclosstan}\), the magnetic loss tangent is calculated from the complex permeability of a material. In the real-world complex permeability is also a highly complicated and nuanced property that is the result of numerous effects. Some of the factors that determine complex permeability at a particular frequency include hysteresis loss due to a static hysteresis loop, eddy current loss due to electrical resistivity, and residual loss due to magnetic domain wall and spin rotational resonances7. To give an idea just how varied the complex permeability of materials can be the following chart shows some measured values for the complex permeability of different ferrite cores and across a range of frequencies.

One really important thing to notice in the above plot is that even at relatively low frequencies of 10 kHz ferrite cores have a relatively low real component, for their permeability. Lower values of the real component indicate higher losses, therefore, ferrite cores tend to be relatively poor magnetic conductors; as such this is one among a few reasons magnetic circuits tend to be significantly more lossy than their electric circuit equivalent and are not used very often.

As a general rule the complex permeability can not be easily determined mathematically and it must be measured for a particular material at a specific frequency to find out what it is. However, much like electric loss tangent, once we know the complex permeability of a material at a particular frequency we can calculate its magnetic loss tangent with the following equation7.

$$ \tan{(\delta_e)}  = \frac{\mu''}{\mu'} \label{maglosstan} $$

    \(\mu’\) and \(\mu^{\prime\prime}\) are components of the complex permeability, \(\color{permeability}\mu\color{normal}\) of the material such that.

$$ \color{permeability}\mu\color{normal} = \color{permeability}\mu' - j\mu''\color{normal} $$

However notice here that equation \(\eqref{maglosstan}\) does not include a conductivity term or an angular frequency term as it did with our dielectric loss tangent, equation \(\eqref{dieleclosstan}\). The complex permeability is still dependent on the frequency it is measured at, so we haven’t actually eliminated our \(\omega\) term, it is just that now it happens to be rolled into the \(\color{permeability}\mu\color{normal}\) term indirectly. However the conductivity term is missing due to a more nuanced reason. If we were trying to determine the loss tangent of an electrical inductor with a ferrite core then the equation would, in fact, look much like our equation for dielectric loss tangent \(\eqref{diaelectloss}\); it would use the complex permeability, \(\color{permeability}\mu\color{normal}\), instead of complex permittivity, \(\color{permittivity}\varepsilon\color{normal}\), and would use resistivity, \(\rho\), instead of conductivity, \(\sigma\), but otherwise would look similar and include a term relating the DC resistance of the component. However in the case of an electrical inductor the resistivity term is intended to account for the DC resistance across the inductor, and thus the resistivity is due to the copper itself, and not the resistivity of the ferrite core. This makes it semantically a bit different than the case of a capacitor where the plates are electrically connected to the dielectric and thus the conductivity of the dielectric itself determines DC resistance. In the case of an inductor the DC resistance of the ferrite core does not effect the DC resistance of the inductor itself. However despite this fact the DC resistance of the ferrite core is not absent from our magnetic loss tangent equation \(\eqref{maglosstan}\), it still plays a role but in a very different way, it is reflected in the complex permeability itself, \(\color{permeability}\mu\color{normal}\), instead. As mentioned earlier one of the properties that effects the complex permeability is loss due to eddy currents; this is proportional to the conductivity of the material; materials with low conductivity/high resistivity having less loss due to eddy currents7. This is where conductivity/resistivity is hiding inside our complex permeability. However the effects are more complicated than the dielectric case because the magnitude of the eddies are also influenced by frequency. Also keep in mind we are talking about a magnetic circuit here, therefore, the changing magnetic field is considered directly when talking about magnetic resistance of a material, and the cause of the changing magnetic field may be manifold; it may be caused by an electrical inductor just as easily as it may be caused by a permanent magnet. Therefore the DC resistance of an electrical inductor, which is normally a consideration in determining an electrical inductors loss tangent, isn’t relevant here. In short, we only care about the magnetic loss tangent as represented by equation \(\eqref{maglosstan}\) without some of the complexities we faced when dealing with the dielectric loss tangent.

As stated earlier this basically means that good magnetic conductors, meaning low loss, need to have both high electrical resistivity and a complex permeability with a high real component. Not many materials naturally have both these qualities so ferrite cores are produced by encapsulating iron granuels in a resistive coating and then pressing it into the desired shape for the core. This gives it a unique combination of permeability and resistance. The following chart shows the relationship between permeability and resistance of several materials to illustrate the uniqueness of ferrite core materials.

In the same way that our electrical loss tangent angle was represented as the angle relative to the negative reactive axis in the complex plane our magnetic loss tangent angle is represented by the angle in the complex plane relative to the positive reactive axis as can be seen in the following diagram.

Notice from the above diagram we are no longer calling the real portion of the complex vector ESR as we did with a capacitor. It is now referred to as magnetic resistance and represented by \(\color{resistance}R_m\color{normal}\). Similarly the earlier loss tangent equation, equation \(\eqref{losstan}\), remains the same except that \(\color{resistance}R_{_{ESR}}\color{normal}\) is replaced with \(\color{resistance}R_m\color{normal}\). Another thing to take note of here is that earlier in our table of duals we pointed out that magnetic resistance is measured in siemens, instead of ohms, which is the reciprocal of ohms. You may think that means we need to take the reciprocal of the ESR to get the magnetic resistance, however that is not the case. Remember that in both a dielectric as well as in a ferrite core or other material, that a higher resistivity of the material results in a lower ESR. In other words the ESR has an reciprocal relationship to the materials DC resistance. This is where the reciprocal relationship in the units comes from and is therefore already represented in the equations without needing to take the reciprocal when dealing with magnetic circuits.

Similarly in the above diagram the \(\color{reactance}X_m\color{normal}\) component would be our magnetic reactance. Much like magnetic resistance the magnetic reactance has a reciprocal relationship with its electrical dual; we see a similar relationship between electrical inductive reactance and electrical capacitive reactance. Remember earlier when we said that the magnetic loss tangent is taken relative to the positive reactive axis and the electric loss tangent is taken relative to the negative reactive axis. This presented itself in equations such as equation \(\eqref{losstan}\) as taking the absolute value of the reactance. Because reactance is the imaginary component of the complex impedance, the difference in sign between the negative reactance of a capacitor and the positive reactance of an inductor is due to a reciprocal relationship; remember the reciprocal of the imaginary number is its negative. Consider the equations for determining the impedance of an electric capacitor, or that of an inductor, they are effectively reciprocals of each other.

$$ \color{impedance}Z_{_C}\color{normal} = \color{impedance}\frac{1}{\omega C j}\color{normal} = \color{reactance}-\frac{1}{\omega C}\color{imaginary}j\color{normal} $$
$$ \color{impedance}Z_{_L}\color{normal} = \color{reactance}\omega L\color{imaginary} j\color{normal} $$

The same reciprocal relationship we see between electrical resistance and magnetic resistance is likewise baked into the relationship between electrical reactance and magnetic reactance by virtue of the fact that both are represented in their positive forms by way of taking the absolute values in our equations. Since magnetic impedance is the simple addition of our magnetic resistance and magnetic reactance, it too has a similar reciprocal relationship with electrical impedance as its components have. It is for this reason that magnetic resistance, * magnetic reactance{: style=”color: rgb(45,177,93)”}, and *magnetic impedance all have the reciprocal unit of the ohm, the siemen.

Now that we got that out of the way we can take the final steps in showing how we can go from the complex permeability of a material at a certain frequency to actually determining its magnetic resistance. One way we can do this is use the equation for the ideal capacitance from equation \(\eqref{idealcap}\) and then use this to calculate \(\color{reactance}\mathcal{X}\color{normal}\). We can then use the same process we used earlier in electric capacitors to find the resistance.

$$ \color{reactance}\mathcal{X}\color{normal} = \frac{1}{\omega \color{capacitance}\mathcal{C}\color{normal}} \label{finalabs} $$
$$ \tan{(\delta_m)} = \frac{\color{resistance}\mathcal{R}\color{normal}}{|\color{reactance}\mathcal{X}\color{normal}|} $$
$$ \color{resistance}\mathcal{R}\color{normal} = |\color{reactance}\mathcal{X}\color{normal}| \cdot \tan{(\delta_m)} $$
$$ \color{resistance}\mathcal{R}\color{normal} = \left\lvert\color{reactance}\frac{1}{\omega \mathcal{C}}\color{normal}\right\rvert \cdot \tan{(\delta_m)} $$
$$ \color{resistance}\mathcal{R}\color{normal} = \left\lvert\color{reactance}\frac{1}{\omega C}\color{normal}\right\rvert \cdot \frac{\mu''}{\mu'} $$
$$ \color{resistance}\mathcal{R}\color{normal} = \frac{\mu''}{\omega \color{capacitance}\mathcal{C}\color{normal}\mu'} $$

Similarly we can now substitute in equation \(\eqref{idealcap}\) and get our equation relative to physical dimensions.

$$ \color{resistance}\mathcal{R}\color{normal} = \frac{\mu''}{\omega \color{capacitance}(\mu'' \frac{S}{L})\color{normal} \mu'} $$
$$ \color{resistance}\mathcal{R}\color{normal} = \color{resistance}\frac{\mu''L}{\omega\mu''\mu'S}\color{normal} $$
$$ \color{resistance}\mathcal{R}\color{normal} = \color{resistance}\frac{L}{\omega\mu'S}\color{normal} $$

We can also do the same substitution of equation \(\eqref{idealcap}\) on \(\mathcal{X}\) from equation \(\eqref{finalabs}\) as well if we want to determine the reactance component in order to construct the full impedance.

$$ \color{reactance}\mathcal{X}\color{normal} = \frac{1}{\omega \color{capacitance}(\mu'' \frac{S}{L})\color{normal}} $$
$$ \color{reactance}\mathcal{X}\color{normal} = \color{reactance}\frac{L}{\omega \mu'' S}\color{normal} $$

Therefore our full impedance would be

$$ \color{impedance}\mathcal{Z}\color{normal} = \color{resistance}\mathcal{R}\color{normal} + \color{reactance}\mathcal{X} \color{imaginary}j\color{normal} $$
$$ \color{impedance}\mathcal{Z}\color{normal} = \color{resistance}\frac{L}{\omega\mu'S}\color{normal} - \color{reactance}\frac{L}{\omega \mu'' S} \color{imaginary}j\color{normal} \label{finalimpcomplex} $$

That’s all there is too it; we now have our complex impedance. We can also pull out our permeability values and change that back into a complex permeability for a more compact equation, but it is usually more helpful to keep it in the complex number form above. But just to show that relationship we would do that as follows

$$ \color{impedance}\mathcal{Z}\color{normal} = \frac{L}{\omega S} \cdot \color{permeability}( \mu' - \mu'' j)\color{normal} $$
$$ \color{impedance}\mathcal{Z}\color{normal} = \frac{L}{\omega S} \cdot \color{permeability}\mu\color{normal} \label{finalimp} $$

    \(\color{impedance}\mathcal{Z}\color{normal}\) is the complex magnetic impedance of the component,
    \(L\) is the length of the component,
    \(S\) is the cross sectional area of the component,
    \(\omega\) is the angular frequency of the signal, and
    \(\color{permeability}\mu\color{normal}\) is the absolute complex permeability of the material.

Keep in mind in the above equations the vertical bars that typically mean absolute value are surrounding a complex number, in that case it means to take the magnitude of the complex number. That effectively means dropping the imaginary number and taking the absolute value.

For the sake of clarity lets show how you can derive the same equation for magnetic impedance using the non-idealized form of the equation for capacitance shown in equation \(\eqref{realcap}\). Recall in this case the imaginary number will be part of our complex capacitance so while equation \(\eqref{finalabs}\) was for reactance, since it lacked the imaginary number, now the same equation represents impedance instead when used with a complex capacitance because our complex capacitance now provides the imaginary number.

$$ \color{impedance}\mathcal{Z}\color{normal} = \frac{1}{\omega \color{capacitance}\mathcal{C}\color{normal}} $$
$$ \color{impedance}\mathcal{Z}\color{normal} = \frac{1}{\omega \cdot \color{capacitance}(\mu \frac{S}{L})\color{normal}} $$
$$ \color{impedance}\mathcal{Z}\color{normal} = \frac{L}{\omega \color{permeability}\mu\color{normal} S} $$
$$ \color{impedance}\mathcal{Z}\color{normal} = \frac{L}{\omega S} \cdot \color{permeability}\mu\color{normal} $$

At this point our equation is identical to equation \(\eqref{finalimp}\) so it should be clear that this alternative approach works as well. It should also be clear that if we continued to expand the equation and put it in complex number form, as we did with equation \(\eqref{finalimpcomplex}\), we would see the imaginary number we expect come out of the \(\color{permeability}\mu\color{normal}\) argument. So this also serves to demonstrate nicely what I mentioned earlier how when using the real-world equation for a capacitor with complex capacitance; we do not need to add the imaginary number back in manually as this is already baked into the complex value of \(\color{permeability}\mu\color{normal}\) for us.

Examples of Magnetic Circuits

This section I just want to devote to showing some sketches of simple magnetic circuits that show how any electric circuit can have a magnetic circuit dual. Keep in mind these may not be particularly practical in real life, generally the electrical equivalent will be more efficient. The only purpose is to show that every electric circuit can be represented with a magnetic circuit dual.

DC Magnetic Lamp

In this circuit we have an ideal voltage source connected to an inductor, which will produce a magnetic field that will constantly increase. This which will give us a magnetic current that flows consistently in one direction and remains constant. Keep in mind in real life the current through the inductor will also increase steadily, so while a real life version of this circuit will work for a short time, eventually you will see smoke and fire once the current gets too high for the inductor or voltage source to handle. The purpose in sharing this is to show that DC current does work in theory in a magnetic circuit, even if it isn’t practical.

In this circuit, like other circuits, the wires are represented by the rectangular path. These would be made out of iron ferrite or plain iron. The lamp at the bottom of the circuit, however, would not be any ordinary light bulb. A light bulb in an electric circuit is just a resistive component that has its resistance selected to produce a specific power. Electric lamps work by heating due to their electrical resistance until the heat is enough to make the element glow white hot, and thus produce light. In a magnetic circuit this would be no different, a lamp would just be a block of material with a specifically chosen magnetic resistance such that it would convert the magnetic energy to heat at such a rate that it would glow white hot and produce light.

If you think about it an induction furnace is in some ways very similar to this circuit; it uses magnetic energy to heat up a piece of metal untill it turns white hot and glows with light. The only differences are that an induction furnace uses an AC magnetic current instead of DC and it it heats the material to the point of melting it. In a lamp you’d want to heat it enough to get it white hot but not so much that it would actually melt of course.

Magnetic High-pass Filter

Here we have a simple magnetic bandpass filter where the magnetic lamp presents a resistance and would be factored into the component values. We added a magnetic capacitor in series with the lamp; there is also a horizontal connection just before the lamp representing a magnetic resistor acting as the resistive component to form the high-pass filter. If these were ideal components then a DC magnetic current would result in the lamp not lighting at all, and as the frequency of the sine wave generator increased the lamp would glow increasingly brighter.

Magnetic Oscillator and Bandpass Filter

Finally this is the schematic for a magnetic oscillator or bandpass filter, depending on how it is used. If the power source is a sine wave generator then as the frequency approaches the resonance between the magnetic capacitor and the magnetic inductor the lamp would glow brighter; as it goes below or above the resonance the lamp would dim.

This would also work as an oscillator in the sense that if the sine wave generator produced a pulse instead of a steady sine wave, then the energy would oscillate between the capacitor and the inductor in the circuit at their resonant frequency until all the energy is dissipated by the lamp.

Permanent magnets as a Power Source

One final related point I wanted to make was on the topic of using permanent magnets as a DC power source in a magnetic circuit. Typically permanent magnets can not do work. This is because the energy stored in their magnetic field, in order to be consumed, must deplete the magnetic field. Just as an inductor would stop acting as an electromagnet if the energy in its magnetic field were consumed, so would a permanent magnet need to become demagnetized in order to consume the energy in its magnetic field. Since permanent magnets don’t usually spontaneously become demagnetized during ordinary use it isn’t typically the case that they can provide any power on their own when stationary, or indeed do any work at all.

However a permanent magnet’s field does contain energy and under some special circumstances that energy can be extracted, though it wouldn’t be practical to do so or to use them an energy source, it is as best a curiosity to demonstrate how this is possible. In the real world the energy extracted would not be significant enough to be practical. None the less, the energy that is stored, and thus able to be recovered, from a permanent magnet’s magnetic field is exactly equal to energy delivered by an inductor that has a current going through it such that the field strength of the inductor is equal to that of the magnet. In both these cases the amount of power they could deliver to a circuit, given the correct circumstances, would be entirely equivalent.

In the case of the inductor, as we already illustrated, all that would be necessary would be to take the charged conductor, make sure it is wound around the iron wiring of our magnetic circuit, and then let the magnetic field collapse by cutting the current to the inductor. The collapsing magnetic field would cause the flux in the magnetic circuit to decrease at a constant rate and thus produce a DC magnetic current until the energy in the inductor is exhausted.

Similarly one could insert a permanent magnet of equivalent field strength into a gap cut out of the iron wiring of the magnetic circuit and let the field of the permanent magnet collapse creating an identical DC magnetic current. How do you get the field of a permanent magnet to collapse you may ask? Simple, just heat it up past a critical temperature, called the curie point, at that temperature the magnet looses its ferromagnetic properties and the field around it will collapse and provide all the same energy as the inductor did.

Now your first thought might be that the heat involved to get the magnet to the curie point is in fact the source of the energy and not the magnetic field itself. But that isn’t the case and can be easily demonstrated by the fact that the energy extracted from such a scenario would be identical regardless of the amount of energy needed to raise the temperature of the magnet to the curie point. A simple thought experiment can demonstrate this. Imagine you had a magnet where the curie point of the magnet is infinitesimally higher than the ambient temperature in the room you are in and setup the circuit as previously described. At this point you could simply touch the magnet and the added warmth from your body will raise its temperature ever so slightly past the curie point. This would represent an infinitesimal amount of energy transferred into the magnet, and thus the energy to do so would be negligible. None the less at this point the magnetic field of the magnet would begin to collapse and the energy of the magnet would be successfully extracted as DC magnetic current.

You could literally run circuits off the energy stored in permanent magnets… well as much as you could from a charged inductor, which is to say, you wont be able to deliver all that much power to really be useful as a power source. But hey, still cool to know that magnets can be used as a power source all the same. Just don’t go getting any crackpot ideas about over unity devices or infinite energy. There is a finite and small amount of energy in a magnet and once you use it up, its gone, so no infinite energy anytime soon I’m afraid.

Special Thanks

Special thanks to the following people for helping me proof-read and edit this article prior to its release:

  • Faith Aydin - He submited several corrections through gitlab catching numerous spelling and gramatical errors.


  1. González, G. G., & Ehsani, M. (2018). “Power-Invariant Magnetic System Modeling”, International Journal of Magnetics and Electromagnetism, 4(1), 1-9. doi:10.35840/2631-5068/6512. ISSN 2631-5068 2 3 4 5

  2. Popov, V. P. (1985). The Principles of Theory of Circuits (in Russian). M.: Higher School.  2 3

  3. Mohammad, Muneer (2014-04-22). An Investigation of Multi-Domain Energy Dynamics (PhD thesis).  2

  4. Lambert, M., Mahseredjian, J., Martinez-Duro, M., & Sirois, F. (2015). Magnetic Circuits Within Electric Circuits: Critical Review of Existing Methods and New Mutator Implementations. IEEE Transactions on Power Delivery, 30(6), 2427–2434. doi:10.1109/tpwrd.2015.2391231]( 

  5.  2

  6. Considerations for a High Performance Capacitor”. Archived from the original on 2008-11-19. 

  7. Hong, Y.-K., & Lee, J. (2013). Ferrites for RF Passive Devices. In Recent Advances in Magnetic Insulators – From Spintronics to Microwave Applications (pp. 237–329). Elsevier. doi:10.1016/b978-0-12-408130-7.00008-3  2 3

Understanding the Reflection Coefficient

Today I hope to answer a rather complex question: What does the Reflection Coefficient mean exactly, how do we measure it, and what can we do with it once we do. For example if we have a reflection coefficient of \(0.5 \angle 30^{\circ}\) at at some point in a feedline what does that mean and how can we use it?

There is a lot to unpack in such a short question, there are many things the reflection coefficient can tell us on its own, and quite a few more things it can tell us when we know a few other variables.

In simplest english terms it means that the reflected wave is half the peak voltage of the forward wave, and that at any moment the reflected wave is 30 degrees ahead in its phase compared to the forward wave. So if measured on an oscilloscope comparing the waves would look something like this.

Keep in mind the waves on an oscilloscope would both be moving to the right on the display at the same speed, so they will always have the same orientation relative to each other. Meanwhile the actual waves in the feedline are moving in opposite directions so their peaks are constantly moving away from each other. Consequently this is why the phase relationship between the two waves will vary depending on the position you measure it at.

Here is what the actual waves would look like in the feedline where the x-axis here would be the position on the feedline (not to be confused with the above image that you would see on an oscilloscope).

So imagining the above image is voltage we see the green wave moving in one direction on the feedline and the blue wave moving in the opposite direction. The redline is the actual voltage at the respective point on the feedline as it changes with time. The situation here is what you would see if the far end of the feedline where the antenna should be had either an open connection or was short circuited. The red wave we see is what we call a Standing Wave. So what we are really doing when we measure the reflection coefficient is we measure the red wave in the above image at a particular point in the feedline for voltage, then do the same for current, and by comparing the two

Now let’s talk a little about how knowing the reflection coefficient is useful and how you can calculate it.

Calculating \(\Gamma\)

As was already pointed out the reflection coefficient tells you the signal that is reflected relative to the forward signal. So per the example I gave at the begining you would say:

$$ \Gamma = 0.5 \angle 30^{\circ} $$

The above is in polar form but its good to remember this is little more than a complex number closely related to phasors (both voltage and current phasors). In complex form we have:

$$ \Gamma = 0.43 + 0.25 i\mkern1mu $$

Now the first thing it can tell us other than the relationship between forward and reverse voltage signals is it can also tell us the relationship between forward and reverse current signals. The relationship being the same but of opposite sign.

$$ \Gamma = -\frac{I_{refl}}{I_{fwd}} = \frac{V_{refl}}{V_{fwd}} $$

Where \(I\) and \(V\) are their respective current and voltage phasors. Remember a phasor represents the amplitude and phase of the signal relative to some reference point, usually whatever we consider ground. So from this it tells us that in our example the reflected current signal will have an amplitude of 0.5 relative to the forward current and a phase of 210 degrees, or -150 degrees whichever you’d like.

Calculating SWR from \(\Gamma\)

The other thing we can calculate directly from the reflection coefficient is the SWR, which is no longer a complex value, it’s a dimensionless ratio. We lose a bit of information (the complex part) in doing this conversion but it is often a useful number used in tuning radio systems. I will explain exactly how SWR is helpful in a minute but first let’s show how to calculate it.

$$ SWR = \frac{1 + \mid \Gamma \mid}{1 - \mid \Gamma \mid} $$

So again taking the initial example we would have the following SWR:

$$ SWR = \frac{1 + 0.5}{1 - 0.5} $$
$$ SWR = \frac{1.5}{0.5} $$
$$ SWR = \frac{3}{1} $$

So we would say here we have an SWR of \(3:1\) . SWR basically tells us how bad of a mismatch we have without worrying about if the mismatch is resistive or reactive. In a perfectly matched system there would be no reflected wave so your SWR is always 1:1 and thus shows us a perfect impedance match. Similarly the worst possible match we could have would be an open circuit or a short circuit, both of which would produce an infinite SWR.

Now its important to note it only tells us about what the impedance match is at the point in the circuit we measure. With a 1:1 SWR or a reflection coefficient of 0 telling us that whatever feedline and antenna is on the load end of the meter as a whole is the same impedance as the feedline and transmitter system on the source side of the meter. By itself it tells us nothing about if the antenna is well matched or well tuned, or the efficiency of the system, or even what the SWR might be at any other point in the feedline. To figure out any of that we would either need to measure at multiple points or need some more information about the components in the system.

Typically SWR meters, and therefore indirectly the reflection coefficient, is useful if it is measured at the point where a transmitter connects to a long feedline which ultimately feeds some load (usually an antenna). A large mismatch at this point will cause any power a transmitter creates intended for the antenna to be reflected back into the transmitter at its outgoing port rather than making it onto the feedline. This causes that energy to be dissipated by the transmitter and ultimately will heat up the transmitter and in some cases can fry it. So it is important to have an SWR that is relatively low for the safety of the transmitter.

Relationship of Load and Source Impedance

From this point on I want to be clear on some terminology I am about to use. If I say “load impedance” I will be talking about the total impedance of the system from the point the reflection coefficient was measured all the way to the far end of the transmission line. This means we are talking about the impedance of that whole half of the system, usually a transmission line, antenna, and maybe even a tuner. It does not refer to just what is connected at the end of the transmission line itself (usually the antenna), we will get to that later. Similarly when I say “source impedance” I will also be talking about the whole system on the transmitting side of where the reflection coefficient was measured.

So with that said the other thing the reflection coefficient tells us is the relationship between the load impedance and the source impedance. The equation for that is as follows:

$$ \Gamma = \frac{Z_L - Z_S}{Z_L + Z_S} $$

Therefore if we have a transmitter that connects directly to our meter and the transmitter has a \(50\Omega\) antenna port on it then we know the source impedance is \(50\Omega\) and can then calculate the impedance of our load. So again going back to the initial example if given the situation I just explained we would calculate the load impedance as follows:

$$ \Gamma = \frac{Z_L - 50}{Z_L + 50} $$
$$ Z_L = \frac{-50 \cdot (\Gamma + 1)}{\Gamma - 1} $$

Note that if \(\Gamma\) is one the equation is undefined, but that would mean the load impedance is infinite, an open circuit.

$$ Z_L = \frac{-50 \cdot (0.43 + 0.25 i\mkern1mu + 1)}{0.43 + 0.25 i\mkern1mu - 1} $$
$$ Z_L = \frac{-50 \cdot (1.43 + 0.25 i\mkern1mu)}{-0.57 + 0.25 i\mkern1mu} $$
$$ Z_L = \frac{-71.5 - 12.5 i\mkern1mu}{-0.57 + 0.25 i\mkern1mu} $$
$$ Z_L \approx 97.1347 + 64.5328 i\mkern1mu $$
$$ Z_L \approx 116.6174610 \angle -146.401367^{\circ} $$

Relationship of Feedline Length and Phase

If we know the position on the feedline that we measured the signal relative to the far end of the load, where the antenna normally is, then we can calculate a few other meaningful things. Keep in mind in the real world the speed at which an electrical signal travels through the feedline is close to the speed of light but not quite. Each feedline is a bit different and we would look at a datasheet for our particular feedline to get what is called the Velocity Factor. This is a percentage or ratio that tells us the percentage of the speed of light a wave will propagate through the feedline. So we would calculate the actual speed of our waves as follows.

$$ c = C \cdot V_f $$

Because of this not only will the wave move slower through the feedline but it will also have a shorter wavelength than what it would when propagating through a vacuum. Let’s look at the equation for wavelength real quick.

$$ \lambda = \frac{c}{f} $$

Where c is the speed of the wave through the medium as we calculated above and f is the frequency, giving us \(\lambda\) as our wavelength.

When talking about a reflection coefficient we are talking about the reflected wave relative to the forward wave. So we can consider the forward wave as our reference wave and take that as our phase reference point. We know that the reflected wave needs to travel from the point being measured to the far end of the load side and then back again, so it travels a total of twice the distance of the load side. Therefore we can calculate the phase shift with the following equation.

$$ \phi = \{ \frac{2 \cdot l_L}{\lambda} \} \cdot 360^{\circ} $$

Where \(l_L\) is the length from the point being measured to the far end of the load, \(\lambda\) is the adjusted wavelength from earlier, and \(\phi\) is the difference in phase shift of the reflected wave relative to the forward wave. Also the curly brackets is a mathematical notation saying to take the fractional part (drop the whole number and just keep the decimal). As you can see by varying the length of the transmission line on the far side of the load we can vary the phase as we wish and thus modify our reflection coefficient to some extent.

Measuring \(\Gamma\)

One very important thing to point out here, because this is where a lot of people get things wrong. Since we are measuring a single point in the feedline we are measuring the sum of the actual forward and reverse waves at that point and we can’t measure the two waves directly, all we know is how the voltage and current is changing at that one point in the line. So to say we are measuring the reflected wave at all is a bit of a lie, we are really just measuring the voltage and current values at a single point and then reconstructing the forward and reverse waves from that. While this may confuse your current understanding this is very important because this is where almost everyone goes wrong on understanding these concepts. But keep in mind just because we cant measure them directly, the two waves are still there. The following is a schematic showing a circuit called a Directional Coupler, this is how we would measure the forward and reverse waves at a point in the feedline.

Notice from the above schematic all we are really doing is Sampling the forward current with \(X_1\) and sampling the forward voltage with \(X_2\) and then biasing the forward signal by the reflected and vice versa. This is how we reconstruct forward and reverse signals when all we know is the voltage and current at a single point.

Imagine we have a perfectly matched system where the characteristic impedance of the feedline is the same as the load and source impedance. What we would see is only a single forward moving wave, no reflected wave at all. Also, if you recall a resistor always has its current in phase with its voltage, this holds true in a matched feedline as well since all the components are real resistance with no reactance. So we would expect the forward voltage wave and the forward current wave to both be in phase without any reflected wave to interfere with them. Looking back at the above schematic we see that the \(X_2\) transformer would sample the forward voltage, which would cause the FWD output to cycle through positive and negative while the other terminal would want to swing the opposite, when fwd is high the other terminal will try to go negative, however its biased by the refl power, so we have to consider that as well. Since current is in phase and the \(X_1\) transformer is similarly going to swing inphase with the fwd port but since its connected to the opposite terminal of \(X_1\) it will essentially cancel out and the reflected port will stay at ground. However if the phase of the current and voltage were not the same then the circuit would respond very differently and we would see a signal out of the reflected port. So really the circuit is measuring the phase difference between voltage and current and using this to reconstruct the forward and reverse waves.

As an example here is what the voltage and phase relationship would look like in a feedline with an open circuit at the antenna end:

As we know impedance in its polar form has an amplitude and a phase component just like our reflection coefficient does. The phase component of an impedance value basically just tells you if you apply a voltage signal across the device how much the voltage and current signals will be out of phase with each other. A resistor always has an impedance that is equal to its resistance and has no imaginary component, and also has a phase of 0 degrees. This agrees with what I said earlier regarding a resistors voltage and current always being inphase with each other. We also know that a capacitor and the inductor always has its current 90 degrees out of phase with its voltage.

We just learned from the above schematic that the voltage-current relationship is in fact equivalent to the forward-reflected wave relationship. One can be used to determine the other and vice versa. Therefore we know that the impedance of the antenna can not just affect the amplitude of the wave it reflects back, but can also dictate its phase.

Feedline as an Impedance Transformer

We mentioned earlier how the reflection coefficient can be calculated by simply knowing the total impedance on one side of the point being measured vs that on the other side. I also pointed out how the load impedance in that calculation described the entire system on the load side including the feedline and wasn’t necessarily the same as the load at the terminating end of the feedline, usually an antenna. Since we now know that the impedance of the antenna dictates not just the amplitude of the wave reflected, but also its phase, and we also know that the length of the feedline itself can shift the phase as well, it should be obvious that we can view a transmission line as an impedance transformer where the impedance of the antenna is transformed into a different impedance based on the length of the transmission line.

In essence we can tweak the load end of the transmission line by making it longer up to one wavelength and as such adjust our reflected wave’s phase to whatever value we want, thus allowing us to change the reflection coefficient we see which is equivalent to changing the load side’s impedance.

Going back to our original example if the reflected wave is 30 degrees out of phase lets see what would happen if we brought it inphase to 0 degrees. To do that lets calculate the length change of the feedline we would need, we will assume we are working with a wavelength of one meter.

$$ \phi = \{ \frac{2 \cdot l_L}{\lambda} \} \cdot 360^{\circ} $$
$$ -30^{\circ} = \frac{2 \cdot l_L}{1} \cdot 360^{\circ} $$
$$ \frac{-30^{\circ}}{360^{\circ}} = 2 \cdot l_L $$
$$ \frac{-30^{\circ}}{2 \cdot 360^{\circ}} = l_L $$
$$ \frac{-1}{24} = l_L $$

So we know that if we subtract \(\frac{1}{24}\) of a meter off we will get the desired effect, or of course we could add \(\frac{23}{24}\) of a meter and get the same effect. This would change our reflection coefficient to:

$$ \Gamma = 0.5 \angle 0^{\circ} $$


$$ \Gamma = 0.5 + 0 i\mkern1mu $$

What is interesting is, as I said, this also changes what the load impedance looks like (the feedline plus antenna). Where before the impedance appeared mostly resistive with a small reactive component it now looks indistinguishable to our meter as a purely resistive load impedance, albeit still a mismatched one though. If we take our impedance equation from earlier and calculate it for our new reflection coefficient we can see exactly what that would be.

$$ Z_L = \frac{-50 \cdot (\Gamma + 1)}{\Gamma - 1} $$
$$ Z_L = \frac{-50 \cdot (0.5 + 1)}{0.5 - 1} $$
$$ Z_L = \frac{-50 \cdot 1.5}{-0.5} $$
$$ Z_L = \frac{-75}{-0.5} $$
$$ Z_L = 150 $$

So we effectively changed the old impedance of the load side from \(116.61 \angle -146.40^{\circ} \Omega\) to just \(150 \Omega\) , pretty neat.

Similarly we can look at this slightly differently. We can say if we know the feedline’s length, and the complex impedance of the antenna, then what would be the impedance we see if we measure the antenna through the feedline. For that the equation is as follows:

$$ Z_L =  Z_0 \cdot \frac{Z_{ANT} + Z_0 \cdot \tan(\frac{2\pi}{\lambda} \cdot l) i\mkern1mu}{Z_0 + Z_{ANT} \cdot \tan(\frac{2\pi}{\lambda} \cdot l) i\mkern1mu} $$

Where \(Z_L\) is the impedance measured through the feedline, \(Z_0\) is the characteristic impedance of the feedline, \(l\) is the length of the feedline, \(\lambda\) is the wavelength of the signal in the feedline, and \(Z_{ANT}\) is the impedance of the antenna at the far end of the feedline, or some other load.

Rotations in N dimensions

Several years ago I was writing a Machine Learning paper that required me to do rotations in an arbitrary number of dimensions. As such I had an entire section of the paper devoted to explaining how that was done before moving on to the actual algorithm. Here I extracted the portion where I explain N-dimensional rotations, basically rotations in 4-dimensional space or higher, and created its own PDF out of it. I hope this will be of use to some of you to help explain the process. Its a bit math-heavy but I am, as always, happy to answer any questions.

Hyperassociative Map Explanation


Almost 8 years ago, on Aug 15, 2009, I invented a new game-changing algorithm called the Hyperassociative Map algorithm. It was released as part of the dANN v2.x library. The HAM algorithm, as it is often called, has since been used by countless developers and in hundreds of projects. HAM is a Graph Drawing algorithm that is similar to force-directed algorithms but in a class all its own. Unlike other force-directed algorithms HAM does not have a sense of momentum or acceleration which makes it debatable if it can even still be called force-directed.

Below is a video demonstration of HAM in action. In this 3D visualization the vertices of the graph are displayed as grey spheres but the edges are not rendered. The graph’s topology is relatively simple containing 128 nodes in groups of 16 layered such that each group is fully connected to each adjacent group. This results in 256 edges between each adjacent group. Since the groups on either end only have one other group they are adjacent to that means there is a total of 1,792 edges. Despite this the graph aligns quickly and smoothly on a single 1 Ghz processor as demonstrated in the video. It starts with randomized locations for each vertex and then aligns. After each alignment the graph is reset with new random starting positions to show that the same alignment is achieved every time.

What makes HAM so special is that it retains many of the advantages that have made force-directed algorithms so popular while simultaneously addressing their short comings. Wikipedia describes the following advantages to using force-directed algorithms, all of which hold true for the HAM algorithm.

  • Good-quality results - The output obtained usually have very good results based on the following criteria: uniform edge length, uniform vertex distribution and showing symmetry. This last criterion is among the most important ones and is hard to achieve with any other type of algorithm.
  • Flexibility - Force-directed algorithms can be easily adapted and extended to fulfill additional aesthetic criteria. This makes them the most versatile class of graph drawing algorithms. Examples of existing extensions include the ones for directed graphs, 3D graph drawing, cluster graph drawing, constrained graph drawing, and dynamic graph drawing.
  • Intuitive - Since they are based on physical analogies of common objects, like springs, the behavior of the algorithms is relatively easy to predict and understand. This is not the case with other types of graph-drawing algorithms.
  • Simplicity - Typical force-directed algorithms are simple and can be implemented in a few lines of code. Other classes of graph-drawing algorithms, like the ones for orthogonal layouts, are usually much more involved.
  • Interactivity - Another advantage of this class of algorithm is the interactive aspect. By drawing the intermediate stages of the graph, the user can follow how the graph evolves, seeing it unfold from a tangled mess into a good-looking configuration. In some interactive graph drawing tools, the user can pull one or more nodes out of their equilibrium state and watch them migrate back into position. This makes them a preferred choice for dynamic and online graph-drawing systems.
  • Strong theoretical foundations - While simple ad-hoc force-directed algorithms often appear in the literature and in practice (because they are relatively easy to understand), more reasoned approaches are starting to gain traction. Statisticians have been solving similar problems in multidimensional scaling (MDS) since the 1930s, and physicists also have a long history of working with related n-body problems - so extremely mature approaches exist. As an example, the stress majorization approach to metric MDS can be applied to graph drawing as described above. This has been proven to converge monotonically. Monotonic convergence, the property that the algorithm will at each iteration decrease the stress or cost of the layout, is important because it guarantees that the layout will eventually reach a local minimum and stop. Damping schedules cause the algorithm to stop, but cannot guarantee that a true local minimum is reached.

However the two disadvantages described of force-directed algorithms, namely high running time and poor local minima, have been corrected in the HAM algorithm. As described earlier HAM is not a true force-directed algorithm because it lacks any sense of momentum. This was intentional as it ensures there is no need for a dampening schedule to eliminate oscillations that arise from the momentum of nodes. This has the added advantage that the algorithm does not prematurely come to rest at a local minima. It also means fewer processing cycles wasted on modeling oscillations and vibrations throughout the network.

These properties alone already make HAM a worthwhile algorithm for general study and real-world applications, however it is important to note that HAM was originally designed with a very specific use case in mind. Originally HAM was designed to facilitate the distribution of massive real-time graph processing networks. The sort of scenario where each vertex in a graph had to process some input data and produce some output data and where each vertex is part of a large interdependent graph working on the data in real time. When distributing the tasks across a cluster of computers it is critical that vertices that are highly interconnected reside on the same computer in the cluster and physically close to the computers housing the vertices that will ultimately receive the data, process it and then carry it throughout the rest of the network. For this purpose HAM was created to model graphs such that each node in a compute cluster took ownership of tasks associated with vertices that were spatially close to each other according to the HAM’s drawing of the compute graph.

In order for HAM to be successful at it’s job it needed to exhibit a few very specific properties. For starters the interactivity property mentioned earlier was a must. HAM needed to be able to work with a graph that is constantly changing its topology with new vertices able to be added, removed, or reconfigured in real time. This is ultimately what led the algorithm to be modeled in a way that made it similar to force-directed algorithms.

The other requirement is that the motion of the vertices had to be smooth without any oscillations as they align. This was critical because if oscillations occurred on a vertex as it was near the border that distinguishes one compute node from another then those oscillations across that border would cause the task in the compute cluster to be transferred between the nodes in the cluster each time. Since this is an expensive operation it is important that as HAM aligned the vertices didn’t jitter causing them to cross these borders excessively.

Finally HAM needed to be able to be parallelized and segmented. That means that it needed to scale well for multi-threading but in such a way that each thread didn’t need to be aware of the entire graph in order to process it; instead each thread had to be capable of computing the alignment of HAM on an isolated section of the graph. This is obviously critical because of the distributed nature of the compute graph, particularly if we want something capable of unbounded scaling. I basically wanted an algorithm that could be successful on even massively large graphs.

With almost 8 years of testing it has become evident that HAM is top in its class compared to many graph drawing algorithms. Despite this it is still scarcely understood by those studying graph drawing algorithms. For this reason I wanted to write this article to share some of its internal workings so others can adapt and play with the algorithm for their own projects.

The Algorithm

In this section I want to get into the internal workings of the Hyperassociative Map algorithm, HAM. Below is the pseudocode breakdown explaining the algorithm. Notice I use some math notation here for simplicity. Most notably I use vector notation where all variables representing vectors have a small arrow above the variable and the norm, or magnitude, of the vector is represented by double vertical bars on either side, for example \(||\vec{p}||\). If you have trouble with vector notation or just want to see a concrete example the full working java code can be found at the end of this article for reference.

\caption{Hyperassociative Map}
% equilibrium distance
\REQUIRE $\tilde{\chi} > 0$
%repulsion strength
\REQUIRE $\delta > 1$
% learning rate
\REQUIRE $\eta = 0.05$
% alignment threshold (determines when graph is aligned)
\REQUIRE $\beta =$ 0.005
\PROCEDURE{HAM}{Vertex Set \textbf{as} $g$}
  \STATE \CALL{Randomize}{$g$}
  \WHILE{\CALL{AlignAll}{$g$} $> \beta \cdot \tilde{\chi}$}
    \STATE optionally recenter the graph
\PROCEDURE{AlignAll}{Vertex Set \textbf{as} $g$}
  \STATE $\zeta = 0$
  \FORALL{$v$ \textbf{in} $g$}
    \STATE $\vec{{\scriptsize \triangle} p}  =$ \CALL{Align}{$v$}
    \IF{$||\vec{{\scriptsize \triangle} p}|| > \zeta$}
      \STATE $\zeta = ||\vec{{\scriptsize \triangle} p}||$
    \STATE \CALL{Place}{$v$, \CALL{Position}{$v$} $+ \vec{{\scriptsize \triangle} p}$}
  \RETURN $\zeta$
\PROCEDURE{Align}{Vertex \textbf{as} $v$}
  \STATE $\vec{p} =$ \CALL{Position}{$v$}
  \STATE $\vec{{\scriptsize \triangle} p} = 0$
  \FORALL{$m$ \textbf{in} \CALL{Neighbors}{$v$}}
    \STATE $\vec{q} =$ \CALL{Position}{$m$} - $\vec{p}$
    \STATE $\vec{{\scriptsize \triangle} p} = \vec{{\scriptsize \triangle} p} + \vec{q} \cdot \frac{(||\vec{q}|| - \tilde{\chi}) \cdot \eta}{||\vec{q}||}$
  \FORALL{$m$ \textbf{in} \CALL{NotNeighbors}{$v$}}
    \STATE $\vec{q} =$ \CALL{Position}{$m$} - $\vec{p}$
    \STATE $\vec{c} = \vec{q} \cdot \frac{-\eta}{{||\vec{q}||}^{\delta + 1}}$
    \IF{$||\vec{c}|| > \tilde{\chi}$}
      \STATE $\vec{c} = \vec{c} \cdot \frac{\tilde{\chi}}{||\vec{c}||}$
    \STATE $\vec{{\scriptsize \triangle} p} = \vec{{\scriptsize \triangle} p} + \vec{c}$
  \RETURN $\vec{{\scriptsize \triangle} p}$
\PROCEDURE{Randomize}{Vertex Array \textbf{as} $g$}
  \STATE randomise position of all vertex in $g$
\PROCEDURE{Place}{Vertex \textbf{as} $v$, Vector \textbf{as} $\vec{p}$}
  \STATE sets the position of $v$ to $\vec{p}$
\PROCEDURE{Neighbors}{Vertex \textbf{as} $v$}
  \RETURN set of all vertex adjacent to $v$
\PROCEDURE{NotNeighbors}{Vertex \textbf{as} $v$}
  \STATE $s =$ set of all vertex not adjacent to $v$
  \STATE $w =$ set of all vertex whose position is close to that of $v$
  \RETURN $s \cap w$
\PROCEDURE{Position}{Vertex \textbf{as} $v$}
  \RETURN vector representing position of $v$

Obviously the pseudocode packs a lot of information into only a few lines so I’ll try to explain some of the more important parts so you have an idea at what you’re looking at.


First, lets consider the constants defined at the beginning. the variable \(\tilde{\chi}\) is called the Equilibrium Distance. It defines the ideal distance between two vertices connected by an edge. If a pair of vertices connected by a single edge are the only vertices present then they will align such that they are approximately as far apart as the value of \(\tilde{\chi}\). For simplicity here we have represented \(\tilde{\chi}\) as a single constant but in practice it is also possible to assign a different value to this constant for each edge, resulting in a graph with different aesthetic qualities. This value must of course always be a positive number greater than \(0\). The default value, and the one used in the demonstration video, is \(1\).

The second constant is called the Repulsion Strength and is represented by the \(\delta\) variable. This constant determines how strong the repulsion between two unconnected vertices are, that is two vertices not connected by an edge. Lower values for \(\delta\) result in a stronger repulsion and larger numbers represent a weaker repulsion. The default value is \(2\) and this is the value used in the demonstration video.

Next is the Learning Rate constant, \(\eta\). This is simply a scaling factor applied when the vertices are aligned to ensure the graph is aligned to each node with equivalent effect rather than over-fitting to the last node processed.

The last constant is the Alignment threshold, \(\beta\), this represents the minimum movement threshold. Once the vertices move less than this value during an alignment cycle it is presumed the graph is sufficiently aligned and the loop ends.

Align Procedure

The algorithm itself is represented such that it is broken up into three major procedures. The procedure named HAM is the entry point for the algorithm, the procedure named Align calculates the incremental alignment for a single vertex, and the procedure named AlignAll calculates alignment once for every vertex in the graph.

Lets first explain what is going on in the Align procedure. Here we have a single value being passed in, the vertex to be aligned, \(v\). On line 19 the current position of the vertex is obtained and represented as a euclidean vector, \(\vec{p}\). Next on line 20 a zero vector is initialized and represented as \(\vec{{\scriptsize \triangle} p}\). The vector \(\vec{{\scriptsize \triangle} p}\) is calculated throughout the procedure and represents the desired change to the current position of the vector \(\vec{p}\). In other words once \(\vec{{\scriptsize \triangle} p}\) is calculated it can be added to the current position of the vertex and will result in the new position for the vertex. Just as if calculating forces the \(\vec{{\scriptsize \triangle} p}\) will be the sum of all the composite influences acting on the vertex; so it represents the overall influence exerted on the vertex at any time.

When calculating \(\vec{{\scriptsize \triangle} p}\) the procedure must iterate through all the other vertices that have an effect on the vertex being aligned. There are two type of vertices each with different effects: neighbors, and non-neighbors. Neighbors are all the vertices connected directly to the current vertex by an edge, non-neighbors are all the other vertices not connected by an edge.

First the influence from the neighbor vertices is calculated on lines 21 - 24. The influence two neighbor vertices have on each other is different depending on how far apart they are. If they are closer than the Equilibrium Distance, \(\tilde{\chi}\), then the effect is repulsive. If they are farther apart than \(\tilde{\chi}\) then the effect is attractive. The calculation for this is represented by line 23. It basically calculates the vector that represents the difference between the position of the vertex being aligned and its neighbor and reduces the magnitude of the vector back to \((||\vec{q}|| - \tilde{\chi}) \cdot \eta\). To look at it another way if the equation was just \(||\vec{q}|| - \tilde{\chi}\) then the new position of the vector would be exactly at the Equilibrium Distance, but instead it is scaled to a fraction of this by \(\eta\) which adjusts how quickly the vertex will approach its equilibrium point.

Next the influence between non-neighbor vertices is calculated on lines 25 - 32. Non-neighbor vertices, that is vertices not connected by an edge, always exhibit a purely repulsive influence. Line 27 calculates this in a similar technique as before. That is the difference between the position of the two vertices is represented by \(\vec{q}\) and then its magnitude is scaled. Of course it’s also negative to indicate that the force is repulsive. The equation just seems confusing in its simplified and compacted form. Initially it was derived by calculating the new magnitude of \(\vec{q}\) as the following.

$$ \frac{-1}{{\mid\mid\vec{q}\mid\mid}^{\delta}} \cdot \eta $$

This makes a lot more sense as we know in nature repulsive forces are the inverse square of their distance. So this accurately represents a repulsive influence that diminishes with distance. Once we actually apply that magnitude to the vector and simplify we arrive at our final equation.

$$ \vec{q} \cdot \frac{\frac{-1}{{\mid\mid\vec{q}\mid\mid}^{\delta}} \cdot \eta}{\mid\mid\vec{q}\mid\mid} \Rightarrow \vec{q} \cdot \frac{-\eta}{{||\vec{q}||}^{\delta + 1}} $$

The only caveat to this is seen in lines 28 to 30 where it checks the distance moved as a result of the repulsive influence. If it is greater than the Equilibrium Distance, \(\tilde{\chi}\), then its magnitude is scaled back to be \(\tilde{\chi}\). This is done because at very close distances the exponential nature of the repulsive influence becomes overwhelming and we want to ensure the majority of this influence works at a distance to allow the graph to spread apart but still allow the other influences to be the dominate influences on the graph.

At this point the computed change in position for the vertex is simply returned at line 33 for further processing by the AlignAll procedure.

AlignAll Procedure

The AlignAll Procedure is extremely straight forward. It is passed in the set of all vertices in the graph as \(g\) and iterates over the set while aligning them one at a time. Each vertex will get aligned once per call to the procedure, this means the procedure will usually need to be called multiple times.

On line 8 the Maximum Vertex Movement variable, represented as \(\zeta\), is initialized to \(0\). This variable represents the greatest distance any vertex moved during the alignment; after being calculated it’s value is returned on line 15. The Maximum Vertex Movement is important for determining when the HAM algorithm has finished processing.

Other than that this procedure doesn’t do anything special, the vertex alignment vector is calculated on line 10 and the new position for the vertex is set on line 14.

HAM Procedure

The HAM procedure is another rather straight forward procedure to explain. It starts by assigning some initial random coordinates to each vertex in the graph. After that it continually loops calling AlignAll until the graph is sufficiently aligned.

On line 3 the AlignAll procedure is called in a loop until the Max Vertex Movement returned is less than \(\beta \cdot \tilde{\chi}\). This is just the Alignment Threshold normalized by the Equilibrium Distance constant. The Alignment Threshold is sufficiently small such that if the movements in the graph are less than this value then they are considered negligible and the alignment can end.

As an optional step after each alignment iteration it may be desired to translate the entire graph so it is centered around the zero vector. There is a small amount of drift as the alignment of the graph is calculated and by doing this it ensures the graph remains in the center of the view when rendered. The drift is usually negligible however so this step is entirely optional. In the full java example below the logic for centering the graph is included.

Appendix: Full Java Code

public class HyperassociativeMap<G extends Graph<N, ?>, N> implements 
        GraphDrawer<G, N> {
    private static final double REPULSIVE_WEAKNESS = 2.0;
    private static final double DEFAULT_LEARNING_RATE = 0.05;
    private static final double EQUILIBRIUM_DISTANCE = 1.0;
    private static final double EQUILIBRIUM_ALIGNMENT_FACTOR = 0.005;

    private final G graph;
    private final int dimensions;
    private Map<N, Vector> coordinates = Collections.synchronizedMap(new 
            HashMap<N, Vector>());
    private static final Random RANDOM = new Random();
    private final boolean useWeights;
    private double equilibriumDistance;
    private double learningRate = DEFAULT_LEARNING_RATE;
    private double maxMovement = 0.0;

    public HyperassociativeMap(final G graph, final int dimensions, final 
    double equilibriumDistance, final boolean useWeights) {
        if (graph == null)
            throw new IllegalArgumentException("Graph can not be null");
        if (dimensions <= 0)
            throw new IllegalArgumentException("dimensions must be 1 or more");

        this.graph = graph;
        this.dimensions = dimensions;
        this.equilibriumDistance = equilibriumDistance;
        this.useWeights = useWeights;

        // refresh all nodes
        for (final N node : this.graph.getNodes()) {
            this.coordinates.put(node, randomCoordinates(this.dimensions));

    public G getGraph() {
        return graph;

    public double getEquilibriumDistance() {
        return equilibriumDistance;

    public void setEquilibriumDistance(final double equilibriumDistance) {
        this.equilibriumDistance = equilibriumDistance;

    public void resetLearning() {
        maxMovement = 0.0;

    public void reset() {
        // randomize all nodes
        for (final N node : coordinates.keySet()) {
            coordinates.put(node, randomCoordinates(dimensions));

    public boolean isAlignable() {
        return true;

    public boolean isAligned() {
        return isAlignable()
                && (maxMovement < (EQUILIBRIUM_ALIGNMENT_FACTOR * 
                && (maxMovement > 0.0);

    public void align() {
        // refresh all nodes
        if (!coordinates.keySet().equals(graph.getNodes())) {
            final Map<N, Vector> newCoordinates = new HashMap<N, Vector>();
            for (final N node : graph.getNodes()) {
                if (coordinates.containsKey(node)) {
                    newCoordinates.put(node, coordinates.get(node));
                } else {
                    newCoordinates.put(node, randomCoordinates(dimensions));
            coordinates = Collections.synchronizedMap(newCoordinates);

        maxMovement = 0.0;
        Vector center;

        center = processLocally();

        // divide each coordinate of the sum of all the points by the number of
        // nodes in order to calculate the average point, or center of all the
        // points
        for (int dimensionIndex = 1; dimensionIndex <= dimensions; 
             dimensionIndex++) {
            center = center.setCoordinate(center.getCoordinate
                    (dimensionIndex) / graph.getNodes().size(), dimensionIndex);


    public int getDimensions() {
        return dimensions;

    public Map<N, Vector> getCoordinates() {
        return Collections.unmodifiableMap(coordinates);

    private void recenterNodes(final Vector center) {
        for (final N node : graph.getNodes()) {
            coordinates.put(node, coordinates.get(node).calculateRelativeTo

    public boolean isUsingWeights() {
        return useWeights;

    Map<N, Double> getNeighbors(final N nodeToQuery) {
        final Map<N, Double> neighbors = new HashMap<N, Double>();
        for (final TraversableCloud<N> neighborEdge : graph.getAdjacentEdges
                (nodeToQuery)) {
            final Double currentWeight = (((neighborEdge instanceof Weighted)
                    && useWeights) ? ((Weighted) neighborEdge).getWeight() : 
            for (final N neighbor : neighborEdge.getNodes()) {
                if (!neighbor.equals(nodeToQuery)) {
                    neighbors.put(neighbor, currentWeight);
        return neighbors;

    private Vector align(final N nodeToAlign) {
        // calculate equilibrium with neighbors
        final Vector location = coordinates.get(nodeToAlign);
        final Map<N, Double> neighbors = getNeighbors(nodeToAlign);

        Vector compositeVector = new Vector(location.getDimensions());
        // align with neighbours
        for (final Entry<N, Double> neighborEntry : neighbors.entrySet()) {
            final N neighbor = neighborEntry.getKey();
            final double associationEquilibriumDistance = neighborEntry

            Vector neighborVector = coordinates.get(neighbor)
            double newDistance = Math.abs(neighborVector.getDistance()) - 
            newDistance *= learningRate;
            neighborVector = neighborVector.setDistance(newDistance);
            compositeVector = compositeVector.add(neighborVector);
        // calculate repulsion with all non-neighbors
        for (final N node : graph.getNodes()) {
            if ((!neighbors.containsKey(node)) && (node != nodeToAlign)
                    && (!graph.getAdjacentNodes(node).contains(nodeToAlign))) {
                Vector nodeVector = coordinates.get(node).calculateRelativeTo
                double newDistance = -1.0 / Math.pow
                        (nodeVector.getDistance(), REPULSIVE_WEAKNESS);
                if (Math.abs(newDistance) > Math.abs(equilibriumDistance)) {
                    newDistance = Math.copySign(equilibriumDistance, 
                newDistance *= learningRate;
                nodeVector = nodeVector.setDistance(newDistance);
                compositeVector = compositeVector.add(nodeVector);
        Vector newLocation = location.add(compositeVector);
        final Vector oldLocation = coordinates.get(nodeToAlign);
        double moveDistance = Math.abs(newLocation.calculateRelativeTo

        if (moveDistance > maxMovement) {
            maxMovement = moveDistance;

        coordinates.put(nodeToAlign, newLocation);
        return newLocation;
    public static Vector randomCoordinates(final int dimensions) {
        final double[] randomCoordinates = new double[dimensions];
        for (int randomCoordinatesIndex = 0; randomCoordinatesIndex < 
                dimensions; randomCoordinatesIndex++) {
            randomCoordinates[randomCoordinatesIndex] = (RANDOM.nextDouble() 
                    * 2.0) - 1.0;

        return new Vector(randomCoordinates);

    private Vector processLocally() {
        Vector pointSum = new Vector(dimensions);
        for (final N node : graph.getNodes()) {
            final Vector newPoint = align(node);
            for (int dimensionIndex = 1; dimensionIndex <= dimensions; 
                 dimensionIndex++) {
                pointSum = pointSum.setCoordinate(pointSum.getCoordinate
                        (dimensionIndex) + newPoint.getCoordinate
                        (dimensionIndex), dimensionIndex);
        return pointSum;

Conditional Probabilities and Bayes Theorem

I’ve been getting a lot of questions from friends lately about what Bayes Theorem means. The confusion is understandable because it appears in a few models that seem to be completely unrelated to each other. For example we have Naive Bayes Classifiers and Bayesian Networks which operate on completely different principles. Moreover this is compounded with a lack of understanding regarding unconditional and conditional probabilities.

In this article I offer a tutorial to help bring the lay person up to speed with some basic understanding on these concepts and how Bayes Theorem can be applied.

Probability Space

We have to start by explaining a few terms and how they are used.

A Random Trial is a trial where we perform some random experiment. For example it might be flipping a coin or rolling dice.

The Sample Space of a Random Trial, typically denoted by \(\Omega\), represents all possible outcomes for the Random Trial being performed. So for flipping a coin the outcome can be either heads or tails, so the Sample Space would be a set containing only these two values.

$$ \Omega = \{Heads, Tails\} $$

For rolling dice the Sample Space would be the set of all the various faces for the dice being rolled. When rolling only one standard six sided die the Sample space would be as follows.

$$ \Omega = \{1, 2, 3, 4, 5, 6\} $$

In both of these examples the Random Trial being performed will select an outcome from their respective Sample Space at random. In these trials each outcome has an equal chance of being selected, though that does not necessarily need to be the case.

For our purposes here I want to formulate a Random Trial thought experiment that simulates a medical trial consisting of 10 patients. We will be using this example throughout much of this tutorial. Therefore our Sample Space will be a set consisting of 10 elements, each element represents a single unique patient in the trial. Patients are represented with the variable x with a subscript from 1 to 10 that uniquely identifies each patient.

$$ \Omega = \{x_1, x_2, x_3, x_4, x_5, x_6, x_7, x_8, x_9, x_{10}\} $$

An Event in the context of probabilities is a set of outcomes that can be satisfied. Typically these sets are represented using lowercase greek letters such as \(\alpha\) or \(\beta\). For example if we were rolling a single die and wanted it to land on an odd number then the Event representing an odd outcome would be represented as the following set.

$$ \alpha = \{1, 3, 5\} $$

Similarly if we simply wanted to roll the number 6 then the Event would be a set containing only that one number.

$$ \alpha = \{6\} $$

The Event Space, often denoted by \(\mathcal{F}\), is the set of all Events to be observed. It is a set of sets that represents every possible combination of subsets of the Sample Space or some part thereof. Not all Events in the event space need to be possible however. For example if we are talking about flipping a coin then the Event Space would have, at most, 4 members representing outcomes for Heads, Tails, either Heads or Tails, and neither Heads nor Tails. We could represent this with the following set notation.

$$ \mathcal{F} = \{\{\}, \{Heads\}, \{Tails\}, \{Heads, Tails\}\} $$

The empty set is usually represented with the \(\emptyset\) symbol. So the previous set can be rewritten using this shorthand as follows.

$$ \mathcal{F} = \{\emptyset, \{Heads\}, \{Tails\}, \{Heads, Tails\}\} $$

Notice that one of the members of the Event Space is equivalent to the Sample space. the fact that it contains both the empty set and the Sample Space as members is more a matter of mathematical completeness and plays a role in making some mathematical proofs easier to carry out. For our purposes here they will largely be ignored.

At this point I want to go over a little bit of mathematical notation that may help when reading other texts on the subject. The first is the concept of a Power Set. The Power Set is simply every possible combination of subsets for a particular set. In the example above regarding the coin toss Event Space we can say that the Event Space specified is the Power Set of the Sample Space. The notation for the Power Set is the number 2 with an exponent that is a set. For example short hand for the above Event Space definition could have been the following.

$$ \mathcal{F} = 2^{\Omega} $$

Every Event Space must be either equal to, or a subset of, the Power Set of the Sample Space. We can represent that with the following set notation.

$$ \mathcal{F} \subseteq 2^{\Omega} $$

Going back to our example of patients in a clinical trial we might want to know what the chance is of selecting a patient at random that has a fever. In that case the Event would be the set of all patients that have a fever and the outcome would be a single patient selected at random. Each Event is an element in the Event Space. So we will denote it as \(\mathcal{F}\) with a subscript so it is easier to read than it would be using arbitrary greek lowercase letters, as is the usual convention. If 3 of the 10 patients in our Sample Space have a fever we can represent the fever Event as follows.

$$ \mathcal{F}_{fever} = \{x_1, x_6, x_8\} $$

This means that if we select a patient at random and that patient is a member of the \(\mathcal{F}_{fever}\) set then that patient has a fever and the outcome has satisfied the event. Similarly we can define the event representing patients that have the flu with the following notation.

$$ \mathcal{F}_{flu} = \{x_2, x_4, x_6, x_8\} $$

As stated earlier Events are simply members of the Event Space. This can be indicated using the following set notation which simply states that the flu Event is a member of the Event Space and the fever Event is also a member of the Event Space.

$$ \mathcal{F}_{fever} \in \mathcal{F} $$
$$ \mathcal{F}_{flu} \in \mathcal{F} $$

Similarly if we wish to indicate that the fever and flu Events are subsets of the Event Space we can do so using the following notation.

$$ \mathcal{F}_{fever} \subset \Omega $$
$$ \mathcal{F}_{flu} \subset \Omega $$

The only term left to define is the Probability Space. This is just the combination of the Event Space, the Sample Space, as well as the probability of each of the Events taking place. It represents all the information we need to determine the chance of any possible outcome occurring. It is denoted as a 3-tuple containing these three things. The probability, P, represents a function that maps Events in \(\mathcal{F}\) to probabilities.

$$ (\Omega, \mathcal{F}, P) $$

Unconditional Probability

This is where things get interesting. Since we have all the important terms defined we can start talking about actual probabilities. We start with the simplest type of probability, the Unconditional Probability. These are the sort of probability most people are familiar with. It is the chance that an outcome will occur independent of any other Events. For example if I flip a coin I can say the probability of it landing on Heads is 50%; this would be an Unconditional Probability.

If all the outcomes in our Sample Space have the same chance of being selected by a Random Trial then calculating the Unconditional Probability is rather easy. The Event would represent all desired outcomes from our Sample Space. So if we wanted to flip a coin and get heads then our Event is a set with a single member and the Sample Space consists of only two members, the possible outcomes. We can write this as the following.

$$ \Omega = \{Heads, Tails\} $$
$$ \mathcal{F}_{Heads} = \{Heads\} $$

If the above event is satisfied by the flip of a coin it means the outcome of the coin toss was heads. To calculate the probability for this event we simply count the number of members in the Event Set and divide it by the number of members in the Sample Space. In this case the result is 50% but we can represent that as follows.

$$ P(\mathcal{F}_{Heads}) = \frac{1}{2} $$

The number of members in a set is called Cardinality. We can represent that using notation that is the same as the absolute value sign used around a set. Therefore we can represent the previous equation using the following notation.

$$ P(\mathcal{F}_{Heads}) = \frac{\mid \mathcal{F}_{Heads} \mid}{\mid \Omega \mid} = \frac{1}{2} $$

We can generalize this for any Event represented as \(\mathcal{F}_{i}\) with the following definition for calculating an Unconditional Probability.

$$ P(\mathcal{F}_{i}) = \frac{\mid \mathcal{F}_{i} \mid}{\mid \Omega \mid} $$

Now let’s apply this to our clinical trial example from earlier. Say we wanted to calculate the chance of selecting someone from the 10 patients in the trial, at random, such that the person selected has a fever. We can calculate that with the following.

$$ P(\mathcal{F}_{fever}) = \frac{\mid \mathcal{F}_{fever} \mid}{\mid \Omega \mid} = \frac{3}{10} $$

We can also do the same for calculating the chance of randomly selecting a patient that has the flu.

$$ P(\mathcal{F}_{flu}) = \frac{\mid \mathcal{F}_{flu} \mid}{\mid \Omega \mid} = \frac{4}{10} = \frac{2}{5} $$

Conditional Probability

A Conditional Probability takes this idea one step further. A Conditional Probability specifies the probability of an event being satisfied if it is known that another event was also satisfied. For example using our clinical trial thought experiment one might ask what is the probability of someone having the flu if we know that person has a fever. This would be represented with the following notation.

$$ P(\mathcal{F}_{flu} \mid \mathcal{F}_{fever}) $$

Assuming that having a fever has some effect on the likelihood of having the flu then this probability would be different than the chance for just any randomly selected member having the flu, after all people with fevers are more likely to have the flu than people without a fever.

Since we already know which patients have the flu and which have a fever it is easy to determine an answer to this question. To calculate the probability we can look at how many patients in our Sample Space have a fever and what percentage of those patients with fever also have the flu. By looking at the data we can see that there are 3 patients with fevers and of those patients only 2 of them have the flu. So the answer is \(\frac{2}{3}\).

$$ \mathcal{F}_{fever} = \{x_1, x_6, x_8\} $$
$$ \mathcal{F}_{flu} = \{x_2, x_4, x_6, x_8\} $$
$$ P(\mathcal{F}_{flu} \mid \mathcal{F}_{fever}) = \frac{2}{3} $$

We can generalize this statement by saying that we take the intersection of the sets that represent the Event for patients with the flu and patients with a fever. The intersection is just the set of all the elements that those two sets have in common.

The symbol for intersection is \(\cap\), therefore we can show the intersection of these two sets as follows.

$$ \mathcal{F}_{flu} \cap \mathcal{F}_{fever} = \{x_6, x_8\} $$

Another way to look at calculating the Conditional Probability would be to take the Cardinality of the intersection of these two Events and divide it by the cardinality of the conditional Event that has been satisfied. So now we have the following.

$$ P(\mathcal{F}_{flu} \mid \mathcal{F}_{fever}) = \frac{\mid \mathcal{F}_{flu} \cap \mathcal{F}_{fever} \mid}{\mid \mathcal{F}_{fever} \mid} = \frac{2}{3} $$

We can also ask a similar, but markedly different, question. If we know a patient has the flu what is the chance that same patient will have a fever. For this we can use the same logic as above and come up with the following.

$$ P(\mathcal{F}_{fever} \mid \mathcal{F}_{flu}) = \frac{\mid \mathcal{F}_{flu} \cap \mathcal{F}_{fever} \mid}{\mid \mathcal{F}_{flu} \mid} = \frac{2}{4} = \frac{1}{2} $$

As you can see the only thing that changed is the denominator which is now the Cardinality of the flu Event rather than the fever Event. We can generalize the equation for calculating a Conditional Probability as follows.

$$ P(\mathcal{F}_{i} \mid \mathcal{F}_{j}) = \frac{\mid \mathcal{F}_{i} \cap \mathcal{F}_{j} \mid}{\mid \mathcal{F}_{j} \mid} $$

Bayes Theorem

Bayes Theorem itself is remarkably simple on the surface yet immensely useful in practice. In its simplest form it lets us calculate a Conditional Probability when we have limited information to work with. If we only knew, for example, the probabilities for \(P(F_i \mid F_j)\), \(P(F_i)\), and \(P(F_j)\), then using Bayes Theorem we could calculate the probability for \(P(F_j \mid F_i)\). The precise equation for Bayes Theorem is as follows.

$$ P(\mathcal{F}_{i} \mid \mathcal{F}_{j}) = \frac{ P(\mathcal{F}_{j} \mid \mathcal{F}_{i}) \cdot P(\mathcal{F}_{i}) }{ P(\mathcal{F}_{j}) } $$

Let’s say we didn’t know all the details of the clinical trial from earlier; we have no idea what the Sample Space is or what members belong to each Event set. All we know is the probability that someone will have a fever at any given time, the probability they will have the flu, and the probability that someone with the flu has a fever. From this limited information, and using Bayes Theorem it would be possible to infer the probability of having the flu if you have a fever. First let’s copy the probabilities we know to match what we previously calculated manually.

$$ P(\mathcal{F}_{fever}) = \frac{3}{10} $$
$$ P(\mathcal{F}_{flu}) = \frac{2}{5} $$
$$ P(\mathcal{F}_{fever} \mid \mathcal{F}_{flu}) = \frac{1}{2} $$

Using only this information, along with Bayes Theorem, we can calculate the probability of someone having the flu if they have a fever as follows.

$$ P(\mathcal{F}_{flu} \mid \mathcal{F}_{fever}) = \frac{ P(\mathcal{F}_{fever} \mid \mathcal{F}_{flu}) \cdot P(\mathcal{F}_{flu}) }{ P(\mathcal{F}_{fever}) } $$
$$ P(\mathcal{F}_{flu} \mid \mathcal{F}_{fever}) = \frac{ \frac{1}{2} \cdot \frac{2}{5} }{ \frac{3}{10} } $$
$$ P(\mathcal{F}_{flu} \mid \mathcal{F}_{fever}) = \frac{2}{3} $$

This solution of course agrees with our earlier results when we were able to calculate the answer by manually counting the data. However, this time we did not have to use the data directly.

Let’s do one more example to drive the point home. Say we have a test for Tuberculosis, TB, that is 95% accurate. That is to say that if you have TB then 95% of the time the test will give you a positive result. Similarly if you do not have TB then only 95% of the time will you get a negative result. We can represent this as follows.

$$ P(\mathcal{F}_{positive} \mid \mathcal{F}_{infected}) = \frac{19}{20} $$

Furthermore let’s say we know that only one in a thousand members of the population are infected with TB at any one time. We can demonstrate this as follows.

$$ P(\mathcal{F}_{infected}) = \frac{1}{1000} $$

Finally let’s say when tested on the general population that 509 out of every 10,000 people received a positive result. We can represent that with the following.

$$ P(\mathcal{F}_{positive}) = \frac{509}{10000} $$

With this information it is possible to calculate the probability someone will have TB if they receive a positive test result. Using Bayes Theorem we can solve for the probability as follows.

$$ P(\mathcal{F}_{infected} \mid \mathcal{F}_{positive}) = \frac{ P(\mathcal{F}_{positive} \mid \mathcal{F}_{infected}) \cdot P(\mathcal{F}_{infected}) }{ P(\mathcal{F}_{positive}) } $$
$$ P(\mathcal{F}_{infected} \mid \mathcal{F}_{positive}) = \frac{ \frac{19}{20} \cdot \frac{1}{1000} }{ \frac{509}{10000} } $$
$$ P(\mathcal{F}_{infected} \mid \mathcal{F}_{positive}) = \frac{ 19 }{ 1018 } = 0.018664 = 1.8664\% $$

This gives us a very surprising result. It says that of the people who take the TB test and show up positive less than 2% of them actually have TB. This demonstrates the importance of using very accurate clinical tests when testing for diseases that have a low occurrence in the population. Even a small error in the test can give false positives at an alarmingly high rate.

Restricted Logarithmic Growth with Injection

The Logistic Function, sometimes with modifications, has been used successfully to model a large range of natural systems. Some examples include bacterial growth, tumor growth, animal populations, neural network transfer functions, chemical reaction rates, language adoption, and diffusion of innovation, to name a few. The real world applications are simply staggering.

Because of the utility of this powerful little equation it has lead me to investigate more thoroughly how it works, and how it can be applied to novel innovations.

There have been many contributions that have led to numerous variations of the Logistic function. One that struck my attention in particular is the Verhulst Equation, sometimes referred to descriptively as the “Restricted Logarithmic Growth Function”, or simply “Logistic Growth Function”. It is used in ecology to model the expected growth of a population while taking into consideration that resources, such as food, are finite, resulting in a maximum sustainable population, called the carrying capacity. The model demonstrates an exponential growth of the population when it is significantly below this carrying capacity, but as the population approaches the carrying capacity, and resource competition increases, the growth rate slows asymptotically. The Verhulst Equation has been tested against numerous real world populations, including Seals and Elk, and has been shown to be a relatively accurate model.

Of course the Verhulst Equation isn’t limited to modeling animal populations, it has also been successfully used to model diffusion of innovation. In this sense it can be used to represent how ideas spread throughout a population. It is this particular application that was most interesting to me.

Unrestricted Exponential Growth

As with any complex idea we have to start with the basics. Whether we are talking about an idea or an organism, if it is capable of spreading through multiplying itself, then obviously the more it spreads, the faster it will spread. You start out with one, which turns into two, then four, then eight, in just a few iterations you will have billions; if there is nothing to impede the growth, then it will follow an exponential curve.

We can model this sort of growth quite simply; the rate at which new members of the population will be observed can be represented as the growth rate, G, multiplied by the population, p.

You will notice in the above equation that there is a derivative on the left hand side of the equation. The equation can therefore be read as “The rate of change in the population for each unit of time is equal to the growth rate multiplied by the current population”. If the population is seen to double each year, for example, then time would have a unit of years, and G would be 2.

Of course the equation becomes much more useful if we can get rid of the derivative and simply express the total population at any point in time. To do that we integrate the equation, while solving for p and we arrive at the following equation.

Here the variable e is Euler’s constant, and P0 is our constant of integration which represents the population when time, t, is equal to 0.

To help visualize whats going on lets try some arbitrary values for our constants. Lets suppose the growth rate is 0.1, and our P0 is 1. This would represent a initial population with only one member capable of replicating once for every 10 units of time that have passed. If we plug these values in and simplify we arrive at the following equation.

It is immediately obvious this is an exponential function, nothing too special about that. We can see from the graph below that the population would continue to grow exponentially and unrestricted.

Restricted Logarithmic Growth

Of course in the real world it is rare that anything will truly grow in an unrestricted manner. Space is finite, resources are finite, eventually everything must stop growing. This brings us to the Restricted Logarithmic Growth model, also called the Verhurst Equation.

Verhurst recognized that, at least amongst animals, resources are limited. When unlimited resources are available to an organism it will grow and reproduce unrestricted. However as the population grows resources become scarce. The more scarce the resources, the greater the restriction on growth and reproduction, and this effect will scale according to the proportion of the resources which are free. Therefore as the population reaches the carrying capacity of the environment then the growth rate should approach 0 in a linear fashion.

Therefore if we take the growth rate equation from before, we can add an additional term to represent the availability of resources.

In the above equation we can see that we multiply by a term that scales proportionally to the population. The term will evaluate to 1 when the population is 0, and will approach 0 as the population approaches the carrying capacity, represented as k(t). This is where the population’s growth restriction is represented.

Before we can actually solve the equation above, we have to actually know what the k(t) function evaluates to; after all we can’t perform integration on a function if we don’t know what that function is. So lets just assume the carrying capacity is a constant we will call K.

At this point we can solve for p and integrate the equation as before. This will give us an equation which models the total population as a function of time.

The above equation is usually what people refer to when they talk about the Verhulst equation, but it might help to see what it looks like with some actual values plugged in for the constants. Lets pick similar values with a growth rate, G, of 0.1, an initial population, P0 of 1, and a carrying capacity, K, of 100.

Once we plug these values in and simplify we arrive at the above equation. It is a specific form of the Logistic Function. If we graph that we will get the following graph.

In the above graph the dotted horizontal orange line represents the carrying capacity, K. This is of course a constant with a value of 100. We can see that early on the population has what appears to be an exponential growth pattern, but as it begins to approach the orange dashed line it asymptotically tappers off. Of course as time approaches infinity (the x axis) then the population approaches the carrying capacity.

Time-varying Carrying Capacity

If we want to play with this equation a bit more we can make things more complex (and fun) by picking a carrying capacity that actually changes with time. If we go back to the original growth rate equation we can make k(t) a function equal to (t/10)+100, rather than a constant. If we do this, and keep the growth rate and P0 constants the same as before, then we will get the following equation.

Again we can solve for p, integrate, and simplify and we will arrive at the following equation.

In the above equation Ei is the Exponential Integral Function. If you don’t know what that is don’t worry, we can just graph the equation below to get a better understanding of how it behaves.

You can see the equation behaves similarly in this graph as in the last graph. The only exception is that the carrying capacity starts at 100 and slowly increases over time. As this happens the population also increases to match the carrying capacity without ever exceeding the carrying capacity. Pretty much how we would expect it to work.

Restricted Logarithmic Growth with Injection

This is where the lessons of history end, and my own contribution begins. I was left considering the effects of a modern world and how they could be reflected in the Verhurst model.

As I mentioned in the introduction the Verhurst equation has since been used in countless applications outside of ecology models. One that was particularly interesting is that it has been used successfully to model the Diffusion of Innovation. Essentially it can accurately describe the adoption of an idea within a population. For example it can model the proliferation of linguistic changes such as the change in the spelling of a word, or even the adoption of an entirely new word.

All of this, of course, has implications in marketing where the marketing penetration of a product can spread through word of mouth alone and can therefore be modeled using the Verhurst equation. But how could we use this to also account for the effects of advertising, which can often act as an accelerant to the spreading of an idea that is still primarily driven by word of mouth. Thats when inspiration struck.

If we accept that ideas spreading through word of mouth will often fit a model similar to population growth, then advertising is a bit like artificially injecting new members into a population as it grows. The advertising acts as a jump start to the population growth, accelerating it, and magnifying the effects that would be seen from word of mouth alone.

If I could incorporate this effect into the model not only could it be used for modeling product adoption, but conservationism efforts as well. It is not uncommon for conservationists to breed in captivity and then release in the hopes of supplementing a specie’s population. In this sense it would be modeled with the same artificial injection.

In fact almost all the examples given earlier where the Verhurst model has been used in the past could potentially benefit from this addition. Consider chemical reactions; an artificial injection factor could represent the effects of slowly adding new reactants to a reaction as it progresses.

So I had to figure out where this new function might fit in. When we go back and take a look at our original differential growth rate equation we have two terms, the left hand term represented the unrestricted growth rate, and the right hand term represented the constraint on growth as the resource contention grows. So all it really takes is adding an injection factor to the left hand term. The new equation looks like the following.

You will notice in the above equation all we did was add the new injection function, shown as i(t). This is just the number of artificially injected members of the population per unit of time. By representing it as a function it can of course change over time such that we can inject a varying number of members over time.

It is important to keep in mind that this injection is still scaled by the right hand term. The more resource contention there is the less effective the injection is likely to be. If we consider the earlier example of breeding animals in captivity and releasing them into the wild then this represents the fact that if resources are scarce many of the animals released are likely to die of starvation before they have a chance to have children. In the case of advertisement, or Diffusion of Innovation in general, then it is reasonable to expect that the more people that know about an idea the harder it would be for your advertisements to reach new people who have yet to hear word of it. If Pepsi plays a commercial on TV, for example, you might expect the vast majority of people who see it to already know about the Pepsi brand.

As before if we want to play around with the equation a bit to see how the model plays out we are going to have to create some definitions for our functions.

For the sake of simplicity we turned our injection function, i(t), into the constant I, and the carrying capacity, k(t), into the constant K. This just gives us something to play with and allows us to solve for p and integrate in the following equation.

Now lets plug in some arbitrary values for our constants and see how it graphs out. Lets pick a value of 1/2 for I, 1/10 for G, 100 for K, and 1 for P0. If we plug these values in and simplify we arrive at the following equation.

Finally, lets graph the carrying capacity and growth model as before. In the graph below carrying capacity is once again represented by the dashed yellow line, and the solid blue line is the population. We can immediately notice that while the graph still has a logistic curve to it that the population grows significantly quicker in the beginning than with the other models, but despite the injection it still does not exceed the carrying capacity at any point.

While this model has yet to be tested against real world data I am hopeful that it will provide an accurate representation of an artificial population injection in growth models. I am looking forward to see how I and others might apply this and similar models to data in the future.