GCD, Euclid and Bezout: an Eternal Golden Braid

This blogpost is written to give some intuition to a few elementary concepts in number theory, which you might encounter in your first discrete mathematics course in university. Prerequisites: familiarity with numbers and some familiarity with thinking. Familiarity with Python is useful, but not required.

We will start with a puzzle. You have two water jugs, one can hold 5 liters, the other 3. Your goal is to measure out exactly 1 liter of water by filling the jugs, emptying them or pouring water between them.

Think about it for a second before watching the solution below.

0:00

As the video shows, we can fill the small jug, pour it into the big one, fill it again and pour 2 liters of it into the big one, which leaves us with 1 liter.

What if our jugs have sizes 4 and 6? Can we still measure out 1 liter? If we try, it seems like we only end up with an even number. If we think about it, it should be obvious why. When pouring water between the jugs, we are basically adding and subtracting the sizes of the jugs. The solution for the original puzzle can be thought of as \(2\cdot 3 - 1\cdot 5\ = 1\), because we're filling the small jug twice, and emptying the big jug once. So a possible solution for sizes 4 and 6 would similarly be of the form \(4x + 6y\), for integers \(x\) and \(y\). But this can never equal 1, because \(4x + 6y = 2(2x + 3y)\) is always even. Therefore the puzzle is impossible for sizes 4 and 6.

The Greatest Common Divisor

The greatest common divisor (GCD) of two numbers \(a\) and \(b\) is exactly what it says on the tin - the biggest number that divides both \(a\) and \(b\), which we will write as \(\gcd(a, b)\). A few properties should be fairly obvious from the definition, namely \(\gcd(a, b) = \gcd(b, a)\) and \(\gcd(a, a) = a\). Also note that if \(d\) divides \(a\) and \(b\), then \(d \leq \gcd(a, b)\).

If we represent a number as a multiset of its prime factors (12 is \(\{2, 2, 3\}\), 21 is \(\{3, 7\}\), etc.), then the GCD is equivalent to taking the intersection of the sets of prime factors (the GCD of 12 and 21 is \(\{2, 2, 3\} \cap \{3, 7\} = \{3\}\)). A related concept, the least common multiple, would be equivalent to taking the union of the sets. (If we put on our abstraction hat for a second and say that for a set \(A\) to be "smaller" than another set \(B\) means that it is \(A\) subset of \(B\), while for a number \(a\) to be "smaller" than another number \(b\) means that \(a\) divides \(b\), then we see that both the set intersection and the GCD is in some sense the biggest object that's smaller than both its operands. This kind of abstraction is formalized in the concept of preorders, where set intersection and GCD are examples of meets in their respective preorders).

We'll call an expression of the form \(ax + by\) a linear combination of \(a\) and \(b\), with integer coefficients \(x\) and \(y\). In general, a linear combination of \(a\) and \(b\) is always a multiple of their greatest common divisor. In other words,

\[ ax + by = \gcd(a, b)\cdot k \tag{1}\]

for some integer \(k\). This is just saying that you can factor out the GCD from \(ax + by\) - it's just a consequence of the distributivity of multiplication over addition. This is (the start of) an important result, because it's says something about what kind of numbers we can create using linear combinations. It immediately tells us that if the volumes of the jugs share a factor bigger than one, we can't solve the puzzle. We would like to know if we can find coefficients \(x, y\) that makes \(k=1\), because then we know we can always solve the water jug puzzle when the volumes have no factors in common. Before we can figure this out we need to take a detour to ancient Greece.

The Euclidean Algorithm

Described by Knuth as the granddaddy of all algorithms, the Euclidean algorithm was known by Euclid more than 2000 years ago, and probably discovered even earlier. We will attempt to rediscover it using what little we know of linear combinations.

We start with positive integers \(a, b\), and let's assume \(a \leq b\) to begin with. The goal of the algorithm is to find \(\gcd(a, b)\). We will do this by repeatedly transforming the input \((a, b)\) into something smaller until the answer is obvious. In particular, we want to show that \(\gcd(a, b) = \gcd(a, b-a)\). First notice that \(b-a\) is a multiple of \(\gcd(a, b)\), since \(b-a\) is a linear combination of \(a\) and \(b\) (a consequence of equation (1)). Therefore \(\gcd(a, b)\) is a divisor of both \(a\) and \(b-a\), which gives

\[ \gcd(a, b) \leq \gcd(a, b-a) \]

Now we can do the same argument in reverse - we have \(b = a + (b-a)\), which means that \(b\) is a linear combination of \(a\) and \(b-a\). Therefore, \(\gcd(a, b-a)\) divides both \(a\) and \(b\), which gives

\[ \gcd(a, b-a) \leq \gcd(a, b) \]

Combining the inequalities, we get

\[ \gcd(a, b) = \gcd(a, b - a) \tag{2}\]

This is all we need to know to recover the ancient algorithm. If we want to calculate \(\gcd(12, 21)\) we can just calculate \(\gcd(12, 21-12) = \gcd(12, 9)\) instead. We can do this transformation multiple times, making sure to subtract the smaller number from the larger. Let's write it in Python! It'll be recursive, just how we like it. Here it goes:

def gcd(a, b):
    if a < b:
        return gcd(b-a, a)
    elif a > b:
        return gcd(a-b, b)
    else: # a == b
        return a

It's not important to understand this code, so keep reading even if you can't code.

We can convince ourselves of the correctness of this algorithm in two steps: firstly, the answer must be correct because the first two branches (the recursive step) uses equation (2) and \(\gcd(a, b) = \gcd(b, a)\), while the base case uses \(\gcd(a, a) = a\). Secondly, the algorithm terminates in a finite amount of time because the recursive step transforms the input into smaller positive integers, but there are only a finite number of positive integers less than \(a\) and \(b\), so we must reach the base case eventually.

Shown below is a visualization of the algorithm with \(a=9=3\cdot3\) and \(b=15=3\cdot5\).

0:00

The algorithm calculates that \(\gcd(9, 15) = \gcd(6, 9) = \gcd(3, 6) = \gcd(3, 3) = 3\). Note that I have factored the input so we can see clearly how the GCD is unchanging at each step of the algorithm, but this is just a visual aid. The point of the algorithm is that it lets you calculate the GCD of two numbers quickly without having to factor them.

We can improve this algorithm somewhat. See what happens when we run it with \(a=2, b=9\):

0:00

It goes through the steps \((2, 9) \to (7, 2) \to (2, 5) \to (3, 2) \to (2, 1)\), but ideally we would like it to just immediately subtract \(8 = 2\cdot4\) from 9. That is, instead of \((a, b) \to (a, b-a)\), we want to subtract from \(b\) as many copies of \(a\) as possible. Let \(q\) be the number of copies we subtract, and \(r = b - qa\) the result. Now notice that since division is just repeated subtraction, \(q\) is actually just \(b\) divided by \(a\) rounded down (ie. the integer division of \(b\) and \(a\)), which means that \(r\) is the remainder, written \(b \bmod a\). That is, maximally subtracting \(a\) from \(b\) is the same as taking the remainder \(b \bmod a\). This is worth taking note of: \(b \bmod a = b - qa\) is a linear combination of \(b\) and \(a\), with coefficients \(1\) and \(-q = -\lfloor b/a\rfloor\) respectively.

If we go back to our derivation of \(\gcd(a, b) = \gcd(a, b-a)\) we see that the same arguments apply using \(b - qa\) (ie. \(b \bmod a\)) instead, so we can conclude

\[ \gcd(a, b) = \gcd(a, b \bmod a) \tag{3}\]

In Python \(b \bmod a\) is written b % a, so the improved algorithm looks like this:

def gcd(a, b):
    if a == 0:
        return b
    return gcd(b % a, a)

Here the base case is \(\gcd(0, b) = b\), which is true because \(0\) is divisible by every number.

Here's what it looks like using \(a=4\) and \(b=18\):

0:00

There is one more improvement we have to do to the algorithm. Convince yourself for a moment that a linear combination of linear combinations is still just a linear combination. The Euclidean algorithm replaces the inputs with linear combinations, which are again replaced by linear combinations... It's linear combinations all the way down, and when we end up at the base case with \(\gcd(0, a') = a'\), we know that \(a'\) is a linear combination of the original inputs \(a\) and \(b\). Here, let me show you how it works with \(a=5, b=7\):

0:00

By keeping track of the coefficients at each step, we can find the coefficients \(x, y\) such that

\[ ax + by = \gcd(a, b) \]

This is called the extended Euclidean Algorithm. Let's just quickly implement it in Python. There are probably a lot of ways to do this, but I like to think of the coefficients as vectors. To begin with, \(a\) has the coefficient vector \(\begin{bmatrix}1 & 0\end{bmatrix}\):

\[ a = \begin{bmatrix}1 & 0\end{bmatrix}\begin{bmatrix}a \\ b\end{bmatrix} \]

Don't be scared if you don't know linear algebra - this is just the same as writing \(a = 1\cdot a + 0 \cdot b\). Similarly, \(b\) has coefficient vector \(\begin{bmatrix}0 & 1\end{bmatrix}\).

What's nice with using vectors in this way is that we can do the exact same transformation to the coefficient vectors as we do to \(a\) and \(b\) themselves. If we call the coefficient vectors of \(a, b\) \(C_a\) and \(C_b\) respectively, the coefficient vector of \(b - qa\) would be \(C_b - q C_a\), because

\[ \begin{align*} b - qa &= C_b\begin{bmatrix}a_0 \\ b_0\end{bmatrix} - qC_a\begin{bmatrix}a_0 \\ b_0\end{bmatrix} \\ &= \left(C_b - q C_a\right) \begin{bmatrix}a_0 \\ b_0\end{bmatrix} \end{align*} \]

Now the implementation is just a small modification of the implementation of the ordinary Euclidean algorithm. We'll be using numpy to get the vector arithmetic we need.

import numpy as np

def gcd_helper(a, c_a, b, c_b):
    if a == 0:
        return b, c_b
    q = b // a
    return gcd_helper(b - q * a, c_b - q * c_a, a, c_a)

def extended_gcd(a, b):
    return gcd_helper(a, np.array([1, 0]), b, np.array([0, 1]))

It's a bit ugly, but it works. Note that q = b // a is just the integer division of b and a, in which case b - q*a is just the same as b % a. If we run extended_gcd(100, 123), we get (1, array([ 16, -13])), and we can verify that \(16\cdot 100 - 13\cdot 123 = 1\).

Bézout's Identity

Going back to the present, what have we learned from this detour? The gist is that, using the extended Euclidean algorithm, we can always find integers \(x, y\) such that

\[ ax + by = \gcd(a, b) \tag{4}\]

This is called Bézout's identity, and it fully resolves the water jug puzzle we started with. As long as the volume of the jugs have a GCD equal to 1, we can measure out 1 liter of water. For example, if the jugs have volumes of 34 and 55 respectively, we can run extended_gcd(34, 55), which returns (1, array([ -21, 13])). Under the interpretation that a negative coefficient means how many times we empty the jug, and a positive coefficient means how many times we fill it, this basically gives us the solution.

Note that we can multiply both sides of equation (4) by an arbitrary integer \(k\):

\[ a(kx) + b(ky) = k\gcd(a, b) \]

which means that a linear combination of \(a\) and \(b\) can become any multiple of the GCD. That is, every linear combination of \(a\) and \(b\) is a multiple of \(\gcd(a, b)\) and every multiple of \(\gcd(a, b)\) is a linear combination of \(a\) and \(b\) - or in other words, these two (infinite) sets are the same:

\[ \{\ ax + by\ |\ x, y \in \mathbb{Z}\ \} = \{\ k\cdot\gcd(a, b)\ |\ k \in \mathbb{Z}\ \} \]

For the water jug puzzle, this means that if we have jugs of volumes 3 and 5, we can measure out 1, 2, 3, 4 or 5 liters, whereas with jugs of volumes 4 and 6, we can only measure out 2, 4 or 6 liters.

If I had to summarize Bézout's identity in plain English, it would be this: the more dissimilar two numbers are in terms of prime factorization, the more they can create together as linear combinations.

There is more that could be said about Bézout's identity and its applications, but I have to wrap it up. I will leave you with some questions to test your understanding: ignoring leap years, why does Christmas Day fall on a different weekday each year? What if there were 5 days in a week? What if there were 15 days in a week?