The Mathematics of Harmony

Garry Ho

This essay was originally written as a submission to the 2026 Leonardo Competition.

Introduction

I would like to begin this essay with an interactive activity. On whichever polyphonic instrument would be the most convenient, I invite the reader to play the note A₄. Furthermore, I ask that the reader simultaneously play an E₅. You will notice that these notes sound good with each other.

Figure 1.1. Perfect fifth

Sheet music in treble clef and 4/4 time: first measure shows half-note A4 followed by half-note E5; second measure shows whole note A4 together with E5.

However, if you play A₄ at the same time as E♭₅, you will find this combination of notes to be dissonant.

Figure 1.2. Tritone

Sheet music in treble clef and 4/4 time: first measure shows half-note A4 followed by half-note Eb5; second measure shows whole note A4 together with Eb5.

Playing A₄ together with A♯₄ arguably sounds even worse.

Figure 1.3. Minor second

Sheet music in treble clef and 4/4 time: first measure shows half-note A4 followed by half-note A#4; second measure shows whole note A4 together with A#4.

The reason for this has to do with the frequencies of these pitches. A₄ has a frequency of \(440\,\mathrm{Hz}\) and E₅ has a frequency of \(660\,\mathrm{Hz}\). These are in the ratio \(3:2\) (E₅ : A₄) and so sound good together. Pitches with frequencies in this ratio are said to be in a perfect fifth interval. In fact, perfect fifths sound so good that Billie Joe Armstrong only plays power chords in his songs. However, the frequencies of E♭₅ and A₄ are in the ratio \(64:45\), a tritone, which sounds nasty due to the non-simple ratio. In fact, this interval has been called the Devil’s interval due to its profound dissonance. Similarly, the frequencies of A₄ and A♯₄ are in the ratio \(16:15\), a minor second.

However, one might astutely observe that this doesn’t really answer anything at all. Surely our ears can’t tell whether the frequencies of two pitches are in a simple or complex ratio. Many people don’t even know what a ratio is. So how can our ears possibly differentiate between consonant and dissonant intervals? The true answer lies in the harmonic series.

Harmonic series

Most people know that all the sounds you hear are just made up of sine waves, where amplitude corresponds to volume and frequency corresponds to pitch. Most sounds, however, are not so simple. For example, listen to the two audio clips below of the bassline for “Money” by Pink Floyd. The first one is played by a pure sine wave, whereas the second is played by Matthew Duan (L5^th Head’s).

Figure 2.1. Bassline for “Money” by Pink Floyd

Sheet music in B Minor in bass clef and 7/4 with swung eights: first and only measure shows quarter-note B2, then eighth-notes B3 then F#3, then staccato quarter note B2, then quarter notes F#2 then A2, then staccato quarter note B2, then quarter note D3.

One can clearly hear the difference between a pure sine wave and an instrument. The reason for this difference is that the wave corresponding to the sound made by an instrument is actually composed of multiple sine waves of different amplitudes and frequencies. Consider, for example, an electric bass playing the first note of the bassline above. If we decompose the sound emitted from the amplifier into sine waves (by a Fourier transform), we would see that it is made up of not only B₁ (known as the fundamental frequency), but also B₂, a 2% sharp F♯₃, B₃ and numerous other pitches in gradually smaller volumes. These are known as the overtones or harmonics of B₁. Together, they form the overtone series or harmonic series (not to be confused with \(\sum_{n=1}^\infty\frac{1}{n}\), although related) of B₁. The figure below shows the first 16 harmonics of A₁, where the numbers indicate how sharp or flat the pitches are in cents (hundredths of a semitone).

Figure 2.2. Harmonic series of A₁

Sheet music showing first 16 harmonics of A1: A1, A2, E3 + 2 cents, A3, C#4 - 14 cents, E4 + 2 cents, G4 - 31 cents, A4, B4 + 4 cents, C#5 - 14 cents, D#5 - 49 cents, E5 + 2 cents, F5 + 41 cents, G5 - 31 cents, G#5 - 12 cents and A5.

I have also embedded a nice interactive demo of harmonics below. You can click on the strings below to play the harmonics of the pitch corresponding to \(100\,\mathrm{Hz}\) (about 35 cents sharp of G₂). Make sure to unmute it at the top right-hand corner.

Demo 2.3. Demonstration of harmonics by Alexander Chen (original link)

The amplitudes of the overtones produced by an instrument are what determines its unique sound, or timbre. As a side note, harmonics also correspond to natural harmonics played on stringed instruments. For example, the second harmonic corresponds to twelfth-fret harmonics on guitar and the third harmonic corresponds to seventh-fret or nineteenth-fret harmonics. This is because the twelfth fret is \(\frac{1}{2}\) along the string and the seventh and nineteenth frets are \(\frac{1}{3}\) along the string. But what are the pitches in a harmonic series? In fact, the frequencies of harmonics are precisely the integer multiples of the fundamental frequency. For example, the harmonics of A₁ (\(55\,\mathrm{Hz}\)) are A₂ (\(110\,\mathrm{Hz}\)), E₃ (\(165\,\mathrm{Hz}\)), A₃ (\(220\,\mathrm{Hz}\)), and so on.

Now, going back to our original question, the reason why simple ratios sound good and complex ratios sound bad is because when, say, a perfect fifth is played, the frequencies line up very often – every 3 time periods of the lower-frequency wave and every 2 time periods of the higher-frequency wave. In particular, this is true because this interval appears earlier in the harmonic series, namely, between the second and third harmonics, whereas more dissonant intervals appear later on in the harmonic series. Furthermore, playing minor seconds or any two close frequencies can also cause beating, where the waves interfere with each other, causing a perceived periodic change in volume. This is clearly observed when, on an electric guitar with heavy distortion, you play two notes off by a semitone. You can even bend the lower note by a small amount like a quarter tone to increase the beating.

A fundamental problem

From here, we may define all the different intervals in terms of harmonic series. An octave is \(2:1\), a perfect fifth is \(3:2\), a perfect fourth is \(4:3\). We can even define minor sevenths, for example, as \(7:4\). This type of tuning system, based on harmonics, is known as just intonation. However, a problem immediately arises. Suppose we justly tune the A major scale with respect to A, letting A₄ be \(440\,\mathrm{Hz}\). We end up with the following:

Table 3.1. Justly tuned A major scale

Pitch	A₄	B₄	C♯₅	D₅	E₅	F♯₅	G♯₅
Ratio from A₄	\(1:1\)	\(9:8\)	\(5:4\)	\(4:3\)	\(3:2\)	\(5:3\)	\(15:8\)
Frequency / \(\mathrm{Hz}\)	\(440\)	\(495\)	\(550\)	\(587\)	\(660\)	\(733\)	\(880\)

You’ll notice that in this tuning system, known as Ptolemaic tuning, A₄ and E₅ sound very consonant together because of the perfect \(3:2\) ratio. However, if you look at the ratio between B₄ and F♯₅, which is also supposed to be a perfect fifth, you’ll see that \(\frac{5}{3}:\frac{9}{8}\) is actually \(40:27\), about \(2.96:2\). This interval is known as a wolf fifth or an imperfect fifth because it sounds horrible.

Figure 3.2. Wolf fifth

Sheet music in treble clef and 4/4 time: first measure shows half-note B4 followed by half-note F#5; second measure shows whole note B4 together with F#5.

The reason for this is because just intonation is not equally tempered. In other words, one interval can have two different ratios. For example, the ratio between A₄ and B₄ is \(9:8\) whereas the ratio between B₄ and C♯₅ is \(10:9\), even though both intervals are major seconds.

There have been many proposed solutions to this problem. One is Pythagorean tuning, which, because Pythagoras didn’t believe in irrational numbers, tries to produce every interval using only ratios of integer powers of \(3\) and powers of \(2\). This is essentially trying to produce every pitch by repeatedly either lowering or raising a reference pitch by perfect fifths and octaves. This gives the following ratios for intervals:

Table 3.3. Pythagorean tuning

Semitones	\(0\)	\(1\)	\(2\)	\(3\)	\(4\)	\(5\)	\(6\)	\(7\)	\(8\)	\(9\)	\(10\)	\(11\)
Ratio	\(1:1\)	\(256:243\)	\(9:8\)	\(32:27\)	\(81:64\)	\(4:3\)	\(1024:729\)	\(3:2\)	\(128:81\)	\(27:16\)	\(16:9\)	\(243:128\)

The ratio for 6 semitones can also be \(729:512\) depending on whether the interval is considered to be diminished fifth or an augmented fourth (both the same as a tritone in regular tuning). As an exercise, I invite the reader to try to spot the powers of \(2\) and \(3\) in the ratios.

Of course, this carries its own problems. Notably, if you go around the circle of fifths, going up twelve perfect fifths should take you back to the original note (up 7 octaves). However, twelve perfect fifths in Pythagorean tuning are \(\left(\frac{3}{2}\right)^{12} \approx 129.75 \ne 128\). This discrepancy is known as the Pythagorean comma which has a value of \(\frac{1.5^{12}}{2^7} \approx 1.014\), about 23.46 cents. This can also be seen between two enharmonic notes like a diminished fifth and an augmented fourth as seen above.

What we really need is some ratio \(r\) for a semitone such that \(r^{12} = 2\) so that a semitone is always the same, whilst simultaneously satisfying the condition that twelve semitones make up an octave. Because a ratio of \(0.9175+0.5297i\) wouldn’t make much sense here, we instead choose \(r\) to be \(\sqrt[12]{2}\). As such, twelve-tone equal temperament (12 TET) was born. In 12 TET, we begin by defining a semitone to be \(\sqrt[12]{2}\) and A₄ to be \(440\,\mathrm{Hz}\). Of course, A440 is largely arbitrary and some songs use other standards; “Pink Triangle” by Weezer, for instance, uses \(445\,\mathrm{Hz}\). From here, any interval can be defined simply as powers of \(\sqrt[12]{2}\). For example, a major third is \((\sqrt[12]{2})^4:1 = 2^{\frac{1}{3}}:1\) and an augmented sixth is \((\sqrt[12]{2})^{10}:1 = 2^{\frac{5}{6}}:1\).

Now, you’ll notice that in 12 TET, a perfect fifth is actually \(2^{\frac{7}{12}}:1\approx 2.997:2\) instead of \(3:2\). This is actually why in Figure 2.2, the third harmonic is labeled as 2 cents sharp with respect to 12 TET, and also why Jacob Collier might tell you that his piano is out of tune. This also means that I lied at the beginning when I claimed that E₅ was \(660\,\mathrm{Hz}\) – it’s actually about \(659.26\,\mathrm{Hz}\). Of course, this is not ideal, but clearly our ears don’t mind anyway, considering all the music you listen to is (probably) in 12 TET.

Inspirational Poster 3.4. Feel free to hang on walls or set as desktop wallpapers.

Poster with mountain background showing quote "Indeed it has been said that 12 TET is the worst tuning system except for all those other systems that have been tried from time to time..." by Winston Churchill, who is shown on the right hand side of the poster with a drop shadow.

All figures and audio created in Musescore except for bassline played by Matthew. Inspirational poster created in Inkscape.