Beat Detection Algorithms (Part 1)
You can think to the beat of a song as the rythm you tap your foot at while
listening to it. The tempo of a song is usually measured in beats per minute.
The tempo of a song, its beats, are usually felt by a listener and drive him,
for instance, to dance according to the song’s rythm. As it is often the case,
things that a human can feel are not easy to detect through a computer
program. This is the first of 2 posts on beat detection algorithms in which I introduce
two simple algorithms I implemented in Scala scala-audio-file library.
Please notice that so far the library is only able
to process WAV files.
###Sound Energy Algorithm
The first algorithm I will talk about was originally presented
here. I implemented it in the class
SoundEnergyBPMDetector
.
The algorithm divides the data into blocks of samples and compares the energy
of a block with the energy of a preceding window of blocks.
The energy of a block is used to detect a beat. If the energy is above a certain
threshold then the block is considered to contain a beat.
The threshold is defined starting from the average energy of the window of
blocks preceding the one we are analyzing.
If a block j is made of 1024 samples and the song is stereo, its energy can be computed as:
\[ E _ j = \sum _ {i=0} ^ {1023} left[i]^2 + right[i]^2 \]
The block’s energy is then placed in a circular buffer that stores all the energy values for the current window. Let us assume that the current window is made of 43 blocks (\(43 \cdot 1024 = 44032\) ~ \(1s\) with a sample rate of \(44100\)), the average window energy can be computed as:
\[ avg(E) = \frac{1}{43} \sum _ {j=0} ^ {42} E _ j \]
We detect a beat if the instant energy \(E _ j\) is bigger than \(C \cdot avg(E)\).
In the original article, a
way to compute \(C\) is proposed as a linear regression of the energy variance in the corresponding window.
In the SoundEnergyBPMDetector
class
I use a slightly modified equation that lowers the impact of variance:
\[ C = -0.0000015 \cdot var(E) + 1.5142857 \]
In general, however, the bigger the variance the more likely we consider a block to be a beat. The variance inside a window of blocks is defined as:
\[ var(E) = \frac{1}{43} \sum _ {j=0} ^ {42} (avg(E) - E _ j)^2 \]
This first algorithm is very easy to implement and fast to execute.
However, the results computed are rather imprecise. To evaluate it add the
scala-audio-file library
to your project and instantiate the SoundEnergyBPMDetector
class as:
val audioFile = WavFile("filename.wav")
val tempo = SoundEnergyBPMDetector(audioFile).bpm
###Low Pass Filter Algorithm
This second algorithm is based on a
post
originally hosted on Beatport’s engineering blog by Joe Sullivan.
The original article uses the
Web Audio Api
to implement browser-side beat detection. A similar but more extensible
implementation of the algorithm is provided by the class FilterBPMDetector
.
First, the algorithm applies to sampled data a biquad low-pass filter
(implemented by the class BiquadFilter
). Filter’s parameters are computed
according to the “Cookbook formulae for audio EQ biquad filter coefficients”
by Robert Bristow-Johnson.
Filtered audio data are divided into windows of samples (whose dimension is
the sample frequency, i.e. 1s). A peak detection
algorithm is applied to each window and identifies as peaks
all values above a certain threshold (defined as \(C \cdot avg(window)\) where \(C\)
is for the user to be configured, e.g. \(0.95\)).
For each peak the distance in number of samples between it and its neighbouring
peaks is stored (10 neighbours are considered).
For each distance between peaks the algorithm counts how many times it has been detected across all windows in the song.
Once all windows are processed the algorithm has built a map distanceHistogram
where for each possible distance its number of occurrences is stored:
distanceHistogram(distance) = count
For each distance a theoretical tempo can be computed according to the
following equation:
\[ theoreticalTempo = 60 / (distance / sampleRate) \]
Here the algorithm builds on the assumption that the actual tempo of the song
lies in the interval [90, 180]. Any bigger theoretical tempo is divided by 2
until it falls in the interval. Similarly, any smaller tempo is multiplied by 2.
By converting each distance to the corresponding theoretical tempo
a map tempoHistogram
is built where each tempo is
associated the sum of occurrences of all distances that lead to it:
tempoHistogram(tempo) = count
The algorithm computes the tempoHistogram
map and selects as
the track’s tempo the one with the highest count.
This second algorithm is also very fast and provides better results than
the first one. To evaluate it, add the
scala-audio-file library
to your project and instantiate the FilterBPMDetector
class as:
val audioFile = WavFile("filename.wav")
val filter = BiquadFilter (
audioFile.sampleRate,
audioFile.numChannels,
FilterType.LowPass)
val detector = FilterBPMDetector(audioFile, filter)
val tempo = detector.bpm
Conclusion
To sum things up, both algorithms are very fast to implement and execute. The second one is a bit slower but provides better approximations of the actual tempo. More time consuming algorithms that compute almost exact results can be developed by exploiting the Fast Fourier Transform or the Discrete Wavelet Transform. I will discuss such algorithms in the next post.