A Computational Approach to Rhythm Description --- Audio Features for the Computation of Rhythm Periodicity Functions and their use in Tempo Induction and Music Content Processing

Fabien Gouyon
University Pompeu Fabra, Barcelona, Spain (November, 2005)


This dissertation is about musical rhythm. More precisely, it is concerned with computer programs that automatically extract rhythmic descriptions from musical audio signals.

New algorithms are presented for tempo induction, tatum estimation, time signature determination, swing estimation, swing transformations and classification of ballroom dance music styles. These algorithms directly process digitized recordings of acoustic musical signals. The backbones of these algorithms are rhythm periodicity functions: functions measuring the salience of a rhythmic pulse as a function of the period (or frequency) of the pulse, calculated from selected instantaneous physical attributes (henceforth features) emphasizing rhythmic aspects of sound. These features are computed at a constant time rate on small chunks (frames) of audio signal waveforms.

Our algorithms determine tempo and tatum of different genres of music, with almost constant tempo, with over 80% accuracy if we do not insist on finding a specific metrical level. They identify time signature with around 90% accuracy, assuming lower metrical levels are known. They classify ballroom dance music in 8 categories with around 80% accuracy when taking nothing but rhythmic aspects of the music into account. Finally they add (or remove) swing to musical audio signals in a fully-automatic fashion, while conserving very good sound quality.

From a more general standpoint, this dissertation substantially contributes to the field of computational rhythm description
a) by proposing an unifying functional framework;
b) by reviewing the architecture of many existing systems with respect to individual blocks of this framework;
c) by organizing the first public evaluation of tempo induction algorithms; and
d) by identifying promising research directions, particularly with respect to the selection of instantaneous features which are best suited to the computation of useful rhythm periodicity functions and the strategy to combine and parse multiple sources of rhythmic information.

[BibTex, External Link, Return]