A Multiresolution Time-Frequency Analysis and Interpretation of Musical Rhythm

Leigh M. Smith
University of Western Australia, Australia (October, 2000)


Computational approaches to music have considerable problems in representing musical time. In particular, in representing structure over time spans longer than short motives. The new approach investigated here is to represent rhythm in terms of frequencies of events, explicitly representing the multiple time scales as spectral components of a rhythmic signal.
Approaches to multiresolution analysis are then reviewed. In comparison to Fourier theory, the theory behind wavelet transform analysis is described. Wavelet analysis can be used to decompose a time dependent signal onto basis functions which represent time-frequency components. The use of Morlet and Grossmann's wavelets produces the best simultaneous localisation in both time and frequency domains. These have the property of making explicit all characteristic frequency changes over time inherent in the signal.
An approach of considering and representing a musical rhythm in signal processing terms is then presented. This casts a musician's performance in terms of a conceived rhythmic signal. The actual rhythm performed is then a sampling of that complex signal, which listeners can reconstruct using temporal predictive strategies which are aided by familarity with the music or musical style by enculturation. The rhythmic signal is seen in terms of amplitude and frequency modulation, which can characterise forms of accents used by a musician.
Once the rhythm is reconsidered in terms of a signal, the application of wavelets in analysing examples of rhythm is then reported. Example rhythms exhibiting duration, agogic and intensity accents, accelerando and rallentando, rubato and grouping are analysed with Morlet wavelets. Wavelet analysis reveals short term periodic components within the rhythms that arise. The use of Morlet wavelets produces a "pure" theoretical decomposition. The degree to which this can be related to a human listener's perception of temporal levels is then considered.
The multiresolution analysis results are then applied to the well-known problem of foot-tapping to a performed rhythm. Using a correlation of frequency modulation ridges extracted using stationary phase, modulus maxima, dilation scale derivatives and local phase congruency, the tactus rate of the performed rhythm is identified, and from that, a new foot-tap rhythm is synthesised. This approach accounts for expressive timing and is demonstrated on rhythms exhibiting asymmetrical rubato and grouping. The accuracy of this approach is presented and assessed.
From these investigations, I argue the value of representing rhythm into time-frequency components. This is the explication of the notion of temporal levels (strata) and the ability to use analytical tools such as wavelets to produce formal measures of performed rhythms which match concepts from musicology and music cognition. This approach then forms the basis for further research in cognitive models of rhythm based on interpretation of the time-frequency components.

[BibTex, Return]