Techniques for the Automated Analysis of Musical Audio

Stephen Webley Hainsworth
University of Cambridge, UK (September, 2004)


This thesis presents work on automated analysis techniques for musical audio. A complete analysis of the content within an audio waveform is termed ‘transcription’ and can be used for intelligent, informed processing of the signal in a variety of applications. However, full transcription is beyond the ability of current methods and hence this thesis concerns subtasks of whole problem.

A major part of musical analysis is the extraction of signal information from a time-frequency representation, often the short time Fourier transform or spectrogram. The ‘reassignment’ method is a technique for improving the clarity of these representations. It is comprehensively reviewed and its relationship to a number of classical instantaneous frequency measures is explicitly stated. Performance of reassignment as a sinusoidal frequency estimator is then assessed quantitatively for the first time. This is achieved in the framework of the Cram´er-Rao bound. Reassignment is shown to be inferior to some other estimators for simple synthetic signals but achieves comparable results for real musical examples. Several extensions and uses of the reassignment method are then described. These include the derivation of reassignment measures extracted from the amplitude of the spectrogram, rather than the traditional phase-based method and the use of reassignment measures for signal classification.

The second major area of musical analysis investigated in this thesis is beat tracking, where the aim is to extract both the tempo implied in the music and also its phase which is analagous to beat location. Three models for the beat in general music examples are described. These are cast in a state-space formulation within the Bayesian paradigm, which allows the use of Monte Carlo methods for inference of the model parameters. The first two models use pre-extracted note onsets and model tempo as either a linear Gaussian process or as Brownian motion. The third model also includes on-line detection of onsets, thus adding an extra layer of complexity. Sequential Monte Carlo algorithms, termed particle filters, are then used for the estimation of the data. The systems are tested on an extensive database, nearly three and a half hours in length and consisting of various styles of music. The results exceed the current published state of the art.

The research presented here could form the early stages of a full transcription system, a proposal for which is also expounded. This would use a flow of contextual information from simpler, more global structures to aid the inference of more complex and detailed processes. The global structures present in the music (such as style, structure, tempo, etc.) still have their own uses, making this scheme instantly applicable to real problems.

[BibTex, PDF, Return]