Ten Experiments on the Modelling of Polyphonic Timbre

Jean-Julien Aucouturier
University of Paris 6, Paris, France (May, 2006)


The majority of systems extracting high-level music descriptions from audio signals rely on a common, implicit model of the global sound or polyphonic timbre of a musical signal. This model represents the timbre of a texture as the long-term distribution of its local spectral features. The underlying assumption is rarely made explicit: the perception of the timbre of a texture is assumed to result from the most statistically significant feature windows. This thesis questions the validity of this assumption. To do so, we construct an explicit measure of the timbre similarity between polyphonic music textures, and variants thereof inspired by previous work in Music Information Retrieval. We show that the precision of such measures is bounded, and that the remaining error rate is not incidental. Notably, this class of algorithms tends to create false positives - which we call hubs - which are mostly always the same songs regardless of the query. Their study shows that the perceptual saliency of feature observations is not necessarily correlated with their statistical significance with respect to the global distribution. In other words, music listeners routinely "hear" things that are not statistically significant in musical signals, but rather are the result of high-level cognitive reasoning, which depends on cultural expectations, a priori knowledge, and context. Much of the music we hear as being "piano music" is really music that we expect to be piano music. Such statistical/perceptual paradoxes are instrumental in the observed discrepancy between human perception of timbre and the models studied here.

[BibTex, PDF, Return]