Expressivity-aware Tempo Transformations of Music Performances Using Case Based Reasoning

Maarten Grachten
University Pompeu Fabra, Barcelona, Spain (November, 2006)


This dissertation is about expressivity-aware tempo transformations of monophonic audio recordings of saxophone jazz performances. It is a contribution to content-based audio processing, a field of technology that has recently emerged as an answer to the increased need to deal intelligently with the ever growing amount of digital multimedia information available nowadays. Content-based audio processing applications may for example search a data base for music that has a particular instrumentation, or musical form, rather than just searching for music based on meta-data such as the artist, or title of the piece.

Content-based audio processing also includes making changes to the audio to meet specific musical needs. The work presented here is an example of such content-based transformation. We have investigated the problem of how a musical performance played at a particular tempo can be rendered automatically at another tempo, while preserving naturally sounding expressivity. Or, differently stated, how does expressiveness change with global tempo. Changing the tempo of a given melody is a problem that cannot be reduced to just applying a uniform transformation to all the notes of the melody. The expressive resources for emphasizing the musical structure of the melody and the affective content differ depending on the performance tempo. We present a case based reasoning system to address this problem. It automatically performs melodic and expressive analysis, and it contains a set of examples of tempo-transformations, and when a new musical performance must be tempo-transformed, it uses the most similar example tempo-transformation to infer the changes of expressivity that are necessary to make the result sound natural.

We have validated the system experimentally, and show that expressivity-aware tempotransformation are more similar to human performances than tempo transformations obtained by uniform time stretching, the current standard technique for tempo transformation. Apart from this contribution as an intelligent audio processing application prototype, several other contributions have been made in this dissertation. Firstly, we present a representation scheme of musical expressivity that is substantially more elaborate than existing representations, and we describe a technique to automatically annotate music performances using this representation scheme. This is an important step towards fullyautomatic case acquisition for musical CBR applications. Secondly, our method reusing past cases provides an example of solving synthetic tasks with multi-layered sequential data, a kind of task that has not been explored much in case based reasoning research. Thirdly, we introduce a similarity measure for melodies that computes similarity based on an semi-abstract musical level. In a comparison with other state-of-the-art melodic similarity techniques, this similarity measure gave the best results. Lastly, a novel evaluation methodology is presented to assess the quality of predictive models of musical expressivity.

[BibTex, PDF, Return]