Inferring Score Level Musical Information From Low-Level Musical Data

Jürgen F. Kilian
Darmstadt University of Technology, Darmstadt, Germany (October, 2004)


The task of inferring score level musical information from low-level musical data can be accomplished by human listeners depending on their training almost intuitively, but an algorithmic model for computer aided transcription is very hard to achieve. In between there exist a large number of approaches addressing different issues related to the process of musical transcription. Especially for the two core issues in the context of transcription, i.e., tempo detection and quantisation, still no general, adequate, and easy-to-use approach seems to be available. Many of the approaches described in the literature have been implemented only in a prototypical way or are restricted to certain styles of music or input data.

This thesis gives an introduction to the general issue of computer aided transcription and describes related approaches known from literature as well as new approaches developed in the context of this thesis. It also describes the implementation and evaluation of these new models by the implementation of a complete system for inferring score level information from low-level symbolic musical data as an usable transcription tool. The described system consists of several modules each addressing specific issues in the context of musical transcription. For each module the thesis includes a discussion of related work known from literature as well as a description of the specific implemented approaches. For the two main tasks during transcription tempo detection and quantisation new pattern-based approaches have been developed and implemented. Beside these main issues the thesis addresses also approaches for voice separation, segmentation and structural analysis, inferring of keyand time signature, pitch spelling, and the detection of musical ornaments. Also approaches for inferring other secondary score elements such as slurs, staccato, and intensity marking are discussed.

A general intention behind the here developed approaches is adequacy in the sense that somehow simple input data should be processed automatically but for more complex input data the system might ask the user for additional information. Where other approaches try to infer always the correct transcription, the here described system was built under the assumption, that a correct, fully automatic algorithmical transcription is not possible for all cases of input data. Therefore the here described system analyses the input and output data for certain features and might ask the user for additional information or it might create warnings if potential errors are detected.

Because the processing of audio files for the detection of the start- and end-positions of notes, and their pitch and intensity information is a complex, challenging task on its own, the here described system uses low-level symbolic data consisting of explicit note objects with pitch, intensity and absolute timing information as input. Different from most other approaches in this area the system uses the Guido Music Notation format as file format for the inferred output data (i.e., musical scores). Where other file formats (e.g., MIDI) are not able to represent all types of high-level score information, they cannot be converted into graphical scores (e.g., ASCII note lists), or it becomes a complex task to create them (e.g., proprietary binary formats), Guido Music Notation can be created algorithmically in a straight forward way. It also offers the advantage that it is human readable, that it can be converted into graphical scores by using existing tools, and that it can represent all score level information required for conventional music notation and beyond.

[BibTex, PDF, Return]