Corpus Project

This overarching project aims to examine the nature and development of tonality and tonal harmony as well as counterpoint and musical forms across a broad timespan comprising a wide variety of historical styles (between ca. 1500 and today). At present, one major drawback affecting digital musicology is the paucity of large corpora annotated with features relevant in both music theory and cognition (to some extent similar, for instance, to the Penn Treebank in linguistics). To this end, we are in the process of compiling a large corpus of music with expert annotations of tonal harmony, counterpoint, and musical form, using a newly developed annotation standards. The symbolic corpora originating from this project can be used to train, evaluate, and improve computational models. The annotation project is complemented by the analysis of Jazz and Pop recordings from the Montreux Jazz Archive. People interested in contributing annotations to this project are invited to get in touch with Markus Neuwirth (

The first available dataset is the “Annotated Beethoven Corpus” (ABC) comprising all String Quartets by Beethoven in harmonic analyses (using a novel annotation standard).

The dataset can be accessed via:

The annotation standard is described in: Neuwirth, M., Harasim, D., Moss, F.C., and Rohrmeier, M. (2018). “The Annotated Beethoven Corpus (ABC): A Dataset of Harmonic Analyses of All Beethoven String Quartets.” Frontiers in Digital Humanities 5:16.