Quick ms3 reference#

To run this notebook#

  • install ms3 (pip install ms3)

  • set the DATA_PATH to where you want the folder dcml_corpora to be created that contains the data

Read about Keys and IDs

DATA_PATH = '~'

Setup#

import os
import ms3
from git import Repo

corpora_path = os.path.join(os.path.expanduser(DATA_PATH), 'dcml_corpora')
if os.path.isdir(corpora_path):
    repo = Repo(corpora_path)
else:
    repo = Repo.clone_from(url='https://github.com/DCMLab/dcml_corpora.git', 
                to_path=corpora_path, 
                multi_options=['--recurse-submodules', '--shallow-submodules'])
print(f"dcml_corpora @ commit {repo.commit().hexsha}")
dcml_corpora @ commit 3612b3b5a23427c3db8b747bc108978a6a5b70cc

Parsing multiple scores at once#

The Corpus object#

Scores often come grouped into a corpus, so when we want to parse multiple scores, we create a Corpus object and pass it the directory containing the scores. ms3 will scan the directory and discover all scores and TSV files that can be potentially parsed:

tchaikovsky_path = os.path.join(corpora_path, 'tchaikovsky_seasons')
corpus = ms3.Corpus(tchaikovsky_path)
corpus
[default|all]
Corpus 'tchaikovsky_seasons'
----------------------------
Location: /home/hentsche/dcml_corpora/tchaikovsky_seasons
View: This view is called 'default'. It 
	- excludes fnames that are not contained in the metadata,
	- filters out file extensions requiring conversion (such as .xml), and
	- excludes review files and folders.

All 12 pieces are listed in 'metadata.tsv':

          scores measures    notes expanded
        detected detected detected detected
op37a01        1        1        1        1
op37a02        1        1        1        1
op37a03        1        1        1        1
op37a04        1        1        1        1
op37a05        1        1        1        1
op37a06        1        1        1        1
op37a07        1        1        1        1
op37a08        1        1        1        1
op37a09        1        1        1        1
op37a10        1        1        1        1
op37a11        1        1        1        1
op37a12        1        1        1        1
72/216 files are excluded from this view.

72 files have been excluded based on their subdir.

When inspecting this object,

corpora_path = '~/corelli'
corpora = ms3.Parse(corpora_path, level='c')
corpora
[default|all]
All corpora
-----------
View: This view is called 'default'. It 
	- excludes fnames that are not contained in the metadata,
	- filters out file extensions requiring conversion (such as .xml), and
	- excludes review files and folders.

             has   active   scores measures    notes expanded
        metadata     view detected detected detected detected
corpus                                                       
corelli      yes  default      149      149      149      149

1058/2995 files are excluded from this view.

1043 files have been excluded based on their subdir.
15 files have been excluded based on their file name.


There are 1 orphans that could not be attributed to any of the respective corpus's fnames.

From here we can use the methods

corpora.parse_scores()
corpora
[default|all]
All corpora
-----------
View: This view is called 'default'. It 
	- excludes fnames that are not contained in the metadata,
	- filters out file extensions requiring conversion (such as .xml), and
	- excludes review files and folders.

             has   active   scores        measures    notes expanded
        metadata     view detected parsed detected detected detected
corpus                                                              
corelli      yes  default      149    149      149      149      149

1058/2995 files are excluded from this view.

1043 files have been excluded based on their subdir.
15 files have been excluded based on their file name.


There are 1 orphans that could not be attributed to any of the respective corpus's fnames.

Now we can extract the facets we need from the parsed scores, e.g. information on all measures from all scores:

corpora.get_facet('measures')
mc mn quarterbeats duration_qb keysig timesig act_dur mc_offset numbering_offset dont_count barline breaks repeats next
corpus fname i
corelli op01n01a 0 1 1 0 4.0 -1 4/4 1 0 <NA> <NA> NaN NaN firstMeasure (2,)
1 2 2 4 4.0 -1 4/4 1 0 <NA> <NA> NaN NaN <NA> (3,)
2 3 3 8 4.0 -1 4/4 1 0 <NA> <NA> NaN NaN <NA> (4,)
3 4 4 12 4.0 -1 4/4 1 0 <NA> <NA> NaN NaN <NA> (5,)
4 5 5 16 4.0 -1 4/4 1 0 <NA> <NA> NaN NaN <NA> (6,)
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
op04n12c 14 15 15 84 6.0 2 12/8 3/2 0 <NA> <NA> <NA> <NA> NaN (16,)
15 16 16 90 6.0 2 12/8 3/2 0 <NA> <NA> <NA> <NA> NaN (17,)
16 17 17 96 6.0 2 12/8 3/2 0 <NA> <NA> <NA> <NA> NaN (18,)
17 18 18 102 6.0 2 12/8 3/2 0 <NA> <NA> <NA> <NA> NaN (19,)
18 19 19 108 6.0 2 12/8 3/2 0 <NA> <NA> <NA> <NA> end (9, -1)

4790 rows × 14 columns

Or we iterate through the corpora and print information on the first 10 notes:

for corpus_name, corpus_object in corpora:
    print(f"First ten measures of {corpus_name}:")
    display(corpus_object.get_facet('notes').iloc[:10])
First ten measures of corelli:
mc mn quarterbeats duration_qb mc_onset mn_onset timesig staff voice duration nominal_duration scalar tied tpc midi name octave chord_id
fname notes_i
op01n01a 0 1 1 0 1.0 0 0 4/4 3 1 1/4 1/4 1 <NA> -1 53 F3 3 8
1 1 1 0 1.0 0 0 4/4 4 1 1/4 1/4 1 <NA> -1 53 F3 3 14
2 1 1 0 1.0 0 0 4/4 2 1 1/4 1/4 1 <NA> 3 81 A5 5 4
3 1 1 0 1.0 0 0 4/4 1 1 1/4 1/4 1 <NA> 0 84 C6 6 0
4 1 1 1 1.0 1/4 1/4 4/4 3 1 1/4 1/4 1 <NA> 1 55 G3 3 9
5 1 1 1 1.0 1/4 1/4 4/4 4 1 1/4 1/4 1 <NA> 1 55 G3 3 15
6 1 1 1 1.0 1/4 1/4 4/4 2 1 1/4 1/4 1 <NA> 1 79 G5 5 5
7 1 1 1 1.0 1/4 1/4 4/4 1 1 1/4 1/4 1 <NA> -2 82 Bb5 5 1
8 1 1 2 0.5 1/2 1/2 4/4 3 1 1/8 1/8 1 <NA> 3 57 A3 3 10
9 1 1 2 1.5 1/2 1/2 4/4 4 1 3/8 1/4 3/2 <NA> 3 57 A3 3 16

The available facets are 'measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords'. We can request several at the same time:

corpora.get_facets(['labels', 'chords'])
mc mn quarterbeats duration_qb mc_onset mn_onset timesig staff voice harmony_layer ... thoroughbass_duration thoroughbass_level_1 thoroughbass_level_2 slur thoroughbass_level_3 articulation staff_text system_text placement dynamics
corpus fname facet i
corelli op01n01a labels 0 1 1 0 1.0 0 0 4/4 4 1 1 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 1 1 1 1.0 1/4 1/4 4/4 4 1 1 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 1 1 2 2.0 1/2 1/2 4/4 4 1 1 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 2 2 4 0.5 0 0 4/4 4 1 1 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 2 2 9/2 0.5 1/8 1/8 4/4 4 1 1 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
op04n12c chords 421 19 19 110 2.0 1/2 1/2 12/8 3 1 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
422 19 19 108 1.0 0 0 12/8 4 1 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
423 19 19 109 0.0 1/4 1/4 12/8 4 1 NaN ... 1/4 # NaN NaN NaN NaN NaN NaN NaN NaN
424 19 19 109 1.0 1/4 1/4 12/8 4 1 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
425 19 19 110 2.0 1/2 1/2 12/8 4 1 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

95639 rows × 31 columns