Developers’ Reference#
The library installs a couple of commands to your system. The documentation for these commands can be found
below or by executing ms3 -h
.
When using ms3 as a module, we are dealing with four main object types:
MSCX
objects hold the information of a single parsed MuseScore file;Annotations
objects hold a set of annotation labels which can be either attached to a score (i.e., contained in its XML structure), or detached.Both types of objects are contained within a
Score
object. For example, a set ofAnnotations
read from a TSV file can be attached to the XML of anMSCX
object, which can then be output as a MuseScore file.To manipulate many
Score
objects at once, for example those of an entire corpus, we useParse
objects.
Since MSCX
and Annotations
objects are always attached to a Score
, the documentation
starts with this central class.
The Parse class#
- class ms3.parse.Parse(directory: Optional[Union[str, Collection[str]]] = None, recursive: bool = True, only_metadata_fnames: bool = True, include_convertible: bool = False, include_tsv: bool = True, exclude_review: bool = True, file_re: Optional[Union[str, Pattern]] = None, folder_re: Optional[Union[str, Pattern]] = None, exclude_re: Optional[Union[str, Pattern]] = None, file_paths: Optional[Collection[str]] = None, labels_cfg: dict = {}, ms=None, **logger_cfg)[source]#
Class for creating one or several
Corpus
objects and performing actions on all of them.- __init__(directory: Optional[Union[str, Collection[str]]] = None, recursive: bool = True, only_metadata_fnames: bool = True, include_convertible: bool = False, include_tsv: bool = True, exclude_review: bool = True, file_re: Optional[Union[str, Pattern]] = None, folder_re: Optional[Union[str, Pattern]] = None, exclude_re: Optional[Union[str, Pattern]] = None, file_paths: Optional[Collection[str]] = None, labels_cfg: dict = {}, ms=None, **logger_cfg)[source]#
Initialize a Parse object and try to create corpora if directories and/or file paths are specified.
- Parameters
directory – Path to scan for corpora.
recursive – Pass False if you don’t want to scan
directory
for subcorpora, but force making it a corpus instead.only_metadata_fnames – The default view excludes piece names that are not listed in the corpus’ metadata.tsv file (e.g. when none was found). Pass False to include all pieces regardless. This might be needed when setting
recursive
to False.include_convertible – The default view excludes scores that would need conversion to MuseScore format prior to parsing. Pass True to include convertible scores in .musicxml, .midi, .cap or any other format that MuseScore 3 can open. For on-the-fly conversion, however, the parameter
ms
needs to be set.include_tsv – The default view includes TSV files. Pass False to disregard them and parse only scores.
exclude_review – The default view excludes files and folders whose name contains ‘review’. Pass False to include these as well.
file_re – Pass a regular expression if you want to create a view filtering out all files that do not contain it.
folder_re – Pass a regular expression if you want to create a view filtering out all folders that do not contain it.
exclude_re – Pass a regular expression if you want to create a view filtering out all files or folders that contain it.
file_paths – If
directory
is specified, the file names of these paths are used to create a filtering view excluding all other files. Otherwise, all paths are expected to be part of the same parent corpus which will be inferred from the first path by looking for the first parent directory that either contains a ‘metadata.tsv’ file or is a git. This parameter is deprecated andfile_re
should be used instead.labels_cfg – Pass a configuration dict to detect only certain labels or change their output format.
ms – If you pass the path to your local MuseScore 3 installation, ms3 will attempt to parse musicXML, MuseScore 2, and other formats by temporarily converting them. If you’re using the standard path, you may try ‘auto’, or ‘win’ for Windows, ‘mac’ for MacOS, or ‘mscore’ for Linux. In case you do not pass the ‘file_re’ and the MuseScore executable is detected, all convertible files are automatically selected, otherwise only those that can be parsed without conversion.
**logger_cfg – Keyword arguments for changing the logger configuration. E.g.
level='d'
to see all debug messages.
- corpus_paths: Dict[str, str]#
{corpus_name -> path} dictionary with each corpus’s base directory. Generally speaking, each corpus path is expected to contain a
metadata.tsv
and, maybe, to be a git.
- corpus_objects: Dict[str, Corpus]#
{corpus_name -> Corpus} dictionary with one object per
corpus_path
.
- labels_cfg#
dict
Configuration dictionary to determine the output format oflabels
andexpanded
tables. The dictonary is passed toScore
upon parsing.
- property ms: str#
Path or command of the local MuseScore 3 installation if specified by the user and recognized.
- property n_detected: int#
Number of detected files aggregated from all
Corpus
objects without taking views into account. Excludes metadata files.
- property n_orphans: int#
Number of files that are always disregarded because they could not be attributed to any of the fnames.
- property n_parsed: int#
Number of parsed files aggregated from all
Corpus
objects without taking views into account. Excludes metadata files.
- property n_parsed_scores: int#
Number of parsed scores aggregated from all
Corpus
objects without taking views into account. Excludes metadata files.
- property n_parsed_tsvs: int#
Number of parsed TSV files aggregated from all
Corpus
objects without taking views into account. Excludes metadata files.
- property n_unparsed_scores: int#
Number of all detected but not yet parsed scores, aggregated from all
Corpus
objects without taking views into account. Excludes metadata files.
- property n_unparsed_tsvs: int#
Number of all detected but not yet parsed TSV files, aggregated from all
Corpus
objects without taking views into account. Excludes metadata files.
- property view: View#
Retrieve the current View object. Shorthand for
get_view()
.
- add_corpus(directory: str, corpus_name: Optional[str] = None, only_metadata_fnames: Optional[bool] = None, include_convertible: Optional[bool] = None, include_tsv: Optional[bool] = None, exclude_review: Optional[bool] = None, file_re: Optional[Union[str, Pattern]] = None, folder_re: Optional[Union[str, Pattern]] = None, exclude_re: Optional[Union[str, Pattern]] = None, paths: Optional[Collection[str]] = None, **logger_cfg) None [source]#
This method creates a
Corpus
object which scans the directorydirectory
for parseable files. It inherits allViews
from the Parse object.- Parameters
directory – Directory to scan for files.
corpus_name – By default, the folder name of
directory
is used as name for this corpus. Pass a string to use a different identifier.**logger_cfg – Keyword arguments for configuring the logger of the new Corpus object. E.g.
level='d'
to see all debug messages. Note that the logger is a child logger of this Parse object’s logger and propagates, so it might filter debug messages. You can use _.change_logger_cfg(level=’d’) to change the level post hoc.
- add_dir(directory: str, recursive: bool = True, only_metadata_fnames: Optional[bool] = None, include_convertible: Optional[bool] = None, include_tsv: Optional[bool] = None, exclude_review: Optional[bool] = None, file_re: Optional[Union[str, Pattern]] = None, folder_re: Optional[Union[str, Pattern]] = None, exclude_re: Optional[Union[str, Pattern]] = None, paths: Optional[Collection[str]] = None, **logger_cfg) None [source]#
This method decides if the directory
directory
contains several corpora or if it is a corpus itself, and callsadd_corpus()
for each corpus.- Parameters
directory – Directory to scan for corpora.
recursive – By default, if any of the first-level subdirectories contains a ‘metadata.tsv’ or is a git, all first-level subdirectories of
directory
are treated as corpora, i.e. oneCorpus
object per folder is created. Pass False to prevent this, which is equivalent to callingadd_corpus(directory)
**logger_cfg – Keyword arguments for configuring the logger of the new Corpus objects. E.g.
level='d'
to see all debug messages. Note that the loggers are child loggers of this Parse object’s logger and propagate, so it might filter debug messages. You can use _.change_logger_cfg(level=’d’) to change the level post hoc.
- add_files(file_paths: Union[str, Collection[str]], corpus_name: Optional[str] = None) None [source]#
Deprecated: To deal with particular files only, use
add_corpus()
passing the directory containing them and configure the :class`~.view.View` accordingly. This method here does it for you but easily leads to unexpected behaviour. It expects the file paths to point to files located in a shared corpus folder on some higher level or in folders for whichCorpus
objects have already been created.- Parameters
file_paths – Collection of file paths. Only existing files can be added.
corpus_name –
By default, I will try to attribute the files to existing
Corpus
objects based on their paths. This makes sense only when new files have been created after the directories were scanned.For paths that do no not contain an existing corpus_path, I will try to detect the parent directory that is a corpus (based on it being a git or containing a
metadata.tsv
). If this is without success for the first path, I will raise an error. Otherwise, all subsequent paths will be considered to be part of that same corpus (watch out meaningless relative paths!).You can pass a folder name contained in the first path to create a new corpus, assuming that all other paths are contained in it (watch out meaningless relative paths!).
Pass an existing corpus_name to add the files to a particular corpus. Note that all parseable files under the corpus_path are detected anyway, and if you add files from other directories, it will lead to invalid relative paths that work only on your system. If you’re adding files that have been created after the Corpus object has, you can leave this parameter empty; paths will be attributed to the existing corpora automatically.
- change_labels_cfg(labels_cfg={}, staff=None, voice=None, harmony_layer=None, positioning=None, decode=None, column_name=None, color_format=None)[source]#
Update
Parse.labels_cfg
and retrieve new ‘labels’ tables accordingly.- Parameters
labels_cfg (
dict
) – Using an entire dictionary or, to change only particular options, choose from:staff – Arguments as they will be passed to
get_labels()
voice – Arguments as they will be passed to
get_labels()
harmony_layer – Arguments as they will be passed to
get_labels()
positioning – Arguments as they will be passed to
get_labels()
decode – Arguments as they will be passed to
get_labels()
column_name – Arguments as they will be passed to
get_labels()
- compare_labels(key: str = 'detached', new_color: str = 'ms3_darkgreen', old_color: str = 'ms3_darkred', detached_is_newer: bool = False, add_to_rna: bool = True, view_name: Optional[str] = None) Tuple[int, int] [source]#
Compare detached labels
key
to the ones attached to the Score to create a diff. By default, the attached labels are considered as the reviewed version and labels that have changed or been added in comparison to the detached labels are colored in green; whereas the previous versions of changed labels are attached to the Score in red, just like any deleted label.- Parameters
key – Key of the detached labels you want to compare to the ones in the score.
new_color – The colors by which new and old labels are differentiated. Identical labels remain unchanged. Colors can be CSS colors or MuseScore colors (see
utils.MS3_COLORS
).old_color – The colors by which new and old labels are differentiated. Identical labels remain unchanged. Colors can be CSS colors or MuseScore colors (see
utils.MS3_COLORS
).detached_is_newer – Pass True if the detached labels are to be added with
new_color
whereas the attached changed labels will turnold_color
, as opposed to the default.add_to_rna – By default, new labels are attached to the Roman Numeral layer. Pass False to attach them to the chord layer instead.
- Returns
- Number of scores in which labels have changed.
Number of scores in which no label has chnged.
- count_extensions(view_name: Optional[str] = None, per_piece: bool = False, include_metadata: bool = False)[source]#
Count file extensions.
- Parameters
keys (
str
orCollection
, optional) – Key(s) for which to count file extensions. By default, all keys are selected.ids (
Collection
) – If you pass a collection of IDs,keys
is ignored and only the selected extensions are counted.per_key (
bool
, optional) – If set to True, the results are returned as a dict {key: Counter}, otherwise the counts are summed up in one Counter.per_subdir (
bool
, optional) – If set to True, the results are returned as {key: {subdir: Counter} }.per_key=True
is therefore implied.
- Returns
By default, the function returns a Counter of file extensions (Counters are converted to dicts). If
per_key
is set to True, a dictionary {key: Counter} is returned, separating the counts. Ifper_subdir
is set to True, a dictionary {key: {subdir: Counter} } is returned.- Return type
- disambiguate_facet(facet: Literal['scores', 'measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords', 'unknown'], view_name: Optional[str] = None, ask_for_input=True) None [source]#
Calls the method on every selected corpus.
- get_dataframes(notes: bool = False, rests: bool = False, notes_and_rests: bool = False, measures: bool = False, events: bool = False, labels: bool = False, chords: bool = False, expanded: bool = False, form_labels: bool = False, cadences: bool = False, view_name: Optional[str] = None, force: bool = False, choose: Literal['all', 'auto', 'ask'] = 'all', unfold: bool = False, interval_index: bool = False, flat=False, include_empty: bool = False) Union[DataFrame, Dict[Tuple[str, str], Union[Dict[str, List[Tuple[File, DataFrame]]], List[Tuple[File, DataFrame]]]]] [source]#
Renamed to
get_facets()
.
- get_facet(facet: Literal['measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords'], view_name: Optional[str] = None, choose: Literal['auto', 'ask'] = 'auto', unfold: bool = False, interval_index: bool = False, concatenate: bool = True) Union[Dict[str, Tuple[File, DataFrame]], DataFrame] [source]#
Retrieves exactly one DataFrame per piece, if available.
- get_view(view_name: Optional[str] = None, **config) View [source]#
Retrieve an existing or create a new View object, potentially while updating the config.
- insert_detached_labels(view_name: Optional[str] = None, key: str = 'detached', staff: Optional[int] = None, voice: Optional[Literal[1, 2, 3, 4]] = None, harmony_layer: Optional[Literal[0, 1, 2]] = None, check_for_clashes: bool = True)[source]#
Attach all
Annotations
objects that are reachable viaScore.key
to their respectiveScore
, altering the XML in memory. Callingstore_scores()
will output MuseScore files where the annotations show in the score.- Parameters
key – Key under which the
Annotations
objects to be attached are stored in theScore
objects. Defaults to ‘detached’.staff (
int
, optional) – If you pass a staff ID, the labels will be attached to that staff where 1 is the upper stuff. By default, the staves indicated in the ‘staff’ column ofms3.annotations.Annotations.df
will be used.voice ({1, 2, 3, 4}, optional) – If you pass the ID of a notational layer (where 1 is the upper voice, blue in MuseScore), the labels will be attached to that one. By default, the notational layers indicated in the ‘voice’ column of
ms3.annotations.Annotations.df
will be used.harmony_layer (
int
, optional) –By default, the labels are written to the layer specified as an integer in the columnharmony_layer
.Pass an integer to select a particular layer:* 0 to attach them as absolute (‘guitar’) chords, meaning that when opened next time,MuseScore will split and encode those beginning with a note name ( resulting in ms3-internal harmony_layer 3).* 1 the labels are written into the staff’s layer for Roman Numeral Analysis.* 2 to have MuseScore interpret them as Nashville Numberscheck_for_clashes (
bool
, optional) – By default, warnings are thrown when there already exists a label at a position (and in a notational layer) where a new one is attached. Pass False to deactivate these warnings.
- iter_corpora(view_name: Optional[str] = None) Generator[Tuple[str, Corpus], None, None] [source]#
Iterate through corpora under the current or specified view.
- load_ignored_warnings(path: str) None [source]#
Adds a filters to all loggers included in a IGNORED_WARNINGS file.
- Parameters
path – Path of the IGNORED_WARNINGS file.
- set_view(active: Optional[View] = None, **views: View)[source]#
Register one or several view_name=View pairs.
- update_metadata_tsv_from_parsed_scores(root_dir: Optional[str] = None, suffix: str = '', markdown_file: Optional[str] = 'README.md', view_name: Optional[str] = None) List[str] [source]#
Gathers the metadata from parsed and currently selected scores and updates ‘metadata.tsv’ with the information.
- Parameters
root_dir – In case you want to output the metadata to folder different from
corpus_path
.suffix – Added to the filename: ‘metadata{suffix}.tsv’. Defaults to ‘’. Metadata files with suffix may be used to store views with particular subselections of pieces.
markdown_file – By default, a subset of metadata columns will be written to ‘README.md’ in the same folder as the TSV file. If the file exists, it will be scanned for a line containing the string ‘# Overview’ and overwritten from that line onwards.
view_name – The view under which you want to update metadata from the selected parsed files. Defaults to None, i.e. the active view.
- Returns
The file paths to which metadata was written.
- update_score_metadata_from_tsv(view_name: Optional[str] = None, force: bool = False, choose: Literal['all', 'auto', 'ask'] = 'all', write_empty_values: bool = False, remove_unused_fields: bool = False, write_text_fields: bool = False) List[File] [source]#
Update metadata fields of parsed scores with the values from the corresponding row in metadata.tsv.
- Parameters
view_name –
force –
choose –
write_empty_values – If set to True, existing values are overwritten even if the new value is empty, in which case the field will be set to ‘’.
remove_unused_fields – If set to True, all non-default fields that are not among the columns of metadata.tsv (anymore) are removed.
write_text_fields – If set to True, ms3 will write updated values from the columns
title_text
,subtitle_text
,composer_text
,lyricist_text
, andpart_name_text
into the score headers.
- Returns
List of File objects of those scores of which the XML structure has been modified.
- update_scores(root_dir: Optional[str] = None, folder: str = '.', suffix: str = '', overwrite: bool = False) List[str] [source]#
Update scores created with an older MuseScore version to the latest MuseScore 3 version.
- Parameters
root_dir – In case you want to create output paths for the updated MuseScore files based on a folder different from
corpus_path
.folder –
The default ‘.’ has the updated scores written to the same directory as the old ones, effectively overwriting them if
root_dir
is None.If
folder
is None, the files will be written to{root_dir}/scores/
.If
folder
is an absolute path,root_dir
will be ignored.If
folder
is a relative path starting with a dot.
the relative path is appended to the file’s subdir. For example,..\scores
will resolve to a sibling directory of the one where thefile
is located.If
folder
is a relative path that does not begin with a dot.
, it will be appended to theroot_dir
.
suffix – String to append to the file names of the updated files, e.g. ‘_updated’.
overwrite – By default, existing files are not overwritten. Pass True to allow this.
- Returns
A list of all up-to-date paths, whether they had to be converted or were already in the latest version.
- update_tsvs_on_disk(facets: Union[Literal['measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords'], Collection[Literal['measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords']]] = 'tsv', view_name: Optional[str] = None, force: bool = False, choose: Literal['auto', 'ask'] = 'auto') List[str] [source]#
Update existing TSV files corresponding to one or several facets with information freshly extracted from a parsed score, but only if the contents are identical. Otherwise, the existing TSV file is not overwritten and the differences are displayed in a log warning. The purpose is to safely update the format of existing TSV files, (for instance with respect to column order) making sure that the content doesn’t change.
- Parameters
facets –
view_name –
force – By default, only TSV files that have already been parsed are updated. Set to True in order to force-parse for each facet one of the TSV files included in the given view, if necessary.
choose –
- Returns
List of paths that have been overwritten.
- metadata_tsv(view_name: Optional[str] = None) DataFrame [source]#
Concatenates the ‘metadata.tsv’ (as they come) files for all corpora with a [corpus, fname] MultiIndex. If you need metadata that filters out fnames according to the current view, use
metadata()
.
- store_extracted_facets(view_name: Optional[str] = None, root_dir: Optional[str] = None, measures_folder: Optional[str] = None, measures_suffix: str = '', notes_folder: Optional[str] = None, notes_suffix: str = '', rests_folder: Optional[str] = None, rests_suffix: str = '', notes_and_rests_folder: Optional[str] = None, notes_and_rests_suffix: str = '', labels_folder: Optional[str] = None, labels_suffix: str = '', expanded_folder: Optional[str] = None, expanded_suffix: str = '', form_labels_folder: Optional[str] = None, form_labels_suffix: str = '', cadences_folder: Optional[str] = None, cadences_suffix: str = '', events_folder: Optional[str] = None, events_suffix: str = '', chords_folder: Optional[str] = None, chords_suffix: str = '', metadata_suffix: Optional[str] = None, markdown: bool = True, simulate: bool = False, unfold: bool = False, interval_index: bool = False, silence_label_warnings: bool = False)[source]#
Store facets extracted from parsed scores as TSV files.
- Parameters
view_name –
root_dir – (‘measures’, ‘notes’, ‘rests’, ‘notes_and_rests’, ‘labels’, ‘expanded’, ‘form_labels’, ‘cadences’, ‘events’, ‘chords’)
measures_folder – Specify directory where to store the corresponding TSV files.
notes_folder – Specify directory where to store the corresponding TSV files.
rests_folder – Specify directory where to store the corresponding TSV files.
notes_and_rests_folder – Specify directory where to store the corresponding TSV files.
labels_folder – Specify directory where to store the corresponding TSV files.
expanded_folder – Specify directory where to store the corresponding TSV files.
form_labels_folder – Specify directory where to store the corresponding TSV files.
cadences_folder – Specify directory where to store the corresponding TSV files.
events_folder – Specify directory where to store the corresponding TSV files.
chords_folder – Specify directory where to store the corresponding TSV files.
measures_suffix – Optionally specify suffixes appended to the TSVs’ file names. If
unfold=True
the suffixes default to_unfolded
.notes_suffix – Optionally specify suffixes appended to the TSVs’ file names. If
unfold=True
the suffixes default to_unfolded
.rests_suffix – Optionally specify suffixes appended to the TSVs’ file names. If
unfold=True
the suffixes default to_unfolded
.notes_and_rests_suffix – Optionally specify suffixes appended to the TSVs’ file names. If
unfold=True
the suffixes default to_unfolded
.labels_suffix – Optionally specify suffixes appended to the TSVs’ file names. If
unfold=True
the suffixes default to_unfolded
.expanded_suffix – Optionally specify suffixes appended to the TSVs’ file names. If
unfold=True
the suffixes default to_unfolded
.form_labels_suffix – Optionally specify suffixes appended to the TSVs’ file names. If
unfold=True
the suffixes default to_unfolded
.cadences_suffix – Optionally specify suffixes appended to the TSVs’ file names. If
unfold=True
the suffixes default to_unfolded
.events_suffix – Optionally specify suffixes appended to the TSVs’ file names. If
unfold=True
the suffixes default to_unfolded
.chords_suffix – Optionally specify suffixes appended to the TSVs’ file names. If
unfold=True
the suffixes default to_unfolded
.metadata_suffix – Specify a suffix to update the ‘metadata{suffix}.tsv’ file for each corpus. For the main file, pass ‘’
markdown – By default, when
metadata_path
is specified, a markdown file calledREADME.md
containing the columns [file_name, measures, labels, standard, annotators, reviewers] is created. If it exists already, this table will be appended or overwritten after the heading# Overview
.simulate –
unfold – By default, repetitions are not unfolded. Pass True to duplicate values so that they correspond to a full playthrough, including correct positioning of first and second endings.
interval_index –
silence_label_warnings –
Returns:
- store_parsed_scores(view_name: Optional[str] = None, only_changed: bool = True, root_dir: Optional[str] = None, folder: str = '.', suffix: str = '', overwrite: bool = False, simulate=False) Dict[str, List[str]] [source]#
Stores all parsed scores under this view as MuseScore 3 files.
- Args:
view_name: Name of another view if another than the current one is to be used. only_changed:
By default, only scores that have been modified since parsing are written. Set to False to store all scores regardless.
root_dir: Directory where to re-build the sub-directory tree of the
Corpus
in question. folder:Different behaviours are available. Note that only the third option ensures that file paths are distinct for files that have identical fnames but are located in different subdirectories of the same corpus.
If
folder
is None (default), the files’ type will be appended to theroot_dir
.If
folder
is an absolute path,root_dir
will be ignored.If
folder
is a relative path that does not begin with a dot.
, it will be appended to theroot_dir
.If
folder
is a relative path starting with a dot.
the relative path is appended to the file’s subdir. For example, ``..
otes`` will resolve to a sibling directory of the one where the
file
is located.suffix: Suffix to append to the original file name. overwrite: Pass True to overwrite existing files. simulate: Set to True if no files are to be written.
- Returns:
Paths of the stored files.
- parse(view_name=None, level=None, parallel=True, only_new=True, labels_cfg={}, cols={}, infer_types=None, **kwargs)[source]#
Shorthand for executing parse_scores and parse_tsv at a time. :param view_name:
- parse_scores(level: Optional[str] = None, parallel: bool = True, only_new: bool = True, labels_cfg: dict = {}, view_name: Optional[str] = None, choose: Literal['all', 'auto', 'ask'] = 'all')[source]#
Parse MuseScore 3 files (MSCX or MSCZ) and store the resulting read-only Score objects. If they need to be writeable, e.g. for removing or adding labels, pass
parallel=False
which takes longer but prevents having to re-parse at a later point.- Parameters
keys (
str
orCollection
, optional) – For which key(s) to parse all MSCX files.ids (
Collection
) – To parse only particular files, pass their IDs.keys
andfexts
are ignored in this case.level ({'W', 'D', 'I', 'E', 'C', 'WARNING', 'DEBUG', 'INFO', 'ERROR', 'CRITICAL'}, optional) – Pass a level name for which (and above which) you want to see log records.
parallel (
bool
, optional) – Defaults to True, meaning that all CPU cores are used simultaneously to speed up the parsing. It implies that the resulting Score objects are in read-only mode and that you might not be able to use the computer during parsing. Pass False to parse one score after the other, which uses more memory but will allow making changes to the scores.only_new (
bool
, optional) – By default, score which already have been parsed, are not parsed again. Pass False to parse them, too.
- Return type
None
- parse_tsv(view_name=None, level=None, cols={}, infer_types=None, only_new=True, choose: Literal['all', 'auto', 'ask'] = 'all', **kwargs)[source]#
Parse TSV files (or other value-separated files such as CSV) to be able to do something with them.
- Parameters
keys (
str
orCollection
, optional) – Key(s) for which to parse all non-MSCX files. By default, all keys are selected.ids (
Collection
) – To parse only particular files, pass there IDs.keys
andfexts
are ignored in this case.fexts (
str
orCollection
, optional) – If you want to parse only files with one or several particular file extension(s), pass the extension(s)cols (
dict
, optional) – By default, if a column called'label'
is found, the TSV is treated as an annotation table and turned into an Annotations object. Pass one or several column name(s) to treat them as label columns instead. If you pass{}
or no label column is found, the TSV is parsed as a “normal” table, i.e. a DataFrame.infer_types (
dict
, optional) – To recognize one or several custom label type(s), pass{name: regEx}
.level ({'W', 'D', 'I', 'E', 'C', 'WARNING', 'DEBUG', 'INFO', 'ERROR', 'CRITICAL'}, optional) – Pass a level name for which (and above which) you want to see log records.
**kwargs – Arguments for
pandas.DataFrame.to_csv()
. Defaults to{'sep': ' ', 'index': False}
. In particular, you might want to update the default dictionaries fordtypes
andconverters
used inload_tsv()
.
- Returns
None
Args – only_new: view_name:
- __iter__() Iterator[Tuple[str, Corpus]] [source]#
Iterate through all (corpus_name, Corpus) tuples, regardless of any Views.
Yields: (corpus_name, Corpus) tuples
- property parsed_mscx: DataFrame#
Deprecated property. Replaced by
n_parsed_scores
- property parsed_tsv: DataFrame#
Deprecated property. Replaced by
n_parsed_tsvs
- add_detached_annotations(*args, **kwargs)[source]#
Deprecated method. Replaced by
insert_detached_labels()
.
- iter(*args, **kwargs)[source]#
Deprecated method. Replaced by
ms3.corpus.Corpus.iter_facets()
.
- parse_mscx(*args, **kwargs)[source]#
Deprecated method. Replaced by
parse_scores()
.
- store_scores(*args, **kwargs)[source]#
Deprecated method. Replaced by
store_parsed_scores()
.
- update_metadata(*args, **kwargs)[source]#
Deprecated method. Replaced by
update_score_metadata_from_tsv()
.
The Corpus class#
- class ms3.corpus.Corpus(directory: str, view: Optional[View] = None, only_metadata_fnames: bool = True, include_convertible: bool = False, include_tsv: bool = True, exclude_review: bool = True, file_re: Optional[Union[str, Pattern]] = None, folder_re: Optional[Union[str, Pattern]] = None, exclude_re: Optional[Union[str, Pattern]] = None, paths: Optional[Collection[str]] = None, labels_cfg={}, ms=None, **logger_cfg)[source]#
Collection of scores and TSV files that can be matched to each other based on their file names.
- name#
Folder name of the corpus.
- files: list#
[File]
list ofFile
data objects containing information on the file location etc. for all detected files.
- labels_cfg#
dict
Configuration dictionary to determine the output format oflabels
andexpanded
tables. The dictonary is passed toScore
upon parsing.
- metadata_tsv: pd.DataFrame#
The parsed ‘metadata.tsv’ file for the corpus.
- ix2fname: Dict[int, str]#
{ix -> fname} dict for associating files with the piece they have been matched to. None for indices that could not be matched, e.g. metadata.
- property fnames: List[str]#
All fnames including those of scores that are not listed in metadata.tsv
- add_dir(directory: str, filter_other_fnames: bool = False, file_re: str = '.*', folder_re: str = '.*', exclude_re: str = '^(\\.|_)') List[File] [source]#
Add additional files pertaining to the already existing fnames of the corpus.
If you want to use a directory with other pieces, create another
Corpus
object or combine several corpora in aParse
object.- Parameters
directory – Directory to scan for parseable (score or TSV) files. Only those that begin with one of the corpus’s fnames will be matched and registered, the others will be kept under
ix2orphan_file
.filter_other_fnames – Set to True if you want to filter out all fnames that were not matched up with one of the added files. This can be useful if you’re loading TSV files with labels and want to parse only the scores for which you have added labels.
file_re – Regular expressions for filtering certain file names or folder names. The regEx are checked with search(), not match(), allowing for fuzzy search.
folder_re – Regular expressions for filtering certain file names or folder names. The regEx are checked with search(), not match(), allowing for fuzzy search.
exclude_re – Exclude files and folders containing this regular expression.
- Returns
List of
File
objects pertaining to the matched, newly added paths.
- add_file_paths(paths: Collection[str]) List[File] [source]#
Iterates through the given paths, converts those that correspond to parseable files to
File
objects (trying to infer their type from the path), and appends those tofiles
.- Parameters
paths – File paths that are to be registered with this Corpus object.
- Returns
A list of
File
objects corresponding to parseable files (based on their extensions).
- collect_fnames_from_scores() None [source]#
Construct sorted list of fnames from all detected scores.
- create_metadata_tsv(suffix='', view_name: Optional[str] = None, overwrite: bool = False, force: bool = True) Optional[str] [source]#
Creates a ‘metadata.tsv’ file for the current view.
- create_pieces(fnames: Optional[Union[Collection[str], str]] = None) None [source]#
Creates and stores one
Piece
object per fname.
- detect_parseable_files() None [source]#
Walks through the corpus_path and collects information on all parseable files.
- disambiguate_facet(facet: Literal['scores', 'measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords', 'unknown'], view_name: Optional[str] = None, ask_for_input=True) None [source]#
Make sure that, for a given facet, the current view includes only one or zero files. If at least one piece has more than one file, the user will be asked which ones to use. The others will be excluded from the view.
- Parameters
facet – Which facet to disambiguate.
ask_for_input – By default, if there is anything to disambiguate, the user is asked to select a group of files. Pass False to see only the questions and choices without actually disambiguating.
- extract_facets(facets: Optional[Union[Literal['measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords'], Collection[Literal['measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords']]]] = None, view_name: Optional[str] = None, force: bool = False, choose: Literal['auto', 'ask'] = 'auto', unfold: bool = False, interval_index: bool = False, flat=False) Dict[str, Union[Dict[str, List[Tuple[File, DataFrame]]], List[Tuple[File, DataFrame]]]] [source]#
Retrieve a dictionary with the selected feature matrices extracted from the parsed scores. If you want to retrieve parsed TSV files, use
get_all_parsed()
.
- find_and_load_metadata() None [source]#
Checks if a ‘metadata.tsv’ is present at the default path and parses it.
- fnames_in_metadata(metadata_ix: Optional[int] = None) List[str] [source]#
fnames (file names without extension and suffix) serve as IDs for pieces. Retrieve those that are listed in the ‘metadata.tsv’ file for this corpus. The argument is simply self.metadata_ix and serves caching of the results for multiple metadata.tsv files.
- fnames_not_in_metadata() List[str] [source]#
fnames (file names without extension and suffix) serve as IDs for pieces. Retrieve those that are not listed in the ‘metadata.tsv’ file for this corpus.
- get_dataframes(notes: bool = False, rests: bool = False, notes_and_rests: bool = False, measures: bool = False, events: bool = False, labels: bool = False, chords: bool = False, expanded: bool = False, form_labels: bool = False, cadences: bool = False, view_name: Optional[str] = None, force: bool = False, choose: Literal['all', 'auto', 'ask'] = 'all', unfold: bool = False, interval_index: bool = False, flat=False, include_empty: bool = False) Dict[str, Union[Dict[str, Tuple[File, DataFrame]], List[Tuple[File, DataFrame]]]] [source]#
Renamed to
get_facets()
.
- get_facet(facet: Literal['measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords'], view_name: Optional[str] = None, choose: Literal['auto', 'ask'] = 'auto', unfold: bool = False, interval_index: bool = False, concatenate: bool = True) Union[Dict[str, Tuple[File, DataFrame]], DataFrame] [source]#
Retrieves exactly one DataFrame per piece, if available.
- get_facets(facets: Optional[Union[Literal['measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords'], Collection[Literal['measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords']]]] = None, view_name: Optional[str] = None, force: bool = False, choose: Literal['all', 'auto', 'ask'] = 'all', unfold: bool = False, interval_index: bool = False, flat=False, include_empty: bool = False) Dict[str, Union[Dict[str, Tuple[File, DataFrame]], List[Tuple[File, DataFrame]]]] [source]#
- Parameters
facets –
view_name –
force – Only relevant when
choose='all'
. By default, only scores and TSV files that have already been parsed are taken into account. Setforce=True
to force-parse all scores and TSV files selected under the given view.choose –
unfold –
interval_index –
flat –
include_empty –
Returns:
- get_all_fnames(fnames_in_metadata: bool = True, fnames_not_in_metadata: bool = True) List[str] [source]#
fnames (file names without extension and suffix) serve as IDs for pieces. Use this function to retrieve the comprehensive list, ignoring views.
- Parameters
fnames_in_metadata – fnames that are listed in the ‘metadata.tsv’ file for this corpus, if present
fnames_not_in_metadata – fnames that are not listed in the ‘metadata.tsv’ file for this corpus
- Returns
The file names included in ‘metadata.tsv’ and/or those of all other scores.
- get_fnames(view_name: Optional[str] = None) List[str] [source]#
Retrieve fnames included in the current or selected view.
- get_view(view_name: Optional[str] = None, **config) View [source]#
Retrieve an existing or create a new View object, potentially while updating the config.
- iter_facets(facets: Optional[Union[Literal['measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords'], Collection[Literal['measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords']]]] = None, view_name: Optional[str] = None, choose: Literal['auto', 'ask'] = 'auto', unfold: bool = False, interval_index: bool = False, include_files: bool = False) Iterator [source]#
Iterate through (fname, *DataFrame) tuples containing exactly one or zero DataFrames per requested facet.
- Parameters
facets –
view_name –
choose –
unfold –
interval_index –
include_files –
- Returns
(fname, *DataFrame) tuples containing exactly one or zero DataFrames per requested facet per piece (fname).
- iter_pieces(view_name: Optional[str] = None) Iterator[Tuple[str, Piece]] [source]#
Iterate through (name, corpus) tuples under the current or specified view.
- load_facet_into_scores(facet: Literal['expanded', 'labels'], view_name: Optional[str] = None, force: bool = False, choose: Literal['auto', 'ask'] = 'auto', git_revision: Optional[str] = None, key: str = 'detached', infer: bool = True, **cols) int [source]#
Loads annotations from maximum one TSV file to maximum one score per piece. Each score will contain the annotations as a ‘detached’ annotation object accessible via the indicated
key
(defaults to ‘detached’).
- look_for_ignored_warnings(directory: Optional[str] = None)[source]#
Looks for a text file called IGNORED_WARNINGS and, if it exists, loads it, configuring loggers as indicated.
- load_ignored_warnings(path: str) Tuple[List[Logger], List[str]] [source]#
Loads in a text file containing warnings that are to be ignored, i.e., wrapped in DEBUG messages. The purpose is to mark certain warnings as OK, warranted by a human, to allow checks to pass regardless.
- load_metadata_file(file: File, allow_prefixed: bool = False) None [source]#
Loads the TSV file at the given path and stores it as metadata. If the file is called ‘metadata.tsv’ it will be treated as the corpus’ main file for determining fnames. Otherwise it is expected to be named ‘metadata{suffix}.tsv’ and the suffix will be used as name for an additionally created view.
- parse(view_name=None, level=None, parallel=True, only_new=True, labels_cfg={}, cols={}, infer_types=None, **kwargs)[source]#
Shorthand for executing parse_scores and parse_tsv at a time. :param view_name:
- parse_mscx(*args, **kwargs)[source]#
Renamed to
parse_scores()
.
- parse_scores(level: Optional[str] = None, parallel: bool = True, only_new: bool = True, labels_cfg: dict = {}, view_name: Optional[str] = None, choose: Literal['all', 'auto', 'ask'] = 'all')[source]#
Parse MuseScore 3 files (MSCX or MSCZ) and store the resulting read-only Score objects. If they need to be writeable, e.g. for removing or adding labels, pass
parallel=False
which takes longer but prevents having to re-parse at a later point.- Parameters
level ({'W', 'D', 'I', 'E', 'C', 'WARNING', 'DEBUG', 'INFO', 'ERROR', 'CRITICAL'}, optional) – Pass a level name for which (and above which) you want to see log records.
parallel (
bool
, optional) – Defaults to True, meaning that all CPU cores are used simultaneously to speed up the parsing. It implies that the resulting Score objects are in read-only mode and that you might not be able to use the computer during parsing. Set to False to parse one score after the other, which uses more memory but will allow making changes to the scores.only_new (
bool
, optional) – By default, score which already have been parsed, are not parsed again. Pass False to parse them, too.
- Return type
None
- parse_tsv(view_name: Optional[str] = None, cols={}, infer_types=None, level=None, only_new: bool = True, choose: Literal['all', 'auto', 'ask'] = 'all', **kwargs)[source]#
Parse TSV files to be able to do something with them.
- Parameters
keys (
str
orCollection
, optional) – Key(s) for which to parse all non-MSCX files. By default, all keys are selected.ids (
Collection
) – To parse only particular files, pass there IDs.keys
andfexts
are ignored in this case.fexts (
str
orCollection
, optional) – If you want to parse only files with one or several particular file extension(s), pass the extension(s)cols (
dict
, optional) – By default, if a column called'label'
is found, the TSV is treated as an annotation table and turned into an Annotations object. Pass one or several column name(s) to treat them as label columns instead. If you pass{}
or no label column is found, the TSV is parsed as a “normal” table, i.e. a DataFrame.infer_types (
dict
, optional) – To recognize one or several custom label type(s), pass{name: regEx}
.level ({'W', 'D', 'I', 'E', 'C', 'WARNING', 'DEBUG', 'INFO', 'ERROR', 'CRITICAL'}, optional) – Pass a level name for which (and above which) you want to see log records.
**kwargs – Arguments for
pandas.DataFrame.to_csv()
. Defaults to{'sep': ' ', 'index': False}
. In particular, you might want to update the default dictionaries fordtypes
andconverters
used inload_tsv()
. Passing kwargs prevents ms3 from parsing TSVs in parallel, so it will be a bit slower.
- Return type
None
- register_files_with_pieces(files: Optional[List[File]] = None, fnames: Optional[Union[str, Collection[str]]] = None) None [source]#
Iterates through the
files
and tries to match it with thefnames
and registered matchedFile
objects with the correspondingPiece
objects (unless already registered).By default, the method uses this object’s
files
andfnames
. To match with a Piece, the file name (without extension) needs to start with the Piece’sfname
; otherwise, it will be stored underix2orphan_file
.- Parameters
files –
File
objects to register with the correspondingPiece
objects based on their file names.fnames – Fnames of the pieces that the files are to be matched to. Those that don’t match any will be stored under
ix2orphan_file
.
- metadata(view_name: Optional[str] = None, choose: Optional[Literal['auto', 'ask']] = None) DataFrame [source]#
Returns metadata.tsv but only for fnames included in the current or indicated view. If no TSV file is present, get metadata from the current scores.
- set_view(active: Optional[View] = None, **views: View)[source]#
Register one or several view_name=View pairs.
- update_scores(root_dir: Optional[str] = None, folder: Optional[str] = '.', suffix: str = '', overwrite: bool = False) List[str] [source]#
Update scores created with an older MuseScore version to the latest MuseScore 3 version.
- Parameters
root_dir – In case you want to create output paths for the updated MuseScore files based on a folder different from
corpus_path
.folder –
The default ‘.’ has the updated scores written to the same directory as the old ones, effectively overwriting them if
root_dir
is None.If
folder
is None, the files will be written to{root_dir}/scores/
.If
folder
is an absolute path,root_dir
will be ignored.If
folder
is a relative path starting with a dot.
the relative path is appended to the file’s subdir. For example,..\scores
will resolve to a sibling directory of the one where thefile
is located.If
folder
is a relative path that does not begin with a dot.
, it will be appended to theroot_dir
.
suffix – String to append to the file names of the updated files, e.g. ‘_updated’.
overwrite – By default, existing files are not overwritten. Pass True to allow this.
- Returns
A list of all up-to-date paths, whether they had to be converted or were already in the latest version.
- update_tsvs_on_disk(facets: Union[Literal['measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords'], Collection[Literal['measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords']]] = 'tsv', view_name: Optional[str] = None, force: bool = False, choose: Literal['auto', 'ask'] = 'auto') List[str] [source]#
Update existing TSV files corresponding to one or several facets with information freshly extracted from a parsed score, but only if the contents are identical. Otherwise, the existing TSV file is not overwritten and the differences are displayed in a log warning. The purpose is to safely update the format of existing TSV files, (for instance with respect to column order) making sure that the content doesn’t change.
- Parameters
facets –
view_name –
force – By default, only TSV files that have already been parsed are updated. Set to True in order to force-parse for each facet one of the TSV files included in the given view, if necessary.
choose –
- Returns
List of paths that have been overwritten.
- insert_detached_labels(view_name: Optional[str] = None, key: str = 'detached', staff: Optional[int] = None, voice: Optional[Literal[1, 2, 3, 4]] = None, harmony_layer: Optional[Literal[0, 1, 2]] = None, check_for_clashes: bool = True) Tuple[int, int] [source]#
Attach all
Annotations
objects that are reachable viaScore.key
to their respectiveScore
, altering the XML in memory. Callingstore_scores()
will output MuseScore files where the annotations show in the score.- Parameters
key – Key under which the
Annotations
objects to be attached are stored in theScore
objects. Defaults to ‘detached’.staff (
int
, optional) – If you pass a staff ID, the labels will be attached to that staff where 1 is the upper stuff. By default, the staves indicated in the ‘staff’ column ofms3.annotations.Annotations.df
will be used.voice ({1, 2, 3, 4}, optional) – If you pass the ID of a notational layer (where 1 is the upper voice, blue in MuseScore), the labels will be attached to that one. By default, the notational layers indicated in the ‘voice’ column of
ms3.annotations.Annotations.df
will be used.harmony_layer (
int
, optional) –By default, the labels are written to the layer specified as an integer in the columnharmony_layer
.Pass an integer to select a particular layer:* 0 to attach them as absolute (‘guitar’) chords, meaning that when opened next time,MuseScore will split and encode those beginning with a note name ( resulting in ms3-internal harmony_layer 3).* 1 the labels are written into the staff’s layer for Roman Numeral Analysis.* 2 to have MuseScore interpret them as Nashville Numberscheck_for_clashes (
bool
, optional) – By default, warnings are thrown when there already exists a label at a position (and in a notational layer) where a new one is attached. Pass False to deactivate these warnings.
- change_labels_cfg(labels_cfg={}, staff=None, voice=None, harmony_layer=None, positioning=None, decode=None, column_name=None, color_format=None)[source]#
Update
Corpus.labels_cfg
and retrieve new ‘labels’ tables accordingly.- Parameters
labels_cfg (
dict
) – Using an entire dictionary or, to change only particular options, choose from:staff – Arguments as they will be passed to
get_labels()
voice – Arguments as they will be passed to
get_labels()
harmony_layer – Arguments as they will be passed to
get_labels()
positioning – Arguments as they will be passed to
get_labels()
decode – Arguments as they will be passed to
get_labels()
column_name – Arguments as they will be passed to
get_labels()
- compare_labels(key: str = 'detached', new_color: str = 'ms3_darkgreen', old_color: str = 'ms3_darkred', detached_is_newer: bool = False, add_to_rna: bool = True, view_name: Optional[str] = None) Tuple[int, int] [source]#
Compare detached labels
key
to the ones attached to the Score to create a diff. By default, the attached labels are considered as the reviewed version and labels that have changed or been added in comparison to the detached labels are colored in green; whereas the previous versions of changed labels are attached to the Score in red, just like any deleted label.- Parameters
key – Key of the detached labels you want to compare to the ones in the score.
new_color – The colors by which new and old labels are differentiated. Identical labels remain unchanged. Colors can be CSS colors or MuseScore colors (see
utils.MS3_COLORS
).old_color – The colors by which new and old labels are differentiated. Identical labels remain unchanged. Colors can be CSS colors or MuseScore colors (see
utils.MS3_COLORS
).detached_is_newer – Pass True if the detached labels are to be added with
new_color
whereas the attached changed labels will turnold_color
, as opposed to the default.add_to_rna – By default, new labels are attached to the Roman Numeral layer. Pass False to attach them to the chord layer instead.
- Returns
Number of scores in which labels have changed. Number of scores in which no label has chnged.
- count_annotation_layers(keys=None, which='attached', per_key=False)[source]#
Counts the labels for each annotation layer defined as (staff, voice, harmony_layer). By default, only labels attached to a score are counted.
- Parameters
keys (
str
orCollection
, optional) – Key(s) for which to count annotation layers. By default, all keys are selected.which ({'attached', 'detached', 'tsv'}, optional) – ‘attached’: Counts layers from annotations attached to a score. ‘detached’: Counts layers from annotations that are in a Score object, but detached from the score. ‘tsv’: Counts layers from Annotation objects that have been loaded from or into annotation tables.
per_key (
bool
, optional) – If set to True, the results are returned as a dict {key: Counter}, otherwise the counts are summed up in one Counter. Ifwhich='detached'
, the keys are keys from Score objects, otherwise they are keys from this Corpus object.
- Returns
By default, the function returns a Counter of labels for every annotation layer (staff, voice, harmony_layer) If
per_key
is set to True, a dictionary {key: Counter} is returned, separating the counts.- Return type
- count_labels(keys=None, per_key=False)[source]#
Count label types.
- Parameters
keys (
str
orCollection
, optional) – Key(s) for which to count label types. By default, all keys are selected.per_key (
bool
, optional) – If set to True, the results are returned as a dict {key: Counter}, otherwise the counts are summed up in one Counter.
- Returns
By default, the function returns a Counter of label types. If
per_key
is set to True, a dictionary {key: Counter} is returned, separating the counts.- Return type
- count_tsv_types(keys=None, per_key=False)[source]#
Count inferred TSV types.
- Parameters
keys (
str
orCollection
, optional) – Key(s) for which to count inferred TSV types. By default, all keys are selected.per_key (
bool
, optional) – If set to True, the results are returned as a dict {key: Counter}, otherwise the counts are summed up in one Counter.
- Returns
By default, the function returns a Counter of inferred TSV types. If
per_key
is set to True, a dictionary {key: Counter} is returned, separating the counts.- Return type
- detach_labels(view_name: Optional[str] = None, force: bool = False, choose: Literal['auto', 'ask'] = 'auto', key: str = 'removed', staff: Optional[int] = None, voice: Optional[Literal[1, 2, 3, 4]] = None, harmony_layer: Optional[Literal[0, 1, 2, 3]] = None, delete: bool = True)[source]#
Calls
Score.detach_labels <ms3.score.Score.detach_labels()
on every parsed score under the current or selected view.
- store_extracted_facets(view_name: Optional[str] = None, root_dir: Optional[str] = None, measures_folder: Optional[str] = None, measures_suffix: str = '', notes_folder: Optional[str] = None, notes_suffix: str = '', rests_folder: Optional[str] = None, rests_suffix: str = '', notes_and_rests_folder: Optional[str] = None, notes_and_rests_suffix: str = '', labels_folder: Optional[str] = None, labels_suffix: str = '', expanded_folder: Optional[str] = None, expanded_suffix: str = '', form_labels_folder: Optional[str] = None, form_labels_suffix: str = '', cadences_folder: Optional[str] = None, cadences_suffix: str = '', events_folder: Optional[str] = None, events_suffix: str = '', chords_folder: Optional[str] = None, chords_suffix: str = '', metadata_suffix: Optional[str] = None, markdown: bool = True, simulate: bool = False, unfold: bool = False, interval_index: bool = False, silence_label_warnings: bool = False) List[str] [source]#
Store facets extracted from parsed scores as TSV files.
- Parameters
view_name –
root_dir –
measures_folder – Specify directory where to store the corresponding TSV files.
notes_folder – Specify directory where to store the corresponding TSV files.
rests_folder – Specify directory where to store the corresponding TSV files.
notes_and_rests_folder – Specify directory where to store the corresponding TSV files.
labels_folder – Specify directory where to store the corresponding TSV files.
expanded_folder – Specify directory where to store the corresponding TSV files.
form_labels_folder – Specify directory where to store the corresponding TSV files.
cadences_folder – Specify directory where to store the corresponding TSV files.
events_folder – Specify directory where to store the corresponding TSV files.
chords_folder – Specify directory where to store the corresponding TSV files.
measures_suffix – Optionally specify suffixes appended to the TSVs’ file names. If
unfold=True
the suffixes default to_unfolded
.notes_suffix – Optionally specify suffixes appended to the TSVs’ file names. If
unfold=True
the suffixes default to_unfolded
.rests_suffix – Optionally specify suffixes appended to the TSVs’ file names. If
unfold=True
the suffixes default to_unfolded
.notes_and_rests_suffix – Optionally specify suffixes appended to the TSVs’ file names. If
unfold=True
the suffixes default to_unfolded
.labels_suffix – Optionally specify suffixes appended to the TSVs’ file names. If
unfold=True
the suffixes default to_unfolded
.expanded_suffix – Optionally specify suffixes appended to the TSVs’ file names. If
unfold=True
the suffixes default to_unfolded
.form_labels_suffix – Optionally specify suffixes appended to the TSVs’ file names. If
unfold=True
the suffixes default to_unfolded
.cadences_suffix – Optionally specify suffixes appended to the TSVs’ file names. If
unfold=True
the suffixes default to_unfolded
.events_suffix – Optionally specify suffixes appended to the TSVs’ file names. If
unfold=True
the suffixes default to_unfolded
.chords_suffix – Optionally specify suffixes appended to the TSVs’ file names. If
unfold=True
the suffixes default to_unfolded
.metadata_suffix – Specify a suffix to update the ‘metadata{suffix}.tsv’ file for this corpus. For the main file, pass ‘’
markdown – By default, when
metadata_path
is specified, a markdown file calledREADME.md
containing the columns [file_name, measures, labels, standard, annotators, reviewers] is created. If it exists already, this table will be appended or overwritten after the heading# Overview
.simulate –
unfold – By default, repetitions are not unfolded. Pass True to duplicate values so that they correspond to a full playthrough, including correct positioning of first and second endings.
interval_index –
silence_label_warnings –
Returns:
- update_metadata_tsv_from_parsed_scores(root_dir: Optional[str] = None, suffix: str = '', markdown_file: Optional[str] = 'README.md', view_name: Optional[str] = None) List[str] [source]#
Gathers the metadata from parsed and currently selected scores and updates ‘metadata.tsv’ with the information.
- Parameters
root_dir – In case you want to output the metadata to folder different from
corpus_path
.suffix – Added to the filename: ‘metadata{suffix}.tsv’. Defaults to ‘’. Metadata files with suffix may be used to store views with particular subselections of pieces.
markdown_file – By default, a subset of metadata columns will be written to ‘README.md’ in the same folder as the TSV file. If the file exists, it will be scanned for a line containing the string ‘# Overview’ and overwritten from that line onwards.
view_name – The view under which you want to update metadata from the selected parsed files. Defaults to None, i.e. the active view.
- Returns
The file paths to which metadata was written.
- update_score_metadata_from_tsv(view_name: Optional[str] = None, force: bool = False, choose: Literal['all', 'auto', 'ask'] = 'all', write_empty_values: bool = False, remove_unused_fields: bool = False, write_text_fields: bool = False) List[File] [source]#
Update metadata fields of parsed scores with the values from the corresponding row in metadata.tsv.
- Parameters
view_name –
force –
choose –
write_empty_values – If set to True, existing values are overwritten even if the new value is empty, in which case the field will be set to ‘’.
remove_unused_fields – If set to True, all non-default fields that are not among the columns of metadata.tsv (anymore) are removed.
write_text_fields – If set to True, ms3 will write updated values from the columns
title_text
,subtitle_text
,composer_text
,lyricist_text
, andpart_name_text
into the score headers.
- Returns
List of File objects of those scores of which the XML structure has been modified.
- store_parsed_scores(view_name: Optional[str] = None, only_changed: bool = True, root_dir: Optional[str] = None, folder: str = '.', suffix: str = '', overwrite: bool = False, simulate=False) List[str] [source]#
Stores all parsed scores under this view as MuseScore 3 files.
- Parameters
view_name –
only_changed – By default, only scores that have been modified since parsing are written. Set to False to store all scores regardless.
root_dir –
folder –
suffix – Suffix to append to the original file name.
overwrite – Pass True to overwrite existing files.
simulate – Set to True if no files are to be written.
- Returns
Paths of the stored files.
- ms3.corpus.parse_musescore_file(file: File, logger: Logger, logger_cfg: dict = {}, read_only: bool = False, ms: Optional[str] = None) Score [source]#
Performs a single parse and returns the resulting Score object or None.
- Parameters
file – File object with path information of a score that can be opened (or converted) with MuseScore 3.
logger – Logger to be used within this function (not for the parsing itself).
logger_cfg – Logger config for the new Score object (and therefore for the parsing itself).
read_only – Pass True to return smaller objects that do not keep a copy of the original XML structure in memory. In order to make changes to the score after parsing, this needs to be False (default).
ms – MuseScore executable in case the file needs to be converted.
- Returns
The parsed score.
The Piece class#
- class ms3.piece.Piece(fname: str, view: Optional[View] = None, logger_cfg={}, ms=None)[source]#
Wrapper around
Score
for associating it with parsed TSV files- facet2files: Dict[str, FileList]#
{typ -> [
File
]} dict storing file information for associated types.
- ix2file: Dict[int, File]#
{ix ->
File
} dict storing the registered file information for access via index.
- facet2parsed: Dict[str, Dict[int, ParsedFile]]#
{typ -> {ix ->
pandas.DataFrame`|:obj:`Score
}} dict storing parsed files for associated types.
- ix2parsed: Dict[int, ParsedFile]#
{ix ->
pandas.DataFrame`|:obj:`Score
} dict storing the parsed files for access via index.
- ix2annotations: Dict[int, Annotations]#
{ix ->
Annotations
} dict storing Annotations objects for the parsed labels and expanded labels.
- all_facets_present(view_name: Optional[str] = None, selected_facets: Optional[Union[Literal['scores', 'measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords', 'unknown'], Collection[Literal['scores', 'measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords', 'unknown']]]] = None) bool [source]#
Checks if parsed TSV files have been detected for all selected facets under the active or indicated view.
- Parameters
view_name – Name of the view to check.
selected_facets – If passed, needs to be a subset of the facets selected by the view, otherwise the result will be False. If no
selected_facets
are passed, check for those selected by the active or indicated view.
- Returns
True if for each selected facet at least one file has been registered.
- score_metadata(view_name: Optional[str], choose: Literal['auto', 'ask'], as_dict: Literal[False]) Series [source]#
- score_metadata(view_name: Optional[str], choose: Literal['auto', 'ask'], as_dict: Literal[True]) dict
- Parameters
choose –
as_dict – Set to True to change the return type from
pandas.Series
todict
.
Returns:
- property tsv_metadata: Optional[Dict[str, str]]#
If the
Corpus
hasmetadata_tsv
, this field will contain the {column: value} pairs of the row pertaining to this piece.
- metadata(view_name: Optional[str] = None) Optional[Series] [source]#
If a row of ‘metadata.tsv’ has been stored, return that, otherwise extract from a (force-)parsed score.
- set_view(active: Optional[View] = None, **views: View)[source]#
Register one or several view_name=View pairs.
- get_view(view_name: Optional[str] = None, **config) View [source]#
Retrieve an existing or create a new View object, potentially while updating the config.
- compare_labels(key: str = 'detached', new_color: str = 'ms3_darkgreen', old_color: str = 'ms3_darkred', detached_is_newer: bool = False, add_to_rna: bool = True, view_name: Optional[str] = None) Tuple[int, int] [source]#
Compare detached labels
key
to the ones attached to the Score to create a diff. By default, the attached labels are considered as the reviewed version and labels that have changed or been added in comparison to the detached labels are colored in green; whereas the previous versions of changed labels are attached to the Score in red, just like any deleted label.- Parameters
key – Key of the detached labels you want to compare to the ones in the score.
new_color – The colors by which new and old labels are differentiated. Identical labels remain unchanged. Colors can be CSS colors or MuseScore colors (see
utils.MS3_COLORS
).old_color – The colors by which new and old labels are differentiated. Identical labels remain unchanged. Colors can be CSS colors or MuseScore colors (see
utils.MS3_COLORS
).detached_is_newer – Pass True if the detached labels are to be added with
new_color
whereas the attached changed labels will turnold_color
, as opposed to the default.add_to_rna – By default, new labels are attached to the Roman Numeral layer. Pass False to attach them to the chord layer instead.
- Returns
Number of scores in which labels have changed. Number of scores in which no label has chnged.
- count_detected(include_empty: bool = False, view_name: Optional[str] = None, prefix: bool = False) Dict[str, int] [source]#
Count how many files per facet have been detected.
- Parameters
include_empty – By default, facets without files are not included in the dict. Pass True to include zero counts.
view_name –
prefix – Pass True if you want the facets prefixed with ‘detected_’.
- Returns
{facet -> count of detected files}
- extract_facets(facets: Optional[Union[Literal['measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords'], Collection[Literal['measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords']]]] = None, view_name: Optional[str] = None, force: bool = False, choose: Literal['all', 'auto', 'ask'] = 'all', unfold: bool = False, interval_index: bool = False, flat=False) Union[Dict[str, List[Tuple[File, DataFrame]]], List[Tuple[File, DataFrame]]] [source]#
Retrieve a dictionary with the selected feature matrices extracted from the parsed scores. If you want to retrieve parsed TSV files, use
get_all_parsed()
.
- get_facets(facets: Optional[Union[Literal['measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords'], Collection[Literal['measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords']]]] = None, view_name: Optional[str] = None, force: bool = False, choose: Literal['all', 'auto', 'ask'] = 'all', unfold: bool = False, interval_index: bool = False, flat=False) Union[Dict[str, Tuple[File, DataFrame]], List[Tuple[File, DataFrame]]] [source]#
Retrieve score facets both freshly extracted from parsed scores and from parsed TSV files, depending on the parameters and the view in question.
If choose != ‘all’, the goal will be to return one DataFrame per facet. Preference is given to a DataFrame freshly extracted from an already parsed score; otherwise, from an already parsed TSV file. If both are not available, preference will be given to a force-parsed TSV, then to a force-parsed score.
- Parameters
facets –
view_name –
force – Only relevant when
choose='all'
. By default, only scores and TSV files that have already been parsed are taken into account. Setforce=True
to force-parse all scores and TSV files selected under the given view.choose –
unfold –
interval_index –
flat –
Returns:
- get_facet(facet: Literal['measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords'], view_name: Optional[str] = None, force: bool = False, choose: Literal['auto', 'ask'] = 'auto', unfold: bool = False, interval_index: bool = False) Tuple[Optional[File], Optional[DataFrame]] [source]#
Retrieve a DataFrame from a parsed score or, if unavailable, from a parsed TSV. If none have been parsed, first force-parse a TSV and, if not included in the given view, force-parse a score.
- get_file(facet: Literal['scores', 'measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords', 'unknown'], view_name: Optional[str] = None, parsed: bool = True, unparsed: bool = True, choose: Literal['auto', 'ask'] = 'auto') Optional[File] [source]#
- Parameters
facet –
choose –
- Returns
A {file_type -> [
File
] dict containing the selected Files or, if flat=True, just a list.
- get_files(facets: Optional[Union[Literal['scores', 'measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords', 'unknown'], Literal['tsv', 'tsvs'], Collection[Literal['scores', 'measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords', 'unknown']]]] = None, view_name: Optional[str] = None, parsed: bool = True, unparsed: bool = True, choose: Literal['all', 'auto', 'ask'] = 'all', flat: bool = False, include_empty: bool = False) Union[Dict[str, List[File]], List[File]] [source]#
- Parameters
facets –
- Returns
A {file_type -> [
File
] dict containing the selected Files or, if flat=True, just a list.
- get_parsed(facet: Literal['scores', 'measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords', 'unknown'], view_name: Optional[str] = None, choose: Literal['auto', 'ask'] = 'auto', git_revision: Optional[str] = None, unfold: bool = False, interval_index: bool = False) Tuple[Optional[File], Optional[Union[Score, DataFrame]]] [source]#
Retrieve exactly one parsed score or TSV file. If none has been parsed, parse one automatically.
- Parameters
facet –
view_name –
choose –
git_revision –
Returns:
- get_all_parsed(facets: Optional[Union[Literal['scores', 'measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords', 'unknown'], Literal['tsv', 'tsvs'], Collection[Literal['scores', 'measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords', 'unknown']]]] = None, view_name: Optional[str] = None, force: bool = False, choose: Literal['all', 'auto', 'ask'] = 'all', flat: bool = False, include_empty: bool = False, unfold: bool = False, interval_index: bool = False) Union[Dict[Literal['scores', 'measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords', 'unknown'], List[Tuple[File, Union[Score, DataFrame]]]], List[Tuple[File, Union[Score, DataFrame]]]] [source]#
Return multiple parsed files.
- iter_extracted_facet(facet: Literal['measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords'], view_name: Optional[str] = None, force: bool = False, unfold: bool = False, interval_index: bool = False) Iterator[Tuple[Optional[File], Optional[DataFrame]]] [source]#
Iterate through the selected facet extracted from all parsed or yet-to-parse scores.
- iter_extracted_facets(facets: Union[Literal['measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords'], Collection[Literal['measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords']]], view_name: Optional[str] = None, force: bool = False, unfold: bool = False, interval_index: bool = False) Iterator[Tuple[File, Dict[str, DataFrame]]] [source]#
Iterate through the selected facets extracted from all parsed or yet-to-parse scores.
- iter_facet2files(view_name: Optional[str] = None, include_empty: bool = False) Iterator[Tuple[str, List[File]]] [source]#
Iterating through
facet2files
under the current or specified view.
- iter_facet2parsed(view_name: Optional[str] = None, include_empty: bool = False) Iterator[Dict[str, List[File]]] [source]#
Iterating through
facet2parsed
under the current or specified view and selecting only parsed files.
- iter_files(facets: Optional[Union[Literal['scores', 'measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords', 'unknown'], Literal['tsv', 'tsvs'], Collection[Literal['scores', 'measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords', 'unknown']]]] = None, view_name: Optional[str] = None, parsed: bool = True, unparsed: bool = True, choose: Literal['all', 'auto', 'ask'] = 'all', flat: bool = False, include_empty: bool = False) Union[Iterator[Dict[str, File]], Iterator[List[File]]] [source]#
Equivalent to iterating through the result of
get_files()
.
- load_annotation_table_into_score(ix: Optional[int] = None, df: Optional[DataFrame] = None, view_name: Optional[str] = None, choose: Literal['auto', 'ask'] = 'auto', key: str = 'detached', infer: bool = True, **cols) None [source]#
Attach an
Annotations
object to the score and make it available asScore.{key}
. It can be an existing object or one newly created from the TSV filetsv_path
.- Parameters
ix – Either pass the index of a TSV file containing annotations, or
df – A DataFrame containing annotations.
key – Specify a new key for accessing the set of annotations. The string needs to be usable as an identifier, e.g. not start with a number, not contain special characters etc. In return you may use it as a property: For example, passing
'chords'
lets you access theAnnotations
asScore.chords
. The key ‘annotations’ is reserved for all annotations attached to the score.infer – By default, the label types are inferred in the currently configured order (see
name2regex
). Pass False to not add and not change any label types.**cols – If the columns in the specified TSV file diverge from the standard column names, pass them as standard_name=’custom name’ keywords.
- store_extracted_facet(facet: Literal['measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords'], root_dir: Optional[str] = None, folder: Optional[str] = None, suffix: str = '', view_name: Optional[str] = None, force: bool = False, choose: Literal['all', 'auto', 'ask'] = 'all', unfold: bool = False, interval_index: bool = False)[source]#
Extract a facet from one or several available scores and store the results as TSV files, the paths of which are computed from the respective score’s location.
- Args:
facet: root_dir:
Defaults to None, meaning that the path is constructed based on the corpus_path. Pass a directory to construct the path relative to it instead. If
folder
is an absolute path,root_dir
is ignored.- folder:
If
folder
is None (default), the files’ type will be appended to theroot_dir
.If
folder
is an absolute path,root_dir
will be ignored.If
folder
is a relative path starting with a dot.
the relative path is appended to the file’s subdir. For example, ``..
- otes`` will resolve to a sibling directory of the one where the
file
is located. If
folder
is a relative path that does not begin with a dot.
, it will be appended to theroot_dir
.
suffix: String to append to the file’s fname. view_name: force: choose: unfold: interval_index:
Returns:
- store_parsed_score_at_ix(ix, root_dir: Optional[str] = None, folder: str = '.', suffix: str = '', overwrite: bool = False, simulate=False) Optional[str] [source]#
Creates a MuseScore 3 file from the Score object at the given index.
- Parameters
ix –
folder –
suffix – Suffix to append to the original file name.
root_dir –
overwrite – Pass True to overwrite existing files.
simulate – Set to True if no files are to be written.
- Returns
Path of the stored file.
- update_score_metadata_from_tsv(view_name: Optional[str] = None, force: bool = False, choose: Literal['all', 'auto', 'ask'] = 'all', write_empty_values: bool = False, remove_unused_fields: bool = False, write_text_fields: bool = False) List[File] [source]#
Update metadata fields of parsed scores with the values from the corresponding row in metadata.tsv.
- Parameters
view_name –
force –
choose –
write_empty_values – If set to True, existing values are overwritten even if the new value is empty, in which case the field will be set to ‘’.
remove_unused_fields – If set to True, all non-default fields that are not among the columns of metadata.tsv (anymore) are removed.
write_text_fields – If set to True, ms3 will write updated values from the columns
title_text
,subtitle_text
,composer_text
,lyricist_text
, andpart_name_text
into the score header.
- Returns
List of File objects of those scores of which the XML structure has been modified.
- update_tsvs_on_disk(facets: Union[Literal['measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords'], Collection[Literal['measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords']]] = 'tsv', view_name: Optional[str] = None, force: bool = False, choose: Literal['auto', 'ask'] = 'auto') List[str] [source]#
Update existing TSV files corresponding to one or several facets with information freshly extracted from a parsed score, but only if the contents are identical. Otherwise, the existing TSV file is not overwritten and the differences are displayed in a log warning. The purpose is to safely update the format of existing TSV files, (for instance with respect to column order) making sure that the content doesn’t change.
- Parameters
facets –
view_name –
force – By default, only TSV files that have already been parsed are updated. Set to True in order to force-parse for each facet one of the TSV files included in the given view, if necessary.
choose –
- Returns
List of paths that have been overwritten.
- get_dataframe(*args, **kwargs) None [source]#
Deprecated method. Replaced by
get_parsed()
,extract_facet()
, andget_facet()
.
The View class#
- class ms3.view.View(view_name: Optional[str] = 'all', only_metadata_fnames: bool = False, include_convertible: bool = True, include_tsv: bool = True, exclude_review: bool = False, **logger_cfg)[source]#
Object storing regular expressions and filter lists, storing and keeping track of things filtered out.
- is_default(relax_for_cli: bool = False) bool [source]#
Checks includes and excludes that may influence the selection of fnames. Returns True if the settings do not filter out any fnames. Only if
relax_for_cli
is set to True, the filtersinclude_convertible
andexclude_review
are permitted, too.
- copy(new_name: Optional[str] = None) View [source]#
Returns a copy of this view, i.e., a new View object.
- update_config(view_name: Optional[str] = None, only_metadata_fnames: Optional[bool] = None, include_convertible: Optional[bool] = None, include_tsv: Optional[bool] = None, exclude_review: Optional[bool] = None, file_paths: Optional[Union[str, Collection[str]]] = None, file_re: Optional[str] = None, folder_re: Optional[str] = None, exclude_re: Optional[str] = None, folder_paths: Optional[Union[str, Collection[str]]] = None, **logger_cfg)[source]#
Update the configuration of the View. This is a shorthand for issuing several calls to
include()
andexclude()
at once.- Parameters
view_name – New name of the view.
only_metadata_fnames – Whether or not fnames that are not included in a metadata.tsv should be excluded.
include_convertible – Whether or not scores that need conversion via MuseScore before parsing should be included.
include_tsv – Whether or not TSV files should be included.
exclude_review – Whether or not files and folder that include ‘review’ should be excluded.
file_paths – The exact file names will be extracted and used as exclusive filter, that is, all files that do not have one of these file names will be excluded. This is regardless of eventual relative or absolute paths included in the argument.
file_re – Include only files whose file name includes this regular expression.
folder_re – Include only files from folders whose name includes this regular expression.
exclude_re – Exclude all file and folders whose name includes this regular expression.
folder_paths – Include only files from these folders.
**logger_cfg –
Returns:
- check_token(category: Literal['corpora', 'folders', 'fnames', 'files', 'suffixes', 'facets', 'paths'], token: str) bool [source]#
Checks if a string pertaining to a certain category should be included in the view or not.
- check_file(file: File) Tuple[bool, str] [source]#
Check if an individual File passes all filters w.r.t. its subdirectories, file name and suffix.
- Parameters
file –
- Returns
False if file is to be discarded from this view. The criterion based on which the file is being excluded.
- filter_by_token(category: Literal['corpora', 'folders', 'fnames', 'files', 'suffixes', 'facets', 'paths'], tuples: Iterable[tuple]) Iterator[tuple] [source]#
Filters out those tuples where the token (first element) does not pass _.check_token(category, token).
- filtered_tokens(category: Literal['corpora', 'folders', 'fnames', 'files', 'suffixes', 'facets', 'paths'], tokens: Collection[str]) List[str] [source]#
Applies
filter_by_token()
to a collection of tokens.
- class ms3.view.DefaultView(view_name: Optional[str] = 'default', only_metadata_fnames: bool = True, include_convertible: bool = False, include_tsv: bool = True, exclude_review: bool = True, **logger_cfg)[source]#
- ms3.view.create_view_from_parameters(only_metadata_fnames: bool = True, include_convertible: bool = False, include_tsv: bool = True, exclude_review: bool = True, file_paths=None, file_re=None, folder_re=None, exclude_re=None, level=None) View [source]#
From the arguments of an __init__ method, create either a DefaultView or a custom view.
The Score class#
- class ms3.score.Score(musescore_file=None, match_regex=['dcml', 'form_labels'], read_only=False, labels_cfg={}, parser='bs4', ms=None, **logger_cfg)[source]#
Object representing a score.
- ABS_REGEX = '^\\(?[A-G|a-g](b*|#*).*?(/[A-G|a-g](b*|#*))?$'#
str
Class variable with a regular expression that recognizes absolute chord symbols in their decoded (string) form; they start with a note name.
- NASHVILLE_REGEX = '^(b*|#*)(\\d).*$'#
str
Class variable with a regular expression that recognizes labels representing a Nashville numeral, which MuseScore is able to encode.
- RN_REGEX = '^$'#
str
Class variable with a regular expression for Roman numerals that momentarily matches nothing because ms3 tries interpreting Roman Numerals als DCML harmony annotations.
- convertible_formats = ('cap', 'capx', 'midi', 'mid', 'musicxml', 'mxl', 'xml')#
tuple
Formats that have to be converted before parsing.
- parseable_formats = ('mscx', 'mscz', 'cap', 'capx', 'midi', 'mid', 'musicxml', 'mxl', 'xml')#
tuple
Formats that ms3 can parse.
- read_only#
bool
, optional Defaults toFalse
, meaning that the parsing is slower and uses more memory in order to allow for manipulations of the score, such as adding and deleting labels. Set toTrue
if you’re only extracting information.
- full_paths#
dict
{KEY: {i: full_path}}
dictionary holding the full paths of all parsed MuseScore and TSV files, including file names. Handled internally by_handle_path()
.
- paths#
dict
{KEY: {i: file path}}
dictionary holding the paths of all parsed MuseScore and TSV files, excluding file names. Handled internally by_handle_path()
.
- files#
dict
{KEY: {i: file name with extension}}
dictionary holding the complete file name of each parsed file, including the extension. Handled internally by_handle_path()
.
- fnames#
dict
{KEY: {i: file name without extension}}
dictionary holding the file name of each parsed file, without its extension. Handled internally by_handle_path()
.
- fexts#
dict
{KEY: {i: file extension}}
dictionary holding the file extension of each parsed file. Handled internally by_handle_path()
.
- _detached_annotations#
dict
{(key, i): Annotations object}
dictionary for accessing all detachedAnnotations
objects.
- _name2regex#
dict
Mapping names to their corresponding regex. Managed via the propertyname2regex
. ‘dcml’: utils.DCML_REGEX,
- labels_cfg#
dict
Configuration dictionary to determine the output format of theAnnotations
objects contained in the current object, especially when callingScore.mscx.labels()
. The default options correspond to the default parameters ofAnnotations.get_labels()
.
- parser#
{‘bs4’} Currently only one XML parser has been implemented which uses BeautifulSoup 4.
- review_report#
pandas.DataFrame
After callingcolor_non_chord_tones()
, this DataFrame contains the expanded chord labels plus the six additional columns [‘n_colored’, ‘n_untouched’, ‘count_ratio’, ‘dur_colored’, ‘dur_untouched’, ‘dur_ratio’] representing the statistics of chord (untouched) vs. non-chord (colored) notes.
- comparison_report#
pandas.DataFrame
DataFrame showing the labels modified (‘new’) and added (‘old’) bycompare_labels()
.
- property name2regex#
list
ordict
, optional The order in which label types are to be inferred. Assigning a new value results in a call toinfer_types()
. Passing a {label type: regex} dictionary is a shortcut to update type regex’s or to add new ones. The inference will take place in the order in which they appear in the dictionary. To reuse an existing regex will updating others, you can refer to them as None, e.g.{'dcml': None, 'my_own': r'^(PAC|HC)$'}
.
- property has_detached_annotations#
bool
Is True as long as the score containsAnnotations
objects, that are not attached to theMSCX
object.
- attach_labels(key, staff=None, voice=None, harmony_layer=None, check_for_clashes=True, remove_detached=True)[source]#
Insert detached labels
key
into this score’sMSCX
object.- Parameters
key (
str
) – Key of the detached labels you want to insert into the score.staff (
int
, optional) – By default, labels are added to staves as specified in the TSV or to -1 (lowest). Pass an integer to specify a staff.voice (
int
, optional) – By default, labels are added to voices (notational layers) as specified in the TSV or to 1 (main voice). Pass an integer to specify a voice.harmony_layer (
int
, optional) –By default, the labels are written to the layer specified as an integer in the columnharmony_layer
.Pass an integer to select a particular layer:* 0 to attach them as absolute (‘guitar’) chords, meaning that when opened next time,MuseScore will split and encode those beginning with a note name ( resulting in ms3-internal harmony_layer 3).* 1 the labels are written into the staff’s layer for Roman Numeral Analysis.* 2 to have MuseScore interpret them as Nashville Numberscheck_for_clashes (
bool
, optional) – Defaults to True, meaning that the positions where the labels will be inserted will be checked for existing labels.remove_detached (
bool
, optional) – By default, the detachedAnnotations
object is removed after successfully attaching it. Pass False to have it remain in detached state.
- Returns
- change_labels_cfg(labels_cfg={}, staff=None, voice=None, harmony_layer=None, positioning=None, decode=None, column_name=None, color_format=None)[source]#
Update
Score.labels_cfg
andMSCX.labels_cfg
.- Parameters
labels_cfg (
dict
) – Using an entire dictionary or, to change only particular options, choose from:staff – Arguments as they will be passed to
get_labels()
voice – Arguments as they will be passed to
get_labels()
harmony_layer – Arguments as they will be passed to
get_labels()
positioning – Arguments as they will be passed to
get_labels()
decode – Arguments as they will be passed to
get_labels()
- check_labels(keys='annotations', regex=None, regex_name='dcml', **kwargs)[source]#
Tries to match the labels
keys
against the givenregex
or the one of the registeredregex_name
. Returns wrong labels.- Parameters
keys (
str
orCollection
, optional) – The key(s) of the Annotation objects you want to check. Defaults to ‘annotations’, the attached labels.regex (
str
, optional) – Pass a regular expression against which to check the labels if you don’t want to use the one of an existingregex_name
or in order to register a new one on the fly by passing the new name asregex_name
.regex_name (
str
, optional) – To use the regular expression of a registered type, pass its name, defaults to ‘dcml’. Pass a new name and aregex
to register a new label type on the fly.kwargs – Parameters passed to
check_labels()
.
- Returns
Labels not matching the regex.
- Return type
- color_non_chord_tones(color_name: str = 'red') Optional[DataFrame] [source]#
Iterates through the attached labels, tries to interpret them as DCML harmony labels, colors the notes in the parsed score that are not expressed by the respective label for a score segment, and stores a report under
review_report
.- Parameters
color_name – Name the color that the non-chord tones should get, defaults to ‘red’. Name can be a CSS color or a MuseScore color (see
utils.MS3_COLORS
).- Returns
A coloring report which is the original
df
with the appended columns ‘n_colored’, ‘n_untouched’, ‘count_ratio’, ‘dur_colored’, ‘dur_untouched’, ‘dur_ratio’. They contain the counts and durations of the colored vs. untouched notes as well the ratio of each pair. Note that the report does not take into account notes that reach into a segment, nor does it correct the duration of notes that reach into the subsequent segment.
- compare_labels(key: str = 'detached', new_color: str = 'ms3_darkgreen', old_color: str = 'ms3_darkred', detached_is_newer: bool = False, add_to_rna: bool = True) Tuple[int, int] [source]#
Compare detached labels
key
to the ones attached to the Score to create a diff. By default, the attached labels are considered as the reviewed version and labels that have changed or been added in comparison to the detached labels are colored in green; whereas the previous versions of changed labels are attached to the Score in red, just like any deleted label.- Parameters
key – Key of the detached labels you want to compare to the ones in the score.
new_color – The colors by which new and old labels are differentiated. Identical labels remain unchanged. Colors can be CSS colors or MuseScore colors (see
utils.MS3_COLORS
).old_color – The colors by which new and old labels are differentiated. Identical labels remain unchanged. Colors can be CSS colors or MuseScore colors (see
utils.MS3_COLORS
).detached_is_newer – Pass True if the detached labels are to be added with
new_color
whereas the attached changed labels will turnold_color
, as opposed to the default.add_to_rna – By default, new labels are attached to the Roman Numeral layer. Pass False to attach them to the chord layer instead.
- Returns
Number of attached labels that were not present in the old version and whose color has been changed. Number of added labels that are not present in the current version any more and which have been added as a consequence.
- detach_labels(key, staff=None, voice=None, harmony_layer=None, delete=True, inverse=False, regex=None)[source]#
Detach all annotations labels from this score’s
MSCX
object or just a selection of them, without taking labels_cfg into account (don’t decode the labels). The extracted labels are stored as a newAnnotations
object that is accessible viaScore.{key}
. By default,delete
is set to True, meaning that if you callstore_scores()
afterwards, the created MuseScore file will not contain the detached labels.- Parameters
key (
str
) – Specify a new key for accessing the detached set of annotations. The string needs to be usable as an identifier, e.g. not start with a number, not contain special characters etc. In return you may use it as a property: For example, passing'chords'
lets you access the detached labels asScore.chords
. The key ‘annotations’ is reserved for all annotations attached to the score.staff (
int
, optional) – Pass a staff ID to select only labels from this staff. The upper staff has ID 1.voice ({1, 2, 3, 4}, optional) – Can be used to select only labels from one of the four notational layers. Layer 1 is MuseScore’s main, ‘upper voice’ layer, coloured in blue.
harmony_layer (
int
orstr
, optional) – Select one of the harmony layers {0,1,2,3} to select only these.delete (
bool
, optional) – By default, the labels are removed from the XML structure inMSCX
. Pass False if you want them to remain. This could be useful if you only want to extract a subset of the annotations for storing them separately but without removing the labels from the score.
- get_infer_regex()[source]#
- Returns
Mapping of label types to the corresponding regular expressions in the order in which they are currently set to be inferred.
- Return type
- get_labels(key: Optional[str] = None, interval_index: bool = False, unfold: bool = False) Optional[DataFrame] [source]#
DataFrame representing all Labels, i.e., all <Harmony> tags, of the score or another set of annotations. Corresponds to calling
get_labels()
on the selected object (by default, the one representing labels attached to the score) with the current_labels_cfg
. Comes with the columns quarterbeats, duration_qb, mc, mn, mc_onset, mn_onset, timesig, staff, voice, volta, harmony_layer, label, offset_x, offset_y, regex_match- Parameters
key –
interval_index – Pass True to replace the default
RangeIndex
by anIntervalIndex
.
- Returns
DataFrame representing all Labels, i.e., all <Harmony> tags in the score.
- new_type(name, regex, description='', infer=True)[source]#
Declare a custom label type. A type consists of a name, a regular expression and, falculatively, of a description.
- Parameters
regex (
str
) – Regular expression that matches all labels of the custom type.description (
str
, optional) – Human readable description that appears when calling the propertyScore.types
.infer (
bool
, optional) – By default, the labels of allAnnotations
objects are matched against the new type. Pass False to not change any label’s type.
- load_annotations(tsv_path: Optional[str] = None, anno_obj: Optional[Annotations] = None, df: Optional[DataFrame] = None, key: str = 'detached', infer: bool = True, **cols) None [source]#
Attach an
Annotations
object to the score and make it available asScore.{key}
. It can be an existing object or one newly created from the TSV filetsv_path
.- Parameters
tsv_path – If you want to create a new
Annotations
object from a TSV file, pass its path.anno_obj – Instead, you can pass an existing object.
df – Or you can automatically create one from a given DataFrame.
key – Specify a new key for accessing the set of annotations. The string needs to be usable as an identifier, e.g. not start with a number, not contain special characters etc. In return you may use it as a property: For example, passing
'chords'
lets you access theAnnotations
asScore.chords
. The key ‘annotations’ is reserved for all annotations attached to the score.infer – By default, the label types are inferred in the currently configured order (see
name2regex
). Pass False to not add and not change any label types.**cols – If the columns in the specified TSV file diverge from the standard column names, pass them as standard_name=’custom name’ keywords.
- store_annotations(key='annotations', tsv_path=None, **kwargs)[source]#
Save a set of annotations as TSV file. While
store_list
stores attached labels only, this method can also store detached labels by passing akey
.- Parameters
key (
str
, optional) – Key of theAnnotations
object which you want to output as TSV file. By default, the annotations attached to the score (key=’annotations’) are stored.tsv_path (
str
, optional) – Path of the newly created TSV file including the file name. By default, the TSV file is stored next to tkwargs – Additional keyword arguments will be passed to the function
pandas.DataFrame.to_csv()
to customise the format of the created file (e.g. to change the separator to commas instead of tabs, you would passsep=','
).
- store_score(filepath)[source]#
Store the current
MSCX
object attached to this score as uncompressed MuseScore file. Just a shortcut forScore.mscx.store_scores()
.- Parameters
filepath (
str
) – Path of the newly created MuseScore file, including the file name ending on ‘.mscx’. Uncompressed files (‘.mscz’) are not supported.
- _handle_path(path, key=None)[source]#
Puts the path into
paths, files, fnames, fexts
dicts with the given key.
- parse_mscx(musescore_file=None, read_only=None, parser=None, labels_cfg={})[source]#
This method is called by
__init__()
to parse the score. It checks the file extension and in the case of a compressed MuseScore file (.mscz), a temporary uncompressed file is generated which is removed after the parsing process. Essentially, parsing means to initiate aMSCX
object and to make it available asScore.mscx
and, if the score includes annotations, to initiate anAnnotations
object that can be accessed asScore.annotations
. The method doesn’t systematically clean up data from a hypothetical previous parse.- Parameters
musescore_file (
str
, optional) – Path to the MuseScore file to be parsed.read_only (
bool
, optional) – Defaults toFalse
, meaning that the parsing is slower and uses more memory in order to allow for manipulations of the score, such as adding and deleting labels. Set toTrue
if you’re only extracting information.parser ('bs4', optional) – The only XML parser currently implemented is BeautifulSoup 4.
labels_cfg (
dict
, optional) – Store a configuration dictionary to determine the output format of theAnnotations
object representing the currently attached annotations. SeeMSCX.labels_cfg
.
- output_mscx(**kwargs) None [source]#
Deprecated method. Replaced by
store_score()
.
The MSCX class#
This class defines the user interface for accessing score information via Score.mscx
.
It consists mainly of shortcuts for interacting with the parser in use, currently
Beautifulsoup exclusively.
- class ms3.score.MSCX(mscx_src, read_only=False, parser='bs4', labels_cfg={}, parent_score=None, **logger_cfg)[source]#
Object for interacting with the XML structure of a MuseScore 3 file. Is usually attached to a
Score
object and exposed asScore.mscx
. An object is only created if a score was successfully parsed.- changed#
bool
Switches to True as soon as the original XML structure is changed. Does not automatically switch back to False.
- read_only#
bool
, optional Shortcut forMSCX.parsed.read_only
. Defaults toFalse
, meaning that the parsing is slower and uses more memory in order to allow for manipulations of the score, such as adding and deleting labels. Set toTrue
if you’re only extracting information.
- parser#
{‘bs4’} The currently used parser.
- labels_cfg#
dict
Configuration dictionary to determine the output format of theAnnotations
object representing the labels that are attached to a score (stored as_annotations`
). The options correspond to the parameters ofAnnotations.get_labels()
.
- cadences(interval_index: bool = False, unfold: bool = False) Optional[DataFrame] [source]#
pandas.DataFrame
DataFrame representing all cadence annotations in the score.
- chords(mode: Literal['auto', 'strict'] = 'auto', interval_index: bool = False, unfold: bool = False) Optional[DataFrame] [source]#
DataFrame of Chords representing all <Chord> tags contained in the MuseScore file (all <note> tags come within one) and attached score information and performance maerks, e.g. lyrics, dynamics, articulations, slurs (see the explanation for the
mode
parameter for more details). Comes with the columns quarterbeats, duration_qb, mc, mn, mc_onset, mn_onset, timesig, staff, voice, duration, gracenote, tremolo, nominal_duration, scalar, volta, chord_id, dynamics, articulation, staff_text, slur, Ottava:8va, Ottava:8vb, pedal, TextLine, decrescendo_hairpin, diminuendo_line, crescendo_line, crescendo_hairpin, tempo, qpm, lyrics:1, Ottava:15mb- Parameters
mode – Defaults to ‘auto’, meaning that additional performance markers available in the score are to be included, namely lyrics, dynamics, fermatas, articulations, slurs, staff_text, system_text, tempo, and spanners (e.g. slurs, 8va lines, pedal lines). This results in NaN values in the column ‘chord_id’ for those markers that are not part of a <Chord> tag, e.g. <Dynamic>, <StaffText>, or <Tempo>. To prevent that, pass ‘strict’, meaning that only <Chords> are included, i.e. the column ‘chord_id’ will have no empty values.
interval_index – Pass True to replace the default
RangeIndex
by anIntervalIndex
.
- Returns
DataFrame of Chords representing all <Chord> tags contained in the MuseScore file.
- events(interval_index: bool = False, unfold: bool = False) Optional[DataFrame] [source]#
DataFrame representing a raw skeleton of the score’s XML structure and contains all Sphinx core events contained in it. It is the original tabular representation of the MuseScore file’s source code from which all other tables, except
measures
are generated.- Parameters
interval_index – Pass True to replace the default
RangeIndex
by anIntervalIndex
.- Returns
DataFrame containing the original tabular representation of all Sphinx core events encoded in the MuseScore file.
- expanded(interval_index: bool = False, unfold: bool = False) Optional[DataFrame] [source]#
DataFrame representing Expanded labels, i.e., all annotations encoded in <Harmony> tags which could be matched against one of the registered regular expressions and split into feature columns. Currently this method is hard-coded to return expanded DCML harmony labels only but it takes into account the current
_labels_cfg
. Comes with the columns quarterbeats, duration_qb, mc, mn, mc_onset, mn_onset, timesig, staff, voice, volta, label, alt_label, offset_x, offset_y, regex_match, globalkey, localkey, pedal, chord, numeral, form, figbass, changes, relativeroot, cadence, phraseend, chord_type, globalkey_is_minor, localkey_is_minor, chord_tones, added_tones, root, bass_note- Parameters
interval_index – Pass True to replace the default
RangeIndex
by anIntervalIndex
.- Returns
DataFrame representing all Labels, i.e., all <Harmony> tags in the score.
- property has_annotations#
bool
Shortcut forMSCX.parsed.has_annotations
. Is True as long as at least one label is attached to the current XML.
- property n_form_labels#
int
Shortcut forMSCX.parsed.n_form_labels
. Is True if at least one StaffText seems to constitute a form label.
- form_labels(detection_regex: Optional[str] = None, exclude_harmony_layer: bool = False, interval_index: bool = False, expand: bool = True, unfold: bool = False) Optional[DataFrame] [source]#
DataFrame representing form labels (or other) that have been encoded as <StaffText>s rather than in the <Harmony> layer. This function essentially filters all StaffTexts matching the
detection_regex
and adds the standard position columns.- Parameters
detection_regex – By default, detects all labels starting with one or two digits followed by a column (see
the regex
). Pass another regex to retrieve only StaffTexts matching this one.exclude_harmony_layer – By default, form labels are detected even if they have been encoded as Harmony labels (rather than as StaffText). Pass True in order to retrieve only StaffText form labels.
interval_index – Pass True to replace the default
RangeIndex
by anIntervalIndex
.
- Returns
DataFrame containing all StaffTexts matching the
detection_regex
- labels(interval_index: bool = False, unfold: bool = False) Optional[DataFrame] [source]#
DataFrame representing all Labels, i.e., all <Harmony> tags in the score, as returned by calling
get_labels()
on the object at_annotations
with the current_labels_cfg
. Comes with the columns quarterbeats, duration_qb, mc, mn, mc_onset, mn_onset, timesig, staff, voice, volta, harmony_layer, label, offset_x, offset_y, regex_match- Parameters
interval_index – Pass True to replace the default
RangeIndex
by anIntervalIndex
.- Returns
DataFrame representing all Labels, i.e., all <Harmony> tags in the score.
- measures(interval_index: bool = False, unfold: bool = False) Optional[DataFrame] [source]#
DataFrame representing the Measures of the MuseScore file (which can be incomplete measures). Comes with the columns mc, mn, quarterbeats, duration_qb, keysig, timesig, act_dur, mc_offset, volta, numbering_offset, dont_count, barline, breaks, repeats, next
- Parameters
interval_index – Pass True to replace the default
RangeIndex
by anIntervalIndex
.- Returns
DataFrame representing the measures of the MuseScore file (which can be incomplete measures).
- offset_dict(all_endings: bool = False, unfold: bool = False, negative_anacrusis: bool = False) dict [source]#
{mc -> offset} dictionary measuring each MC’s distance from the piece’s beginning (0) in quarter notes.
- property metadata#
dict
Shortcut forMSCX.parsed.metadata
. Metadata from and about the MuseScore file.
- notes(interval_index: bool = False, unfold: bool = False) Optional[DataFrame] [source]#
DataFrame representing the Notes of the MuseScore file. Comes with the columns quarterbeats, duration_qb, mc, mn, mc_onset, mn_onset, timesig, staff, voice, duration, gracenote, tremolo, nominal_duration, scalar, tied, tpc, midi, volta, chord_id
- Parameters
interval_index – Pass True to replace the default
RangeIndex
by anIntervalIndex
.- Returns
DataFrame representing the Notes of the MuseScore file.
- notes_and_rests(interval_index: bool = False, unfold: bool = False) Optional[DataFrame] [source]#
DataFrame representing the Notes and Rests of the MuseScore file. Comes with the columns quarterbeats, duration_qb, mc, mn, mc_onset, mn_onset, timesig, staff, voice, duration, gracenote, tremolo, nominal_duration, scalar, tied, tpc, midi, volta, chord_id
- Parameters
interval_index – Pass True to replace the default
RangeIndex
by anIntervalIndex
.- Returns
DataFrame representing the Notes and Rests of the MuseScore file.
- property parsed: _MSCX_bs4#
_MSCX_bs4
Standard way of accessing the object exposed by the current parser.MSCX
uses this object’s interface for requesting manipulations of and information from the source XML.
- rests(interval_index: bool = False, unfold: bool = False) Optional[DataFrame] [source]#
DataFrame representing the Rests of the MuseScore file. Comes with the columns quarterbeats, duration_qb, mc, mn, mc_onset, mn_onset, timesig, staff, voice, duration, nominal_duration, scalar, volta
- Parameters
interval_index – Pass True to replace the default
RangeIndex
by anIntervalIndex
.- Returns
DataFrame representing the Rests of the MuseScore file.
- property staff_ids#
list
ofint
The staff IDs contained in the score, usually just a list of increasing numbers starting at 1.
- property style#
Style
Can be used like a dictionary to change the information within the score’s <Style> tag.
- add_labels(annotations_object)[source]#
Receives the labels from an
Annotations
object and adds them to the XML structure representing the MuseScore file that might be written to a file afterwards.- Parameters
annotations_object (
Annotations
) – Object of labels to be added.- Returns
Number of actually added labels.
- Return type
- change_label_color(mc, mc_onset, staff, voice, label, color_name=None, color_html=None, color_r=None, color_g=None, color_b=None, color_a=None)[source]#
Shortcut for :py:meth:
MSCX.parsed.change_label_color
- Parameters
mc (
int
) – Measure count of the labelmc_onset (
fractions.Fraction
) – Onset position to which the label is attached.staff (
int
) – Staff to which the label is attached.voice (
int
) – Notational layer to which the label is attached.label (
str
) – (Decoded) label.color_name (
str
, optional) – Two ways of specifying the color.color_html (
str
, optional) – Two ways of specifying the color.color_r (
int
orstr
, optional) – To specify a RGB color instead, pass at least, the first three.color_a
(alpha = opacity) defaults to 255.color_g (
int
orstr
, optional) – To specify a RGB color instead, pass at least, the first three.color_a
(alpha = opacity) defaults to 255.color_b (
int
orstr
, optional) – To specify a RGB color instead, pass at least, the first three.color_a
(alpha = opacity) defaults to 255.color_a (
int
orstr
, optional) – To specify a RGB color instead, pass at least, the first three.color_a
(alpha = opacity) defaults to 255.
- change_labels_cfg(labels_cfg={}, staff=None, voice=None, harmony_layer=None, positioning=None, decode=None, column_name=None, color_format=None)[source]#
Update
MSCX.labels_cfg
.- Parameters
labels_cfg (
dict
) – Using an entire dictionary or, to change only particular options, choose from:staff – Arguments as they will be passed to
get_labels()
voice – Arguments as they will be passed to
get_labels()
harmony_layer – Arguments as they will be passed to
get_labels()
positioning – Arguments as they will be passed to
get_labels()
decode – Arguments as they will be passed to
get_labels()
- color_non_chord_tones(df: DataFrame, color_name: str = 'red', chord_tone_cols: Collection[str] = ['chord_tones', 'added_tones'], color_nan: bool = True) DataFrame [source]#
Iterates backwards through the rows of the given DataFrame, interpreting each row as a score segment, and colors all notes that do not correspond to one of the tonal pitch classes (TPC) indicated in one of the tuples contained in the
chord_tone_cols
. The columns ‘mc’ and ‘mc_onset’ are taken to indicate each score segment’s start, which reaches to the subsequent one (the last segment reaching to the end of the score). Only notes whose onsets lie within the respective segment are colored, meaning that those whose durations reach into a segment are not taken into account.- Parameters
df – A DataFrame with the columns [‘mc’, ‘mc_onset’] +
chord_tone_cols
color_name – Name the color that the non-chord tones should get, defaults to ‘red’. Name can be a CSS color or a MuseScore color (see
utils.MS3_COLORS
).chord_tone_cols – Names of the columns containing tuples of chord tones, expressed as TPC.
color_nan – By default, if all of the
chord_tone_cols
contain a NaN value, all notes in the segment will be colored. Pass False to add the segment to the previous one instead.
- Returns
A coloring report which is the original
df
with the appended columns ‘n_colored’, ‘n_untouched’, ‘count_ratio’, ‘dur_colored’, ‘dur_untouched’, ‘dur_ratio’. They contain the counts and durations of the colored vs. untouched notes as well the ratio of each pair. Note that the report does not take into account notes that reach into a segment, nor does it correct the duration of notes that reach into the subsequent segment.
- delete_labels(df)[source]#
Delete a set of labels from the current XML.
- Parameters
df (
pandas.DataFrame
) – A DataFrame with the columns [‘mc’, ‘mc_onset’, ‘staff’, ‘voice’]
- replace_labels(annotations_object)[source]#
- Parameters
annotations_object (
Annotations
) – Object of labels to be added.
- get_chords(staff=None, voice=None, mode='auto', lyrics=False, staff_text=False, dynamics=False, articulation=False, spanners=False, **kwargs)[source]#
Retrieve a customized chord list, e.g. one including less of the processed features or additional, unprocessed ones compared to the standard chord list.
- Parameters
staff (
int
) – Get information from a particular staff only (1 = upper staff)voice (
int
) – Get information from a particular voice only (1 = only the first layer of every staff)mode ({'auto', 'all', 'strict'}, optional) –
‘auto’ (default), meaning that those aspects are automatically included that occur in the score; the resulting DataFrame has no empty columns except for those parameters that are set to True.
’all’: Columns for all aspects are created, even if they don’t occur in the score (e.g. lyrics).
’strict’: Create columns for exactly those parameters that are set to True, regardless which aspects occur in the score.
lyrics (
bool
, optional) – Include lyrics.staff_text (
bool
, optional) – Include staff text such as tempo markings.dynamics (
bool
, optional) – Include dynamic markings such as f or p.articulation (
bool
, optional) – Include articulation such as arpeggios.spanners (
bool
, optional) – Include spanners such as slurs, 8va lines, pedal lines etc.**kwargs (
bool
, optional) – Set a particular keyword to True in order to include all columns from the _events DataFrame whose names include that keyword. Column names include the tag names from the MSCX source code.
- Returns
DataFrame representing all <Chord> tags in the score with the selected features.
- Return type
- get_raw_labels()[source]#
Shortcut for
MSCX.parsed.get_raw_labels()
. Retrieve a “raw” list of labels, meaning that label types reflect only those defined within <Harmony> tags which can be 1 (MuseScore’s Roman Numeral display), 2 (Nashville) or undefined (in the case of ‘normal’ chord labels, defaulting to 0).- Returns
DataFrame with raw label features (i.e. as encoded in XML)
- Return type
- infer_mc(mn, mn_onset=0, volta=None)[source]#
Shortcut for
MSCX.parsed.infer_mc()
. Tries to convert a(mn, mn_onset)
into a(mc, mc_onset)
tuple on the basis of this MuseScore file. In other words, a human readable score position such as “measure number 32b (i.e., a second ending), beat 3” needs to be converted to(32, 1/2, 2)
if “beat” has length 1/4, or–if the meter is, say 9/8 and “beat” has a length of 3/8– to(32, 6/8, 2)
. The resulting(mc, mc_onset)
tuples are required for attaching a label to a score. This is only necessary for labels that were not originally extracted by ms3.- Parameters
mn (
int
orstr
) – Measure number as in a reference print edition.mn_onset (
fractions.Fraction
, optional) – Distance of the requested position from beat 1 of the complete measure (MN), expressed as fraction of a whole note. Defaults to 0, i.e. the position of beat 1.volta (
int
, optional) – In the case of first and second endings, which bear the same measure number, a MN might have to be disambiguated by passing 1 for first ending, 2 for second, and so on. Alternatively, the MN can be disambiguated traditionally by passing it as string with a letter attached. In other words,infer_mc(mn=32, volta=1)
is equivalent toinfer_mc(mn='32a')
.
- Returns
int
– Measure count (MC), denoting particular <Measure> tags in the score.
The Annotations class#
- class ms3.annotations.Annotations(tsv_path=None, df=None, cols={}, index_col=None, sep='\t', mscx_obj=None, infer_types=None, read_only=False, **logger_cfg)[source]#
Class for storing, converting and manipulating annotation labels.
- property harmony_layer_counts#
Returns the counts of the harmony_layers as dict.
- get_labels(staff=None, voice=None, harmony_layer=None, positioning=False, decode=True, drop=False, inverse=False, column_name=None, color_format=None, regex=None)[source]#
Returns a DataFrame of annotation labels.
- Parameters
staff (
int
, optional) – Select harmonies from a given staff only. Pass staff=1 for the upper staff.harmony_layer ({0, 1, 2, 3, 'dcml', ...}, optional) –
- If MuseScore’s harmony feature has been used, you can filter harmony types by passing
0 for unrecognized strings 1 for Roman Numeral Analysis 2 for Nashville Numbers 3 for encoded absolute chords ‘dcml’ for labels from the DCML harmonic annotation standard … self-defined types that have been added to self.regex_dict through the use of self.infer_types()
positioning (
bool
, optional) – Set to True if you want to include information about how labels have been manually positioned.decode (
bool
, optional) – Set to False if you want to keep labels in harmony_layer 0, 2, and 3 labels in their original form as encoded by MuseScore (e.g., with root and bass as TPC (tonal pitch class) where C = 14 for layer 0).drop (
bool
, optional) – Set to True to delete the returned labels from this object.column_name (
str
, optional) – Can be used to rename the columns holding the labels.color_format ({'html', 'rgb', 'rgba', 'name', None}) – If label colors are encoded, determine how they are displayed.
- expand_dcml(drop_others=True, warn_about_others=True, drop_empty_cols=False, chord_tones=True, relative_to_global=False, absolute=False, all_in_c=False, **kwargs)[source]#
Expands all labels where the regex_match has been inferred as ‘dcml’ and stores the DataFrame in self._expanded.
- Parameters
drop_others (
bool
, optional) – Set to False if you want to keep labels in the expanded DataFrame which have not regex_match ‘dcml’.warn_about_others (
bool
, optional) – Set to False to suppress warnings about labels that have not regex_match ‘dcml’. Is automatically set to False ifdrop_others
is set to False.drop_empty_cols (
bool
, optional) – Return without unused columnschord_tones (
bool
, optional) – Pass True if you want to add four columns that contain information about each label’s chord, added, root, and bass tones. The pitches are expressed as intervals relative to the respective chord’s local key or, ifrelative_to_global=True
, to the globalkey. The intervals are represented as integers that represent stacks of fifths over the tonic, such that 0 = tonic, 1 = dominant, -1 = subdominant, 2 = supertonic etc.relative_to_global (
bool
, optional) – Pass True if you want all labels expressed with respect to the global key. This levels and eliminates the features localkey and relativeroot.absolute (
bool
, optional) – Pass True if you want to transpose the relative chord_tones to the global key, which makes them absolute so they can be expressed as actual note names. This implies prior conversion of the chord_tones (but not of the labels) to the global tonic.all_in_c (
bool
, optional) – Pass True to transpose chord_tones to C major/minor. This performs the same transposition of chord tones as relative_to_global but without transposing the labels, too. This option clashes with absolute=True.kwargs – Additional arguments are passed to
get_labels()
to define the original representation.
- Returns
Expanded DCML labels
- Return type
The BeautifulSoup parser#
- class ms3.bs4_parser._MSCX_bs4(mscx_src, read_only=False, logger_cfg={})[source]#
This sister class implements
MSCX
’s methods for a score parsed with beautifulsoup4.- measure_nodes#
{staff -> {MC -> tag} }
- tags#
{MC -> {staff -> {voice -> tag} } }
- staff2drum_map: Dict[int, pd.DataFrame]#
For each stuff that is to be treated as drumset score, keep a mapping from MIDI pitch (DataFrame index) to note and instrument features. The columns typically include [‘head’, ‘line’, ‘voice’, ‘name’, ‘stem’, ‘shortcut’]. When creating note tables, the ‘name’ column will be populated with the names here rather than note names.
- parse_mscx() None [source]#
Load the XML structure from the score in self.mscx_src and store references to staves and measures.
- parse_measures()[source]#
Converts the score into the three DataFrame self._measures, self._events, and self._notes
- _make_measure_list(sections=True, secure=True, reset_index=True)[source]#
Regenerate the measure list from the parsed score with advanced options.
- chords(mode: Literal['auto', 'strict'] = 'auto', interval_index: bool = False, unfold: bool = False) Optional[DataFrame] [source]#
DataFrame of Chords representing all <Chord> tags contained in the MuseScore file (all <note> tags come within one) and attached score information and performance maerks, e.g. lyrics, dynamics, articulations, slurs (see the explanation for the
mode
parameter for more details). Comes with the columns quarterbeats, duration_qb, mc, mn, mc_onset, mn_onset, timesig, staff, voice, duration, gracenote, tremolo, nominal_duration, scalar, volta, chord_id, dynamics, articulation, staff_text, slur, Ottava:8va, Ottava:8vb, pedal, TextLine, decrescendo_hairpin, diminuendo_line, crescendo_line, crescendo_hairpin, tempo, qpm, lyrics:1, Ottava:15mb- Parameters
mode – Defaults to ‘auto’, meaning that additional performance markers available in the score are to be included, namely lyrics, dynamics, fermatas, articulations, slurs, staff_text, system_text, tempo, and spanners (e.g. slurs, 8va lines, pedal lines). This results in NaN values in the column ‘chord_id’ for those markers that are not part of a <Chord> tag, e.g. <Dynamic>, <StaffText>, or <Tempo>. To prevent that, pass ‘strict’, meaning that only <Chords> are included, i.e. the column ‘chord_id’ will have no empty values.
interval_index – Pass True to replace the default
RangeIndex
by anIntervalIndex
.
- Returns
DataFrame of Chords representing all <Chord> tags contained in the MuseScore file.
- cl(recompute: bool = False) DataFrame [source]#
Get the raw Chords without adding quarterbeat columns.
- events(interval_index: bool = False, unfold: bool = False) Optional[DataFrame] [source]#
DataFrame representing a raw skeleton of the score’s XML structure and contains all Sphinx core events contained in it. It is the original tabular representation of the MuseScore file’s source code from which all other tables, except
measures
are generated.- Parameters
interval_index – Pass True to replace the default
RangeIndex
by anIntervalIndex
.- Returns
DataFrame containing the original tabular representation of all Sphinx core events encoded in the MuseScore file.
- form_labels(detection_regex: Optional[str] = None, exclude_harmony_layer: bool = False, interval_index: bool = False, unfold: bool = False) Optional[DataFrame] [source]#
DataFrame representing form labels (or other) that have been encoded as <StaffText>s rather than in the <Harmony> layer. This function essentially filters all StaffTexts matching the
detection_regex
and adds the standard position columns.- Parameters
detection_regex – By default, detects all labels starting with one or two digits followed by a column (see
the regex
). Pass another regex to retrieve only StaffTexts matching this one.exclude_harmony_layer – By default, form labels are detected even if they have been encoded as Harmony labels (rather than as StaffText). Pass True in order to retrieve only StaffText form labels.
interval_index – Pass True to replace the default
RangeIndex
by anIntervalIndex
.
- Returns
DataFrame containing all StaffTexts matching the
detection_regex
- fl(detection_regex: Optional[str] = None, exclude_harmony_layer=False) DataFrame [source]#
Get the raw Form labels (or other) that match the
detection_regex
, but without adding quarterbeat columns.- {ref}`$1`
- detection_regex:
By default, detects all labels starting with one or two digits followed by a column (see
the regex
). Pass another regex to retrieve only StaffTexts matching this one.
- Returns
DataFrame containing all StaffTexts matching the
detection_regex
or None
- property has_voltas: bool#
Return True if the score includes first and second endings. Otherwise, no ‘volta’ columns will be added to facets.
- measures(interval_index: bool = False, unfold: bool = False) Optional[DataFrame] [source]#
DataFrame representing the Measures of the MuseScore file (which can be incomplete measures). Comes with the columns mc, mn, quarterbeats, duration_qb, keysig, timesig, act_dur, mc_offset, volta, numbering_offset, dont_count, barline, breaks, repeats, next
- Parameters
interval_index – Pass True to replace the default
RangeIndex
by anIntervalIndex
.- Returns
DataFrame representing the measures of the MuseScore file (which can be incomplete measures).
- ml(recompute: bool = False) DataFrame [source]#
Get the raw Measures without adding quarterbeat columns.
- Parameters
recompute – By default, the measures are cached. Pass True to enforce recomputing anew.
- notes(interval_index: bool = False, unfold: bool = False) Optional[DataFrame] [source]#
DataFrame representing the Notes of the MuseScore file. Comes with the columns quarterbeats, duration_qb, mc, mn, mc_onset, mn_onset, timesig, staff, voice, duration, gracenote, tremolo, nominal_duration, scalar, tied, tpc, midi, volta, chord_id
- Parameters
interval_index – Pass True to replace the default
RangeIndex
by anIntervalIndex
.- Returns
DataFrame representing the Notes of the MuseScore file.
- nl(recompute: bool = False) DataFrame [source]#
Get the raw Notes without adding quarterbeat columns.
- Parameters
recompute – By default, the notes are cached. Pass True to enforce recomputing anew.
- notes_and_rests(interval_index: bool = False, unfold: bool = False) Optional[DataFrame] [source]#
DataFrame representing the Notes and Rests of the MuseScore file. Comes with the columns quarterbeats, duration_qb, mc, mn, mc_onset, mn_onset, timesig, staff, voice, duration, gracenote, tremolo, nominal_duration, scalar, tied, tpc, midi, volta, chord_id
- Parameters
interval_index – Pass True to replace the default
RangeIndex
by anIntervalIndex
.- Returns
DataFrame representing the Notes and Rests of the MuseScore file.
- nrl(recompute: bool = False) DataFrame [source]#
Get the raw Notes and Rests without adding quarterbeat columns.
- Parameters
recompute – By default, the measures are cached. Pass True to enforce recomputing anew.
- offset_dict(all_endings: bool = False, unfold: bool = False, negative_anacrusis: bool = False) dict [source]#
Dictionary mapping MCs (measure counts) to their quarterbeat offset from the piece’s beginning. Used for computing quarterbeats for other facets.
- Parameters
all_endings – Uses the column ‘quarterbeats_all_endings’ of the measures table if it has one, otherwise falls back to the default ‘quarterbeats’.
- Returns
{MC -> quarterbeat_offset}. Offsets are Fractions. If
all_endings
is not set toTrue
, values for MCs that are part of a first ending (or third or larger) are NA.
- rests(interval_index: bool = False, unfold: bool = False) Optional[DataFrame] [source]#
DataFrame representing the Rests of the MuseScore file. Comes with the columns quarterbeats, duration_qb, mc, mn, mc_onset, mn_onset, timesig, staff, voice, duration, nominal_duration, scalar, volta
- Parameters
interval_index – Pass True to replace the default
RangeIndex
by anIntervalIndex
.- Returns
DataFrame representing the Rests of the MuseScore file.
- rl(recompute: bool = False) DataFrame [source]#
Get the raw Rests without adding quarterbeat columns.
- Parameters
recompute – By default, the measures are cached. Pass True to enforce recomputing anew.
- get_chords(staff: Optional[int] = None, voice: Optional[Literal[1, 2, 3, 4]] = None, mode: Literal['auto', 'strict'] = 'auto', lyrics: bool = False, dynamics: bool = False, articulation: bool = False, staff_text: bool = False, system_text: bool = False, tempo: bool = False, spanners: bool = False, thoroughbass: bool = False, **kwargs) DataFrame [source]#
Retrieve a customized chord lists, e.g. one including less of the processed features or additional, unprocessed ones.
- Parameters
staff – Get information from a particular staff only (1 = upper staff)
voice – Get information from a particular voice only (1 = only the first layer of every staff)
mode –
Defaults to ‘auto’, meaning that those aspects are automatically included that occur in the score; the resulting DataFrame has no empty columns except for those parameters that are set to True.’strict’: Create columns for exactly those parameters that are set to True, regardless whether they occur in the score or not (in which case the column will be empty).lyrics – Include lyrics.
dynamics – Include dynamic markings such as f or p.
articulation – Include articulation such as arpeggios.
staff_text – Include expression text such as ‘dolce’ and free-hand staff text such as ‘div.’.
system_text – Include system text such as movement titles.
tempo – Include tempo markings.
spanners – Include spanners such as slurs, 8va lines, pedal lines etc.
thoroughbass – Include thoroughbass figures’ levels and durations.
**kwargs –
- Returns
DataFrame representing all <Chord> tags in the score with the selected features.
- get_texts(only_header: bool = True) Dict[str, str] [source]#
Process <Text> nodes (normally attached to <Staff id=”1”>).
- add_standard_cols(df: DataFrame) DataFrame [source]#
Ensures that the DataFrame’s first columns are [‘mc’, ‘mn’, (‘volta’), ‘timesig’, ‘mc_offset’]
- delete_label(mc, staff, voice, mc_onset, empty_only=False)[source]#
Delete a label from a particular position (if there is one).
- Parameters
mc (
int
) – Measure count.staff – Notational layer in which to delete the label.
voice – Notational layer in which to delete the label.
mc_onset (
fractions.Fraction
) – mc_onsetempty_only (
bool
, optional) – Set to True if you want to delete only empty harmonies. Since normally all labels at the defined position are deleted, this flag is needed to prevent deleting non-empty <Harmony> tags.
- Returns
Whether a label was deleted or not.
- Return type
- add_label(label, mc, mc_onset, staff=1, voice=1, **kwargs)[source]#
Adds a single label to the current XML in form of a new <Harmony> (and maybe also <location>) tag.
- Parameters
label –
mc –
mc_onset –
staff –
voice –
kwargs –
- change_label_color(mc, mc_onset, staff, voice, label, color_name=None, color_html=None, color_r=None, color_g=None, color_b=None, color_a=None)[source]#
Change the color of an existing label.
- Parameters
mc (
int
) – Measure count of the labelmc_onset (
fractions.Fraction
) – Onset position to which the label is attached.staff (
int
) – Staff to which the label is attached.voice (
int
) – Notational layer to which the label is attached.label (
str
) – (Decoded) label.color_name (
str
, optional) – Two ways of specifying the color.color_html (
str
, optional) – Two ways of specifying the color.color_r (
int
orstr
, optional) – To specify a RGB color instead, pass at least, the first three.color_a
(alpha = opacity) defaults to 255.color_g (
int
orstr
, optional) – To specify a RGB color instead, pass at least, the first three.color_a
(alpha = opacity) defaults to 255.color_b (
int
orstr
, optional) – To specify a RGB color instead, pass at least, the first three.color_a
(alpha = opacity) defaults to 255.color_a (
int
orstr
, optional) – To specify a RGB color instead, pass at least, the first three.color_a
(alpha = opacity) defaults to 255.
- color_notes(from_mc: int, from_mc_onset: Fraction, to_mc: Optional[int] = None, to_mc_onset: Optional[Fraction] = None, midi: List[int] = [], tpc: List[int] = [], inverse: bool = False, color_name: Optional[str] = None, color_html: Optional[str] = None, color_r: Optional[int] = None, color_g: Optional[int] = None, color_b: Optional[int] = None, color_a: Optional[int] = None) Tuple[List[Fraction], List[Fraction]] [source]#
Colors all notes occurring in a particular score segment in one particular color, or only those (not) pertaining to a collection of MIDI pitches or Tonal Pitch Classes (TPC).
- Parameters
from_mc – MC in which the score segment starts.
from_mc_onset – mc_onset where the score segment starts.
to_mc – MC in which the score segment ends. If not specified, the segment ends at the end of the score.
to_mc_onset – If
to_mc
is defined, the mc_onset where the score segment ends.midi – Collection of MIDI numbers to use as a filter or an inverse filter (depending on
inverse
).tpc – Collection of Tonal Pitch Classes (C=0, G=1, F=-1 etc.) to use as a filter or an inverse filter (depending on
inverse
).inverse – By default, only notes where all specified filters (midi and/or tpc) apply are colored. Set to True to color only those notes where none of the specified filters match.
color_name – Specify the color either as a name, or as HTML color, or as RGB(A). Name can be a CSS color or a MuseScore color (see
utils.MS3_COLORS
).color_html – Specify the color either as a name, or as HTML color, or as RGB(A). An HTML color needs to be string of length 6.
color_r – If you specify the color as RGB(A), you also need to specify color_g and color_b.
color_g – If you specify the color as RGB(A), you also need to specify color_r and color_b.
color_b – If you specify the color as RGB(A), you also need to specify color_r and color_g.
color_a – If you have specified an RGB color, the alpha value defaults to 255 unless specified otherwise.
- Returns
List of durations (in fractions) of all notes that have been colored. List of durations (in fractions) of all notes that have not been colored.
- class ms3.bs4_parser.Metatags(soup)[source]#
Easy way to read and write any style information in a parsed MSCX score.
- class ms3.bs4_parser.Style(soup)[source]#
Easy way to read and write any style information in a parsed MSCX score.
- class ms3.bs4_parser.Prelims(soup: BeautifulSoup, **logger_cfg)[source]#
Easy way to read and write the preliminaries of a score, that is Title, Subtitle, Composer, Lyricist, and ‘Instrument Name (Part)’.
- ms3.bs4_parser.get_duration_event(elements)[source]#
Receives a list of dicts representing the events for a given mc_onset and returns the index and name of the first event that has a duration, so either a Chord or a Rest.
- ms3.bs4_parser.get_part_info(part_tag)[source]#
Instrument names come in different forms in different places. This function extracts the information from a <Part> tag and returns it as a dictionary.
- ms3.bs4_parser.make_spanner_cols(df: DataFrame, spanner_types: Optional[Collection[str]] = None) DataFrame [source]#
- From a raw chord list as returned by
get_chords(spanners=True)
create a DataFrame with Spanner IDs for all chords for all spanner types they are associated with.
- Parameters
spanner_types (
collection
) – If this parameter is passed, only the enlisted spanner types [‘Slur’, ‘HairPin’, ‘Pedal’, ‘Ottava’] are included.
- From a raw chord list as returned by
- ms3.bs4_parser.recurse_node(node, prepend=None, exclude_children=None)[source]#
The heart of the XML -> DataFrame conversion. Changes may have ample repercussions!
- Returns
Keys are combinations of tag (& attribute) names, values are value strings.
- Return type
- ms3.bs4_parser.make_oneliner(node)[source]#
Pass a tag of which the layout does not spread over several lines.
- ms3.bs4_parser.format_node(node, indent)[source]#
Recursively format Beautifulsoup tag as in an MSCX file.
- ms3.bs4_parser.bs4_to_mscx(soup)[source]#
Turn the BeautifulSoup into a string representing an MSCX file
- ms3.bs4_parser.text_tag2str(tag: Tag) str [source]#
Transforms a <text> tag into a string that potentially includes written-out HTML tags.
- ms3.bs4_parser.tag2text(tag: Tag) Tuple[str, str] [source]#
Takes the <Text> from a MuseScore file’s header and returns its style and string.
- ms3.bs4_parser.get_thoroughbass_symbols(item_tag: Tag) Tuple[str, str] [source]#
Returns the prefix and suffix of a <FiguredBassItem> tag if present, empty strings otherwise.
The expand_dcml module#
This is the same code as in the corpora repo as copied on September 24, 2020 and then adapted.
- class ms3.expand_dcml.SliceMaker[source]#
This class serves for storing slice notation such as
:3
as a variable or passing it as function argument.Examples
SM = SliceMaker() some_function( slice_this, SM[3:8] ) select_all = SM[:] df.loc[select_all]
- ms3.expand_dcml.expand_labels(df, column='label', regex=None, rename={}, dropna=False, propagate=True, volta_structure=None, relative_to_global=False, chord_tones=True, absolute=False, all_in_c=False, skip_checks=False)[source]#
Split harmony labels complying with the DCML syntax into columns holding their various features and allows for additional computations and transformations.
Uses:
compute_chord_tones()
,features2type()
,labels2global_tonic()
,propagate_keys()
,propagate_pedal()
,replace_special()
,roman_numeral2fifths()
,split_alternatives()
,split_labels()
,transform()
,transpose()
- Parameters
df (
pandas.DataFrame
) – Dataframe where one column contains DCML chord labels.column (
str
) – Name of the column that holds the harmony labels.regex (
re.Pattern
) – Compiled regular expression used to split the labels. It needs to have named groups. The group names are used as column names unless replaced bycols
.rename (
dict
, optional) – Dictionary to map the regex’s group names to deviating column names of your choice.dropna (
bool
, optional) – Pass True if you want to drop rows wherecolumn
is NaN/<NA>propagate (
bool
, optional) – By default, information about global and local keys and about pedal points is spread throughout the DataFrame. Pass False if you only want to split the labels into their features. This ignores all following parameters because their expansions depend on information about keys.volta_structure (
dict
, optional) – {first_mc -> {volta_number -> [mc1, mc2…]} } dictionary as you can get it fromScore.mscx.volta_structure
. This allows for correct propagation into second and other voltas.relative_to_global (
bool
, optional) – Pass True if you want all labels expressed with respect to the global key. This levels and eliminates the features localkey and relativeroot.chord_tones (
bool
, optional) – Pass True if you want to add four columns that contain information about each label’s chord, added, root, and bass tones. The pitches are expressed as intervals relative to the respective chord’s local key or, ifrelative_to_global=True
, to the globalkey. The intervals are represented as integers that represent stacks of fifths over the tonic, such that 0 = tonic, 1 = dominant, -1 = subdominant, 2 = supertonic etc.absolute (
bool
, optional) – Pass True if you want to transpose the relative chord_tones to the global key, which makes them absolute so they can be expressed as actual note names. This implies prior conversion of the chord_tones (but not of the labels) to the global tonic.all_in_c (
bool
, optional) – Pass True to transpose chord_tones to C major/minor. This performs the same transposition of chord tones as relative_to_global but without transposing the labels, too. This option clashes with absolute=True.
- Returns
Original DataFrame plus additional columns with split features.
- Return type
- ms3.expand_dcml.extract_features_from_labels(S, regex=None)[source]#
Applies .str.extract(regex) on the Series and returns a DataFrame with all named capturing groups.
- ms3.expand_dcml.split_labels(df, label_column='label', regex=None, rename={}, dropna=False, inplace=False, skip_checks=False, **kwargs)[source]#
Split harmony labels complying with the DCML syntax into columns holding their various features.
- Parameters
df (
pandas.DataFrame
) – Dataframe where one column contains DCML chord labels.label_column (
str
) – Name of the column that holds the harmony labels.regex (
re.Pattern
) – Compiled regular expression used to split the labels. It needs to have named groups. The group names are used as column names unless replaced by cols.rename (
dict
) – Dictionary to map the regex’s group names to deviating column names.dropna (
bool
, optional) – Pass True if you want to drop rows wherecolumn
is NaN/<NA>inplace (
bool
, optional) – Pass True if you want to mutatedf
.
- ms3.expand_dcml.features2type(numeral, form=None, figbass=None)[source]#
Turns a combination of the three chord features into a chord type.
- Returns
‘M’ (Major triad)
’m’ (Minor triad)
’o’ (Diminished triad)
’+’ (Augmented triad)
’mm7’ (Minor seventh chord)
’Mm7’ (Dominant seventh chord)
’MM7’ (Major seventh chord)
’mM7’ (Minor major seventh chord)
’o7’ (Diminished seventh chord)
’%7’ (Half-diminished seventh chord)
’+7’ (Augmented (minor) seventh chord)
’+M7’ (Augmented major seventh chord)
- ms3.expand_dcml.replace_special(df, regex, merge=False, inplace=False, cols={}, special_map={})[source]#
- Move special symbols in the numeral column to a separate column and replace them by the explicit chords they stand for.In particular, this function replaces the symbols It, Ger, and Fr.
Uses:
merge_changes()
- Parameters
df (
pandas.DataFrame
) – Dataframe containing DCML chord labels that have been split by split_labels().regex (
re.Pattern
) – Compiled regular expression used to split the labels replacing the special symbols.It needs to have named groups. The group names are used as column names unless replaced by cols.merge (
bool
, optional) – False: By default, existing values, except figbass, are overwritten. True: Merge existing with new values (for changes and relativeroot).cols (
dict
, optional) –The special symbols appear in the column numeral and are moved to the column special. In case the column names for
['numeral','form', 'figbass', 'changes', 'relativeroot', 'special']
deviate, pass a dict, such as{'numeral': 'numeral_col_name', 'form': 'form_col_name 'figbass': 'figbass_col_name', 'changes': 'changes_col_name', 'relativeroot': 'relativeroot_col_name', 'special': 'special_col_name'}
special_map (
dict
, optional) – In case you want to add or alter special symbols to be replaced, pass a replacement map, e.g. {‘N’: ‘bII6’}. The column ‘figbass’ is only altered if it’s None to allow for inversions of special chords.inplace (
bool
, optional) – Pass True if you want to mutatedf
.
- ms3.expand_dcml.merge_changes(left, right, *args)[source]#
Merge two changes into one, e.g. b3 and +#7 to +#7b3.
Uses:
changes2list()
- ms3.expand_dcml.propagate_keys(df, volta_structure=None, globalkey='globalkey', localkey='localkey', add_bool=True)[source]#
- Propagate information about global keys and local keys throughout the dataframe.Pass split harmonies for one piece at a time. For concatenated pieces, use apply().
Uses:
series_is_minor()
- Parameters
df (
pandas.DataFrame
) – Dataframe containing DCML chord labels that have been split by split_labels().volta_structure (
dict
, optional) – {first_mc -> {volta_number -> [mc1, mc2…]} } dictionary as you can get it fromScore.mscx.volta_structure
. This allows for correct propagation into second and other voltas.globalkey (
str
, optional) – In case you renamed the columns, pass column names.localkey (
str
, optional) – In case you renamed the columns, pass column names.add_bool (
bool
, optional) – Pass True if you want to add two boolean columns which are true if the respective key is a minor key.
- ms3.expand_dcml.propagate_pedal(df, relative=True, drop_pedalend=True, cols={})[source]#
Propagate the pedal note for all chords within square brackets. By default, the note is expressed in relation to each label’s localkey.
Uses:
rel2abs_key()
,abs2rel_key()
- Parameters
df (
pandas.DataFrame
) – Dataframe containing DCML chord labels that have been split by split_labels() and where the keys have been propagated using propagate_keys().relative (
bool
, optional) – Pass False if you want the pedal note to stay the same even if the localkey changes.drop_pedalend (
bool
, optional) – Pass False if you don’t want the column with the ending brackets to be dropped.cols (
dict
, optional) –In case the column names for
['pedal','pedalend', 'globalkey', 'localkey']
deviate, pass a dict, such as{'pedal': 'pedal_col_name', 'pedalend': 'pedalend_col_name', 'globalkey': 'globalkey_col_name', 'localkey': 'localkey_col_name'}
Utils#
- ms3.utils.COMPUTED_METADATA_COLUMNS = ['TimeSig', 'KeySig', 'last_mc', 'last_mn', 'length_qb', 'last_mc_unfolded', 'last_mn_unfolded', 'length_qb_unfolded', 'volta_mcs', 'all_notes_qb', 'n_onsets', 'n_onset_positions', 'guitar_chord_count', 'form_label_count', 'label_count', 'annotated_key']#
Automatically computed columns
- ms3.utils.DCML_METADATA_COLUMNS = ['harmony_version', 'annotators', 'reviewers', 'score_integrity', 'composed_start', 'composed_end', 'composed_source']#
Arbitrary column names used in the DCML corpus initiative
- ms3.utils.MUSESCORE_METADATA_FIELDS = ['composer', 'workTitle', 'movementNumber', 'movementTitle', 'workNumber', 'poet', 'lyricist', 'arranger', 'copyright', 'creationDate', 'mscVersion', 'platform', 'source', 'translator']#
Default fields available in the File -> Score Properties… menu.
- ms3.utils.VERSION_COLUMNS = ['musescore', 'ms3_version']#
Software versions
- ms3.utils.MUSESCORE_HEADER_FIELDS = ['title_text', 'subtitle_text', 'lyricist_text', 'composer_text', 'part_name_text']#
Default text fields in MuseScore
- ms3.utils.AUTOMATIC_COLUMNS = ['TimeSig', 'KeySig', 'last_mc', 'last_mn', 'length_qb', 'last_mc_unfolded', 'last_mn_unfolded', 'length_qb_unfolded', 'volta_mcs', 'all_notes_qb', 'n_onsets', 'n_onset_positions', 'guitar_chord_count', 'form_label_count', 'label_count', 'annotated_key', 'musescore', 'ms3_version', 'has_drumset', 'ambitus', 'subdirectory', 'rel_path']#
This combination of column names is excluded when updating metadata fields in MuseScore files via ms3 metadata.
- ms3.utils.METADATA_COLUMN_ORDER = ['fname', 'TimeSig', 'KeySig', 'last_mc', 'last_mn', 'length_qb', 'last_mc_unfolded', 'last_mn_unfolded', 'length_qb_unfolded', 'volta_mcs', 'all_notes_qb', 'n_onsets', 'n_onset_positions', 'guitar_chord_count', 'form_label_count', 'label_count', 'annotated_key', 'harmony_version', 'annotators', 'reviewers', 'score_integrity', 'composed_start', 'composed_end', 'composed_source', 'composer', 'workTitle', 'movementNumber', 'movementTitle', 'workNumber', 'poet', 'lyricist', 'arranger', 'copyright', 'creationDate', 'mscVersion', 'platform', 'source', 'translator', 'title_text', 'subtitle_text', 'lyricist_text', 'composer_text', 'part_name_text', 'musescore', 'ms3_version', 'subdirectory', 'rel_path', 'has_drumset', 'ambitus']#
The default order in which columns of metadata.tsv files are to be sorted.
- ms3.utils.STANDARD_NAMES = ['notes_and_rests', 'rests', 'notes', 'measures', 'events', 'labels', 'chords', 'expanded', 'harmonies', 'cadences', 'form_labels', 'MS3', 'scores']#
list
Indicators for corpora: If a folder contains any file or folder beginning or ending on any of these names, it is considered to be a corpus by the functioniterate_corpora()
.
- ms3.utils.DCML_REGEX = re.compile('\n^(\\.?\n ((?P<globalkey>[a-gA-G](b*|\\#*))\\.)?\n ((?P<localkey>((b*|\\#*)(VII|VI|V|IV|III|II|I|vii|vi|v|iv|iii|ii|i)/?)+)\\.)?\n ((?P<pedal>((b*|\\#*)(VII|VI|V|IV|III|II|I|vii|vi|v|iv|iii, re.VERBOSE)#
str
Constant with a regular expression that recognizes labels conforming to the DCML harmony annotation standard excluding those consisting of two alternatives.
- ms3.utils.DCML_DOUBLE_REGEX = re.compile('\n ^(?P<first>\n (\\.?\n ((?P<globalkey>[a-gA-G](b*|\\#*))\\.)?\n , re.VERBOSE)#
str
Constant with a regular expression that recognizes complete labels conforming to the DCML harmony annotation standard including those consisting of two alternatives, without having to split them. It is simply a doubled version of DCML_REGEX.
- ms3.utils.FORM_DETECTION_REGEX = '^\\d{1,2}.*?:'#
str
Following Gotham & Ireland (@ISMIR 20 (2019): “Taking Form: A Representation Standard, Conversion Code, and Example Corpus for Recording, Visualizing, and Studying Analyses of Musical Form”), detects form labels as those strings that start with indicating a hierarchical level (one or two digits) followed by a colon. By extension (Llorens et al., forthcoming), allows one or more ‘i’ characters or any other alphabet character to further specify the level.
- ms3.utils.rgba#
alias of
RGBA
- class ms3.utils.map_dict[source]#
Such a dictionary can be mapped to a Series to replace its values but leaving the values absent from the dict keys intact.
- ms3.utils.assert_all_lines_equal(before, after, original, tmp_file)[source]#
Compares two multiline strings to test equality.
- ms3.utils.assert_dfs_equal(old, new, exclude=[])[source]#
Compares the common columns of two DataFrames to test equality. Uses: nan_eq()
- ms3.utils.ambitus2oneliner(ambitus)[source]#
Turns a
metadata['parts'][staff_id]
dictionary into a string.
- ms3.utils.changes2list(changes, sort=True)[source]#
Splits a string of changes into a list of 4-tuples.
Example
>>> changes2list('+#7b5') [('+#7', '+', '#', '7'), ('b5', '', 'b', '5')]
- ms3.utils.changes2tpc(changes, numeral, minor=False, root_alterations=False)[source]#
Given a numeral and changes, computes the intervals that the changes represent. Changes do not express absolute intervals but instead depend on the numeral and the mode.
Uses: split_scale_degree(), changes2list()
- Parameters
changes (
str
) – A string of changes following the DCML harmony standard.numeral (
str
) – Roman numeral. If it is preceded by accidentals, it depends on the parameter root_alterations whether these are taken into account.minor (
bool
, optional) – Set to true if the numeral occurs in a minor context.root_alterations (
bool
, optional) – Set to True if accidentals of the root should change the result.
- ms3.utils.check_labels(df, regex, column='label', split_regex=None, return_cols=['mc', 'mc_onset', 'staff', 'voice'])[source]#
Checks the labels in
column
againstregex
and returns those that don’t match.- Parameters
df (
pandas.DataFrame
) – DataFrame containing a column with labels.regex (
str
) – Regular expression that incorrect labels don’t match.column (
str
, optional) – Column name where the labels are. Defaults to ‘label’split_regex (
str
, optional) – If you pass a regular expression (or simple string), it will be used to split the labels before checking the resulting column separately. Instead, pass True to use the default (a ‘-’ that does not precede a scale degree).return_cols (
list
, optional) – Pass a list of the DataFrame columns that you want to be displayed for the wrong labels.
- Returns
df – DataFrame with wrong labels.
- Return type
- ms3.utils.color_name2format(n, format='rgb')[source]#
Converts a single CSS3 name into one of ‘HTML’, ‘rgb’, or ‘rgba’
- ms3.utils.color_params2rgba(color_name=None, color_html=None, color_r=None, color_g=None, color_b=None, color_a=None)[source]#
For functions where the color can be specified in four different ways (HTML string, CSS name, RGB, or RGBA), convert the given parameters to RGBA.
- Parameters
color_name (
str
, optional) – As a name you can use CSS colors or MuseScore colors (seeMS3_COLORS
).color_html (
str
, optional) – An HTML color needs to be string of length 6.color_r (
int
, optional) – If you specify the color as RGB(A), you also need to specify color_g and color_b.color_g (
int
, optional) – If you specify the color as RGB(A), you also need to specify color_r and color_b.color_b (
int
, optional) – If you specify the color as RGB(A), you also need to specify color_r and color_g.color_a (
int
, optional) – If you have specified an RGB color, the alpha value defaults to 255 unless specified otherwise.
- Returns
namedtuple
with four integers.- Return type
- ms3.utils.commonprefix(paths, sep='/')[source]#
Returns common prefix of a list of paths. Uses: allnamesequal(), itertools.takewhile()
- ms3.utils.compute_mn(measures: DataFrame) Series [source]#
Compute measure number integers from a measures table.
- Parameters
measures – Measures table with columns [‘mc’, ‘dont_count’, ‘numbering_offset’].
Returns:
- ms3.utils.compute_mn_playthrough(measures: DataFrame) Series [source]#
Compute measure number strings from an unfolded measures table, such that the first occurrence of a measure number ends on ‘a’, the second one on ‘b’ etc.
The function requires the column ‘dont_count’ in order to correctly number the return of a completing MC after an incomplete MC with “endrepeat” sign. For example, if a repeated section begins with an upbeat that at first completes MN 16 it will have mn_playthrough ‘16a’ the first time and ‘32a’ the second time (assuming it completes the incomplete MN 32).
- Parameters
measures – Measures table with columns [‘mc’, ‘mn’, ‘dont_count’]
- Returns
‘mn_playthrough’ Series of disambiguated measure number strings. If no measure repeats, the result will be equivalent to converting column ‘mn’ to strings and appending ‘a’ to all of them.
- ms3.utils.convert_folder(directory=None, file_paths=None, target_dir=None, extensions=[], target_extension='mscx', regex='.*', suffix=None, recursive=True, ms='mscore', overwrite=False, parallel=False)[source]#
Convert all files in dir that have one of the extensions to .mscx format using the executable MS.
- Parameters
directory (
str
) – Directory in which to look for files to convert.file_paths (
list
of dir) – List of file paths to convert. These are not filtered by any means.target_dir (
str
) – Directory where to store converted files. Defaults todirectory
extensions (list, optional) – If you want to convert only certain formats, give those, e.g. [‘mscz’, ‘xml’]
recursive (bool, optional) – Subdirectories as well.
MS (str, optional) – Give the path to the MuseScore executable on your system. Need only if the command ‘mscore’ does not execute MuseScore on your system.
- ms3.utils.decode_harmonies(df, label_col='label', keep_layer=True, return_series=False, alt_cols='alt_label', alt_separator='-')[source]#
MuseScore stores types 2 (Nashville) and 3 (absolute chords) in several columns. This function returns a copy of the DataFrame
Annotations.df
where the label column contains the strings corresponding to these columns.- Parameters
df (
pandas.DataFrame
) – DataFrame with encoded harmony labels as stored in anAnnotations
object.label_col (
str
, optional) – Column name where the main components (<name> tag) are stored, defaults to ‘label’keep_layer (
bool
, optional) – Defaults to True, retaining the ‘harmony_layer’ column with original layers.return_series (
bool
, optional) – If set to True, only the decoded labels column is returned as a Series rather than a copy ofdf
.alt_cols (
str
orlist
, optional) – Column(s) with alternative labels that are joined with the label columns usingalt_separator
. Defaults to ‘alt_label’. Suppress by passing None.alt_separator (
str
, optional) – Separator for joiningalt_cols
.
- Returns
Decoded harmony labels.
- Return type
- ms3.utils.df2md(df: DataFrame, name: str = 'Overview') MarkdownTableWriter [source]#
Turns a DataFrame into a MarkDown table. The returned writer can be converted into a string.
- ms3.utils.resolve_form_abbreviations(token: str, abbreviations: dict, mc: Optional[Union[int, str]] = None, fallback_to_lowercase: bool = True) str [source]#
Checks for each consecutive substring of the token if it matches one of the given abbreviations and replaces it with the corresponding long name. Trailing numbers are separated by a space in this case.
- Parameters
token – Individual token after splitting alternative readings.
abbreviations – {abbreviation -> long name} dict for string replacement.
fallback_to_lowercase – By default, the substrings are checked against the dictionary keys and, if unsuccessful, again in lowercase. Pass False to use only the original string.
Returns:
- ms3.utils.distribute_tokens_over_levels(levels: Collection[str], tokens: Collection[str], mc: Optional[Union[int, str]] = None, abbreviations: dict = {}) Dict[Tuple[str, str], str] [source]#
Takes the regex matches of one label and turns them into as many {layer -> token} pairs as the label contains tokens.
- Parameters
levels – Collection of strings indicating analytical layers.
tokens – Collection of tokens coming along, same size as levels.
mc – Pass the label’s label’s MC to display it in error messages.
abbreviations – {abbrevation -> long name} mapping abbreviations to what they are to be replaced with
- Returns
A {(form_tree, level) -> token} dict where form_tree is either ‘’ or a letter between a-h identifying one of several trees annotated in parallel.
- ms3.utils.expand_single_form_label(label: str, default_abbreviations=True, **kwargs) Dict[Tuple[str, str], str] [source]#
Splits a form label and applies distribute_tokens_over_levels()
- Parameters
label – Complete form label including indications of analytical layer(s).
default_abbreviations – By default, each token component is checked against a mapping from abbreviations to long names. Pass False to prevent that.
**kwargs – Abbreviation=’long name’ mappings to resolve individual abbreviations
- Returns
A DataFrame with one column added per hierarchical layer of analysis, starting from level 0.
- ms3.utils.expand_form_labels(fl: DataFrame, fill_mn_until: int = None, default_abbreviations=True, **kwargs) DataFrame [source]#
Expands form labels into a hierarchical view of levels in a table.
- Parameters
fl – A DataFrame containing raw form labels as retrieved from
ms3.Score.mscx.form_labels()
.fill_mn_until – Pass the last measure number if you want every measure of the piece have a row in the tree view, even if it doesn’t come with a form label. This may be desired for increased intuition of proportions, rather than seeing all form labels right below each other. In order to add the empty rows, even without knowing the number of measures, pass -1.
default_abbreviations – By default, each token component is checked against a mapping from abbreviations to long names. Pass False to prevent that.
**kwargs – Abbreviation=’long name’ mappings to resolve individual abbreviations
- Returns
A DataFrame with one column added per hierarchical layer of analysis, starting from level 0.
- ms3.utils.add_collections(left: Series, right: Collection, dtype: Dtype) Series [source]#
- ms3.utils.add_collections(left: ndarray[Any, dtype[ScalarType]], right: Collection, dtype: Dtype) ndarray[Any, dtype[ScalarType]]
- ms3.utils.add_collections(left: list, right: Collection, dtype: Dtype) list
- ms3.utils.add_collections(left: tuple, right: Collection, dtype: Dtype) tuple
Zip-adds together the strings (by default) contained in two collections regardless of their types (think of adding two columns together element-wise). Pass another
dtype
if you want the values to be converted to another datatype before adding them together.
- ms3.utils.fifths2acc(fifths: int) str [source]#
- ms3.utils.fifths2acc(fifths: Series) Series
- ms3.utils.fifths2acc(fifths: ndarray[Any, dtype[int]]) ndarray[Any, dtype[str]]
- ms3.utils.fifths2acc(fifths: List[int]) List[str]
- ms3.utils.fifths2acc(fifths: Tuple[int]) Tuple[str]
Returns accidentals for a stack of fifths that can be combined with a basic representation of the seven steps.
- ms3.utils.fifths2iv(fifths: int, smallest: bool = False, perfect: str = 'P', major: str = 'M', minor: str = 'm', augmented: str = 'a', diminished: str = 'd') str [source]#
Return interval name of a stack of fifths such that 0 = ‘P1’, -1 = ‘P4’, -2 = ‘m7’, 4 = ‘M3’ etc. If you pass
smallest=True
, intervals of a fifth or greater will be inverted (e.g. ‘m6’ => ‘-M3’ and ‘D5’ => ‘-A4’).- Parameters
fifths – Number of fifths representing the inveral
smallest – Pass True if you want to wrap intervals of a fifths and larger to the downward counterpart.
perfect – String representing the perfect interval quality, defaults to ‘P’.
major – String representing the major interval quality, defaults to ‘M’.
minor – String representing the minor interval quality, defaults to ‘m’.
augmented – String representing the augmented interval quality, defaults to ‘a’.
diminished – String representing the diminished interval quality, defaults to ‘d’.
- Returns
Name of the interval as a string.
- ms3.utils.tpc2name(tpc: int, ms: bool = False, minor: bool = False) Optional[str] [source]#
- ms3.utils.tpc2name(tpc: Series, ms: bool = False, minor: bool = False) Optional[Series]
- ms3.utils.tpc2name(tpc: ndarray[Any, dtype[int]], ms: bool = False, minor: bool = False) Optional[ndarray[Any, dtype[str]]]
- ms3.utils.tpc2name(tpc: List[int], ms: bool = False, minor: bool = False) Optional[List[str]]
- ms3.utils.tpc2name(tpc: Tuple[int], ms: bool = False, minor: bool = False) Optional[Tuple[str]]
Turn a tonal pitch class (TPC) into a name or perform the operation on a collection of integers.
- Parameters
tpc – Tonal pitch class(es) to turn into a note name.
ms – Pass True if
tpc
is a MuseScore TPC, i.e. C = 14minor – Pass True if the string is to be returned as lowercase.
Returns:
- ms3.utils.fifths2name(fifths: int, midi: Optional[int], ms: bool, minor: bool) Optional[str] [source]#
- ms3.utils.fifths2name(fifths: pd.Series, midi: Optional[pd.Series], ms: bool, minor: bool) Optional[pd.Series]
- ms3.utils.fifths2name(fifths: NDArray[int], midi: Optional[NDArray[int]], ms: bool, minor: bool) Optional[NDArray[str]]
- ms3.utils.fifths2name(fifths: List[int], midi: Optional[List[int]], ms: bool, minor: bool) Optional[List[str]]
- ms3.utils.fifths2name(fifths: Tuple[int], midi: Optional[Tuple[int]], ms: bool, minor: bool) Optional[Tuple[str]]
- Return note name of a stack of fifths such that
0 = C, -1 = F, -2 = Bb, 1 = G etc. This is a wrapper of
tpc2name()
, that additionally accepts the argumentmidi
which allows for adding octave information.
- Parameters
fifths – Tonal pitch class(es) to turn into a note name.
midi – In order to include the octave into the note name, pass the corresponding MIDI pitch(es).
ms – Pass True if
fifths
is a MuseScore TPC, i.e. C = 14minor – Pass True if the string is to be returned as lowercase.
- ms3.utils.fifths2pc(fifths)[source]#
Turn a stack of fifths into a chromatic pitch class. Uses: map2elements()
- ms3.utils.fifths2rn(fifths, minor=False, auto_key=False)[source]#
- Return Roman numeral of a stack of fifths such that
0 = I, -1 = IV, 1 = V, -2 = bVII in major, VII in minor, etc. Uses: map2elements(), is_minor_mode()
- Parameters
auto_key (
bool
, optional) – By default, the returned Roman numerals are uppercase. Pass True to pass upper- or lowercase according to the position in the scale.
- ms3.utils.fifths2sd(fifths, minor=False)[source]#
Return scale degree of a stack of fifths such that 0 = ‘1’, -1 = ‘4’, -2 = ‘b7’ in major, ‘7’ in minor etc. Uses: map2elements(), fifths2str()
- ms3.utils.get_musescore(MS: Union[str, Literal['auto', 'win', 'mac']] = 'auto') Optional[str] [source]#
Tests whether a MuseScore executable can be found on the system. Uses: test_binary()
- Parameters
MS – A path to the executable, installed command, or one of the keywords {‘auto’, ‘win’, ‘mac’}
- Returns
Path to the executable if found or None.
- ms3.utils.get_path_component(path, after)[source]#
Returns only the path’s subfolders below
after
. Ifafter
is the last component, ‘.’ is returned.
- ms3.utils.html2format(df, format='name', html_col='color_html')[source]#
Converts the HTML column of a DataFrame into ‘name’, ‘rgb , or ‘rgba’.
- ms3.utils.html_color2format(h, format='name')[source]#
Converts a single HTML color into ‘name’, ‘rgb’, or ‘rgba’.
- ms3.utils.html_color2name(h)[source]#
Converts a HTML color into its CSS3 name or itself if there is none.
- ms3.utils.interval_overlap(a, b, closed=None)[source]#
Returns the overlap of two pd.Intervals as a new pd.Interval.
- Parameters
a (
pandas.Interval
) – Intervals for which to compute the overlap.b (
pandas.Interval
) – Intervals for which to compute the overlap.closed ({'left', 'right', 'both', 'neither'}, optional) – If no value is passed, the closure of the returned interval is inferred from
a
andb
.
- Return type
- ms3.utils.interval_overlap_size(a, b, decimals=3)[source]#
Returns the size of the overlap of two pd.Intervals.
- ms3.utils.is_any_row_equal(df1, df2)[source]#
Returns True if any two rows of the two DataFrames contain the same value tuples.
- ms3.utils.is_minor_mode(fifths, minor=False)[source]#
Returns True if the scale degree fifths naturally has a minor third in the scale.
- ms3.utils.iter_nested(nested)[source]#
Iterate through any nested structure of lists and tuples from left to right.
- ms3.utils.iter_selection(collectio, selector=None, opposite=False)[source]#
Returns a generator of
collectio
.selector
can be a collection of index numbers to select or unselect elements – depending onopposite
- ms3.utils.first_level_files_and_subdirs(path)[source]#
Returns the directory names and filenames contained in path.
- ms3.utils.get_first_level_corpora(path: str) List[str] [source]#
Checks the first-level subdirectories of path for indicators of being a corpus. If one of them shows an indicator (presence of a ‘metadata.tsv’ file, or of a ‘.git’ folder or any of the default folder names), returns a list of all subdirectories.
- ms3.utils.join_tsvs(dfs, sort_cols=False)[source]#
Performs outer join on the passed DataFrames based on ‘mc’ and ‘mc_onset’, if any. Uses: functools.reduce(), sort_cols(), sort_note_lists()
- Parameters
dfs (
Collection
) – Collection of DataFrames to join.sort_cols (
bool
, optional) – If you pass True, the columns after those defined inSTANDARD_COLUMN_ORDER
will be sorted alphabetically.
- ms3.utils.parse_interval_index_column(df, column=None, closed='left')[source]#
Turns a column of strings in the form ‘[0.0, 1.1)’ into a
pandas.IntervalIndex
.- Parameters
df (
pandas.DataFrame
) –column (
str
, optional) – Name of the column containing strings. If not specified, use the index.closed (
str
, optional) – On whot side the intervals should be closed. Defaults to ‘left’.
- Return type
- ms3.utils.load_tsv(path, index_col=None, sep='\t', converters={}, dtype={}, stringtype=False, **kwargs) Optional[DataFrame] [source]#
Loads the TSV file path while applying correct type conversion and parsing tuples.
- Parameters
path (
str
) – Path to a TSV file as output by format_data().index_col (
list
, optional) – By default, the first two columns are loaded as MultiIndex. The first level distinguishes pieces and the second level the elements within.converters (
dict
, optional) – Enhances or overwrites the mapping from column names to types included the constants.dtype (
dict
, optional) – Enhances or overwrites the mapping from column names to types included the constants.stringtype (
bool
, optional) – If you’re using pandas >= 1.0.0 you might want to set this to True in order to be using the new string datatype that includes the new null type pd.NA.
- ms3.utils.make_csvw_jsonld(title: str, columns: Collection[str], urls: Union[str, Collection[str]], description: Optional[str] = None) dict [source]#
W3C’s CSV on the Web Primer: https://www.w3.org/TR/tabular-data-primer/
- ms3.utils.make_continuous_offset_series(measures, quarters=True, negative_anacrusis=None)[source]#
Accepts a measure table without ‘quarterbeats’ column and computes each MC’s offset from the piece’s beginning. Deal with voltas before passing the table.
If you need an offset_dict and the measures already come with a ‘quarterbeats’ column, you can call
make_offset_dict_from_measures()
.- Parameters
measures (
pandas.DataFrame
) – A measures table with ‘normal’ RangeIndex containing the column ‘act_durs’ and one of ‘mc’ or ‘mc_playthrough’ (if repeats were unfolded).quarters (
bool
, optional) – By default, the continuous offsets are expressed in quarter notes. Pass false to leave them as fractions of a whole note.negative_anacrusis (
fractions.Fraction
) – By default, the first value is 0. If you pass a fraction here, the first value will be its negative and the second value will be 0.
- Returns
Cumulative sum of the actual durations, shifted down by 1. Compared to the original DataFrame it has length + 2 because it adds the end value twice, once with the next index value, and once with the index ‘end’. Otherwise the end value would be lost due to the shifting.
- Return type
- ms3.utils.make_offset_dict_from_measures(measures: DataFrame, all_endings: bool = False) dict [source]#
Turn a measure table that comes with a ‘quarterbeats’ column into a dictionary that maps MCs (measure counts) to their quarterbeat offset from the piece’s beginning, used for computing quarterbeats for other facets.
This function is used for the default case. If you need more options, e.g. an offset dict from unfolded measures or expressed in whole notes or with negative anacrusis, use
make_continuous_offset_series()
instead.- Parameters
measures – Measures table containing a ‘quarterbeats’ column.
all_endings – Uses the column ‘quarterbeats_all_endings’ of the measures table if it has one, otherwise falls back to the default ‘quarterbeats’.
- Returns
{MC -> quarterbeat_offset}. Offsets are Fractions. If
all_endings
is not set toTrue
, values for MCs that are part of a first ending (or third or larger) are NA.
- ms3.utils.make_id_tuples(key, n)[source]#
For a given key, this function returns index tuples in the form [(key, 0), …, (key, n)]
- Returns
indices in the form [(key, 0), …, (key, n)]
- Return type
- ms3.utils.make_interval_index_from_breaks(S, end_value=None, closed='left', name='interval')[source]#
Interpret a Series as interval breaks and make an IntervalIndex out of it.
- Parameters
S (
pandas.Series
) – Interval breaks. It is assumed that the breaks are sorted.end_value (numeric, optional) – Often you want to pass the right border of the last interval.
closed (
str
, optional) – Defaults to ‘left’. Argument passed to topandas.IntervalIndex.from_breaks()
.name (
str
, optional) – Name of the created index. Defaults to ‘interval’.
- Return type
- ms3.utils.make_name_columns(df)[source]#
Relies on the columns
localkey
andglobalkey
to transform the columnsroot
andbass_notes
from scale degrees (expressed as fifths) to absolute note names, e.g. in C major: 0 => ‘C’, 7 => ‘C#’, -5 => ‘Db’ Uses: transform(), scale_degree2name
- ms3.utils.make_playthrough2mc(measures: DataFrame) Optional[Series] [source]#
Turns the column ‘next’ into a mapping of playthrough_mc -> mc.
- ms3.utils.make_playthrough_info(measures: DataFrame) Optional[Union[DataFrame, Series]] [source]#
Turns a measures table into a DataFrame or Series that can be passed as argument to
unfold_repeats()
. The return type is DataFrame if the unfolded measures table contains an ‘mn_playthrough’ column, otherwise it is equal to the result ofmake_playthrough2mc()
. Hence, the purpose of the function is to add an ‘mn_playthrough’ column to unfolded facets whenever possible.
- ms3.utils.map2elements(e, f, *args, **kwargs)[source]#
If e is an iterable, f is applied to all elements.
- ms3.utils.merge_ties(df, return_dropped=False, perform_checks=True)[source]#
- In a note list, merge tied notes to single events with accumulated durations.
Input dataframe needs columns [‘duration’, ‘tied’, ‘midi’, ‘staff’]. This function does not handle correctly overlapping ties on the same pitch since it doesn’t take into account the notational layers (‘voice’).
- Parameters
df –
return_dropped –
- ms3.utils.merge_chords_and_notes(chords_table: DataFrame, notes_table: DataFrame) DataFrame [source]#
Performs an outer join between a chords table and a notes table, based on the column ‘chord_id’. If the chords come with an ‘event’ column, all chord events matched with at least one note will be renamed to ‘Note’. Markup displayed in individual rows (‘Dynamic’, ‘Spanner’, ‘StaffText’, ‘SystemText’, ‘Tempo’, ‘FiguredBass’), are/remain placed before the note(s) with the same onset. Markup showing up in a Chord event’s row (e.g. a Spanner ID) will be duplicated for each note pertaining to that chord, i.e., only for notes in the same staff and voice.
- Parameters
chords_table –
notes_table –
- Returns
Merged DataFrame.
- ms3.utils.metadata2series(d: dict) Series [source]#
Turns a metadata dict into a pd.Series() (for storing in a DataFrame) Uses: ambitus2oneliner(), dict2oneliner(), parts_info()
- Returns
A series allowing for storing metadata as a row of a DataFrame.
- Return type
- ms3.utils.midi2octave(midi: int, fifths: Optional[int]) int [source]#
- ms3.utils.midi2octave(midi: Series, fifths: Optional[Series]) Series
- ms3.utils.midi2octave(midi: ndarray[Any, dtype[int]], fifths: Optional[ndarray[Any, dtype[ScalarType]]]) ndarray[Any, dtype[int]]
- ms3.utils.midi2octave(midi: List[int], fifths: Optional[List[int]]) List[int]
- ms3.utils.midi2octave(midi: Tuple[int], fifths: Optional[Tuple[int]]) Tuple[int]
- For a given MIDI pitch, calculate the octave. Middle octave = 4
Uses: midi_and_tpc2octave(), map2elements()
- ms3.utils.mn2int(mn_series)[source]#
Turn a series of measure numbers parsed as strings into two integer columns ‘mn’ and ‘volta’.
- ms3.utils.name2format(df, format='html', name_col='color_name')[source]#
Converts a column with CSS3 names into ‘html’, ‘rgb’, or ‘rgba’.
- ms3.utils.name2fifths(nn)[source]#
Turn a note name such as Ab into a tonal pitch class, such that -1=F, 0=C, 1=G etc. Uses: split_note_name()
- ms3.utils.name2pc(nn)[source]#
Turn a note name such as Ab into a tonal pitch class, such that -1=F, 0=C, 1=G etc. Uses: split_note_name()
- ms3.utils.nan_eq(a, b)[source]#
Returns True if a and b are equal or both null. Works on two Series or two elements.
- ms3.utils.next2sequence(next_col: Series) Optional[List[int]] [source]#
Turns a ‘next’ column into the correct sequence of MCs corresponding to unfolded repetitions. Requires that the Series’ index be the MCs as in
measures.set_index('mc').next
.
- ms3.utils.no_collections_no_booleans(df, coll_columns=None, bool_columns=None)[source]#
Cleans the DataFrame columns [‘next’, ‘chord_tones’, ‘added_tones’, ‘volta_mcs] from tuples and the columns [‘globalkey_is_minor’, ‘localkey_is_minor’] from booleans, converting them all to integers
- ms3.utils.parts_info(d)[source]#
Turns a (nested)
metadata['parts']
dict into a flat dict based on staves.Example
>>> d = s.mscx.metadata_from_parsed >>> parts_info(d['parts']) {'staff_1_instrument': 'Voice', 'staff_1_ambitus': '66-76 (F#4-E5)', 'staff_2_instrument': 'Voice', 'staff_2_ambitus': '55-69 (G3-A4)', 'staff_3_instrument': 'Voice', 'staff_3_ambitus': '48-67 (C3-G4)', 'staff_4_instrument': 'Voice', 'staff_4_ambitus': '41-60 (F2-C4)'}
- ms3.utils.path2type(path)[source]#
Determine a file’s type by scanning its path for default components in the constant STANDARD_NAMES.
- Parameters
path –
- ms3.utils.pretty_dict(ugly_dict: dict, heading_key: Optional[str] = None, heading_value: Optional[str] = None) str [source]#
Turns a dictionary into a string where the keys are printed in a column, separated by ‘->’.
- ms3.utils.rgb2format(df, format='html', r_col='color_r', g_col='color_g', b_col='color_b')[source]#
Converts three RGB columns into a color_html or color_name column.
- ms3.utils.rgb_tuple2format(t, format='html')[source]#
Converts a single RGB tuple into ‘HTML’ or ‘name’.
- ms3.utils.rgb_tuple2name(t)[source]#
Converts a single RGB tuple into its CSS3 name or to HTML if there is none.
- ms3.utils.roman_numeral2fifths(rn, global_minor=False)[source]#
Turn a Roman numeral into a TPC interval (e.g. for transposition purposes). Uses: split_scale_degree()
- ms3.utils.roman_numeral2semitones(rn, global_minor=False)[source]#
Turn a Roman numeral into a semitone distance from the root (0-11). Uses: split_scale_degree()
- ms3.utils.scale_degree2name(sd, localkey, globalkey)[source]#
For example, scale degree -1 (fifths, i.e. the subdominant) of the localkey of ‘VI’ within ‘E’ minor is ‘F’.
- Parameters
- Returns
The given scale degree, expressed as a note name.
- Return type
- ms3.utils.scan_directory(directory: str, file_re: str = '.*', folder_re: str = '.*', exclude_re: str = '^(\\.|_)', recursive: bool = True, subdirs: bool = False, progress: bool = False, exclude_files_only: bool = False, return_metadata: bool = False) Iterator[Union[str, Tuple[str, str]]] [source]#
Generator of filtered file paths in
directory
.- Parameters
directory – Directory to be scanned for files.
file_re – Regular expressions for filtering certain file names or folder names. The regEx are checked with search(), not match(), allowing for fuzzy search.
folder_re – Regular expressions for filtering certain file names or folder names. The regEx are checked with search(), not match(), allowing for fuzzy search.
exclude_re – Exclude files and folders (unless
exclude_files_only=True
) containing this regular expression.recursive – By default, sub-directories are recursively scanned. Pass False to scan only
dir
.subdirs – By default, full file paths are returned. Pass True to return (path, name) tuples instead.
progress – Pass True to display the progress (useful for large directories).
exclude_files_only – By default,
exclude_re
excludes files and folder. Pass True to exclude only files matching the regEx.return_metadata – If set to True, ‘metadata.tsv’ are always yielded regardless of
file_re
.
- Yields
Full file path or, if
subdirs=True
, (path, file_name) pairs in random order.
- ms3.utils.column_order(df, first_cols=None, sort=True)[source]#
Sort DataFrame columns so that they start with the order of
first_cols
, followed by those not included.
- ms3.utils.sort_note_list(df, mc_col='mc', mc_onset_col='mc_onset', midi_col='midi', duration_col='duration')[source]#
Sort every measure (MC) by [‘mc_onset’, ‘midi’, ‘duration’] while leaving gracenotes’ order (duration=0) intact.
- Parameters
df –
mc_col –
mc_onset_col –
midi_col –
duration_col –
- ms3.utils.sort_tpcs(tpcs, ascending=True, start=None)[source]#
- Sort tonal pitch classes by order on the piano.
Uses: fifths2pc()
- ms3.utils.split_alternatives(df, column='label', regex='-(?!(\\d|b+\\d|\\#+\\d))', max=2, inplace=False, alternatives_only=False)[source]#
Splits labels that come with an alternative separated by ‘-’ and adds a new column. Only one alternative is taken into account. df is mutated inplace.
- Parameters
df (
pandas.DataFrame
) – Dataframe where one column contains DCML chord labels.column (
str
, optional) – Name of the column that holds the harmony labels.regex (
str
, optional) – The regular expression (or simple string) that detects the character combination used to separate alternative annotations. By default, alternatives are separated by a ‘-’ that does not precede a scale degree such as ‘b6’ or ‘3’.max (
int
, optional) – Maximum number of admitted alternatives, defaults to 2.inplace (
bool
, optional) – Pass True if you want to mutatedf
.alternatives_only (
bool
, optional) – By default the alternatives are added to the original DataFrame (inplace
or not). Pass True if you just need the split alternatives.
Example
>>> import pandas as pd >>> labels = pd.read_csv('labels.csv') >>> split_alternatives(labels, inplace=True)
- ms3.utils.split_note_name(nn, count=False)[source]#
Splits a note name such as ‘Ab’ into accidentals and name.
- ms3.utils.split_scale_degree(sd, count=False)[source]#
Splits a scale degree such as ‘bbVI’ or ‘b6’ into accidentals and numeral.
- ms3.utils.transform(df, func, param2col=None, column_wise=False, **kwargs)[source]#
- Compute a function for every row of a DataFrame, using several cols as arguments.
The result is the same as using df.apply(lambda r: func(param1=r.col1, param2=r.col2…), axis=1) but it optimizes the procedure by precomputing func for all occurrent parameter combinations. Uses: inspect.getfullargspec()
- Parameters
df (
pandas.DataFrame
orpandas.Series
) – Dataframe containing function parameters.func (
callable
) – The result of this function for every row will be returned.param2col (
dict
orlist
, optional) – Mapping from parameter names of func to column names. If you pass a list of column names, the columns’ values are passed as positional arguments. Pass None if you want to use all columns as positional arguments.column_wise (
bool
, optional) – Pass True if you want to mapfunc
to the elements of every column separately. This is simply an optimized version of df.apply(func) but allows for naming columns to use as function arguments. If param2col is None,func
is mapped to the elements of all columns, otherwise to all columns that are not named as parameters inparam2col
. In the case wherefunc
does not require a positional first element and you want to pass the elements of the various columns as keyword argument, give it as param2col={‘function_argument’: None}inplace (
bool
, optional) – Pass True if you want to mutatedf
rather than getting an altered copy.**kwargs (Other parameters passed to
func
.) –
- ms3.utils.adjacency_groups(S: Series, na_values: Optional[str] = 'group', prevent_merge: bool = False) Tuple[Series, Dict[int, Any]] [source]#
Turns a Series into a Series of ascending integers starting from 1 that reflect groups of successive equal values. There are several options of how to deal with NA values.
- Parameters
S – Series in which to group identical adjacent values with each other.
na_values –
‘group’ creates individual groups for NA values (default).’backfill’ or ‘bfill’ groups NA values with the subsequent group’pad’, ‘ffill’ groups NA values with the preceding groupAny other string works like ‘group’, with the difference that the groups will be named with this value.Passing None means NA values & ranges are being ignored, i.e. they will also be present in the output and the subsequent value will be based on the preceding value.prevent_merge – By default, if you use the na_values argument to fill NA values, they might lead to two groups merging. Pass True to prevent this. For example, take the sequence [‘a’, NA, ‘a’] with
na_values='ffill'
: By default, it will be merged to one single group[1, 1, 1], {1: 'a'}
. However, passingprevent_merge=True
will result in[1, 1, 2], {1: 'a', 2: 'a'}
.
- Returns
A series with increasing integers that can be used for grouping. A dictionary mapping the integers to the grouped values.
- ms3.utils.unfold_measures_table(measures: DataFrame) Optional[DataFrame] [source]#
Returns a copy of a measures table that corresponds through a succession of MCs when playing all repeats. To distinguish between repeated MCs and MNs, it adds the continues column ‘mc_playthrough’ (starting at 1) and ‘mn_playthrough’ which contains the values of ‘mn’ as string with letters {‘a’, ‘b’, …} appended.
- Parameters
measures – Measures table with columns [‘mc’, ‘next’, ‘dont_count’]
Returns:
- ms3.utils.unfold_repeats(df: DataFrame, playthrough_info: Union[Series, DataFrame]) DataFrame [source]#
Use a succesion of MCs to bring a DataFrame in this succession. MCs may repeat.
- Parameters
df – DataFrame needs to have the columns ‘mc’. If ‘mn’ is present, the column ‘mn’ will be added, too.
playthrough2mc – A Series of the format
{mc_playthrough: mc}
wheremc_playthrough
corresponds to continuous MC
- Returns
A copy of the dataframe with the columns ‘mc_playthrough’ and ‘mn_playthrough’ (if ‘mn’ is present) inserted.
- ms3.utils.capture_parse_logs(logger_object: Logger, level: Union[str, int] = 'w') LogCapturer [source]#
Within the context, the given logger will have an additional handler that captures all messages with level
level
or higher. At the end of the context, retrieve the message list via LocCapturer.content_list.Example
with capture_parse_logs(logger, level='d') as capturer: # do the stuff of which you want to capture all_messages = capturer.content_list
- ms3.utils.write_metadata(metadata_df: DataFrame, path: str, index=False) bool [source]#
Write the DataFrame
metadata_df
topath
, updating an existing file rather than overwriting it.- Parameters
metadata_df – DataFrame with one row per piece and an index of strings identifying pieces. The index is used for updating a potentially pre-existent file, from which the first column ∈ (‘fname’, ‘fnames’, ‘name’, ‘names’) will be used as index.
path – If folder path, the filename ‘metadata.tsv’ will be appended; file_path will be used as is but a warning is thrown if the extension is not .tsv
index – Pass True if you want the first column of the output to be a RangeIndex starting from 0.
- Returns
True if the metadata were successfully written, False otherwise.
- ms3.utils.enforce_fname_index_for_metadata(metadata_df: DataFrame, append=False) DataFrame [source]#
Returns a copy of the DataFrame that has an index level called ‘fname’.
- ms3.utils.write_markdown(metadata_df: DataFrame, file_path: str) None [source]#
Write a subset of the DataFrame
metadata_df
topath
in markdown format. If the file exists, it will be scanned for a line containing the string ‘# Overview’ and overwritten from that line onwards.- Parameters
metadata_df – DataFrame containing metadata.
file_path – Path of the markdown file.
- ms3.utils.write_tsv(df, file_path, pre_process=True, **kwargs)[source]#
Write a DataFrame to a TSV or CSV file based on the extension of ‘file_path’. Uses:
no_collections_no_booleans()
- Parameters
df (
pandas.DataFrame
) – DataFrame to write.file_path (
str
) – File to create or overwrite. If the extension is .tsv, the argument ‘sep’ will be set to ‘ ‘, otherwise the extension is expected to be .csv and the default separator ‘,’ will be used.pre_process (
bool
, optional) – By default, DataFrame cells containing lists and tuples will be transformed to strings and Booleans will be converted to 0 and 1 (otherwise they will be written out as True and False). Pass False to prevent.kwargs – Additional keyword arguments will be passed on to
pandas.DataFrame.to_csv()
. Defaults arguments areindex=False
andsep=' '
(assuming extension ‘.tsv’, see above).
- Return type
None
- ms3.utils.abs2rel_key(absolute: str, localkey: str, global_minor: bool = False) str [source]#
Expresses a Roman numeral as scale degree relative to a given localkey. The result changes depending on whether Roman numeral and localkey are interpreted within a global major or minor key.
Uses:
split_scale_degree()
- Parameters
absolute – Absolute key expressed as Roman scale degree of the local key.
localkey – The local key in terms of which
absolute
will be expressed.global_minor – Has to be set to True if absolute and localkey are scale degrees of a global minor key.
Examples
In a minor context, the key of II would appear within the key of vii as #III.
>>> abs2rel_key('iv', 'VI', global_minor=False) 'bvi' # F minor expressed with respect to A major >>> abs2rel_key('iv', 'vi', global_minor=False) 'vi' # F minor expressed with respect to A minor >>> abs2rel_key('iv', 'VI', global_minor=True) 'vi' # F minor expressed with respect to Ab major >>> abs2rel_key('iv', 'vi', global_minor=True) '#vi' # F minor expressed with respect to Ab minor
>>> abs2rel_key('VI', 'IV', global_minor=False) 'III' # A major expressed with respect to F major >>> abs2rel_key('VI', 'iv', global_minor=False) '#III' # A major expressed with respect to F minor >>> abs2rel_key('VI', 'IV', global_minor=True) 'bIII' # Ab major expressed with respect to F major >>> abs2rel_key('VI', 'iv', global_minor=False) 'III' # Ab major expressed with respect to F minor
- ms3.utils.rel2abs_key(relative: str, localkey: str, global_minor: bool = False)[source]#
Expresses a Roman numeral that is expressed relative to a localkey as scale degree of the global key. For local keys {III, iii, VI, vi, VII, vii} the result changes depending on whether the global key is major or minor.
Uses:
split_scale_degree()
- Parameters
relative – Relative key or chord expressed as Roman scale degree of the local key.
localkey – The local key to which rel is relative.
global_minor – Has to be set to True if localkey is a scale degree of a global minor key.
Examples
If the label viio6/VI appears in the context of the local key VI or vi, the absolute key to which viio6 applies depends on the global key. The comments express the examples in relation to global C major or C minor.
>>> rel2abs_key('vi', 'VI', global_minor=False) '#iv' # vi of A major = F# minor >>> rel2abs_key('vi', 'vi', global_minor=False) 'iv' # vi of A minor = F minor >>> rel2abs_key('vi', 'VI', global_minor=True) 'iv' # vi of Ab major = F minor >>> rel2abs_key('vi', 'vi', global_minor=True) 'biv' # vi of Ab minor = Fb minor
The same examples hold if you’re expressing in terms of the global key the root of a VI-chord within the local keys VI or vi.
- ms3.utils.make_interval_index_from_durations(df, position_col='quarterbeats', duration_col='duration_qb', closed='left', round=None, name='interval')[source]#
Given an annotations table with positions and durations, create an
pandas.IntervalIndex
. Returns None if any row is underspecified.- Parameters
df (
pandas.DataFrame
) – Annotation table containing the columns ofposition_col
(default: ‘quarterbeats’) andduration_col
default: ‘duration_qb’).position_col (
str
, optional) – Name of the column containing positions, used as left boundaries.duration_col (
str
, optional) – Name of the column containing durations which will be added to the positions to obtain right boundaries.closed (
str
, optional) – ‘left’, ‘right’ or ‘both’ <- defining the interval boundariesround (
int
, optional) – To how many decimal places to round the intervals’ boundary values.name (
str
, optional) – Name of the created index. Defaults to ‘interval’.
- Returns
A copy of
df
with the original index replaced and underspecified rows removed (those where no interval could be coputed).- Return type
- ms3.utils.replace_index_by_intervals(df, position_col='quarterbeats', duration_col='duration_qb', closed='left', filter_zero_duration=False, round=None, name='interval')[source]#
Given an annotations table with positions and durations, replaces its index with an
pandas.IntervalIndex
. Underspecified rows are removed.- Parameters
df (
pandas.DataFrame
) – Annotation table containing the columns ofposition_col
(default: ‘quarterbeats’) andduration_col
default: ‘duration_qb’).position_col (
str
, optional) – Name of the column containing positions.duration_col (
str
, optional) – Name of the column containing durations.closed (
str
, optional) – ‘left’, ‘right’ or ‘both’ <- defining the interval boundariesfilter_zero_duration (
bool
, optional) – Defaults to False, meaning that rows with zero durations are maintained. Pass True to remove them.round (
int
, optional) – To how many decimal places to round the intervals’ boundary values.name (
str
, optional) – Name of the created index. Defaults to ‘interval’.
- Returns
A copy of
df
with the original index replaced and underspecified rows removed (those where no interval could be computed).- Return type
- ms3.utils.boolean_mode_col2strings(S) Series [source]#
Turn the boolean is_minor columns into string columns such that True => ‘minor’, False => ‘major’.
- ms3.utils.replace_boolean_mode_by_strings(df) DataFrame [source]#
Replaces boolean ‘_is_minor’ columns with string columns renamed to ‘_mode’. Example: df[‘some_col’, ‘some_name_is_minor’] => df[‘some_col’, ‘some_name_mode’]
- ms3.utils.resolve_relative_keys(relativeroot, minor=False)[source]#
Resolve nested relative keys, e.g. ‘V/V/V’ => ‘VI’.
Uses:
rel2abs_key()
,str_is_minor()
- ms3.utils.series_is_minor(S, is_name=True)[source]#
Returns boolean Series where every value in
S
representing a minor key/chord is True.
- ms3.utils.str_is_minor(tone, is_name=True)[source]#
Returns True if
tone
represents a minor key or chord.
- ms3.utils.transpose_changes(changes, old_num, new_num, old_minor=False, new_minor=False)[source]#
Since the interval sizes expressed by the changes of the DCML harmony syntax depend on the numeral’s position in the scale, these may change if the numeral is transposed. This function expresses the same changes for the new position. Chord tone alterations (of 3 and 5) stay untouched.
Uses:
changes2tpc()
- Parameters
changes (
str
) – A string of changes following the DCML harmony standard.old_num (
str
:) – Old numeral, new numeral.new_num (
str
:) – Old numeral, new numeral.old_minor (
bool
, optional) – For each numeral, pass True if it occurs in a minor context.new_minor (
bool
, optional) – For each numeral, pass True if it occurs in a minor context.
- ms3.utils.features2tpcs(numeral, form=None, figbass=None, changes=None, relativeroot=None, key='C', minor=None, merge_tones=True, bass_only=False, mc=None)[source]#
Given the features of a chord label, this function returns the chord tones in the order of the inversion, starting from the bass note. The tones are expressed as tonal pitch classes, where -1=F, 0=C, 1=G etc.
Uses:
changes2list()
,name2fifths()
,resolve_relative_keys()
,roman_numeral2fifths()
,sort_tpcs()
,str_is_minor()
- Parameters
numeral (
str
) – Roman numeral of the chord’s rootform ({None, 'M', 'o', '+' '%'}, optional) – Indicates the chord type if not a major or minor triad (for which
form
is None). ‘%’ and ‘M’ can only occur as tetrads, not as triads.figbass ({None, '6', '64', '7', '65', '43', '2'}, optional) – Indicates chord’s inversion. Pass None for triad root position.
changes (
str
, optional) – Added steps such as ‘+6’ or suspensions such as ‘4’ or any combination such as (9+64). Numbers need to be in descending order.relativeroot (
str
, optional) – Pass a Roman scale degree if numeral is to be applied to a different scale degree of the local key, as in ‘V65/V’key (
str
orint
, optional) – The local key expressed as the root’s note name or a tonal pitch class. If it is a name and minor is None, uppercase means major and lowercase minor. If it is a tonal pitch class, minor needs to be specified.minor (
bool
, optional) – Pass True for minor and False for major. Can be omitted if key is a note name. This affects calculation of chords related to III, VI and VII.merge_tones (
bool
, optional) – Pass False if you want the function to return two tuples, one with (potentially suspended) chord tones and one with added notes.bass_only (
bool
, optional) – Return only the bass note instead of all chord tones.mc (int or str) – Pass measure count to display it in warnings.
- ms3.utils.path2parent_corpus(path)[source]#
Walk up the path and return the name of the first superdirectory that is a git repository or contains a ‘metadata.tsv’ file.
- ms3.utils.chord2tpcs(chord, regex=None, **kwargs)[source]#
Split a chord label into its features and apply features2tpcs().
Uses: features2tpcs()
- Parameters
chord (
str
) – Chord label that can be split into the features [‘numeral’, ‘form’, ‘figbass’, ‘changes’, ‘relativeroot’].regex (
re.Pattern
, optional) – Compiled regex with named groups for the five features. By default, the current version of the DCML harmony annotation standard is used.**kwargs – arguments for features2tpcs (pass MC to show it in warnings!)
- ms3.utils.ignored_warnings2dict(messages: Collection[str]) Dict[str, List[Tuple[int]]] [source]#
- Parameters
messages –
- Returns
{logger_name -> [ignored_warnings]} dict.
- ms3.utils.parse_ignored_warnings_file(path: str) Dict[str, List[Tuple[int, Tuple[int]]]] [source]#
Parse file with log messages that have to be ignored to the dict. The expected structure of message: warning_type (warning_type_id, *integers) file Example of message: INCORRECT_VOLTA_MN_WARNING (2, 94) ms3.Parse.mixed_files.Did03M-Son_regina-1762-Sarti.mscx.MeasureList
- Parameters
key (
str
) –Path to IGNORED_WARNINGS- Returns
{logger_name: [(message_id, label_of_message), (message_id, label_of_message), …]}.
- Return type
obj: dict
- ms3.utils.overlapping_chunk_per_interval(df: DataFrame, intervals: List[Interval], truncate: bool = True) Dict[Interval, DataFrame] [source]#
For each interval, create a chunk of the given DataFrame based on its IntervalIndex. This is an optimized algorithm compared to calling IntervalIndex.overlaps(interval) for each given interval, with the additional advantage that it will not discard rows where the interval is zero, such as [25.0, 25.0).
- Parameters
df (
pandas.DataFrame
) – The DataFrame is expected to come with an IntervalIndex and contain the columns ‘quarterbeats’ and ‘duration_qb’. Those can be obtained throughParse.get_lists(interval_index=True)
orParse.iter_transformed(interval_index=True)
.intervals (
list
ofpd.Interval
) – The intervals defining the chunks’ dimensions. Expected to be non-overlapping and monotonically increasing.truncate (
bool
, optional) – Defaults to True, meaning that the interval index and the ‘duration_qb’ will be adapted for overlapping intervals. Pass False to get chunks with all overlapping intervals as they are.
- Returns
{interval -> chunk}
- Return type
- ms3.utils.infer_tsv_type(df: DataFrame) Optional[str] [source]#
Infers the contents of a DataFrame from the presence of particular columns.
- ms3.utils.reduce_dataframe_duration_to_first_row(df: DataFrame) DataFrame [source]#
Reduces a DataFrame to its row and updates the duration_qb column to reflect the reduced duration.
- Parameters
df – Dataframe of which to keep only the first row. If it has an IntervalIndex, the interval is updated to reflect the whole duration.
- Returns
DataFrame with one row.
- class ms3.utils.File(ix: int, type: str, file: str, fname: str, fext: str, subdir: str, corpus_path: str, rel_path: str, full_path: str, directory: str, suffix: str, commit_sha: str = '')[source]#
Storing path and file name information for one file.
- corpus_path: str#
Absolute path of the file’s parent directory that is considered as corpus directory.
- ms3.utils.ask_user_to_choose(query: str, choices: Collection[Any]) Optional[Any] [source]#
Ask user to input an integer and return the nth choice selected by the user.
- ms3.utils.disambiguate_files(files: Collection[File], fname: str, file_type: str, choose: Literal['auto', 'ask'] = 'auto') Optional[File] [source]#
Receives a collection of
File
with the aim to pick one of them. First, a dictionary is created where the keys are disambiguation strings based on the files’ paths and suffixes.- Parameters
files –
choose – If ‘auto’ (default), the file with the shortest disambiguation string is chosen. Set to True if you want to be asked to manually choose a file.
- Returns
The selected file.
- ms3.utils.files2disambiguation_dict(files: Collection[File], include_disambiguator: bool = False) Dict[str, File] [source]#
Takes a list of
File
returns a dictionary with disambiguating strings based on path components. of distinct strings to distinguish files pertaining to the same type.
- ms3.utils.literal_type2tuple(typ: TypeVar) Tuple[str] [source]#
Turns the first Literal included in the TypeVar into a list of values. The first literal value needs to be a string, otherwise the function may lead to unexpected behaviour.
- ms3.utils.argument_and_literal_type2list(argument: Union[str, Tuple[str], Literal[None]], typ: Optional[Union[TypeVar, Tuple[str]]] = None, none_means_all: bool = True) Optional[List[str]] [source]#
Makes sure that an input value is a list of strings and that all strings are valid w.r.t. to the type’s expected literal values (strings).
- Parameters
argument – If string, wrapped in a list, otherwise expected to be a tuple of strings (passing a list will fail). If None, a list of all possible values according to the type is returned if none_means_all.
typ – A typing.Literal declaration or a TypeVar where the first component is one, or a tuple of allowed values. All allowed values should be strings.
none_means_all – By default, None values are replaced with all allowed values, if specified. Pass False to return None in this case.
- Returns
The list of accepted strings. The list of rejected strings.
- ms3.utils.resolve_facets_param(facets, facet_type_var: TypeVar = typing.Literal['scores', 'measures', 'notes', 'rests', 'notes_and_rests', 'labels', 'expanded', 'form_labels', 'cadences', 'events', 'chords', 'unknown'], none_means_all=True)[source]#
Like
argument_and_literal_type2list()
, but also resolves ‘tsv’ to all non-score facets.
- ms3.utils.unpack_json_paths(paths: Collection[str]) None [source]#
Mutates the list with paths by replacing .json files with the list (of paths) contained in them.
- ms3.utils.resolve_paths_argument(paths: Union[str, Collection[str]], files: bool = True) List[str] [source]#
Makes sure that the given path(s) exists(s) and filters out those that don’t.
- Parameters
paths – One or several paths given as strings.
files – By default, only file paths are returned. Set to False to return only folders.
Returns:
- ms3.utils.compute_path_from_file(file: File, root_dir: Optional[str] = None, folder: Optional[str] = None) str [source]#
Constructs a path based on the arguments.
- Args:
file: This function uses the fields corpus_path, subdir, and type. root_dir:
Defaults to None, meaning that the path is constructed based on the corpus_path. Pass a directory to construct the path relative to it instead. If
folder
is an absolute path,root_dir
is ignored.- folder:
If
folder
is None (default), the files’ type will be appended to theroot_dir
.If
folder
is an absolute path,root_dir
will be ignored.If
folder
is a relative path starting with a dot.
the relative path is appended to the file’s subdir. For example, ``..
- otes`` will resolve to a sibling directory of the one where the
file
is located. If
folder
is a relative path that does not begin with a dot.
, it will be appended to theroot_dir
.If
folder
== ‘’ (empty string), the result will be root_dir.
- Returns:
The constructed directory path.
- ms3.utils.make_file_path(file: File, root_dir=None, folder: Optional[str] = None, suffix: str = '', fext: str = '.tsv')[source]#
Constructs a file path based on the arguments.
- Args:
file: This function uses the fields fname, corpus_path, subdir, and type. root_dir:
Defaults to None, meaning that the path is constructed based on the corpus_path. Pass a directory to construct the path relative to it instead. If
folder
is an absolute path,root_dir
is ignored.- folder:
Different behaviours are available. Note that only the third option ensures that file paths are distinct for files that have identical fnames but are located in different subdirectories of the same corpus. * If
folder
is None (default), the files’ type will be appended to theroot_dir
. * Iffolder
is an absolute path,root_dir
will be ignored. * Iffolder
is a relative path starting with a dot.
the relative path is appended to the file’s subdir.For example, ``..
- otes`` will resolve to a sibling directory of the one where the
file
is located. If
folder
is a relative path that does not begin with a dot.
, it will be appended to theroot_dir
.
suffix: String to append to the file’s fname. fext: File extension to append to the (fname+suffix). Defaults to
.tsv
.- Returns:
The constructed file path.
- ms3.utils.string2identifier(s: str, remove_leading_underscore: bool = True) str [source]#
Transform a string in a way that it can be used as identifier (variable or attribute name). Solution by Kenan Banks on https://stackoverflow.com/a/3303361
- ms3.utils.parse_tsv_file_at_git_revision(file: File, git_revision: str, repo_path: Optional[str] = None) Tuple[Optional[File], Optional[DataFrame]] [source]#
Pass a File object of a TSV file and an identifier for a git revision to retrieve the parsed TSV file at that commit. The file needs to have existed at the revision in question.
- Parameters
file –
git_revision –
repo_path –
Returns:
Transformations#
Functions for transforming DataFrames as output by ms3.
- ms3.transformations.make_note_name_and_octave_columns(notes: DataFrame, staff2drums: Optional[Dict[int, Union[dict, DataFrame, Series]]] = None) Tuple[Series, Series] [source]#
Takes a notelist and maybe a {staff -> {midi_pitch -> ‘instrument_name’}} mapping and returns two columns named ‘name’ and ‘octave’.
- ms3.transformations.add_quarterbeats_col(df: DataFrame, offset_dict: Union[Series, dict], interval_index: bool = False) DataFrame [source]#
- Insert a column measuring the distance of events from MC 1 in quarter notes. If no ‘mc_onset’ column is present,
the column corresponds to the values given in the offset_dict..
- Parameters
df (
pandas.DataFrame
) – DataFrame with anmc_playthrough
and anmc_onset
column.offset_dict (
pandas.Series
ordict
, optional) –If unfolded: {mc_playthrough -> offset}Otherwise: {mc -> offset}You can create the dict using the functionParse.get_continuous_offsets()
It is not required if the column ‘quarterbeats’ exists already.interval_index (
bool
, optional) – Defaults to False. Pass True to replace the index with anpandas.IntervalIndex
(depends on the successful creation of the columnduration_qb
).
- ms3.transformations.add_weighted_grace_durations(notes, weight=0.5)[source]#
For a given notes table, change the ‘duration’ value of all grace notes, weighting it by
weight
.- Parameters
notes (
pandas.DataFrame
) – Notes table containing the columns ‘duration’, ‘nominal_duration’, ‘scalar’weight (
Fraction
orfloat
) – Value by which to weight duration of all grace notes. Defaults to a half.
- Returns
Copy of
notes
with altered duration values.- Return type
- ms3.transformations.compute_chord_tones(df, bass_only=False, expand=False, cols={})[source]#
Compute the chord tones for DCML harmony labels. They are returned as lists of tonal pitch classes in close position, starting with the bass note. The tonal pitch classes represent intervals relative to the local tonic:
-2: Second below tonic -1: fifth below tonic 0: tonic 1: fifth above tonic 2: second above tonic, etc.
The labels need to have undergone
split_labels()
andpropagate_keys()
. Pedal points are not taken into account.Uses:
features2tpcs()
- Parameters
df (
pandas.DataFrame
) – Dataframe containing DCML chord labels that have been split by split_labels() and where the keys have been propagated using propagate_keys(add_bool=True).bass_only (
bool
, optional) – Pass True if you need only the bass note.expand (
bool
, optional) – Pass True if you need chord tones and added tones in separate columns.cols (
dict
, optional) –In case the column names for
['mc', 'numeral', 'form', 'figbass', 'changes', 'relativeroot', 'localkey', 'globalkey']
deviate, pass a dict, such as{'mc': 'mc', 'numeral': 'numeral_col_name', 'form': 'form_col_name', 'figbass': 'figbass_col_name', 'changes': 'changes_col_name', 'relativeroot': 'relativeroot_col_name', 'localkey': 'localkey_col_name', 'globalkey': 'globalkey_col_name'}
You may also deactivate columns by setting them to None, e.g. {‘changes’: None}
- Returns
For every row of df one tuple with chord tones, expressed as tonal pitch classes. If expand is True, the function returns a DataFrame with four columns: Two with tuples for chord tones and added tones, one with the chord root, and one with the bass note.
- Return type
- ms3.transformations.dfs2quarterbeats(dfs: Union[DataFrame, List[DataFrame]], measures: DataFrame, unfold=False, quarterbeats=True, interval_index=True) List[DataFrame] [source]#
Pass one or several DataFrames and one measures table to unfold repeats and/or add quarterbeats columns and/or index.
- Parameters
dfs – DataFrame(s) that are to be unfolded and/or receive quarterbeats.
measures –
unfold –
quarterbeats –
interval_index –
- Returns
Altered copies of dfs.
- ms3.transformations.get_chord_sequences(at, major_minor=True, level=None, column='chord')[source]#
Transforms an annotation table into lists of chord symbols for n-gram analysis. If your table represents several pieces, make sure to pass the groupby parameter
level
to avoid including inexistent transitions.- Parameters
at (
pandas.DataFrame
) – Annotation table.major_minor (
bool
, optional) –Defaults to True: the length of the chord sequences corresponds to localkey segments. The result comes as dict of dicts.If you pass False, chord sequences are returned as they are, potentially including incorrect transitions, e.g., when the localkey changes. The result comes as list of lists, where the sublists result from the groupby if you specifiedlevel
.level (
int
orlist
) – Argument passed topandas.DataFrame.groupby()
. Defaults to -1, resulting in a GroupBy by all levels except the last. Conversely, you can pass, for instance, 2 to group by the first two levels.column (
str
) – Name of the column containing the chord symbols that compose the sequences.
- Returns
- Return type
- ms3.transformations.group_annotations_by_features(at, features='numeral')[source]#
Drop exact repetitions of one or several feature columns when occurring under the same localkey (and pedal point). For example, pass
features = ['numeral', 'form', 'figbass']
to drop rows where all three features are identical with the previous row _and_ the localkey stays the same. If the columnduration_qb
is present, it is updated with the new durations, as would be the IntervalIndex if there is one. Uses: nan_eq()- Parameters
at (
pandas.DataFrame
) – Annotation tablefeatures (
str
orlist
) – Feature or feature combination for which to remove immediate repetitionsdropna (
bool
) – Also subsumes rows for which allfeatures
are NaN, rather than treating them as a new value.
- Return type
Example
>>> df +--------------+--------------+-------------+----------+---------------+---------+------+---------+---------+--------------+ | | quarterbeats | duration_qb | localkey | chord | numeral | form | figbass | changes | relativeroot | +==============+==============+=============+==========+===============+=========+======+=========+=========+==============+ | [37.5, 38.5) | 75/2 | 1.0 | I | viio65(6b3)/V | vii | o | 65 | 6b3 | V | +--------------+--------------+-------------+----------+---------------+---------+------+---------+---------+--------------+ | [38.5, 40.5) | 77/2 | 2.0 | I | Ger | vii | o | 65 | b3 | V | +--------------+--------------+-------------+----------+---------------+---------+------+---------+---------+--------------+ | [40.5, 41.5) | 81/2 | 1.0 | I | V(7v4) | V | | | 7v4 | | +--------------+--------------+-------------+----------+---------------+---------+------+---------+---------+--------------+ | [41.5, 43.5) | 83/2 | 2.0 | I | V(64) | V | | | 64 | | +--------------+--------------+-------------+----------+---------------+---------+------+---------+---------+--------------+ | [43.5, 44.5) | 87/2 | 1.0 | I | V7(9) | V | | 7 | 9 | | +--------------+--------------+-------------+----------+---------------+---------+------+---------+---------+--------------+ | [44.5, 46.5) | 89/2 | 2.0 | I | V7 | V | | 7 | | | +--------------+--------------+-------------+----------+---------------+---------+------+---------+---------+--------------+ | [46.5, 48.0) | 93/2 | 1.5 | I | I | I | | | | | +--------------+--------------+-------------+----------+---------------+---------+------+---------+---------+--------------+
>>> group_annotations_by_features(df) +--------------+--------------+-------------+----------+--------------+---------+-------+ | | quarterbeats | duration_qb | localkey | relativeroot | numeral | chord | +==============+==============+=============+==========+==============+=========+=======+ | [37.5, 40.5) | 75/2 | 3.0 | I | V | vii | vii/V | +--------------+--------------+-------------+----------+--------------+---------+-------+ | [40.5, 46.5) | 81/2 | 6.0 | I | NaN | V | V | +--------------+--------------+-------------+----------+--------------+---------+-------+ | [46.5, 48.0) | 93/2 | 1.5 | I | NaN | I | I | +--------------+--------------+-------------+----------+--------------+---------+-------+
- ms3.transformations.labels2global_tonic(df, cols={}, inplace=False)[source]#
Transposes all numerals to their position in the global major or minor scale. This eliminates localkeys and relativeroots. The resulting chords are defined by [numeral, figbass, changes, globalkey_is_minor] (and pedal).
Uses:
transform()
,rel2abs_key^, :py:func:`resolve_relative_keys()
->str_is_minor()
transpose_changes()
,series_is_minor()
,- Parameters
df (
pandas.DataFrame
) – Dataframe containing DCML chord labels that have been split by split_labels() and where the keys have been propagated using propagate_keys(add_bool=True).cols (
dict
, optional) –In case the column names for
['numeral', 'form', 'figbass', 'changes', 'relativeroot', 'localkey', 'globalkey']
deviate, pass a dict, such as{'chord': 'chord_col_name' 'pedal': 'pedal_col_name', 'numeral': 'numeral_col_name', 'form': 'form_col_name', 'figbass': 'figbass_col_name', 'changes': 'changes_col_name', 'relativeroot': 'relativeroot_col_name', 'localkey': 'localkey_col_name', 'globalkey': 'globalkey_col_name'}}
inplace (
bool
, optional) – Pass True if you want to mutate the input.
- Returns
If inplace=False, the relevant features of the transposed chords are returned. Otherwise, the original DataFrame is mutated.
- Return type
- ms3.transformations.make_gantt_data(at, last_mn=None, relativeroots=True, mode_agnostic_adjacency=True)[source]#
Takes an expanded DCML annotation table and returns a DataFrame with timings of the included key segments, based on the column
localkey
. The column names are suited for the plotly library. Uses: rel2abs_key, resolve_relative_keys, roman_numeral2fifths roman_numerals2semitones, labels2global_tonic- Parameters
at (
pandas.DataFrame
) – Expanded DCML annotation table.last_mn (
int
, optional) – By default, the columnquarterbeats
is used for computing Start and Finish unless the column is not present, in which case a continuous version of measure numbers (MN) is used. In the latter case you should pass the last measure number of the piece in order to calculate the correct duration of the last key segment; otherwise it will go until the end of the last label’s MN. As soon as you pass a value, the columnquarterbeats
is ignored even if present. If you want to ignore it but don’t know the last MN, pass -1.relativeroots (
bool
, optional) – By default, additional rows are added based on the columnrelativeroot
. Pass False to prevent that.mode_agnostic_adjacency (
bool
, optional) – By default (ifrelativeroots
is True), additional rows are added for labels adjacent to temporarily tonicized roots, no matter if the mode is identical or not. For example, before and after a V/V, all V _and_ v labels will be grouped as adjacent segments. Pass False to group only labels with the same mode (only V labels in the example), or None to include no adjacency at all.
- ms3.transformations.notes2pcvs(notes, pitch_class_format='tpc', normalize=False, long=False, fillna=True, additional_group_cols=None, ensure_columns=None)[source]#
- Parameters
notes (
pandas.DataFrame
) – Note table to be transformed into a wide or long table of Pitch Class Vectors by grouping via the first (or only) index level. The DataFrame needs containing at least the columns ‘duration_qb’ and ‘tpc’ or ‘midi’, depending onpitch_class_format
.pitch_class_format (
str
, optional) –Defines the type of pitch classes to use for the vectors.’tpc’ (default): tonal pitch class, such that -1=F, 0=C, 1=G etc.’name’: tonal pitch class as spelled pitch, e.g. ‘C’, ‘F#’, ‘Abb’ etc.’pc’: chromatic pitch classes where 0=C, 1=C#/Db, … 11=B/Cb.’midi’: original MIDI numbers; the result are pitch vectors, not pitch class vectors.normalize (
bool
, optional) – By default, the PCVs contain absolute durations in quarter notes. Pass True to normalize the PCV for each group.long (
bool
, optional) – By default, the resulting DataFrames have wide format, i.e. each row contains the PCV for one slice. Pass True if you need long format instead, i.e. with a non-uniquepandas.IntervalIndex
and two columns,[('tpc'|'midi'), 'duration_qb']
where the first column’s name depends onpitch_class_format
.fillna (
bool
, optional) – By default, if a Pitch class does not appear in a PCV, its value will be 0. Pass False if you want NA instead.additional_group_cols ((
list
of)str
) – If you would like to maintain some information from other columns ofnotes
in additional index levels, pass their names.ensure_columns (
Iterable
, optional) – By default, pitch classes that don’t appear don’t get a column. Pass a value if you want to ensure the presence of particular columns, even if empty. For example, ifpitch_class_format='pc'
you could passensure_columns=range(12)
.
- ms3.transformations.resolve_all_relative_numerals(at, additional_columns=None, inplace=False)[source]#
Resolves Roman numerals that include slash notation such as ‘#vii/ii’ => ‘#i’ or ‘V/V/V’ => ‘VI’ in a major and ‘#VI’ in a minor key. The function expects the columns [‘globalkey_is_minor’, ‘localkey_is_minor’] to be present. The former is necessary only if the column ‘localkey’ is present and needs resolving. Execution will be slightly faster if performed on the entire DataFrame rather than using
transform_multiple()
.- Parameters
at (
pandas.DataFrame
) – Annotation table.additional_columns (
str
orlist
) – By default, the function resolves, if present, the columns [‘relativeroot’, ‘pedal’] but here you can name other columns, too. They will be resolved based on the localkey’s mode.inplace (
bool
, optional) – By default, a manipulated copy ofat
is returned. Pass True to mutate instead.
- ms3.transformations.segment_by_adjacency_groups(df, cols, na_values='group', group_keys=False)[source]#
Drop exact adjacent repetitions within one or a combination of several feature columns and adapt the IntervalIndex and the column ‘duration_qb’ accordingly. Uses:
adjacency_groups()
,reduce_dataframe_duration_to_first_row()
- Parameters
df (
pandas.DataFrame
) – DataFrame to be reduced, expected to contain the columnduration_qb
. In order to use the result as a segmentation, it should have apandas.IntervalIndex
.cols (
list
) – Feature columns which exact, adjacent repetitions should be grouped to a segment, keeping only the first row.na_values ((
list
of)str
orAny
, optional) –Either pass a list of equal length ascols
or a single value that is passed toadjacency_groups()
for each. Not dealing with NA values will lead to wrongly grouped segments. The default option is the safest.’group’ creates individual groups for NA values’backfill’ or ‘bfill’ groups NA values with the subsequent group’pad’, ‘ffill’ groups NA values with the preceding groupAny other value works like ‘group’, with the difference that the created groups will be named with this value.group_keys (
bool
, optional) – By default, the grouped values will be returned as an appended MultiIndex, differentiation groups via ascending integers. If you want to duplicate the columns’ value, e.g. to account for a custom filling value forna_values
, pass True. Beware that this most often results in non-unique index levels.
- Returns
Reduced DataFrame with updated ‘duration_qb’ column and
pandas.IntervalIndex
on the first level (if present).- Return type
- ms3.transformations.segment_by_criterion(df: DataFrame, boolean_mask: Union[Series, array], warn_na: bool = False) DataFrame [source]#
Drop all rows where the boolean mask does not match and adapt the IntervalIndex and the column ‘duration_qb’ accordingly.
- Parameters
df – DataFrame to be reduced, expected to come with the column
duration_qb
and anpandas.IntervalIndex
.boolean_mask – Boolean mask where every True value starts a new segment.
warn_na – If the boolean mask starts with any number of False, this first group will be missing from the result. Set warn_na to True if you want the logger to throw a warning in this case.
- Returns
Reduced DataFrame with updated ‘duration_qb’ column and
pandas.IntervalIndex
on the first level.
- ms3.transformations.segment_by_interval_index(df, idx, truncate=True)[source]#
Segment a DataFrame into chunks based on a given IntervalIndex.
- Parameters
df (
pandas.DataFrame
) – DataFrame that has apandas.IntervalIndex
to allow for its segmentation.idx (
pandas.IntervalIndex
orpandas.MultiIndex
) – Intervals by which to segmentdf
. The index will be prepended to differentiate between segments. Ifidx
is apandas.MultiIndex
, the first level is expected to be apandas.IntervalIndex
.truncate (
bool
, optional) – By default, the intervals of the segmented DataFrame will be cut off at segment boundaries and the event’s ‘duration_qb’ will be adapted accordingly. Pass False to prevent that and duplicate overlapping events without adapting their Intervals and ‘duration_qb’.
- Returns
A copy of
df
where the index levelsidx
have been prepended and only rows ofdf
with overlapping intervals are included.- Return type
- ms3.transformations.slice_df(df: DataFrame, quarters_per_slice: Optional[float] = None) Dict[Interval, DataFrame] [source]#
Returns a sliced version of the DataFrame. Slices appear in the IntervalIndex and the contained event’s durations within the slice are shown in the column ‘duration_qb’. Uses:
- Parameters
df (
pandas.DataFrame
) – The DataFrame is expected to come with an IntervalIndex and contain the columns ‘quarterbeats’ and ‘duration_qb’. Those can be obtained throughParse.get_lists(interval_index=True)
orParse.iter_transformed(interval_index=True)
.quarters_per_slice (
float
, optional) – By default, the slices have variable size, from onset to onset. If you pass a value, the slices will have that constant size, measured in quarter notes. For example, pass 1.0 for all slices to have size 1 quarter.
- Return type
- ms3.transformations.transform_multiple(df, func, level=-1, **kwargs)[source]#
Applying transformation(s) separately to concatenated pieces that can be differentiated by index level(s).
- Parameters
df (
pandas.DataFrame
) – Concatenated tables withpandas.MultiIndex
.func (
Callable
orstr
) – Function to be applied to the individual tables. For convenience, you can pass strings to call the standard transformers for a particular table type. For example, pass ‘annotations’ to calltransform_annotations
.level (
int
orlist
) – Argument passed topandas.DataFrame.groupby()
. Defaults to -1, resulting in a GroupBy by all levels except the last. Conversely, you can pass, for instance, 2 to group by the first two levels.kwargs – Keyword arguments passed to
func
.
- Return type
- ms3.transformations.transform_annotations(at, groupby_features=None, resolve_relative=False)[source]#
Wrapper for applying several transformations to an annotation table.
- Parameters
at (
pandas.DataFrame
) – Annotation table corresponding to a single piece.groupby_features (
str
orlist
) – Argumentfeatures
passed togroup_annotations_by_features()
.resolve_relative (
bool
) – Resolves slash notation (e.g. ‘vii/V’) from Roman numerals in the columns [‘localkey’, ‘relativeroot’, ‘pedal’].
- Return type
- ms3.transformations.transpose_notes_to_localkey(notes)[source]#
Transpose the columns ‘tpc’ and ‘midi’ such that they reflect the local key as if it was C major/minor. This operation is typically required for creating pitch class profiles. Uses:
transform()
,name2fifths()
,roman_numeral2fifths()
- Parameters
notes (
pandas.DataFrame
) – DataFrame that has at least the columns [‘globalkey’, ‘localkey’, ‘tpc’, ‘midi’].- Returns
A copy of
notes
where the columns ‘tpc’ and ‘midi’ are shifted in such a way that tpc=0 and midi=60 match the local tonic (e.g. for the local key A major/minor, each pitch A will have tpc=0 and midi % 12 = 0).- Return type
- ms3.transformations.transform_columns(df, func, columns=None, param2col=None, inplace=False, **kwargs)[source]#
Wrapper function to use transform() on df[columns], leaving the other columns untouched.
- Parameters
df (
pandas.DataFrame
) – DataFrame where columns (or column combinations) work as function arguments.func (
callable
) – Function you want to apply to all elements in columns.columns (
list
) – Columns to which you want to apply func.param2col (
dict
orlist
, optional) – Mapping from parameter names of func to column names. If you pass a list of column names, the columns’ values are passed as positional arguments. Pass None if you want to use all columns as positional arguments.inplace (
bool
, optional) – Pass True if you want to mutate df rather than getting an altered copy.**kwargs (keyword arguments for transform()) –
- ms3.transformations.transform_note_columns(df, to, note_cols=['chord_tones', 'added_tones', 'bass_note', 'root'], minor_col='localkey_is_minor', inplace=False, **kwargs)[source]#
Turns columns with line-of-fifth tonal pitch classes into another representation.
Uses: transform_columns()
- Parameters
df (
pandas.DataFrame
) – DataFrame where columns (or column combinations) work as function arguments.to ({'name', 'iv', 'pc', 'sd', 'rn'}) –
The tone representation that you want to get from the note_cols.
- ’name’: Note names. Should only be used if the stacked fifths actually represent
absolute tonal pitch classes rather than intervals over the local tonic. In other words, make sure to use ‘name’ only if 0 means C rather than I.
- ’iv’: Intervals such that 0 = ‘P1’, 1 = ‘P5’, 4 = ‘M3’, -3 = ‘m3’, 6 = ‘A4’,
-6 = ‘D5’ etc.
’pc’: (Relative) chromatic pitch class, or distance from tonic in semitones.
- ’sd’: Scale degrees such that 0 = ‘1’, -1 = ‘4’, -2 = ‘b7’ in major, ‘7’ in minor etc.
This representation requires a boolean column minor_col which is True in those rows where the stacks of fifths occur in a local minor context and False for the others. Alternatively, if all pitches are in the same mode or you simply want to express them as degrees of particular mode, you can pass the boolean keyword argument minor.
- ’rn’: Roman numerals such that 0 = ‘I’, -2 = ‘bVII’ in major, ‘VII’ in minor etc.
Requires boolean ‘minor’ values, see ‘sd’.
note_cols (
list
, optional) – List of columns that hold integers or collections of integers that represent stacks of fifth (0 = tonal center, 1 = fifth above, -1 = fourth above, etc).minor_col (
str
, optional) – If to is ‘sd’ or ‘rn’, specify a boolean column where the value is True in those rows where the stacks of fifths occur in a local minor context and False for the others.**kwargs (keyword arguments for transform()) –
- ms3.transformations.transpose_chord_tones_by_localkey(df, by_global=False)[source]#
- Returns a copy of the expanded table where the scale degrees in the chord tone columns
have been transposed by localkey (i.e. they express all chord tones as scale degrees of the globalkey) or, if
by_global
is set to True, additionally by globalkey (i.e., chord tones as tonal pitch classes TPC).
- Parameters
df (
pandas.DataFrame
) – Expanded labels with chord tone columns.by_global (
bool
) – By default, the transformed chord tone columns express chord tones as scale degrees (or intervals) of the global tonic. If set to True, they correspond to tonal pitch classes and can be further transformed to note names using transform_note_columns().
- Return type
The commandline interface#
The library offers you the following commands. Add the flag -h to one of them to learn about its parameters.
usage: ms3 [-h] [--version] {add,check,compare,convert,empty,extract,metadata,review,transform,update} ...
Positional Arguments#
- action
Possible choices: add, check, compare, convert, empty, extract, metadata, review, transform, update
The action that you want to perform.
options#
- --version
show program’s version number and exit
Sub-commands:#
add#
Add labels from annotation tables to scores.
ms3 add [-h] [--ask] [--use {expanded,labels}] [-d DIR] [-o OUT_DIR] [-n] [-a] [-i REGEX] [-e REGEX] [-f REGEX] [-m [PATH]] [--reviewed] [--files PATHs [PATHs ...]] [--iterative] [-l {c, e, w, i, d}] [--log [LOG]] [-t] [-v] [-s SUFFIX]
[--replace]
options#
- --ask
If several files are available for the selected facet (default: ‘expanded’, see –use), I will pick one automatically. Add –ask if you want me to have you select which ones to compare with the scores.
Default: False
- --use
Possible choices: expanded, labels
Which type of labels you want to compare with the ones in the score. Defaults to ‘expanded’, i.e., DCML labels. Set –use labels to use other labels available as TSV and set –ask if several sets of labels are available that you want to choose from.
Default: “expanded”
- -d, --dir
Folder(s) that will be scanned for input files. Defaults to current working directory if no individual files are passed via -f.
Default: /home/hentsche/PycharmProjects/ms3/docs
- -o, --out
Output directory.
- -n, --nonrecursive
Treat DIR as single corpus even if it contains corpus directories itself.
Default: False
- -a, --all
By default, only files listed in the ‘fname’ column of a ‘metadata.tsv’ file are parsed. With this option, all files will be parsed.
Default: False
- -i, --include
Select only files whose names include this string or regular expression.
- -e, --exclude
Any files or folders (and their subfolders) including this regex will be disregarded.By default, files including ‘_reviewed’ or starting with . or _ or ‘concatenated’ are excluded.
- -f, --folders
Select only folders whose names include this string or regular expression.
- -m, --musescore
- Command or path of your MuseScore 3 executable. -m by itself will set ‘auto’ (attempt to use standard path for your system).
Other shortcuts are -m win, -m mac, and -m mscore (for Linux).
- --reviewed
By default, review files and folder are excluded from parsing. With this option, they will be included, too.
Default: False
- --files
(Deprecated) The paths are expected to be within DIR. They will be converted into a view that includes only the indicated files. This is equivalent to specifying the file names as a regex via –include (assuming that file names are unique amongst corpora.
- --iterative
Do not use all available CPU cores in parallel to speed up batch jobs.
Default: False
- -l, --level
Choose how many log messages you want to see: c (none), e, w, i, d (maximum)
Default: “i”
- --log
Can be a file path or directory path. Relative paths are interpreted relative to the current directory.
- -t, --test
No data is written to disk.
Default: False
- -v, --verbose
Show more output such as files discarded from parsing.
Default: False
- -s, --suffix
Suffix of the new scores with inserted labels. Defaults to _annotated.
Default: “_annotated”
- --replace
Remove existing labels from the scores prior to adding. Like calling ms3 empty first.
Default: False
check#
Parse MSCX files and look for errors. In particular, check DCML harmony labels for syntactic correctness.
ms3 check [-h] [--ignore_scores] [--ignore_labels] [--fail] [-d DIR] [-o OUT_DIR] [-n] [-a] [-i REGEX] [-e REGEX] [-f REGEX] [-m [PATH]] [--reviewed] [--files PATHs [PATHs ...]] [--iterative] [-l {c, e, w, i, d}] [--log [LOG]] [-t] [-v]
options#
- --ignore_scores
Don’t check scores for encoding errors.
Default: False
- --ignore_labels
Don’t check DCML labels for syntactic correctness.
Default: False
- --fail
If you pass this argument the process will deliberately fail with an AssertionError when there are any mistakes.
Default: False
- -d, --dir
Folder(s) that will be scanned for input files. Defaults to current working directory if no individual files are passed via -f.
Default: /home/hentsche/PycharmProjects/ms3/docs
- -o, --out
Output directory.
- -n, --nonrecursive
Treat DIR as single corpus even if it contains corpus directories itself.
Default: False
- -a, --all
By default, only files listed in the ‘fname’ column of a ‘metadata.tsv’ file are parsed. With this option, all files will be parsed.
Default: False
- -i, --include
Select only files whose names include this string or regular expression.
- -e, --exclude
Any files or folders (and their subfolders) including this regex will be disregarded.By default, files including ‘_reviewed’ or starting with . or _ or ‘concatenated’ are excluded.
- -f, --folders
Select only folders whose names include this string or regular expression.
- -m, --musescore
- Command or path of your MuseScore 3 executable. -m by itself will set ‘auto’ (attempt to use standard path for your system).
Other shortcuts are -m win, -m mac, and -m mscore (for Linux).
- --reviewed
By default, review files and folder are excluded from parsing. With this option, they will be included, too.
Default: False
- --files
(Deprecated) The paths are expected to be within DIR. They will be converted into a view that includes only the indicated files. This is equivalent to specifying the file names as a regex via –include (assuming that file names are unique amongst corpora.
- --iterative
Do not use all available CPU cores in parallel to speed up batch jobs.
Default: False
- -l, --level
Choose how many log messages you want to see: c (none), e, w, i, d (maximum)
Default: “i”
- --log
Can be a file path or directory path. Relative paths are interpreted relative to the current directory.
- -t, --test
No data is written to disk.
Default: False
- -v, --verbose
Show more output such as files discarded from parsing.
Default: False
compare#
For MSCX files for which annotation tables exist, create another MSCX file with a coloured label comparison if differences are found.
ms3 compare [-h] [--ask] [--use {expanded,labels}] [--flip] [--safe] [-d DIR] [-o OUT_DIR] [-n] [-a] [-i REGEX] [-e REGEX] [-f REGEX] [-m [PATH]] [--reviewed] [--files PATHs [PATHs ...]] [--iterative] [-l {c, e, w, i, d}] [--log [LOG]] [-t] [-v]
[-c GIT_REVISION] [-s SUFFIX]
options#
- --ask
If several files are available for the selected facet (default: ‘expanded’, see –use), I will pick one automatically. Add –ask if you want me to have you select which ones to compare with the scores.
Default: False
- --use
Possible choices: expanded, labels
Which type of labels you want to compare with the ones in the score. Defaults to ‘expanded’, i.e., DCML labels. Set –use labels to use other labels available as TSV and set –ask if several sets of labels are available that you want to choose from.
Default: “expanded”
- --flip
Pass this flag to treat the annotation tables as if updating the scores instead of the other way around, effectively resulting in a swap of the colors in the output files.
Default: False
- --safe
Don’t overwrite existing files.
Default: True
- -d, --dir
Folder(s) that will be scanned for input files. Defaults to current working directory if no individual files are passed via -f.
Default: /home/hentsche/PycharmProjects/ms3/docs
- -o, --out
Output directory.
- -n, --nonrecursive
Treat DIR as single corpus even if it contains corpus directories itself.
Default: False
- -a, --all
By default, only files listed in the ‘fname’ column of a ‘metadata.tsv’ file are parsed. With this option, all files will be parsed.
Default: False
- -i, --include
Select only files whose names include this string or regular expression.
- -e, --exclude
Any files or folders (and their subfolders) including this regex will be disregarded.By default, files including ‘_reviewed’ or starting with . or _ or ‘concatenated’ are excluded.
- -f, --folders
Select only folders whose names include this string or regular expression.
- -m, --musescore
- Command or path of your MuseScore 3 executable. -m by itself will set ‘auto’ (attempt to use standard path for your system).
Other shortcuts are -m win, -m mac, and -m mscore (for Linux).
- --reviewed
By default, review files and folder are excluded from parsing. With this option, they will be included, too.
Default: False
- --files
(Deprecated) The paths are expected to be within DIR. They will be converted into a view that includes only the indicated files. This is equivalent to specifying the file names as a regex via –include (assuming that file names are unique amongst corpora.
- --iterative
Do not use all available CPU cores in parallel to speed up batch jobs.
Default: False
- -l, --level
Choose how many log messages you want to see: c (none), e, w, i, d (maximum)
Default: “i”
- --log
Can be a file path or directory path. Relative paths are interpreted relative to the current directory.
- -t, --test
No data is written to disk.
Default: False
- -v, --verbose
Show more output such as files discarded from parsing.
Default: False
- -c, --compare
By default, the _reviewed file displays removed labels in red and added labels in green, compared to the version currently represented in the present TSV files, if any. If instead you want a comparison with the TSV files from another Git commit, pass its specifier, e.g. ‘HEAD~3’, <branch-name>, <commit SHA> etc.
Default: “”
- -s, --suffix
Suffix of the newly created comparison files. Defaults to _compared
Default: “_compared”
convert#
Use your local install of MuseScore to convert MuseScore files.
ms3 convert [-h] [-d DIR] [-o OUT_DIR] [-n] [-a] [-i REGEX] [-e REGEX] [-f REGEX] [-m [PATH]] [--reviewed] [--files PATHs [PATHs ...]] [--iterative] [-l {c, e, w, i, d}] [--log [LOG]] [-t] [-v] [-s SUFFIX] [--format FORMAT]
[--extensions EXTENSIONS [EXTENSIONS ...]] [--safe]
options#
- -d, --dir
Folder(s) that will be scanned for input files. Defaults to current working directory if no individual files are passed via -f.
Default: /home/hentsche/PycharmProjects/ms3/docs
- -o, --out
Output directory.
- -n, --nonrecursive
Treat DIR as single corpus even if it contains corpus directories itself.
Default: False
- -a, --all
By default, only files listed in the ‘fname’ column of a ‘metadata.tsv’ file are parsed. With this option, all files will be parsed.
Default: False
- -i, --include
Select only files whose names include this string or regular expression.
- -e, --exclude
Any files or folders (and their subfolders) including this regex will be disregarded.By default, files including ‘_reviewed’ or starting with . or _ or ‘concatenated’ are excluded.
- -f, --folders
Select only folders whose names include this string or regular expression.
- -m, --musescore
- Command or path of your MuseScore 3 executable. -m by itself will set ‘auto’ (attempt to use standard path for your system).
Other shortcuts are -m win, -m mac, and -m mscore (for Linux).
- --reviewed
By default, review files and folder are excluded from parsing. With this option, they will be included, too.
Default: False
- --files
(Deprecated) The paths are expected to be within DIR. They will be converted into a view that includes only the indicated files. This is equivalent to specifying the file names as a regex via –include (assuming that file names are unique amongst corpora.
- --iterative
Do not use all available CPU cores in parallel to speed up batch jobs.
Default: False
- -l, --level
Choose how many log messages you want to see: c (none), e, w, i, d (maximum)
Default: “i”
- --log
Can be a file path or directory path. Relative paths are interpreted relative to the current directory.
- -t, --test
No data is written to disk.
Default: False
- -v, --verbose
Show more output such as files discarded from parsing.
Default: False
- -s, --suffix
Suffix of the converted files. Defaults to .
Default: “”
- --format
Output format of converted files. Defaults to mscx. Other options are {png, svg, pdf, mscz, wav, mp3, flac, ogg, xml, mxl, mid}
Default: “mscx”
- --extensions
Those file extensions that you want to be converted, separated by spaces. Defaults to mscx mscz
Default: [‘mscx’, ‘mscz’]
- --safe
Don’t overwrite existing files.
Default: True
empty#
Remove harmony annotations and store the MuseScore files without them.
ms3 empty [-h] [-d DIR] [-o OUT_DIR] [-n] [-a] [-i REGEX] [-e REGEX] [-f REGEX] [-m [PATH]] [--reviewed] [--files PATHs [PATHs ...]] [--iterative] [-l {c, e, w, i, d}] [--log [LOG]] [-t] [-v]
options#
- -d, --dir
Folder(s) that will be scanned for input files. Defaults to current working directory if no individual files are passed via -f.
Default: /home/hentsche/PycharmProjects/ms3/docs
- -o, --out
Output directory.
- -n, --nonrecursive
Treat DIR as single corpus even if it contains corpus directories itself.
Default: False
- -a, --all
By default, only files listed in the ‘fname’ column of a ‘metadata.tsv’ file are parsed. With this option, all files will be parsed.
Default: False
- -i, --include
Select only files whose names include this string or regular expression.
- -e, --exclude
Any files or folders (and their subfolders) including this regex will be disregarded.By default, files including ‘_reviewed’ or starting with . or _ or ‘concatenated’ are excluded.
- -f, --folders
Select only folders whose names include this string or regular expression.
- -m, --musescore
- Command or path of your MuseScore 3 executable. -m by itself will set ‘auto’ (attempt to use standard path for your system).
Other shortcuts are -m win, -m mac, and -m mscore (for Linux).
- --reviewed
By default, review files and folder are excluded from parsing. With this option, they will be included, too.
Default: False
- --files
(Deprecated) The paths are expected to be within DIR. They will be converted into a view that includes only the indicated files. This is equivalent to specifying the file names as a regex via –include (assuming that file names are unique amongst corpora.
- --iterative
Do not use all available CPU cores in parallel to speed up batch jobs.
Default: False
- -l, --level
Choose how many log messages you want to see: c (none), e, w, i, d (maximum)
Default: “i”
- --log
Can be a file path or directory path. Relative paths are interpreted relative to the current directory.
- -t, --test
No data is written to disk.
Default: False
- -v, --verbose
Show more output such as files discarded from parsing.
Default: False
extract#
Extract selected information from MuseScore files and store it in TSV files.
ms3 extract [-h] [-M [folder]] [-N [folder]] [-R [folder]] [-L [folder]] [-X [folder]] [-F [folder]] [-E [folder]] [-C [folder]] [-J [folder]] [-D [suffix]] [-s [SUFFIX ...]] [-p] [--raw] [-u] [--interval_index] [-d DIR] [-o OUT_DIR] [-n] [-a]
[-i REGEX] [-e REGEX] [-f REGEX] [-m [PATH]] [--reviewed] [--files PATHs [PATHs ...]] [--iterative] [-l {c, e, w, i, d}] [--log [LOG]] [-t] [-v]
options#
- -M, --measures
Folder where to store TSV files with measure information needed for tasks such as unfolding repetitions.
- -N, --notes
Folder where to store TSV files with information on all notes.
- -R, --rests
Folder where to store TSV files with information on all rests.
- -L, --labels
Folder where to store TSV files with information on all annotation labels.
- -X, --expanded
Folder where to store TSV files with expanded DCML labels.
- -F, --form_labels
Folder where to store TSV files with all form labels.
- -E, --events
Folder where to store TSV files with all events (chords, rests, articulation, etc.) without further processing.
- -C, --chords
Folder where to store TSV files with <chord> tags, i.e. groups of notes in the same voice with identical onset and duration. The tables include lyrics, dynamics, articulation, staff- and system texts, tempo marking, spanners, and thoroughbass figures.
- -J, --joined_chords
Like -C except that all Chords are substituted with the actual Notes they contain. This is useful, for example, for relating slurs to the notes they group, or bass figures to their bass notes.
- -D, --metadata
Set -D to update the ‘metadata.tsv’ files of the respective corpora with the parsed scores. Add a suffix if you want to update ‘metadata{suffix}.tsv’ instead.
- -s, --suffix
Pass -s to use standard suffixes or -s SUFFIX to choose your own. In the latter case they will be assigned to the extracted aspects in the order in which they are listed above (capital letter arguments).
- -p, --positioning
When extracting labels, include manually shifted position coordinates in order to restore them when re-inserting.
Default: False
- --raw
When extracting labels, leave chord symbols encoded instead of turning them into a single column of strings.
Default: True
- -u, --unfold
Unfold the repeats for all stored DataFrames.
Default: False
- --interval_index
Prepend a column with [start, end) intervals to the TSV files.
Default: False
- -d, --dir
Folder(s) that will be scanned for input files. Defaults to current working directory if no individual files are passed via -f.
Default: /home/hentsche/PycharmProjects/ms3/docs
- -o, --out
Output directory.
- -n, --nonrecursive
Treat DIR as single corpus even if it contains corpus directories itself.
Default: False
- -a, --all
By default, only files listed in the ‘fname’ column of a ‘metadata.tsv’ file are parsed. With this option, all files will be parsed.
Default: False
- -i, --include
Select only files whose names include this string or regular expression.
- -e, --exclude
Any files or folders (and their subfolders) including this regex will be disregarded.By default, files including ‘_reviewed’ or starting with . or _ or ‘concatenated’ are excluded.
- -f, --folders
Select only folders whose names include this string or regular expression.
- -m, --musescore
- Command or path of your MuseScore 3 executable. -m by itself will set ‘auto’ (attempt to use standard path for your system).
Other shortcuts are -m win, -m mac, and -m mscore (for Linux).
- --reviewed
By default, review files and folder are excluded from parsing. With this option, they will be included, too.
Default: False
- --files
(Deprecated) The paths are expected to be within DIR. They will be converted into a view that includes only the indicated files. This is equivalent to specifying the file names as a regex via –include (assuming that file names are unique amongst corpora.
- --iterative
Do not use all available CPU cores in parallel to speed up batch jobs.
Default: False
- -l, --level
Choose how many log messages you want to see: c (none), e, w, i, d (maximum)
Default: “i”
- --log
Can be a file path or directory path. Relative paths are interpreted relative to the current directory.
- -t, --test
No data is written to disk.
Default: False
- -v, --verbose
Show more output such as files discarded from parsing.
Default: False
metadata#
Update MSCX files with changes made to metadata.tsv (created via ms3 extract -D [-a]).
ms3 metadata [-h] [-d DIR] [-o OUT_DIR] [-n] [-a] [-i REGEX] [-e REGEX] [-f REGEX] [-m [PATH]] [--reviewed] [--files PATHs [PATHs ...]] [--iterative] [-l {c, e, w, i, d}] [--log [LOG]] [-t] [-v] [-s SUFFIX] [-p] [--empty] [--remove]
options#
- -d, --dir
Folder(s) that will be scanned for input files. Defaults to current working directory if no individual files are passed via -f.
Default: /home/hentsche/PycharmProjects/ms3/docs
- -o, --out
Output directory.
- -n, --nonrecursive
Treat DIR as single corpus even if it contains corpus directories itself.
Default: False
- -a, --all
By default, only files listed in the ‘fname’ column of a ‘metadata.tsv’ file are parsed. With this option, all files will be parsed.
Default: False
- -i, --include
Select only files whose names include this string or regular expression.
- -e, --exclude
Any files or folders (and their subfolders) including this regex will be disregarded.By default, files including ‘_reviewed’ or starting with . or _ or ‘concatenated’ are excluded.
- -f, --folders
Select only folders whose names include this string or regular expression.
- -m, --musescore
- Command or path of your MuseScore 3 executable. -m by itself will set ‘auto’ (attempt to use standard path for your system).
Other shortcuts are -m win, -m mac, and -m mscore (for Linux).
- --reviewed
By default, review files and folder are excluded from parsing. With this option, they will be included, too.
Default: False
- --files
(Deprecated) The paths are expected to be within DIR. They will be converted into a view that includes only the indicated files. This is equivalent to specifying the file names as a regex via –include (assuming that file names are unique amongst corpora.
- --iterative
Do not use all available CPU cores in parallel to speed up batch jobs.
Default: False
- -l, --level
Choose how many log messages you want to see: c (none), e, w, i, d (maximum)
Default: “i”
- --log
Can be a file path or directory path. Relative paths are interpreted relative to the current directory.
- -t, --test
No data is written to disk.
Default: False
- -v, --verbose
Show more output such as files discarded from parsing.
Default: False
- -s, --suffix
Suffix of the new scores with updated metadata fields.
- -p, --prelims
Pass this flag if, in addition to updating metadata fields, you also want score headers to be updated from the columns title_text, subtitle_text, composer_text, lyricist_text, part_name_text.
Default: False
- --empty
Set this flag to also allow empty values to be used for overwriting existing ones.
Default: False
- --remove
Set this flag to remove non-default metadata fields that are not columns in the metadata.tsv file anymore.
Default: False
review#
Extract facets, check labels, and create _reviewed files.
ms3 review [-h] [--ignore_scores] [--ignore_labels] [--fail] [--ask] [--use {expanded,labels}] [--flip] [--safe] [-M [folder]] [-N [folder]] [-R [folder]] [-L [folder]] [-X [folder]] [-F [folder]] [-E [folder]] [-C [folder]] [-J [folder]]
[-D [suffix]] [-s [SUFFIX ...]] [-p] [--raw] [-u] [--interval_index] [-d DIR] [-o OUT_DIR] [-n] [-a] [-i REGEX] [-e REGEX] [-f REGEX] [-m [PATH]] [--reviewed] [--files PATHs [PATHs ...]] [--iterative] [-l {c, e, w, i, d}] [--log [LOG]]
[-t] [-v] [-c [GIT_REVISION]] [--threshold THRESHOLD]
options#
- --ignore_scores
Don’t check scores for encoding errors.
Default: False
- --ignore_labels
Don’t check DCML labels for syntactic correctness.
Default: False
- --fail
If you pass this argument the process will deliberately fail with an AssertionError when there are any mistakes.
Default: False
- --ask
If several files are available for the selected facet (default: ‘expanded’, see –use), I will pick one automatically. Add –ask if you want me to have you select which ones to compare with the scores.
Default: False
- --use
Possible choices: expanded, labels
Which type of labels you want to compare with the ones in the score. Defaults to ‘expanded’, i.e., DCML labels. Set –use labels to use other labels available as TSV and set –ask if several sets of labels are available that you want to choose from.
Default: “expanded”
- --flip
Pass this flag to treat the annotation tables as if updating the scores instead of the other way around, effectively resulting in a swap of the colors in the output files.
Default: False
- --safe
Don’t overwrite existing files.
Default: True
- -M, --measures
Folder where to store TSV files with measure information needed for tasks such as unfolding repetitions.
- -N, --notes
Folder where to store TSV files with information on all notes.
- -R, --rests
Folder where to store TSV files with information on all rests.
- -L, --labels
Folder where to store TSV files with information on all annotation labels.
- -X, --expanded
Folder where to store TSV files with expanded DCML labels.
- -F, --form_labels
Folder where to store TSV files with all form labels.
- -E, --events
Folder where to store TSV files with all events (chords, rests, articulation, etc.) without further processing.
- -C, --chords
Folder where to store TSV files with <chord> tags, i.e. groups of notes in the same voice with identical onset and duration. The tables include lyrics, dynamics, articulation, staff- and system texts, tempo marking, spanners, and thoroughbass figures.
- -J, --joined_chords
Like -C except that all Chords are substituted with the actual Notes they contain. This is useful, for example, for relating slurs to the notes they group, or bass figures to their bass notes.
- -D, --metadata
Set -D to update the ‘metadata.tsv’ files of the respective corpora with the parsed scores. Add a suffix if you want to update ‘metadata{suffix}.tsv’ instead.
- -s, --suffix
Pass -s to use standard suffixes or -s SUFFIX to choose your own. In the latter case they will be assigned to the extracted aspects in the order in which they are listed above (capital letter arguments).
- -p, --positioning
When extracting labels, include manually shifted position coordinates in order to restore them when re-inserting.
Default: False
- --raw
When extracting labels, leave chord symbols encoded instead of turning them into a single column of strings.
Default: True
- -u, --unfold
Unfold the repeats for all stored DataFrames.
Default: False
- --interval_index
Prepend a column with [start, end) intervals to the TSV files.
Default: False
- -d, --dir
Folder(s) that will be scanned for input files. Defaults to current working directory if no individual files are passed via -f.
Default: /home/hentsche/PycharmProjects/ms3/docs
- -o, --out
Output directory.
- -n, --nonrecursive
Treat DIR as single corpus even if it contains corpus directories itself.
Default: False
- -a, --all
By default, only files listed in the ‘fname’ column of a ‘metadata.tsv’ file are parsed. With this option, all files will be parsed.
Default: False
- -i, --include
Select only files whose names include this string or regular expression.
- -e, --exclude
Any files or folders (and their subfolders) including this regex will be disregarded.By default, files including ‘_reviewed’ or starting with . or _ or ‘concatenated’ are excluded.
- -f, --folders
Select only folders whose names include this string or regular expression.
- -m, --musescore
- Command or path of your MuseScore 3 executable. -m by itself will set ‘auto’ (attempt to use standard path for your system).
Other shortcuts are -m win, -m mac, and -m mscore (for Linux).
- --reviewed
By default, review files and folder are excluded from parsing. With this option, they will be included, too.
Default: False
- --files
(Deprecated) The paths are expected to be within DIR. They will be converted into a view that includes only the indicated files. This is equivalent to specifying the file names as a regex via –include (assuming that file names are unique amongst corpora.
- --iterative
Do not use all available CPU cores in parallel to speed up batch jobs.
Default: False
- -l, --level
Choose how many log messages you want to see: c (none), e, w, i, d (maximum)
Default: “i”
- --log
Can be a file path or directory path. Relative paths are interpreted relative to the current directory.
- -t, --test
No data is written to disk.
Default: False
- -v, --verbose
Show more output such as files discarded from parsing.
Default: False
- -c, --compare
Pass -c if you want the _reviewed file to display removed labels in red and added labels in green, compared to the version currently represented in the present TSV files, if any. If instead you want a comparison with the TSV files from another Git commit, additionally pass its specifier, e.g. ‘HEAD~3’, <branch-name>, <commit SHA> etc.
- --threshold
Harmony segments where the ratio of non-chord tones vs. chord tones lies above this threshold will be printed in a warning and will cause the check to fail if the –fail flag is set. Defaults to 0.6 (3:2).
Default: 0.6
transform#
Concatenate and transform TSV data from one or several corpora. Available transformations are unfolding repeats and adding an interval index.
ms3 transform [-h] [-d DIR] [-o OUT_DIR] [-n] [-a] [-i REGEX] [-e REGEX] [-f REGEX] [-m [PATH]] [--reviewed] [--files PATHs [PATHs ...]] [--iterative] [-l {c, e, w, i, d}] [--log [LOG]] [-t] [-v] [-M] [-N] [-R] [-L] [-X] [-F [folder]] [-E] [-C]
[-D] [-s [SUFFIX ...]] [-u] [--interval_index]
options#
- -d, --dir
Folder(s) that will be scanned for input files. Defaults to current working directory if no individual files are passed via -f.
Default: /home/hentsche/PycharmProjects/ms3/docs
- -o, --out
Output directory.
- -n, --nonrecursive
Treat DIR as single corpus even if it contains corpus directories itself.
Default: False
- -a, --all
By default, only files listed in the ‘fname’ column of a ‘metadata.tsv’ file are parsed. With this option, all files will be parsed.
Default: False
- -i, --include
Select only files whose names include this string or regular expression.
- -e, --exclude
Any files or folders (and their subfolders) including this regex will be disregarded.By default, files including ‘_reviewed’ or starting with . or _ or ‘concatenated’ are excluded.
- -f, --folders
Select only folders whose names include this string or regular expression.
- -m, --musescore
- Command or path of your MuseScore 3 executable. -m by itself will set ‘auto’ (attempt to use standard path for your system).
Other shortcuts are -m win, -m mac, and -m mscore (for Linux).
- --reviewed
By default, review files and folder are excluded from parsing. With this option, they will be included, too.
Default: False
- --files
(Deprecated) The paths are expected to be within DIR. They will be converted into a view that includes only the indicated files. This is equivalent to specifying the file names as a regex via –include (assuming that file names are unique amongst corpora.
- --iterative
Do not use all available CPU cores in parallel to speed up batch jobs.
Default: False
- -l, --level
Choose how many log messages you want to see: c (none), e, w, i, d (maximum)
Default: “i”
- --log
Can be a file path or directory path. Relative paths are interpreted relative to the current directory.
- -t, --test
No data is written to disk.
Default: False
- -v, --verbose
Show more output such as files discarded from parsing.
Default: False
- -M, --measures
Concatenate measures TSVs for all selected pieces.
Default: False
- -N, --notes
Concatenate notes TSVs for all selected pieces.
Default: False
- -R, --rests
Concatenate rests TSVs for all selected pieces (use ms3 extract -R to create those).
Default: False
- -L, --labels
Concatenate raw harmony label TSVs for all selected pieces (use ms3 extract -L to create those).
Default: False
- -X, --expanded
Concatenate expanded DCML label TSVs for all selected pieces.
Default: False
- -F, --form_labels
Concatenate form label TSVs for all selected pieces.
- -E, --events
Concatenate events TSVs (notes, rests, articulation, etc.) for all selected pieces (use ms3 extract -E to create those).
Default: False
- -C, --chords
Concatenate chords TSVs (<chord> tags group notes in the same voice with identical onset and duration) including lyrics, dynamics, articulation, staff- and system texts, tempo marking, spanners, and thoroughbass figures, for all selected pieces (use ms3 extract -C to create those).
Default: False
- -D, --metadata
Output ‘concatenated_metadata.tsv’ with one row per selected piece.
Default: False
- -s, --suffix
Pass -s to use standard suffixes or -s SUFFIX to choose your own. In the latter case they will be assigned to the extracted aspects in the order in which they are listed above (capital letter arguments).
- -u, --unfold
Unfold the repeats for all concatenated DataFrames.
Default: False
- --interval_index
Prepend a column with [start, end) intervals to the TSV files.
Default: False
update#
Convert MSCX files to the latest MuseScore version and move all chord annotations to the Roman Numeral Analysis layer. This command overwrites existing files!!!
ms3 update [-h] [-d DIR] [-o OUT_DIR] [-n] [-a] [-i REGEX] [-e REGEX] [-f REGEX] [-m [PATH]] [--reviewed] [--files PATHs [PATHs ...]] [--iterative] [-l {c, e, w, i, d}] [--log [LOG]] [-t] [-v] [-s SUFFIX] [--above] [--safe] [--staff STAFF]
[--type TYPE]
options#
- -d, --dir
Folder(s) that will be scanned for input files. Defaults to current working directory if no individual files are passed via -f.
Default: /home/hentsche/PycharmProjects/ms3/docs
- -o, --out
Output directory.
- -n, --nonrecursive
Treat DIR as single corpus even if it contains corpus directories itself.
Default: False
- -a, --all
By default, only files listed in the ‘fname’ column of a ‘metadata.tsv’ file are parsed. With this option, all files will be parsed.
Default: False
- -i, --include
Select only files whose names include this string or regular expression.
- -e, --exclude
Any files or folders (and their subfolders) including this regex will be disregarded.By default, files including ‘_reviewed’ or starting with . or _ or ‘concatenated’ are excluded.
- -f, --folders
Select only folders whose names include this string or regular expression.
- -m, --musescore
- Command or path of your MuseScore 3 executable. -m by itself will set ‘auto’ (attempt to use standard path for your system).
Other shortcuts are -m win, -m mac, and -m mscore (for Linux).
- --reviewed
By default, review files and folder are excluded from parsing. With this option, they will be included, too.
Default: False
- --files
(Deprecated) The paths are expected to be within DIR. They will be converted into a view that includes only the indicated files. This is equivalent to specifying the file names as a regex via –include (assuming that file names are unique amongst corpora.
- --iterative
Do not use all available CPU cores in parallel to speed up batch jobs.
Default: False
- -l, --level
Choose how many log messages you want to see: c (none), e, w, i, d (maximum)
Default: “i”
- --log
Can be a file path or directory path. Relative paths are interpreted relative to the current directory.
- -t, --test
No data is written to disk.
Default: False
- -v, --verbose
Show more output such as files discarded from parsing.
Default: False
- -s, --suffix
Add this suffix to the filename of every new file.
- --above
Display Roman Numerals above the system.
Default: False
- --safe
Only moves labels if their temporal positions stay intact.
Default: False
- --staff
Which staff you want to move the annotations to. 1=upper staff; -1=lowest staff (default)
Default: -1
- --type
defaults to 1, i.e. moves labels to Roman Numeral layer. Other types have not been tested!
Default: 1
Unittests#
ms3
has a test suite that uses the PyTest library.
Install dependencies#
Install the library via pip install ms3[testing]
.
Configuring the tests#
In order to run the tests you need to
clone the unittest_metacorpus including submodules (ask for permission)
in the configuration file
new_tests/conftest.py
, change the value ofCORPUS_DIR
to the path containing your clone of the metacorpus (defaults to the user’s home directory)in the line below, copy the commit SHA of
TEST_COMMIT
, e.g.51e4cb5
, and checkout your metacorpus to that commit (e.g.,git checkout 51e4cb5
).
Running the tests#
In the commandline, head to your ms3
folder and call pytest new_tests
. Alternatively, some IDEs allow
you to right-click on the folder new_tests
and select something like Run pytest in new_tests
.