--- title: Clean notebooks keywords: fastai sidebar: home_sidebar summary: "Strip notebooks from superfluous metadata" description: "Strip notebooks from superfluous metadata" nb_path: "nbs/07_clean.ipynb" ---
{% raw %}
{% endraw %} {% raw %}
{% endraw %}

To avoid pointless conflicts while working with jupyter notebooks (with different execution counts or cell metadata), it is recommended to clean the notebooks before committing anything (done automatically if you install the git hooks with nbdev_install_git_hooks). The following functions are used to do that.

Utils

{% raw %}

rm_execution_count[source]

rm_execution_count(o)

Remove execution count in o

{% endraw %} {% raw %}
{% endraw %} {% raw %}

clean_output_data_vnd[source]

clean_output_data_vnd(o)

Remove application/vnd.google.colaboratory.intrinsic+json in data entries

{% endraw %} {% raw %}
{% endraw %} {% raw %}

clean_cell_output[source]

clean_cell_output(cell)

Remove execution count in cell

{% endraw %} {% raw %}
{% endraw %} {% raw %}
{% endraw %} {% raw %}

clean_cell[source]

clean_cell(cell, clear_all=False)

Clean cell by removing superfluous metadata or everything except the input if clear_all

{% endraw %} {% raw %}
{% endraw %} {% raw %}
tst = {'cell_type': 'code',
       'execution_count': 26,
       'metadata': {'hide_input': True, 'meta': 23},
       'outputs': [{'execution_count': 2, 
                    'data': {
                        'application/vnd.google.colaboratory.intrinsic+json': {
                            'type': 'string'},
                        'plain/text': ['sample output',]
                    },
                    'output': 'super'}],
       
       'source': 'awesome_code'}
tst1 = tst.copy()

clean_cell(tst)
test_eq(tst, {'cell_type': 'code',
              'execution_count': None,
              'metadata': {'hide_input': True},
              'outputs': [{'execution_count': None, 
                           'data': {'plain/text': ['sample output',]},
                           'output': 'super'}],
              'source': 'awesome_code'})

clean_cell(tst1, clear_all=True)
test_eq(tst1, {'cell_type': 'code',
               'execution_count': None,
               'metadata': {},
               'outputs': [],
               'source': 'awesome_code'})
{% endraw %} {% raw %}
tst2 = {
       'metadata': {'tags':[]},
       'outputs': [{
                    'metadata': {
                        'tags':[]
                    }}],
       
          "source": [
    ""
   ]}
clean_cell(tst2, clear_all=False)
test_eq(tst2, {
               'metadata': {},
               'outputs': [{
                    'metadata':{}}],
               'source': []})
{% endraw %} {% raw %}

clean_nb[source]

clean_nb(nb, clear_all=False)

Clean nb from superfluous metadata, passing clear_all to clean_cell

{% endraw %} {% raw %}
{% endraw %} {% raw %}
tst = {'cell_type': 'code',
       'execution_count': 26,
       'metadata': {'hide_input': True, 'meta': 23},
       'outputs': [{'execution_count': 2,
                    'data': {
                        'application/vnd.google.colaboratory.intrinsic+json': {
                            'type': 'string'},
                        'plain/text': ['sample output',]
                    },
                    'output': 'super'}],
       'source': 'awesome_code'}
nb = {'metadata': {'kernelspec': 'some_spec', 'jekyll': 'some_meta', 'meta': 37},
      'cells': [tst]}

clean_nb(nb)
test_eq(nb['cells'][0], {'cell_type': 'code',
              'execution_count': None,
              'metadata': {'hide_input': True},
              'outputs': [{'execution_count': None, 
                           'data': { 'plain/text': ['sample output',]},
                           'output': 'super'}],
              'source': 'awesome_code'})
test_eq(nb['metadata'], {'kernelspec': 'some_spec', 'jekyll': 'some_meta'})
{% endraw %} {% raw %}
{% endraw %} {% raw %}

clean_cr[source]

clean_cr(s)

{% endraw %} {% raw %}
{% endraw %} {% raw %}
assert clean_cr(fr'a{BSLASH}r\nb{BSLASH}rc\n') == fr'a\nb\nc\n'
{% endraw %}

Main function

{% raw %}

nbdev_clean_nbs[source]

nbdev_clean_nbs(fname:"A notebook name or glob to convert"=None, clear_all:"Clean all metadata and outputs"=False, disp:"Print the cleaned outputs"=False, read_input_stream:"Read input stram and not nb folder"=False)

Clean all notebooks in fname to avoid merge conflicts

{% endraw %} {% raw %}
{% endraw %}

By default (fname left to None), the all the notebooks in lib_folder are cleaned. You can opt in to fully clean the notebook by removing every bit of metadata and the cell outputs by passing clear_all=True. disp is only used for internal use with git hooks and will print the clean notebook instead of saving it. Same for read_input_stream that will read the notebook from the input stream instead of the file names.