BNF (Backus-Naur form) parser

Pedro Reis dos Santos
University of Lisboa
(C)IST, 2022

bnf-1.0

1. Introduction

This document aims at providing some insight on the eBNF parser developped for the ebnf module. The high-level functions are called externally by the user. The other routines are documented to make undestanding of the code smooth and facilitate further changes.

2. High-level BNF functions

The main routine is the top level user routine. The bnf routine is the actual processing rotine, the parse routine matches the input sequence to a given grammar, the dump routine the grammar routines are used to build a grammar from different input formats.

2.1 Debug

The ebnf package already includes an extensive debug mode to help developers. A debug variable can be set to values ranging from 0 produce no debug information, to 5, the highest debug level.
pyburg.debug=0 # no debug
A debug value higher than 0 will report errors while processing the grammer, no match for start symbol in input processing, and print the tree final cost. A debug value higher than 1 will also print the reduced rules and reports a missing goal variable or if it unable to produce grammar from the input arguments. A debug value higher than 2 include labeling information about tree node and rules. A debug value higher than 3 reports costs. A debug value higher than 4 prints reduce state information and closure setup.

2.2 Main

The main(argv) function is the high level function called when the module is directly invoqued. When the list, or tuple, contains two arguments, the first is taken as the grammar filename, and its contents is processed, and the second argument is an input data file. The rotine processes the input, given the grammar, and exits the process with a 0 (zero) code if the input is accept by the grammar, or exits the process with a code 2 (two) if the input is rejected by the grammar.

2.3 BNF

The bnf(filename, data, debug) function matches an input data sequence to a grammar, given the grammar's filename. The option debug parameter activates a multi-level verbose mode.
def bnf(filename, data, debug=False)

2.4 parse

The def parse(data, gram, nterm) function matches an input data sequence to a grammar (gram), given a starting nonterminal symbol, nterm. When invoqued externaly, the nonterminal (nterm) should be the grammar's start symbol. However, internaly, the routine is recursively invoqued for every potential nonterminal, and the input data sequence adjusted accordingly. If no gram or nterm are given, the previous values returned from the bnf are used. The routine uses a global variable recurs in order to keep track of ilimited recursion and, therefore, is not reentrant.
def parse(data, gram=None, nterm=None)

2.5 dump

The dump(gram, start) function is a debug routine that prints the parsed grammar and start symbol. If no gram or nterm are given, the previous values returned from the bnf are used.
def dump(gram=None, start=None)

2.6 grammar

The grammar(data) function builds a grammar structure and determines the its start symbol, given its textual description data as a character string. A grammar structure is a python's dictionary where the keys are nonterminal symbols as strings and its values are python's lists of rules. Each rule is a python's list of terminal and nonterminal symbols, tagged by type and represented as strings.
def grammar(data)

2. High-level eBNF functions

The eBNF parser uses the same structure as the BNF and the routines have the same names. The grammar internal representation format is the same, only the syntactic sugar is different. The module containing the routines is called ebnf.py.