Language description with BNF (and eBNF)

Pedro Reis dos Santos
Universidade de Lisboa
(C)IST, 2022

1. Introduction

The BNF package provides two modules (bnf and ebnf) for grammar descriptions in the BNF and eBNF format, with the same operators. The two descriptions are very similar, and a quoting scheme was introduced in order to allow the definition of any input sequence. The grammar description can be parsed, for integrity check, and then input sequences can be tested to check whether they conform to the given grammar, or not. The processing of each input sequence provides a boolean result (True or False). The input sequence matching is performed by a simple LL parser with backtracking. This implies that not every grammar is accepted, but there is an equivalent LL grammar that can be used to describe the target language. The user must adapt its initial grammar in order to be able to use this tool.

2. Overview

BNF syntax:
  1. non-terminals between <>
  2. rules end at newline \n
  3. assign with ::=
  4. operators:
    • alternative derivations separated by |
    • group items between ()
    • optional items between {} or with postfix ? operator
    • zero or more repetitions with postfix * operator
    • one or more repetitions with postfix + operator
  5. set of terminal values between []: in set [aeiou], not in set [^aeiou] or ranges [a-z]
The BNF compiler uses a LL parser with backtracking:
  1. no left-recursion: <X> ::= <X> ...
  2. no a+ a alike sequences
  3. longest rule first: rule <X> ::= a | a b must be replaced by <X> ::= a b | a
  4. special chars <>(){}[]|+*?:= each must be quoted with \
eBNF syntax:
  1. terminal symbols must be quoted between "": "if"
  2. rules end with ; not a newline

3. Parsing language descriptions

Grammar example in BNF for a python tuple of integer literals (tuple.bnf):
<tuple> ::= \( <body> \) | \( \)
<body>  ::= <elem> <num> | <elem>
<elem>  ::= <num> , <elem> | <num> ,
<num>   ::= <dig> <num> | <dig>
<dig>   ::= [0-9]
The tuple example in eBNF becomes (tuple.ebnf):
tuple ::= '(' body ')' | '(' ')' ;
body  ::= elem num | elem ;
elem  ::= num ',' elem | num ',' ;
num   ::= dig num | dig ;
dig   ::= [0-9] ;

4. Matching input sequences

Test if an input sequence matches the above grammar with:
echo -n "(12,34,)" | python3 -m ebnf tuple.ebnf
The printed result should be True or False whether the input sequence is accepted by the grammar, or not, respectively. If the input sequence is store in file (sequence.txt), use it as a second argument:
python3 -m ebnf tuple.ebnf sequence.txt
Note: input sequence must not contain a newline (\n) if grammar does not support it (use echo -n) When no arguments are given, the grammar is read from the terminal and, after a first EOF (End-of-file: ctrl-D in unix or ctrl-Z in windows), the input sequence:
prompt$ python3 -m bnf
<x> ::= a b+ c
input sequence: end with EOF (^D) or use ^D^D to end with no EOL
abbc
True
prompt$
Use the environment DEBUG=1 for a verbose output (DEBUG=2 for a more verbose output):
echo -n "(12,34,)" | DEBUG=1 python3 -m ebnf tuple.ebnf
In interactive mode:
>>> from bnf import grammar, parse
>>> grammar("<x> ::= a b+ c\n")
>>> parse("abbc")