3

BNF Notation: Dive Deeper Into Python's Grammar

 6 months ago
source link: https://realpython.com/python-bnf-notation/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Getting to Know Backus-Naur Form Notation (BNF)

The Backus–Naur form or Backus normal form (BNF) is a metasyntax notation for context-free grammars. Computer scientists often use this notation to describe the syntax of programming languages because it allows them to write a detailed description of a language’s grammar.

The BNF notation consists of three core pieces:

Component Description Examples
Terminals Strings that must exactly match specific items in the input. "def", "return", ":"
Nonterminals Symbols that will be replaced by a concrete value. They may also be called simply syntactic variables. <letter>, <digit>
Rules Conventions of terminals and nonterminals that define how these elements relate. <letter> ::= "a"

By combining terminals and nonterminals, you can create BNF rules, which can get as detailed as you need. Nonterminals must have their own defining rules. In a piece of grammar, you’ll have a root rule and potentially many secondary rules that define the required nonterminals. This way, you may end up with a hierarchy of rules.

BNF rules are the core components of a BNF grammar. So, a grammar is a set of BNF rules that are also called production rules.

In practice, you can build a set of BNF rules to specify the grammar of a language. Here, language refers to a set of strings that are valid according to the rules defined in the corresponding grammar. BNF is mainly used for programming languages.

For example, the Python syntax has a grammar that’s defined as a set of BNF rules, and these rules are used to validate the syntax of any piece of Python code. If the code doesn’t fulfill the rules, then you’ll get a SyntaxError.

You’ll find many variations of the original BNF notation out there. Some of the most relevant include the extended Backus–Naur form (EBNF) and augmented Backus–Naur form (ABNF).

In the following sections, you’ll learn the basics of creating BNF rules. Note that you’ll use a variation of BNF that matches the requirements of the BNF Playground site, which you’ll use for testing your rules.

BNF Rules and Their Components

As you already learned, by combining terminals and nonterminals, you can create BNF rules. These rules typically follow the syntax below:

BNF Grammar
<symbol> ::= expression

In the BNF rule syntax, you have the following parts:

  • <symbol> is a nonterminal variable, which is often enclosed in angle brackets (<>).
  • ::= means that the nonterminal on the left will be replaced with the expression on the right.
  • expression consists of a series of terminals, nonterminals, and other symbols that define a specific piece of grammar.

When building BNF rules, you can use a variety of symbols with specific meanings. For example, if you’re going to use the BNF Playground site to compile and test your rules, then you’ll find yourself using some of the following symbols:

Symbol Meaning
"" Encloses a terminal symbol
<> Indicates a nonterminal symbol
() Indicates a group of valid options
+ Specifies one or more of the previous element
* Specifies zero or more of the previous element
? Specifies zero or one occurrence of the previous element
| Indicates that you can select one of the options
[x-z] Indicates letter or digit intervals

Once you know how to write a BNF rule and what symbols to use, you can start creating your own rules. Note that the BNF Playground has several additional symbols and syntactical constructs that you can use in your rules. For a complete reference, click the Grammar Help section at the top of the page.

Now, it’s time to start playing with a couple of custom BNF rules. To kick things off, you’ll start with a generic example.

A Generic Example: Grammar for a Full Name

Say that you need to create a context-free grammar to define how a user should input a person’s full name. In this case, the full name will have three components:

  1. First name
  2. Middle name
  3. Family name

Between each component, you need to place exactly one whitespace. You should also treat the middle name as optional. Here’s how you can define this rule:

BNF Grammar
<full_name> ::= <first_name> " " (<middle_name> " ")? <family_name>

The left-hand part of your BNF rule is a nonterminal variable that identifies the person’s full name. The ::= symbol denotes that <full_name> will be replaced with the right-hand part of the rule.

The right-hand part of the rule has several components. First, you have the first name, which you define using the <first_name> nonterminal. Next, you need a space to separate the first name from the following component. To define this space, you use a terminal, which consists of a space character between quotes.

After the first name, you can accept a middle name, and after that, you need another space. So, you open parentheses to group these two elements. Then you create <middle_name> and the " " terminal. Both are optional, so you use a question mark (?) after to denote that condition.

Finally, you need the family name. To define this component, you use another nonterminal, <family_name>. That’s it! You’ve built your first BNF rule. However, you still don’t have a working grammar. You only have a root rule.

To complete the grammar, you need to define rules for <first_name>, <middle_name>, and <family_name>. To do this, you need to meet some requirements:

  • Each name component will accept only letters.
  • Each name component will start with a capital letter and continue with lowercase letters.

In this case, you can start by defining two rules, one for uppercase letters and one for lowercase letters:

BNF Grammar
<full_name>        ::= <first_name> " " (<middle_name> " ")? <family_name>
<uppercase_letter> ::= [A-Z]
<lowercase_letter> ::= [a-z]

In the highlighted lines of this grammar snippet, you create two pretty similar rules. The first rule accepts all the ASCII letters from uppercase A to Z. The second rule accepts all the lowercase letters. In this example, you don’t support accents or other non-ASCII letters.

With these rules in place, you can build the rest of your rules. To kick things off, go ahead and add the <first_name> rule:

BNF Grammar
<full_name>        ::= <first_name> " " (<middle_name> " ")? <family_name>
<uppercase_letter> ::= [A-Z]
<lowercase_letter> ::= [a-z]
<first_name>       ::= <uppercase_letter> <lowercase_letter>*

To define the <first_name> rule, you start with the <uppercase_letter> nonterminal to express that the first letter of the name must be an uppercase letter. Then, you continue with the <lowercase_letter> nonterminal followed by an asterisk (*). This asterisk means that the first name will accept zero or more lowercase letters after the initial uppercase letter.

You can follow this same pattern to build the <middle_name> and <family_name> rules. Would you like to give it a try? Once you’re done, click the collapsible section below to get the complete grammar so that you can compare it with yours:

You can check if your full name grammar works using the BNF Playground site. Here’s a demo:

Once you navigate to the BNF Playground site, you can paste your grammar rules in the text input area. Then press the COMPILE BNF button. If everything is okay with your BNF rules, then you can enter a full name in the Test a string here! input field. Once you’ve entered a person’s full name, the field will turn green if the input string fulfills the rules.

A Programming-Related Example: Identifiers

In the previous section, you learned how to create a BNF grammar that defines how your users must provide a person’s name. This is a generic example that may or may not relate to programming. In this section, you’ll get more technical by writing a short set of BNF rules to validate an identifier in a hypothetical programming language.

An identifier can be a variable, function, class, or an object’s name. In your example, you’ll write a set of rules to check whether a given string meets the following requirements:

  • The first character is an uppercase or lowercase letter or an underscore.
  • The rest of the characters can be uppercase or lowercase letters, digits, or underscores.

Here’s the root rule for your identifier:

BNF Grammar
<identifier> ::= <char> (<char> | <digit>)*

In this rule, you have the <identifier> nonterminal variable, which defines the root. On the right-hand side, you first have the <char> nonterminal. The rest of the identifier is grouped inside parentheses. The asterisk after the group says that elements from the group can appear zero or more times. Each such element is either a character or a digit.

Now, you need to define the <char> and <digit> nonterminals with their own dedicated rules. They’ll look like in the code below:

BNF Grammar
<identifier> ::= <char> (<char> | <digit>)*
<char>       ::= [A-Z] | [a-z] | "_"
<digit>      ::= [0-9]

The <char> rule accepts one ASCII letter in either lowercase or uppercase. Alternatively, it can accept an underscore. Finally, the <digit> rule accepts a digit from 0 to 9. Now, your set of rules is complete. Go ahead and give it a try on the BNF Playground site.

For you as a programmer, reading BNF rules can be a pretty useful skill. For example, you’ll often find that the official documentation of many programming languages includes the BNF grammar of the languages, in whole or in part. So, being able to read BNF allows you to better understand the language syntax and intricacies.

From this point on, you’ll learn how to read Python’s BNF variation, which you’ll find in several parts of the language documentation.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK