HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, an operator precedence parser is a
bottom-up parser Bottom-up may refer to: * Bottom-up analysis, a fundamental analysis technique in accounting and finance * Bottom-up parsing, a computer science strategy * Bottom-up processing, in Pattern recognition (psychology) * Bottom-up theories of galaxy fo ...
that interprets an operator-precedence grammar. For example, most
calculator An electronic calculator is typically a portable electronic device used to perform calculations, ranging from basic arithmetic to complex mathematics. The first solid-state electronic calculator was created in the early 1960s. Pocket-sized ...
s use operator precedence parsers to convert from the human-readable
infix notation Infix notation is the notation commonly used in arithmetical and logical formulae and statements. It is characterized by the placement of operators between operands—" infixed operators"—such as the plus sign in . Usage Binary relations a ...
relying on
order of operations In mathematics and computer programming, the order of operations (or operator precedence) is a collection of rules that reflect conventions about which procedures to perform first in order to evaluate a given mathematical expression. For exampl ...
to a format that is optimized for evaluation such as
Reverse Polish notation Reverse Polish notation (RPN), also known as reverse Łukasiewicz notation, Polish postfix notation or simply postfix notation, is a mathematical notation in which operators ''follow'' their operands, in contrast to Polish notation (PN), in whi ...
(RPN).
Edsger Dijkstra Edsger Wybe Dijkstra ( ; ; 11 May 1930 – 6 August 2002) was a Dutch computer scientist, programmer, software engineer, systems scientist, and science essayist. He received the 1972 Turing Award for fundamental contributions to developing progra ...
's
shunting yard algorithm In computer science, the shunting yard algorithm is a method for parsing arithmetical or logical expressions, or a combination of both, specified in infix notation. It can produce either a postfix notation string, also known as Reverse Polish ...
is commonly used to implement operator precedence parsers.


Relationship to other parsers

An operator-precedence parser is a simple
shift-reduce parser A shift-reduce parser is a class of efficient, table-driven bottom-up parsing methods for computer languages and other notations formally defined by a grammar. The parsing methods most commonly used for parsing programming languages, LR parsing a ...
that is capable of parsing a subset of
LR(1) In computer science, a canonical LR parser or LR(1) parser is an LR(k) parser for ''k=1'', i.e. with a single lookahead terminal. The special attribute of this parser is that any LR(k) grammar with ''k>1'' can be transformed into an LR(1) grammar. ...
grammars. More precisely, the operator-precedence parser can parse all LR(1) grammars where two consecutive
nonterminal In computer science, terminal and nonterminal symbols are the lexical elements used in specifying the production rules constituting a formal grammar. ''Terminal symbols'' are the elementary symbols of the language defined by a formal grammar. ...
s and
epsilon Epsilon (, ; uppercase , lowercase or lunate ; el, έψιλον) is the fifth letter of the Greek alphabet, corresponding phonetically to a mid front unrounded vowel or . In the system of Greek numerals it also has the value five. It was der ...
never appear in the right-hand side of any rule. Operator-precedence parsers are not used often in practice; however they do have some properties that make them useful within a larger design. First, they are simple enough to write by hand, which is not generally the case with more sophisticated right shift-reduce parsers. Second, they can be written to consult an operator table at
run time Run(s) or RUN may refer to: Places * Run (island), one of the Banda Islands in Indonesia * Run (stream), a stream in the Dutch province of North Brabant People * Run (rapper), Joseph Simmons, now known as "Reverend Run", from the hip-hop group ...
, which makes them suitable for languages that can add to or change their operators while parsing. (An example is
Haskell Haskell () is a general-purpose, statically-typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research and industrial applications, Haskell has pioneered a number of programming lan ...
, which allows user-defined infix operators with custom associativity and precedence; consequentially, an operator-precedence parser must be run on the program ''after'' parsing of all referenced modules.) Raku sandwiches an operator-precedence parser between two
recursive descent parser In computer science, a recursive descent parser is a kind of top-down parser built from a set of mutually recursive procedures (or a non-recursive equivalent) where each such procedure implements one of the nonterminals of the grammar. Thus t ...
s in order to achieve a balance of speed and dynamism. GCC's C and C++ parsers, which are hand-coded recursive descent parsers, are both sped up by an operator-precedence parser that can quickly examine arithmetic expressions. Operator precedence parsers are also embedded within
compiler-compiler In computer science, a compiler-compiler or compiler generator is a programming tool that creates a parser, interpreter, or compiler from some form of formal description of a programming language and machine. The most common type of compiler- ...
-generated parsers to noticeably speed up the recursive descent approach to expression parsing.


Precedence climbing method

The precedence climbing method is a compact, efficient, and flexible algorithm for parsing expressions that was first described by Martin Richards and Colin Whitby-Strevens. An infix-notation expression grammar in EBNF format will usually look like this: expression ::= equality-expression equality-expression ::= additive-expression ( ( '

' , '!=' ) additive-expression ) * additive-expression ::= multiplicative-expression ( ( '+' , '-' ) multiplicative-expression ) * multiplicative-expression ::= primary ( ( '*' , '/' ) primary ) * primary ::= '(' expression ')' , NUMBER , VARIABLE , '-' primary
With many levels of precedence, implementing this grammar with a predictive recursive-descent parser can become inefficient. Parsing a number, for example, can require five function calls: one for each non-terminal in the grammar until reaching ''primary''. An operator-precedence parser can do the same more efficiently. The idea is that we can left associate the arithmetic operations as long as we find operators with the same precedence, but we have to save a temporary result to evaluate higher precedence operators. The algorithm that is presented here does not need an explicit stack; instead, it uses recursive calls to implement the stack. The algorithm is not a pure operator-precedence parser like the Dijkstra shunting yard algorithm. It assumes that the ''primary'' nonterminal is parsed in a separate subroutine, like in a recursive descent parser.


Pseudocode

The pseudocode for the algorithm is as follows. The parser starts at function ''parse_expression''. Precedence levels are greater than or equal to 0. parse_expression() return parse_expression_1(parse_primary(), 0) parse_expression_1(lhs, min_precedence) ''lookahead'' := peek next token while ''lookahead'' is a binary operator whose precedence is >= ''min_precedence'' ''op'' := ''lookahead'' advance to next token ''rhs'' := ''parse_primary'' () ''lookahead'' := peek next token while ''lookahead'' is a binary operator whose precedence is greater than ''op'''s, or a right-associative operator whose precedence is equal to ''ops ''rhs'' := ''parse_expression_1'' (''rhs'', precedence of ''op'' + (1 if ''lookahead'' precedence is greater, else 0)) ''lookahead'' := peek next token ''lhs'' := the result of applying ''op'' with operands ''lhs'' and ''rhs'' return ''lhs'' Note that in the case of a production rule like this (where the operator can only appear once): equality-expression ::= additive-expression ( '

' , '!=' ) additive-expression
the algorithm must be modified to accept only binary operators whose precedence is > ''min_precedence''.


Example execution of the algorithm

An example execution on the expression 2 + 3 * 4 + 5

19 is as follows. We give precedence 0 to equality expressions, 1 to additive expressions, 2 to multiplicative expressions. ''parse_expression_1'' (''lhs'' = 2, ''min_precedence'' = 0) * the lookahead token is +, with precedence 1. the outer while loop is entered. * ''op'' is + (precedence 1) and the input is advanced * ''rhs'' is 3 * the lookahead token is *, with precedence 2. the inner while loop is entered.
''parse_expression_1'' (''lhs'' = 3, ''min_precedence'' = 2) :* the lookahead token is *, with precedence 2. the outer while loop is entered. ::* ''op'' is * (precedence 2) and the input is advanced ::* ''rhs'' is 4 ::* the next token is +, with precedence 1. the inner while loop is not entered. ::* ''lhs'' is assigned 3*4 = 12 ::* the next token is +, with precedence 1. the outer while loop is left. :* 12 is returned. * the lookahead token is +, with precedence 1. the inner while loop is not entered. * ''lhs'' is assigned 2+12 = 14 * the lookahead token is +, with precedence 1. the outer while loop is not left. * ''op'' is + (precedence 1) and the input is advanced * ''rhs'' is 5 * the next token is

, with precedence 0. the inner while loop is not entered. * ''lhs'' is assigned 14+5 = 19 * the next token is

, with precedence 0. the outer while loop is not left. * ''op'' is

(precedence 0) and the input is advanced * ''rhs'' is 19 * the next token is ''end-of-line'', which is not an operator. the inner while loop is not entered. * ''lhs'' is assigned the result of evaluating 19

19, for example 1 (as in the C standard). * the next token is ''end-of-line'', which is not an operator. the outer while loop is left. 1 is returned.


Pratt parsing

Another precedence parser known as Pratt parsing was first described by
Vaughan Pratt Vaughan Pratt (born April 12, 1944) is a Professor Emeritus at Stanford University, who was an early pioneer in the field of computer science. Since 1969, Pratt has made several contributions to foundational areas such as search algorithms, sorti ...
in the 1973 paper "Top down operator precedence", based on
recursive descent In computer science, a recursive descent parser is a kind of top-down parsing, top-down parser built from a set of mutual recursion, mutually recursive procedures (or a non-recursive equivalent) where each such procedure (computer science), proce ...
. Though it predates precedence climbing, it can be viewed as a generalization of precedence climbing. Pratt designed the parser originally to implement the
CGOL CGOL (pronounced ''"see goll"'') is an alternative syntax featuring an extensible algebraic notation for the Lisp programming language. It was designed for MACLISP by Vaughan Pratt and subsequently ported to Common Lisp. The notation of CGOL ...
programming language, and it was treated in much more depth in a Masters Thesis under his supervision. Tutorials and implementations: *
Douglas Crockford Douglas Crockford is an American computer programmer who is involved in the development of the JavaScript language. He specified the data format JSON (JavaScript Object Notation), and has developed various JavaScript related tools such as the ...
based the JavaScript parser in
JSLint JSLint is a static code analysis tool used in software development for checking if JavaScript source code complies with coding rules. It is provided primarily as a browser-based web application accessible through the domain jslint.com, but there ...
on Pratt parsing. * Comparison between Python implementations of precedence climbing and Pratt parsing
"Pratt Parsing and Precedence Climbing Are the Same Algorithm" (2016) by Andy Chu
* Tutorial using
Rust Rust is an iron oxide, a usually reddish-brown oxide formed by the reaction of iron and oxygen in the catalytic presence of water or air moisture. Rust consists of hydrous iron(III) oxides (Fe2O3·nH2O) and iron(III) oxide-hydroxide (FeO(OH ...

"Simple but Powerful Pratt Parsing" (2020) by Aleksey Kladov
* Tutorial using
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...

"Simple Top-Down Parsing in Python" (2008) by Fredrik Lundh
* Tutorial using
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...

"Pratt Parsers: Expression Parsing Made Easy" (2011) by Bob Nystrom, author of Crafting Interpreters
* Implementation in C#
"Gratt: A Generic Vaughn Pratt's top-down operator precedence parser for .NET Standard"
(a
generic Generic or generics may refer to: In business * Generic term, a common name used for a range or class of similar things not protected by trademark * Generic brand, a brand for a product that does not have an associated brand or trademark, other ...
version inspired by the Java implementation presented by Bob Nystrom i
"Pratt Parsers: Expression Parsing Made Easy"


Alternative methods

There are other ways to apply operator precedence rules. One is to build a tree of the original expression and then apply tree rewrite rules to it. Such trees do not necessarily need to be implemented using data structures conventionally used for trees. Instead, tokens can be stored in flat structures, such as tables, by simultaneously building a priority list which states what elements to process in which order. Another approach is to first fully parenthesize the expression, inserting a number of parentheses around each operator, such that they lead to the correct precedence even when parsed with a linear, left-to-right parser. This algorithm was used in the early FORTRAN I compiler:
The Fortran I compiler would expand each operator with a sequence of parentheses. In a simplified form of the algorithm, it would * replace + and with ))+(( and ))-((, respectively; * replace * and / with )*( and )/(, respectively; * add (( at the beginning of each expression and after each left parenthesis in the original expression; and * add )) at the end of the expression and before each right parenthesis in the original expression. Although not obvious, the algorithm was correct, and, in the words of Knuth, “The resulting formula is properly parenthesized, believe it or not.”
Example code of a simple C application that handles parenthesisation of basic math operators (+, -, *, /, ^, ( and )): #include #include int main(int argc, char *argv[]) For example, when compiled and invoked from the command line with parameters
a * b + c ^ d / e
it produces
((((a))*((b)))+(((c)^(d))/((e))))
as output on the console. A limitation to this strategy is that unary operators must all have higher precedence than infix operators. The "negative" operator in the above code has a higher precedence than exponentiation. Running the program with this input
- a ^ 2
produces this output
((((-a)^(2))))
which is probably not what is intended.


References


External links

*
Example C++ code by Keith Clarke for parsing infix expressions using the precedence climbing method
*
Parser for expression with infix notation using precedence lists
{{DEFAULTSORT:Operator-Precedence Parser Parsing algorithms Articles with example C code