GNU Bison
   HOME

TheInfoList



OR:

GNU Bison, commonly known as Bison, is a
parser generator In computer science, a compiler-compiler or compiler generator is a programming tool that creates a parser, interpreter, or compiler from some form of formal description of a programming language and machine. The most common type of compiler- ...
that is part of the
GNU Project The GNU Project () is a free software, mass collaboration project announced by Richard Stallman on September 27, 1983. Its goal is to give computer users freedom and control in their use of their computers and computing devices by collaborati ...
. Bison reads a specification in the BNF notation (a
context-free language In formal language theory, a context-free language (CFL) is a language generated by a context-free grammar (CFG). Context-free languages have many applications in programming languages, in particular, most arithmetic expressions are generated by ...
), warns about any
parsing Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from Lati ...
ambiguities, and generates a parser that reads sequences of tokens and decides whether the sequence conforms to the syntax specified by the grammar. The generated parsers are portable: they do not require any specific compilers. Bison by default generates LALR(1) parsers but it can also generate canonical LR, IELR(1) and GLR parsers. In
POSIX The Portable Operating System Interface (POSIX) is a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems. POSIX defines both the system- and user-level application programming interf ...
mode, Bison is compatible with
Yacc Yacc (Yet Another Compiler-Compiler) is a computer program for the Unix operating system developed by Stephen C. Johnson. It is a Look Ahead Left-to-Right Rightmost Derivation (LALR) parser generator, generating a LALR parser (the part of a com ...
, but also has several extensions over this earlier program, including * Generation of counterexamples for conflicts * Location tracking (e.g., file, line, column) * Rich and internationalizable syntax error messages in the generated parsers * Customizable syntax error generation, * Reentrant parsers * Push parsers, with autocompletion * Support for named references * Several types of reports (graphical, XML) on the generated parser * Support for several programming languages ( C,
C++ C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
, D, or
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
)
Flex Flex or FLEX may refer to: Computing * Flex (language), developed by Alan Kay * FLEX (operating system), a single-tasking operating system for the Motorola 6800 * FlexOS, an operating system developed by Digital Research * FLEX (protocol), a comm ...
, an automatic lexical analyser, is often used with Bison, to tokenise input data and provide Bison with tokens. Bison was originally written by Robert Corbett in 1985. Later, in 1989, Robert Corbett released another parser generator named
Berkeley Yacc Berkeley Yacc (byacc) is a Unix parser generator designed to be compatible with Yacc. It was originally written by Robert Corbett and released in 1989. Due to its liberal license and because it was faster than the AT&T Yacc, it quickly became the ...
. Bison was made Yacc-compatible by
Richard Stallman Richard Matthew Stallman (; born March 16, 1953), also known by his initials, rms, is an American free software movement activist and programmer. He campaigns for software to be distributed in such a manner that its users have the freedom to ...
. Bison is
free software Free software or libre software is computer software distributed under terms that allow users to run the software for any purpose as well as to study, change, and distribute it and any adapted versions. Free software is a matter of liberty, no ...
and is available under the
GNU General Public License The GNU General Public License (GNU GPL or simply GPL) is a series of widely used free software licenses that guarantee end users the Four Freedoms (Free software), four freedoms to run, study, share, and modify the software. The license was th ...
, with an exception (discussed below) allowing its generated code to be used without triggering the
copyleft Copyleft is the legal technique of granting certain freedoms over copies of copyrighted works with the requirement that the same rights be preserved in derivative works. In this sense, ''freedoms'' refers to the use of the work for any purpose, ...
requirements of the licence.


Features


Counterexample generation

One delicate issue with LR parser generators is the resolution of conflicts (shift/reduce and reduce/reduce conflicts). Resolving conflicts usually requires the analysis of the parser automaton as described in the reports, and some expertise from the user. Counterexamples help understanding quickly some conflicts, and can even actually prove that the problem is that the grammar is actually ambiguous. For instance on a grammar suffering from the infamous
dangling else The dangling else is a problem in programming of parser generators in which an optional else clause in an if–then(–else) statement results in nested conditionals being ambiguous. Formally, the reference context-free grammar of the language i ...
problem, Bison reports


Reentrancy

Reentrancy is a feature which has been added to Bison and does not exist in Yacc. Normally, Bison generates a parser which is not reentrant. In order to achieve reentrancy the declaration %define api.pure must be used. More details on Bison reentrancy can be found in the Bison manual.


Output languages

Bison can generate code for C,
C++ C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
, D and
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
. For using the Bison-generated parser from other languages a
language binding In programming and software design, binding is an application programming interface (API) that provides glue code specifically made to allow a programming language to use a foreign library or operating system service (one that is not native to th ...
tool such as SWIG can be used.


License and distribution of generated code

Because Bison generates source code that in turn gets added to the source code of other software projects, it raises some simple but interesting copyright questions.


A GPL-compatible license is not required

The code generated by Bison includes significant amounts of code from the Bison project itself. The Bison package is distributed under the terms of the
GNU General Public License The GNU General Public License (GNU GPL or simply GPL) is a series of widely used free software licenses that guarantee end users the Four Freedoms (Free software), four freedoms to run, study, share, and modify the software. The license was th ...
(GPL) but an exception has been added so that the GPL does not apply to output. Earlier releases of Bison stipulated that parts of its output were also licensed under the GPL, due to the inclusion of the yyparse() function from the original source code in the output.


Distribution of packages using Bison

Free software projects that use Bison may have a choice of whether to distribute the source code which their project feeds into Bison, or the resulting C code made output by Bison. Both are sufficient for a recipient to be able to compile the project source code. However, distributing only the input carries the minor inconvenience that the recipients must have a compatible copy of Bison installed so that they can generate the necessary C code when compiling the project. And distributing only the C code in output, creates the problem of making it very difficult for the recipients to modify the parser since this code was written neither ''by'' a human nor ''for'' humans - its purpose is to be fed directly into a C compiler. These problems can be avoided by distributing both the input files and the generated code. Most people will compile using the generated code, no different from any other software package, but anyone who wants to modify the parser component can modify the input files first and re-generate the generated files before compiling. Projects distributing both usually do not have the generated files in their
revision control In software engineering, version control (also known as revision control, source control, or source code management) is a class of systems responsible for managing changes to computer programs, documents, large web sites, or other collections o ...
systems. The files are only generated when making a release. Some licenses, such as the
GPL The GNU General Public License (GNU GPL or simply GPL) is a series of widely used free software licenses that guarantee end users the four freedoms to run, study, share, and modify the software. The license was the first copyleft for general u ...
, require that the source code be in "''the preferred form of the work for making modifications to it''". GPL'd projects using Bison must thus distribute the files which are the input for Bison. Of course, they can also include the generated files.


Use

Because Bison was written as a replacement for Yacc, and is largely compatible, the code from a lot of projects using Bison could equally be fed into Yacc. This makes it difficult to determine if a project "uses" Bison-specific source code or not. In many cases, the "use" of Bison could be trivially replaced by the equivalent use of Yacc or one of its other derivatives. Bison has features not found in Yacc, so some projects can be truly said to "use" Bison, since Yacc would not suffice. The following list is of projects which are known to "use" Bison in the looser sense, that they use free software development tools and distribute code which is intended to be fed into Bison or a Bison-compatible package. *
Bash Bash or BASH may refer to: Arts and entertainment * ''Bash!'' (Rockapella album), 1992 * ''Bash!'' (Dave Bailey album), 1961 * '' Bash: Latter-Day Plays'', a dramatic triptych * ''BASH!'' (role-playing game), a 2005 superhero game * "Bash" ('' ...
shell uses a yacc grammar for parsing the command input. * Bison's own grammar parser is generated by Bison. *
CMake In software development, CMake is cross-platform free and open-source software for build automation, testing, packaging and installation of software by using a compiler-independent method. CMake is not a build system itself; it generates an ...
uses several Bison grammars. * GCC started out using Bison, but switched to a hand-written
recursive-descent parser In computer science, a recursive descent parser is a kind of top-down parser built from a set of mutually recursive procedures (or a non-recursive equivalent) where each such procedure implements one of the nonterminals of the grammar. Thus t ...
for C++ in 2004 (version 3.4), and for C and Objective-C in 2006 (version 4.1) * The Go programming language (GC) used Bison, but switched to a hand-written scanner and parser in version 1.5. *
LilyPond LilyPond is a computer program and file format for music engraving. One of LilyPond's major goals is to produce scores that are engraved with traditional layout rules, reflecting the era when scores were engraved by hand. LilyPond is cross-pl ...
requires Bison to generate its parser. *
MySQL MySQL () is an open-source relational database management system (RDBMS). Its name is a combination of "My", the name of co-founder Michael Widenius's daughter My, and "SQL", the acronym for Structured Query Language. A relational database o ...
*
GNU Octave GNU Octave is a high-level programming language primarily intended for scientific computing and numerical computation. Octave helps in solving linear and nonlinear problems numerically, and for performing other numerical experiments using a langu ...
uses a Bison-generated parser. *
Perl 5 Perl is a family of two High-level programming language, high-level, General-purpose programming language, general-purpose, Interpreter (computing), interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it ...
uses a Bison-generated parser starting in 5.10. * The
PHP PHP is a general-purpose scripting language geared toward web development. It was originally created by Danish-Canadian programmer Rasmus Lerdorf in 1993 and released in 1995. The PHP reference implementation is now produced by The PHP Group. ...
programming language (Zend Parser). *
PostgreSQL PostgreSQL (, ), also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. It was originally named POSTGRES, referring to its origins as a successor to the In ...
*
Ruby MRI Matz's Ruby Interpreter or Ruby MRI (also called CRuby) was the reference implementation of the Ruby programming language named after Ruby creator Yukihiro Matsumoto ("Matz"). Until the specification of the Ruby language in 2011, the MRI impleme ...
, the reference implementation of the
Ruby A ruby is a pinkish red to blood-red colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called sa ...
programming language, relies on a Bison grammar. * syslog-ng uses several Bison grammars assembled together.


A complete reentrant parser example

The following example shows how to use Bison and flex to write a simple calculator program (only addition and multiplication) and a program for creating an
abstract syntax tree In computer science, an abstract syntax tree (AST), or just syntax tree, is a tree representation of the abstract syntactic structure of text (often source code) written in a formal language. Each node of the tree denotes a construct occurring ...
. The next two files provide definition and implementation of the syntax tree functions. /* * Expression.h * Definition of the structure used to build the syntax tree. */ #ifndef __EXPRESSION_H__ #define __EXPRESSION_H__ /** * @brief The operation type */ typedef enum tagEOperationType EOperationType; /** * @brief The expression structure */ typedef struct tagSExpression SExpression; /** * @brief It creates an identifier * @param value The number value * @return The expression or NULL in case of no memory */ SExpression *createNumber(int value); /** * @brief It creates an operation * @param type The operation type * @param left The left operand * @param right The right operand * @return The expression or NULL in case of no memory */ SExpression *createOperation(EOperationType type, SExpression *left, SExpression *right); /** * @brief Deletes a expression * @param b The expression */ void deleteExpression(SExpression *b); #endif /* __EXPRESSION_H__ */ /* * Expression.c * Implementation of functions used to build the syntax tree. */ #include "Expression.h" #include /** * @brief Allocates space for expression * @return The expression or NULL if not enough memory */ static SExpression *allocateExpression() SExpression *createNumber(int value) SExpression *createOperation(EOperationType type, SExpression *left, SExpression *right) void deleteExpression(SExpression *b) The tokens needed by the Bison parser will be generated using flex. % %option outfile="Lexer.c" header-file="Lexer.h" %option warn nodefault %option reentrant noyywrap never-interactive nounistd %option bison-bridge %% \r\n\t -9 "*" "+" "(" ")" . %% int yyerror(SExpression **expression, yyscan_t scanner, const char *msg) The names of the tokens are typically neutral: "TOKEN_PLUS" and "TOKEN_STAR", not "TOKEN_ADD" and "TOKEN_MULTIPLY". For instance if we were to support the unary "+" (as in "+1"), it would be wrong to name this "+" "TOKEN_ADD". In a language such as C, "int *ptr" denotes the definition of a pointer, not a product: it would be wrong to name this "*" "TOKEN_MULTIPLY". Since the tokens are provided by flex we must provide the means to communicate between the parser and the lexer.Flex Manual: C Scanners with Bison Parsers
The data type used for communication, ''YYSTYPE'', is set using Bison ''%union'' declaration. Since in this sample we use the reentrant version of both flex and yacc we are forced to provide parameters for the ''yylex'' function, when called from ''yyparse''. This is done through Bison ''%lex-param'' and ''%parse-param'' declarations.
/ref> % %code requires %output "Parser.c" %defines "Parser.h" %define api.pure %lex-param %parse-param %parse-param %union %token TOKEN_LPAREN "(" %token TOKEN_RPAREN ")" %token TOKEN_PLUS "+" %token TOKEN_STAR "*" %token TOKEN_NUMBER "number" %type expr /* Precedence (increasing) and associativity: a+b+c is (a+b)+c: left associativity a+b*c is a+(b*c): the precedence of "*" is higher than that of "+". */ %left "+" %left "*" %% input : expr ; expr : expr "+" expr , expr "*" expr , "(" expr ")" , "number" ; %% The code needed to obtain the syntax tree using the parser generated by Bison and the scanner generated by flex is the following. /* * main.c file */ #include "Expression.h" #include "Parser.h" #include "Lexer.h" #include int yyparse(SExpression **expression, yyscan_t scanner); SExpression *getAST(const char *expr) int evaluate(SExpression *e) int main(void) A simple makefile to build the project is the following. # Makefile FILES = Lexer.c Parser.c Expression.c main.c CC = g++ CFLAGS = -g -ansi test: $(FILES) $(CC) $(CFLAGS) $(FILES) -o test Lexer.c: Lexer.l flex Lexer.l Parser.c: Parser.y Lexer.c bison Parser.y clean: rm -f *.o *~ Lexer.c Lexer.h Parser.c Parser.h test


See also

*
Berkeley Yacc Berkeley Yacc (byacc) is a Unix parser generator designed to be compatible with Yacc. It was originally written by Robert Corbett and released in 1989. Due to its liberal license and because it was faster than the AT&T Yacc, it quickly became the ...
(byacc) another free software Yacc replacement sharing the same author as GNU Bison *
ANTLR In computer-based language recognition, ANTLR (pronounced ''antler''), or ANother Tool for Language Recognition, is a parser generator that uses LL(*) for parsing. ANTLR is the successor to the Purdue Compiler Construction Tool Set (PCCTS), firs ...
ANother Tool for Language Recognition, another open-source parser generator


References


Further reading

*


External links


Website in the GNU Project
*
Manual

Bison
project at
GNU Savannah GNU Savannah is a project of the Free Software Foundation initiated by Loïc Dachary, which serves as a collaborative software development management system for free Software projects. Savannah currently offers CVS, GNU arch, Subversion, Git, ...

Entry in the Free Software Directory



How to download and install Bison (GNU Parser Generator) on Linux


(version 2.4.1) {{DEFAULTSORT:Gnu Bison Compiling tools Cross-platform software
Bison Bison are large bovines in the genus ''Bison'' (Greek: "wild ox" (bison)) within the tribe Bovini. Two extant and numerous extinct species are recognised. Of the two surviving species, the American bison, ''B. bison'', found only in North Ame ...
Bison Bison are large bovines in the genus ''Bison'' (Greek: "wild ox" (bison)) within the tribe Bovini. Two extant and numerous extinct species are recognised. Of the two surviving species, the American bison, ''B. bison'', found only in North Ame ...
Parsing