HOME

TheInfoList



OR:

Yacc (Yet Another Compiler-Compiler) is a
computer program A computer program is a sequence or set of instructions in a programming language for a computer to execute. Computer programs are one component of software, which also includes documentation and other intangible components. A computer program ...
for the
Unix Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, and ot ...
operating system developed by Stephen C. Johnson. It is a Look Ahead Left-to-Right Rightmost Derivation (LALR) parser generator, generating a
LALR parser In computer science, an LALR parser or Look-Ahead LR parser is a simplified version of a canonical LR parser, to parse a text according to a set of production rules specified by a formal grammar for a computer language. ("LR" means left-to-rig ...
(the part of a
compiler In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...
that tries to make syntactic sense of the
source code In computing, source code, or simply code, is any collection of code, with or without comments, written using a human-readable programming language, usually as plain text. The source code of a program is specially designed to facilitate the wo ...
) based on a
formal grammar In formal language theory, a grammar (when the context is not given, often called a formal grammar for clarity) describes how to form strings from a language's alphabet that are valid according to the language's syntax. A grammar does not describe ...
, written in a notation similar to Backus–Naur Form (BNF). Yacc is supplied as a standard utility on BSD and AT&T Unix.
GNU GNU () is an extensive collection of free software (383 packages as of January 2022), which can be used as an operating system or can be used in parts with other operating systems. The use of the completed GNU tools led to the family of operat ...
-based
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which ...
distributions include Bison, a
forward-compatible Forward compatibility or upward compatibility is a design characteristic that allows a system to accept input intended for a later version of itself. The concept can be applied to entire systems, electrical interfaces, telecommunication signals, d ...
Yacc replacement.


History

In the early 1970s, Stephen C. Johnson, a computer scientist at
Bell Labs Nokia Bell Labs, originally named Bell Telephone Laboratories (1925–1984), then AT&T Bell Laboratories (1984–1996) and Bell Labs Innovations (1996–2007), is an American industrial research and scientific development company owned by mult ...
/
AT&T AT&T Inc. is an American multinational telecommunications holding company headquartered at Whitacre Tower in Downtown Dallas, Texas. It is the world's largest telecommunications company by revenue and the third largest provider of mobile te ...
, developed Yacc because he wanted to insert an
exclusive or Exclusive or or exclusive disjunction is a logical operation that is true if and only if its arguments differ (one is true, the other is false). It is symbolized by the prefix operator J and by the infix operators XOR ( or ), EOR, EXOR, , ...
operator into a B language compiler (developed using McIlroy's TMG compiler-compiler), but it turned out to be a hard task. As a result, he was directed by his colleague at Bell Labs
Al Aho Alfred Vaino Aho (born August 9, 1941) is a Canadian computer scientist best known for his work on programming languages, compilers, and related algorithms, and his textbooks on the art and science of computer programming. Aho was elected into ...
to
Donald Knuth Donald Ervin Knuth ( ; born January 10, 1938) is an American computer scientist, mathematician, and professor emeritus at Stanford University. He is the 1974 recipient of the ACM Turing Award, informally considered the Nobel Prize of computer sc ...
's work on LR parsing, which served as the basis for Yacc. Yacc was influenced by and received its name in reference to TMG compiler-compiler. Yacc was originally written in the B programming language, but was soon rewritten in C by Alan Snyder. It appeared as part of
Version 3 Unix The term "Research Unix" refers to early versions of the Unix operating system for DEC PDP-7, PDP-11, VAX and Interdata 7/32 and 8/32 computers, developed in the Bell Labs Computing Sciences Research Center (CSRC). History The term ''Resear ...
, and a full description of Yacc was published in 1975. Johnson used Yacc to create the
Portable C Compiler The Portable C Compiler (also known as pcc or sometimes pccm - portable C compiler machine) is an early compiler for the C programming language written by Stephen C. Johnson of Bell Labs in the mid-1970s, based in part on ideas proposed by Alan ...
.
Bjarne Stroustrup Bjarne Stroustrup (; ; born 30 December 1950) is a Danish computer scientist, most notable for the invention and development of the C++ programming language. As of July 2022, Stroustrup is a professor of Computer Science at Columbia University ...
also attempted to use Yacc to create a formal specification of
C++ C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
, but "was defeated by C's syntax". While finding it unsuitable for a formal specification of the language, Stroustrup did proceed to use Yacc to implement
Cfront Cfront was the original compiler for C++ (then known as " C with Classes") from around 1983, which converted C++ to C; developed by Bjarne Stroustrup at AT&T Bell Labs. The preprocessor did not understand all of the language and much of the code wa ...
, the first implementation of C++. In a 2008 interview, Johnson reflected that "the contribution Yacc made to the spread of
Unix Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, and ot ...
and C is what I'm proudest of".


Description

The input to Yacc is a grammar with snippets of C code (called "actions") attached to its rules. Its output is a
shift-reduce parser A shift-reduce parser is a class of efficient, table-driven bottom-up parsing methods for computer languages and other notations formally defined by a grammar. The parsing methods most commonly used for parsing programming languages, LR parsing a ...
in C that executes the C snippets associated with each rule as soon as the rule is recognized. Typical actions involve the construction of parse trees. Using an example from Johnson, if the call constructs a binary parse tree node with the specified and children, then the rule expr : expr '+' expr recognizes summation expressions and constructs nodes for them. The special identifiers , and refer to items on the parser's stack. Yacc produces only a parser (phrase analyzer); for full syntactic analysis this requires an external
lexical analyzer In computer science, lexical analysis, lexing or tokenization is the process of converting a sequence of characters (such as in a computer program or web page) into a sequence of ''lexical tokens'' ( strings with an assigned and thus identified ...
to perform the first tokenization stage (word analysis), which is then followed by the parsing stage proper. Lexical analyzer generators, such as
Lex Lex or LEX may refer to: Arts and entertainment * ''Lex'', a daily featured column in the ''Financial Times'' Games * Lex, the mascot of the word-forming puzzle video game ''Bookworm'' * Lex, the protagonist of the word-forming puzzle video ga ...
or
Flex Flex or FLEX may refer to: Computing * Flex (language), developed by Alan Kay * FLEX (operating system), a single-tasking operating system for the Motorola 6800 * FlexOS, an operating system developed by Digital Research * FLEX (protocol), a comm ...
, are widely available. The
IEEE The Institute of Electrical and Electronics Engineers (IEEE) is a 501(c)(3) professional association for electronic engineering and electrical engineering (and associated disciplines) with its corporate office in New York City and its operation ...
POSIX The Portable Operating System Interface (POSIX) is a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems. POSIX defines both the system- and user-level application programming interf ...
P1003.2 standard defines the functionality and requirements for both Lex and Yacc. Some versions of AT&T Yacc have become
open source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
. For example,
source code In computing, source code, or simply code, is any collection of code, with or without comments, written using a human-readable programming language, usually as plain text. The source code of a program is specially designed to facilitate the wo ...
is available with the standard distributions of Plan 9.


Impact

Yacc and similar programs (largely reimplementations) have been very popular. Yacc itself used to be available as the default parser generator on most Unix systems, though it has since been supplanted by more recent, largely compatible, programs such as Berkeley Yacc, GNU Bison, MKS Yacc, and Abraxas PCYACC. An updated version of the original AT&T Yacc is included as part of Sun's
OpenSolaris OpenSolaris () is a discontinued open-source computer operating system based on Solaris and created by Sun Microsystems. It was also, perhaps confusingly, the name of a project initiated by Sun to build a developer and user community around th ...
project. Each offers slight improvements and additional features over the original Yacc, but the concept and basic syntax have remained the same. Among the languages that were first implemented with Yacc are
AWK AWK (''awk'') is a domain-specific language designed for text processing and typically used as a data extraction and reporting tool. Like sed and grep, it is a filter, and is a standard feature of most Unix-like operating systems. The AWK lang ...
,
C++ C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
, eqn and Pic. Yacc was also used on Unix to implement the
Portable C Compiler The Portable C Compiler (also known as pcc or sometimes pccm - portable C compiler machine) is an early compiler for the C programming language written by Stephen C. Johnson of Bell Labs in the mid-1970s, based in part on ideas proposed by Alan ...
, as well as parsers for such programming languages as FORTRAN 77,
Ratfor Ratfor (short for ''Rational Fortran'') is a programming language implemented as a preprocessor for Fortran 66. It provides modern control structures, unavailable in Fortran 66, to replace GOTOs and statement numbers. Features Ratfor provides ...
, APL, bc, m4, etc. Yacc has also been rewritten for other languages, including
OCaml OCaml ( , formerly Objective Caml) is a general-purpose programming language, general-purpose, multi-paradigm programming language which extends the Caml dialect of ML (programming language), ML with object-oriented programming, object-oriented ...
,
Ratfor Ratfor (short for ''Rational Fortran'') is a programming language implemented as a preprocessor for Fortran 66. It provides modern control structures, unavailable in Fortran 66, to replace GOTOs and statement numbers. Features Ratfor provides ...
, ML,
Ada Ada may refer to: Places Africa * Ada Foah, a town in Ghana * Ada (Ghana parliament constituency) * Ada, Osun, a town in Nigeria Asia * Ada, Urmia, a village in West Azerbaijan Province, Iran * Ada, Karaman, a village in Karaman Province, ...
, Pascal,
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
,
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
,
Ruby A ruby is a pinkish red to blood-red colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called sa ...
, Go,
Common Lisp Common Lisp (CL) is a dialect of the Lisp programming language, published in ANSI standard document ''ANSI INCITS 226-1994 (S20018)'' (formerly ''X3.226-1994 (R1999)''). The Common Lisp HyperSpec, a hyperlinked HTML version, has been derived fro ...
and Erlang. * Berkeley Yacc: The Berkeley implementation of Yacc quickly became more popular than AT&T Yacc itself because of its performance and lack of reuse restrictions. *
LALR parser In computer science, an LALR parser or Look-Ahead LR parser is a simplified version of a canonical LR parser, to parse a text according to a set of production rules specified by a formal grammar for a computer language. ("LR" means left-to-rig ...
: The underlying parsing algorithm in Yacc-generated parsers. * Bison: The GNU version of Yacc. *
Lex Lex or LEX may refer to: Arts and entertainment * ''Lex'', a daily featured column in the ''Financial Times'' Games * Lex, the mascot of the word-forming puzzle video game ''Bookworm'' * Lex, the protagonist of the word-forming puzzle video ga ...
(and
Flex lexical analyser Flex (fast lexical analyzer generator) is a free and open-source software alternative to lex. It is a computer program that generates lexical analyzers (also known as "scanners" or "lexers"). It is frequently used as the lex implementation toget ...
), a token parser commonly used in conjunction with Yacc (and Bison). * BNF is a
metasyntax In logic and computer science, a metasyntax describes the allowable structure and composition of phrases and sentences of a metalanguage, which is used to describe either a natural language or a computer programming language.Sellink, Alex, and Chr ...
used to express context-free grammars: that is a formal way to describe context-free languages. *
PLY (Python Lex-Yacc) PLY is a parsing tool written purely in Python. It is, in essence, a re-implementation of Lex and Yacc originally in C-language. It was written by David M. Beazley. PLY uses the same LALR parsing technique as Lex and Yacc. It also has exten ...
is an alternative implementation of Lex and Yacc in Python.


See also

* Compiler-compiler *
hoc (programming language) hoc, an acronym for High Order Calculator, is an interpreted programming language that was used in the 1984 book The Unix Programming Environment to demonstrate how to build interpreters using Yacc. hoc was developed by Brian Kernighan and Rob ...


References


External links

* * * {{Authority control Compiling tools Parser generators Unix programming tools Unix SUS2008 utilities Plan 9 commands Inferno (operating system) commands