HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
, a compiler-compiler or compiler generator is a programming tool that creates a
parser Parsing, syntax analysis, or syntactic analysis is a process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar by breaking it into parts. The term '' ...
, interpreter, or
compiler In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primaril ...
from some form of formal description of a
programming language A programming language is a system of notation for writing computer programs. Programming languages are described in terms of their Syntax (programming languages), syntax (form) and semantics (computer science), semantics (meaning), usually def ...
and machine. The most common type of compiler-compiler is called a parser generator. It handles only syntactic analysis. A formal description of a language is usually a
grammar In linguistics, grammar is the set of rules for how a natural language is structured, as demonstrated by its speakers or writers. Grammar rules may concern the use of clauses, phrases, and words. The term may also refer to the study of such rul ...
used as an input to a parser generator. It often resembles
Backus–Naur form In computer science, Backus–Naur form (BNF, pronounced ), also known as Backus normal form, is a notation system for defining the Syntax (programming languages), syntax of Programming language, programming languages and other Formal language, for ...
(BNF),
extended Backus–Naur form Extension, extend or extended may refer to: Mathematics Logic or set theory * Axiom of extensionality * Extensible cardinal * Extension (model theory) * Extension (proof theory) * Extension (predicate logic), the set of tuples of values ...
(EBNF), or has its own syntax. Grammar files describe a
syntax In linguistics, syntax ( ) is the study of how words and morphemes combine to form larger units such as phrases and sentences. Central concerns of syntax include word order, grammatical relations, hierarchical sentence structure (constituenc ...
of a generated compiler's target programming language and actions that should be taken against its specific constructs.
Source code In computing, source code, or simply code or source, is a plain text computer program written in a programming language. A programmer writes the human readable source code to control the behavior of a computer. Since a computer, at base, only ...
for a parser of the programming language is returned as the parser generator's output. This source code can then be compiled into a parser, which may be either standalone or embedded. The compiled parser then accepts the source code of the target programming language as an input and performs an action or outputs an
abstract syntax tree An abstract syntax tree (AST) is a data structure used in computer science to represent the structure of a program or code snippet. It is a tree representation of the abstract syntactic structure of text (often source code) written in a formal ...
(AST). Parser generators do not handle the
semantics Semantics is the study of linguistic Meaning (philosophy), meaning. It examines what meaning is, how words get their meaning, and how the meaning of a complex expression depends on its parts. Part of this process involves the distinction betwee ...
of the AST, or the generation of machine code for the target machine."A Syntax Directed Compiler for ALGOL 60" Edgar T. Irons, Communications of the ACM Volume 4 Issue 1, Jan. 1961. A metacompiler is a software development tool used mainly in the construction of
compiler In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primaril ...
s,
translators Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. The English language draws a terminological distinction (which does not exist in every language) between ''transl ...
, and interpreters for other programming languages. The input to a metacompiler is a
computer program A computer program is a sequence or set of instructions in a programming language for a computer to Execution (computing), execute. It is one component of software, which also includes software documentation, documentation and other intangibl ...
written in a specialized programming metalanguage designed mainly for the purpose of constructing compilers. The language of the compiler produced is called the object language. The minimal input producing a compiler is a metaprogram specifying the object language grammar and
semantic Semantics is the study of linguistic Meaning (philosophy), meaning. It examines what meaning is, how words get their meaning, and how the meaning of a complex expression depends on its parts. Part of this process involves the distinction betwee ...
transformations into an object program.


Variants

A typical parser generator associates executable code with each of the rules of the grammar that should be executed when these rules are applied by the parser. These pieces of code are sometimes referred to as semantic action routines since they define the semantics of the syntactic structure that is analyzed by the parser. Depending upon the type of parser that should be generated, these routines may construct a
parse tree A parse tree or parsing tree (also known as a derivation tree or concrete syntax tree) is an ordered, rooted tree that represents the syntactic structure of a string according to some context-free grammar. The term ''parse tree'' itself is use ...
(or
abstract syntax tree An abstract syntax tree (AST) is a data structure used in computer science to represent the structure of a program or code snippet. It is a tree representation of the abstract syntactic structure of text (often source code) written in a formal ...
), or generate executable code directly. One of the earliest (1964), surprisingly powerful, versions of compiler-compilers is
META II META II is a Domain-specific language, domain-specific programming language for writing compilers. It was created in 1963–1964 by Dewey Val Schorre at University of California, Los Angeles (UCLA). META II uses what Schorre called ''Syntax (progr ...
, which accepted an analytical grammar with output facilities that produce stack machine code, and is able to compile its own source code and other languages. Among the earliest programs of the original
Unix Unix (, ; trademarked as UNIX) is a family of multitasking, multi-user computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, a ...
versions being built at
Bell Labs Nokia Bell Labs, commonly referred to as ''Bell Labs'', is an American industrial research and development company owned by Finnish technology company Nokia. With headquarters located in Murray Hill, New Jersey, Murray Hill, New Jersey, the compa ...
was the two-part lex and
yacc Yacc (Yet Another Compiler-Compiler) is a computer program for the Unix operating system developed by Stephen C. Johnson. It is a lookahead left-to-right rightmost derivation (LALR) parser generator, generating a LALR parser (the part of a co ...
system, which was normally used to output
C programming language C (''pronounced'' '' – like the letter c'') is a general-purpose programming language. It was created in the 1970s by Dennis Ritchie and remains very widely used and influential. By design, C's features cleanly reflect the capabilities of ...
code, but had a flexible output system that could be used for everything from programming languages to text file conversion. Their modern GNU versions are flex and
bison A bison (: bison) is a large bovine in the genus ''Bison'' (from Greek, meaning 'wild ox') within the tribe Bovini. Two extant taxon, extant and numerous extinction, extinct species are recognised. Of the two surviving species, the American ...
. Some experimental compiler-compilers take as input a formal description of programming language semantics, typically using
denotational semantics In computer science, denotational semantics (initially known as mathematical semantics or Scott–Strachey semantics) is an approach of formalizing the meanings of programming languages by constructing mathematical objects (called ''denotations'' ...
. This approach is often called 'semantics-based compiling', and was pioneered by Peter Mosses' Semantic Implementation System (SIS) in 1978. However, both the generated compiler and the code it produced were inefficient in time and space. No production compilers are currently built in this way, but research continues. The Production Quality Compiler-Compiler (
PQCC The Production Quality Compiler-Compiler Project (PQCC) was a long-term project led by William Wulf at Carnegie Mellon University to produce an industrial-strength compiler-compiler. PQCC would produce full, optimizing programming language compile ...
) project at
Carnegie Mellon University Carnegie Mellon University (CMU) is a private research university in Pittsburgh, Pennsylvania, United States. The institution was established in 1900 by Andrew Carnegie as the Carnegie Technical Schools. In 1912, it became the Carnegie Institu ...
does not formalize semantics, but does have a semi-formal framework for machine description. Compiler-compilers exist in many flavors, including bottom-up rewrite machine generators (se
JBurg
used to tile syntax trees according to a rewrite grammar for code generation, and
attribute grammar An attribute grammar is a formal way to supplement a formal grammar with semantic information processing. Semantic information is stored in attributes associated with terminal and nonterminal symbols of the grammar. The values of attributes are t ...
parser generators (e.g. ANTLR can be used for simultaneous type checking, constant propagation, and more during the parsing stage).


Metacompilers

As a metacompiler's metalanguage will usually be a powerful string and symbol processing language, they often have strong applications for general-purpose applications, including generating a wide range of other software engineering and analysis tools. Besides being useful for
domain-specific language A domain-specific language (DSL) is a computer language specialized to a particular application domain. This is in contrast to a general-purpose language (GPL), which is broadly applicable across domains. There are a wide variety of DSLs, ranging ...
development, a metacompiler is a prime example of a domain-specific language, designed for the domain of compiler writing. A metacompiler is a metaprogram ''usually written in its own metalanguage'' or an existing computer programming language. The process of a metacompiler, written in its own metalanguage, compiling itself is equivalent to self-hosting compiler. Most common compilers written today are self-hosting compilers. Self-hosting is a powerful tool, of many metacompilers, allowing the easy extension of their own metaprogramming metalanguage. The feature that separates a metacompiler apart from other compiler compilers is that it takes as input a specialized
metaprogramming Metaprogramming is a computer programming technique in which computer programs have the ability to treat other programs as their data. It means that a program can be designed to read, generate, analyse, or transform other programs, and even modi ...
language that describes all aspects of the compiler's operation. A metaprogram produced by a metacompiler is as complete a program as a program written in C++,
BASIC Basic or BASIC may refer to: Science and technology * BASIC, a computer programming language * Basic (chemistry), having the properties of a base * Basic access authentication, in HTTP Entertainment * Basic (film), ''Basic'' (film), a 2003 film ...
or any other general
programming language A programming language is a system of notation for writing computer programs. Programming languages are described in terms of their Syntax (programming languages), syntax (form) and semantics (computer science), semantics (meaning), usually def ...
. The metaprogramming metalanguage is a powerful attribute allowing easier development of computer programming languages and other computer tools. Command line processors, text string transforming and analysis are easily coded using metaprogramming metalanguages of metacompilers. A full featured development package includes a
linker Linker or linkers may refer to: Computing * Linker (computing), a computer program that takes one or more object files generated by a compiler or generated by an assembler and links them with libraries, generating an executable program or shar ...
and a run time support
library A library is a collection of Book, books, and possibly other Document, materials and Media (communication), media, that is accessible for use by its members and members of allied institutions. Libraries provide physical (hard copies) or electron ...
. Usually, a machine-oriented
system programming language A system programming language is a programming language used for system programming; such languages are designed for writing system software, which usually requires different development approaches when compared with application software. Eds ...
, such as C or C++, is needed to write the support library. A library consisting of support functions needed for the compiling process usually completes the full metacompiler package.


The meaning of metacompiler

In computer science, the prefix '' meta'' is commonly used to mean ''about (its own category)''. For example,
metadata Metadata (or metainformation) is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive ...
are data that describe other data. A language that is used to describe other languages is a metalanguage. Meta may also mean ''on a higher level of abstraction''. A metalanguage operates on a higher level of abstraction in order to describe properties of a language.
Backus–Naur form In computer science, Backus–Naur form (BNF, pronounced ), also known as Backus normal form, is a notation system for defining the Syntax (programming languages), syntax of Programming language, programming languages and other Formal language, for ...
(BNF) is a formal metalanguage originally used to define
ALGOL 60 ALGOL 60 (short for ''Algorithmic Language 1960'') is a member of the ALGOL family of computer programming languages. It followed on from ALGOL 58 which had introduced code blocks and the begin and end pairs for delimiting them, representing a ...
. BNF is a weak metalanguage, for it describes only the
syntax In linguistics, syntax ( ) is the study of how words and morphemes combine to form larger units such as phrases and sentences. Central concerns of syntax include word order, grammatical relations, hierarchical sentence structure (constituenc ...
and says nothing about the
semantics Semantics is the study of linguistic Meaning (philosophy), meaning. It examines what meaning is, how words get their meaning, and how the meaning of a complex expression depends on its parts. Part of this process involves the distinction betwee ...
or meaning. Metaprogramming is the writing of
computer program A computer program is a sequence or set of instructions in a programming language for a computer to Execution (computing), execute. It is one component of software, which also includes software documentation, documentation and other intangibl ...
s with the ability to treat programs as their data. A metacompiler takes as input a metaprogram written in a specialized metalanguages (a higher level abstraction) specifically designed for the purpose of metaprogramming. The output is an executable object program. An analogy can be drawn: That as a ''C++'' compiler takes as input a ''C++'' programming language program, a ''meta''compiler takes as input a ''meta''programming metalanguage program.


Forth metacompiler

Many advocates of the language Forth call the process of creating a new implementation of Forth a meta-compilation and that it constitutes a metacompiler. The Forth definition of metacompiler is: :"A metacompiler is a compiler which processes its own source code, resulting in an executable version of itself." This Forth use of the term metacompiler is disputed in mainstream computer science. See
Forth (programming language) Forth is a stack-oriented programming language and interactive integrated development environment designed by Charles H. "Chuck" Moore and first used by other programmers in 1970. Although not an acronym, the language's name in its early years ...
and History of compiler construction. The actual Forth process of compiling itself is a combination of a Forth being a self-hosting extensible programming language and sometimes cross compilation, long established terminology in computer science. Metacompilers are a general compiler writing system. Besides the Forth metacompiler concept being indistinguishable from self-hosting and extensible language. The actual process acts at a lower level defining a minimum subset of forth ''words'', that can be used to define additional forth words, A full Forth implementation can then be defined from the base set. This sounds like a bootstrap process. The problem is that almost every general purpose language compiler also fits the Forth metacompiler description. : When (self-hosting compiler) X processes its own source code, resulting in an executable version of itself, X is a metacompiler. Just replace X with any common language, C, C++,
Java Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
, Pascal,
COBOL COBOL (; an acronym for "common business-oriented language") is a compiled English-like computer programming language designed for business use. It is an imperative, procedural, and, since 2002, object-oriented language. COBOL is primarily ...
, Fortran, Ada,
Modula-2 Modula-2 is a structured, procedural programming language developed between 1977 and 1985/8 by Niklaus Wirth at ETH Zurich. It was created as the language for the operating system and application software of the Lilith personal workstation. It w ...
, etc. And X would be a metacompiler according to the Forth usage of metacompiler. A metacompiler operates at an abstraction level above the compiler it compiles. It only operates at the same (self-hosting compiler) level when compiling itself. One has to see the problem with this definition of metacompiler. It can be applied to most any language. However, on examining the concept of programming in Forth, adding new words to the dictionary, extending the language in this way is metaprogramming. It is this metaprogramming in Forth that makes it a metacompiler. Programming in Forth is adding new words to the language. Changing the language in this way is
metaprogramming Metaprogramming is a computer programming technique in which computer programs have the ability to treat other programs as their data. It means that a program can be designed to read, generate, analyse, or transform other programs, and even modi ...
. Forth is a metacompiler, because Forth is a language specifically designed for metaprogramming. Programming in Forth is extending Forth adding words to the Forth vocabulary creates a new Forth
dialect A dialect is a Variety (linguistics), variety of language spoken by a particular group of people. This may include dominant and standard language, standardized varieties as well as Vernacular language, vernacular, unwritten, or non-standardize ...
. Forth is a specialized metacompiler for Forth language dialects.


History

Design of the original compiler-compiler was started by Tony Brooker and Derrick Morris in 1959, with initial testing beginning in March 1962. The Brooker Morris Compiler Compiler (BMCC) was used to create compilers for the new
Atlas An atlas is a collection of maps; it is typically a bundle of world map, maps of Earth or of a continent or region of Earth. Advances in astronomy have also resulted in atlases of the celestial sphere or of other planets. Atlases have traditio ...
computer at the
University of Manchester The University of Manchester is a public university, public research university in Manchester, England. The main campus is south of Manchester city centre, Manchester City Centre on Wilmslow Road, Oxford Road. The University of Manchester is c ...
, for several languages: Mercury Autocode, Extended Mercury Autocode, Atlas Autocode,
ALGOL 60 ALGOL 60 (short for ''Algorithmic Language 1960'') is a member of the ALGOL family of computer programming languages. It followed on from ALGOL 58 which had introduced code blocks and the begin and end pairs for delimiting them, representing a ...
and ASA Fortran. At roughly the same time, related work was being done by E. T. (Ned) Irons at Princeton, and Alick Glennie at the Atomic Weapons Research Establishment at Aldermaston whose "Syntax Machine" paper (declassified in 1977) inspired the META series of translator writing systems mentioned below. The early history of metacompilers is closely tied with the history of SIG/PLAN Working group 1 on Syntax Driven Compilers. The group was started primarily through the effort of Howard Metcalfe in the Los Angeles area. In the fall of 1962, Howard Metcalfe designed two compiler-writing interpreters. One used a bottom-to-top analysis technique based on a method described by Ledley and Wilson. The other used a top-to-bottom approach based on work by Glennie to generate random English sentences from a context-free grammar. At the same time, Val Schorre described two "meta machines", one generative and one analytic. The generative machine was implemented and produced random algebraic expressions. Meta I the first metacompiler was implemented by Schorre on an IBM 1401 at UCLA in January 1963. His original interpreters and metamachines were written directly in a pseudo-machine language.
META II META II is a Domain-specific language, domain-specific programming language for writing compilers. It was created in 1963–1964 by Dewey Val Schorre at University of California, Los Angeles (UCLA). META II uses what Schorre called ''Syntax (progr ...
, however, was written in a higher-level metalanguage able to describe its own compilation into the pseudo-machine language. Lee Schmidt at Bolt, Beranek, and Newman wrote a metacompiler in March 1963 that utilized a CRT display on the time-sharing PDP-l. This compiler produced actual machine code rather than interpretive code and was partially bootstrapped from Meta I. Schorre bootstrapped Meta II from Meta I during the spring of 1963. The paper on the refined metacompiler system presented at the 1964 Philadelphia ACM conference is the first paper on a metacompiler available as a general reference. The syntax and implementation technique of Schorre's system laid the foundation for most of the systems that followed. The system was implemented on a small 1401, and was used to implement a small
ALGOL ALGOL (; short for "Algorithmic Language") is a family of imperative computer programming languages originally developed in 1958. ALGOL heavily influenced many other languages and was the standard method for algorithm description used by the ...
-like language. Many similar systems immediately followed. Roger Rutman of AC Delco developed and implemented LOGIK, a language for logical design simulation, on the IBM 7090 in January 1964. This compiler used an algorithm that produced efficient code for Boolean expressions. Another paper in the 1964 ACM proceedings describe
Meta III
developed b
Schneider
and Johnson at UCLA for the IBM 7090. Meta III represents an attempt to produce efficient machine code, for a large class of languages. Meta III was implemented completely in assembly language. Two compilers were written in Meta III, CODOL, a compiler-writing demonstration compiler, and PUREGOL, a dialect of ALGOL 60. (It was pure gall to call it ALGOL). Late in 1964, Lee Schmidt bootstrapped the metacompiler EQGEN, from the PDP-l to the Beckman 420. EQGEN was a logic equation generating language. In 1964, System Development Corporation began a major effort in the development of metacompilers. This effort includes powerful metacompilers, Bookl, and Book2 written in
Lisp Lisp (historically LISP, an abbreviation of "list processing") is a family of programming languages with a long history and a distinctive, fully parenthesized Polish notation#Explanation, prefix notation. Originally specified in the late 1950s, ...
which have extensive tree-searching and backup ability. An outgrowth of one of the Q-32 systems at SDC is Meta 5. The Meta 5 system incorporates backup of the input stream and enough other facilities to parse any context-sensitive language. This system was successfully released to a wide number of users and had many string-manipulation applications other than compiling. It has many elaborate push-down stacks, attribute setting and testing facilities, and output mechanisms. That Meta 5 successfully translates JOVIAL programs to
PL/I PL/I (Programming Language One, pronounced and sometimes written PL/1) is a procedural, imperative computer programming language initially developed by IBM. It is designed for scientific, engineering, business and system programming. It has b ...
programs demonstrates its power and flexibility. Robert McClure at
Texas Instruments Texas Instruments Incorporated (TI) is an American multinational semiconductor company headquartered in Dallas, Texas. It is one of the top 10 semiconductor companies worldwide based on sales volume. The company's focus is on developing analog ...
invented a compiler-compiler called TMG (presented in 1965). TMG was used to create early compilers for programming languages like B,
PL/I PL/I (Programming Language One, pronounced and sometimes written PL/1) is a procedural, imperative computer programming language initially developed by IBM. It is designed for scientific, engineering, business and system programming. It has b ...
and ALTRAN. Together with metacompiler of Val Schorre, it was an early inspiration for the last chapter of
Donald Knuth Donald Ervin Knuth ( ; born January 10, 1938) is an American computer scientist and mathematician. He is a professor emeritus at Stanford University. He is the 1974 recipient of the ACM Turing Award, informally considered the Nobel Prize of comp ...
's '' The Art of Computer Programming''. The LOT system was developed during 1966 at Stanford Research Institute and was modeled very closely after Meta II. It had new special-purpose constructs allowing it to generate a compiler which could in turn, compile a subset of PL/I. This system had extensive statistic-gathering facilities and was used to study the characteristics of top-down analysis. SIMPLE is a specialized translator system designed to aid the writing of pre-processors for PL/I, SIMPLE, written in PL/I, is composed of three components: An executive, a syntax analyzer and a semantic constructor. The
TREE-META The TREE-META (or Tree Meta, TREEMETA) Translator Writing System is a compiler-compiler system for context-free languages originally developed in the 1960s. Parsing statements of the metalanguage resemble augmented Backus–Naur form with embedd ...
compiler was developed at Stanford Research Institute in Menlo Park, California. April 1968. The early metacompiler history is well documented in th
TREE META manual.
TREE META paralleled some of the SDC developments. Unlike earlier metacompilers it separated the semantics processing from the syntax processing. The syntax rules contained
tree In botany, a tree is a perennial plant with an elongated stem, or trunk, usually supporting branches and leaves. In some usages, the definition of a tree may be narrower, e.g., including only woody plants with secondary growth, only ...
building operations that combined recognized language elements with tree nodes. The tree structure representation of the input was then processed by a simple form of unparse rules. The unparse rules used node recognition and attribute testing that when matched resulted in the associated action being performed. In addition like tree element could also be tested in an unparse rule. Unparse rules were also a recursive language being able to call unparse rules passing elements of thee tree before the action of the unparse rule was performed. The concept of the metamachine originally put forth by Glennie is so simple that three hardware versions have been designed and one actually implemented. The latter at Washington University in St. Louis. This machine was built from macro-modular components and has for instructions the codes described by Schorre. CWIC (Compiler for Writing and Implementing Compilers) is the last known Schorre metacompiler. It was developed at Systems Development Corporation by Erwin Book, Dewey Val Schorre and Steven J. Sherman With the full power of (lisp 2) a list processing language optimizing algorithms could operate on syntax generated lists and trees before code generation. CWIC also had a symbol table built into the language. With the resurgence of domain-specific languages and the need for parser generators which are easy to use, easy to understand, and easy to maintain, metacompilers are becoming a valuable tool for advanced software engineering projects. Other examples of parser generators in the yacc vein are ANTLR, Coco/R, CUP, GNU Bison, Eli, FSL, SableCC, SID (Syntax Improving Device), and JavaCC. While useful, pure parser generators only address the parsing part of the problem of building a compiler. Tools with broader scope, such as
PQCC The Production Quality Compiler-Compiler Project (PQCC) was a long-term project led by William Wulf at Carnegie Mellon University to produce an industrial-strength compiler-compiler. PQCC would produce full, optimizing programming language compile ...
, Coco/R and DMS Software Reengineering Toolkit provide considerable support for more difficult post-parsing activities such as semantic analysis, code optimization and generation.


Schorre metalanguages

The earliest Schorre metacompilers, META I and META II, were developed by D. Val Schorre at UCLA. Other Schorre based metacompilers followed. Each adding improvements to language analysis and/or code generation. In programming it is common to use the programming language name to refer to both the compiler and the programming language, the context distinguishing the meaning. A C++ program is compiled using a C++ compiler. That also applies in the following. For example, META II is both the compiler and the language. The metalanguages in the Schorre line of metacompilers are functional programming languages that use top down grammar analyzing syntax equations having embedded output transformation constructs. A syntax equation:
 = ;
is a compiled ''test'' function returning ''success'' or ''failure''. is the function name. is a form of logical expression consisting of tests that may be grouped, have alternates, and output productions. A ''test'' is like a ''bool'' in other languages, ''success'' being ''true'' and ''failure'' being ''false''. Defining a programming language analytically top down is natural. For example, a program could be defined as:
 program = $declaration;
Defining a program as a sequence of zero or more declaration(s). In the Schorre META ''X'' languages there is a driving rule. The program rule above is an example of a driving rule. The program rule is a ''test'' function that calls declaration, a ''test'' rule, that returns ''success'' or ''failure''. The $ loop operator repeatedly calling declaration until ''failure'' is returned. The $ operator is always successful, even when there are zero declaration. Above program would always return success. (In CWIC a long fail can bypass declaration. A long-fail is part of the backtracking system of CWIC) The character sets of these early compilers were limited. The character / was used for the alternant (or) operator. "A or B" is written as A / B. Parentheses ( ) are used for grouping. A (B / C) Describes a construct of A followed by B or C. As a boolean expression it would be A ''and'' (B ''or'' C) A sequence X Y has an implied X and Y meaning. ( ) are grouping and / the or operator. The order of evaluation is always left to right as an input character sequence is being specified by the ordering of the tests. Special operator words whose first character is a "." are used for clarity. .EMPTY is used as the last alternate when no previous alternant need be present. X (A / B / .EMPTY) Indicates that X is optionally followed by A or B. This is a specific characteristic of these metalanguages being programming languages. Backtracking is avoided by the above. Other compiler constructor systems may have declared the three possible sequences and left it up to the parser to figure it out. The characteristics of the metaprogramming metalanguages above are common to all Schorre metacompilers and those derived from them.


META I

META I was a hand compiled metacompiler used to compile META II. Little else is known of META I except that the initial compilation of META II produced nearly identical code to that of the hand coded META I compiler.


META II

Each rule consists optionally of tests, operators, and output productions. A rule attempts to match some part of the input program source character stream returning success or failure. On success the input is advanced over matched characters. On failure the input is not advanced. Output productions produced a form of assembly code directly from a syntax rule.


TREE-META

TREE-META introduced tree building operators :<''node_name''> and ''<''number''>'' moving the output production transforms to unparsed rules. The tree building operators were used in the grammar rules directly transforming the input into an
abstract syntax tree An abstract syntax tree (AST) is a data structure used in computer science to represent the structure of a program or code snippet. It is a tree representation of the abstract syntactic structure of text (often source code) written in a formal ...
. Unparse rules are also test functions that matched tree patterns. Unparse rules are called from a grammar rule when an abstract syntax tree is to be transformed into output code. The building of an abstract syntax tree and unparse rules allowed local optimizations to be performed by analyzing the parse tree. Moving of output productions to the unparse rules made a clear separation of grammar analysis and code production. This made the programming easier to read and understand.


CWIC

In 1968–1970, Erwin Book, Dewey Val Schorre, and Steven J. Sherman developed CWIC. (Compiler for Writing and Implementing Compilers) at System Development Corporationbr>Charles Babbage Institute Center for the History of Information Technology (Box 12, folder 21)
CWIC is a compiler development system composed of three special-purpose, domain specific, languages, each intended to permit the description of certain aspects of translation in a straight forward manner. The syntax language is used to describe the recognition of source text and the construction from it to an intermediate
tree In botany, a tree is a perennial plant with an elongated stem, or trunk, usually supporting branches and leaves. In some usages, the definition of a tree may be narrower, e.g., including only woody plants with secondary growth, only ...
structure. The generator language is used to describe the transformation of the tree into appropriate object language. The syntax language follows Dewey Val Schorre's previous line of metacompilers. It most resembles TREE-META having
tree In botany, a tree is a perennial plant with an elongated stem, or trunk, usually supporting branches and leaves. In some usages, the definition of a tree may be narrower, e.g., including only woody plants with secondary growth, only ...
building operators in the syntax language. The unparse rules of TREE-META are extended to work with the object based generator language based on LISP 2. CWIC includes three languages: * Syntax: Transforms the source program input, into list structures using grammar transformation formula. A parsed expression structure is passed to a generator by placement of a generator call in a rule. A
tree In botany, a tree is a perennial plant with an elongated stem, or trunk, usually supporting branches and leaves. In some usages, the definition of a tree may be narrower, e.g., including only woody plants with secondary growth, only ...
is represented by a list whose first element is a node object. The language has operators, < and >, specifically for making lists. The colon : operator is used to create node objects. :ADD creates an ADD node. The exclamation ! operator combines a number of parsed entries with a node to make a tree. Trees created by syntax rules are passed to generator functions, returning success or failure. The syntax language is very close to TREE-META. Both use a colon to create a node. CWIC's tree building exclamation ! functions the same as TREE-META's number> * Generator: a named series of transforming rules, each consisting of an unparse, pattern matching, rule. and an output production written in a LISP 2 like language. the translation was to IBM 360 binary machine code. Other facilities of the generator language generalized output. * MOL-360: an independent mid level implementation language for the IBM System/360 family of computers developed in 1968 and used for writing the underlying support library.


Generators language

Generators Language had semantics similar to
Lisp Lisp (historically LISP, an abbreviation of "list processing") is a family of programming languages with a long history and a distinctive, fully parenthesized Polish notation#Explanation, prefix notation. Originally specified in the late 1950s, ...
. The parse
tree In botany, a tree is a perennial plant with an elongated stem, or trunk, usually supporting branches and leaves. In some usages, the definition of a tree may be narrower, e.g., including only woody plants with secondary growth, only ...
was thought of as a recursive list. The general form of a Generator Language function is:
  function-name(first-unparse_rule) => first-production_code_generator
           (second-unparse_rule) => second-production_code_generator
           (third-unparse_rule) => third-production_code_generator
               ...
The code to process a given
tree In botany, a tree is a perennial plant with an elongated stem, or trunk, usually supporting branches and leaves. In some usages, the definition of a tree may be narrower, e.g., including only woody plants with secondary growth, only ...
included the features of a general purpose programming language, plus a form: <stuff>, which would emit (stuff) onto the output file. A generator call may be used in the unparse_rule. The generator is passed the element of unparse_rule pattern in which it is placed and its return values are listed in (). For example:
  expr_gen(ADD xpr_gen(x),expr_gen(y) =>
                                <AR + (x*16)+y;>
                                releasereg(y);
                                return x;
               (SUB xpr_gen(x),expr_gen(y)=>
                                <SR + (x*16)+y;>
                                releasereg(y);
                                return x;
               (MUL xpr_gen(x),expr_gen(y)=>
                              .
                              .
                              .
               (x)=>     r1 = getreg();
                            load(r1, x);
                            return r1;
...
That is, if the parse
tree In botany, a tree is a perennial plant with an elongated stem, or trunk, usually supporting branches and leaves. In some usages, the definition of a tree may be narrower, e.g., including only woody plants with secondary growth, only ...
looks like (ADD something1>,, expr_gen(x) would be called with and return x. A variable in the unparse rule is a local variable that can be used in the production_code_generator. expr_gen(y) is called with and returns y. Here is a generator call in an unparse rule is passed the element in the position it occupies. Hopefully in the above x and y will be registers on return. The last transforms is intended to load an atomic into a register and return the register. The first production would be used to generate the 360 "AR" (Add Register) instruction with the appropriate values in general registers. The above example is only a part of a generator. Every generator expression evaluates to a value that con then be further processed. The last transform could just as well have been written as:
               (x)=>     return load(getreg(), x);
In this case load returns its first parameter, the register returned by getreg(). the functions load and getreg are other CWIC generators.


CWIC addressed

domain-specific language A domain-specific language (DSL) is a computer language specialized to a particular application domain. This is in contrast to a general-purpose language (GPL), which is broadly applicable across domains. There are a wide variety of DSLs, ranging ...
s before the term
domain-specific language A domain-specific language (DSL) is a computer language specialized to a particular application domain. This is in contrast to a general-purpose language (GPL), which is broadly applicable across domains. There are a wide variety of DSLs, ranging ...
existed

From the authors of CWIC: "A metacompiler assists the task of compiler-building by automating its non creative aspects, those aspects that are the same regardless of the language which the produced compiler is to translate. This makes possible the design of languages which are appropriate to the specification of a particular problem. It reduces the cost of producing processors for such languages to a point where it becomes economically feasible to begin the solution of a problem with language design."


Examples

* ANTLR * GNU Bison * Coco/R, Coco-2 * Copper * DMS Software Reengineering Toolkit, a program transformation system with parser generators * Epsilon Grammar Studio *
Lemon The lemon (''Citrus'' × ''limon'') is a species of small evergreen tree in the ''Citrus'' genus of the flowering plant family Rutaceae. A true lemon is a hybrid of the citron and the bitter orange. Its origins are uncertain, but some ...
parser generator * LRSTAR: LR(*) parser generator *
META II META II is a Domain-specific language, domain-specific programming language for writing compilers. It was created in 1963–1964 by Dewey Val Schorre at University of California, Los Angeles (UCLA). META II uses what Schorre called ''Syntax (progr ...
* parboiled, a Java library for building parsers. * Packrat parser *
PQCC The Production Quality Compiler-Compiler Project (PQCC) was a long-term project led by William Wulf at Carnegie Mellon University to produce an industrial-strength compiler-compiler. PQCC would produce full, optimizing programming language compile ...
, a compiler-compiler that is more than a parser generator. * Syntax Improving Device (SID) *
SYNTAX In linguistics, syntax ( ) is the study of how words and morphemes combine to form larger units such as phrases and sentences. Central concerns of syntax include word order, grammatical relations, hierarchical sentence structure (constituenc ...
, an integrated toolset for compiler construction. * tacc - The Alternative Compiler Compiler *
TREE-META The TREE-META (or Tree Meta, TREEMETA) Translator Writing System is a compiler-compiler system for context-free languages originally developed in the 1960s. Parsing statements of the metalanguage resemble augmented Backus–Naur form with embedd ...
*
Yacc Yacc (Yet Another Compiler-Compiler) is a computer program for the Unix operating system developed by Stephen C. Johnson. It is a lookahead left-to-right rightmost derivation (LALR) parser generator, generating a LALR parser (the part of a co ...
* Xtext * XPL * JavaCC


See also

*
Parsing expression grammar In computer science, a parsing expression grammar (PEG) is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language. The formalism was introduced by Bryan Ford in 20 ...
* LL parser *
LR parser In computer science, LR parsers are a type of bottom-up parsing, bottom-up parser that analyse deterministic context-free languages in linear time. There are several variants of LR parsers: SLR parsers, LALR parsers, canonical LR parser, canonica ...
* Simple LR parser * LALR parser * GLR parser * Domain analysis *
Domain-specific language A domain-specific language (DSL) is a computer language specialized to a particular application domain. This is in contrast to a general-purpose language (GPL), which is broadly applicable across domains. There are a wide variety of DSLs, ranging ...
* History of compiler construction ** History of compiler construction#Self-hosting compilers * Metacompilation *
Program transformation A program transformation is any operation that takes a computer program and generates another program. In many cases the transformed program is required to be semantically equivalent to the original, relative to a particular Formal semantics of p ...


References and notes


Further reading

* * * * * *


External links


Tony Brooker and the Atlas Compiler Compiler

An Explanation of the Compiler Compiler listings (1963)

Compiler Compiler source code
(starts around PDF page 182)
Original Compiler Compiler flowcharts


Brooker Autocodes
Catalog.compilertools.net
, The Catalog of Compiler Construction Tools
Labraj.uni-mb.si
Lisa
Skenz.it
Jflex and Cup resources {{Parsers Parsing Compiler construction Metaprogramming Pattern matching programming languages Program transformation tools Extensible syntax programming languages Domain-specific programming languages Program analysis Software design Compiler theory