HOME

TheInfoList



OR:

In computer-based language recognition, ANTLR (pronounced ''
antler Antlers are extensions of an animal's skull found in members of the Cervidae (deer) family. Antlers are a single structure composed of bone, cartilage, fibrous tissue, skin, nerves, and blood vessels. They are generally found only on male ...
''), or ANother Tool for Language Recognition, is a
parser generator In computer science, a compiler-compiler or compiler generator is a programming tool that creates a parser, interpreter, or compiler from some form of formal description of a programming language and machine. The most common type of compiler- ...
that uses LL(*) for parsing. ANTLR is the successor to the Purdue Compiler Construction Tool Set (PCCTS), first developed in 1989, and is under active development. Its maintainer is Professor
Terence Parr Terence John Parr (born 1964 in Los Angeles) is a professor of computer science at the University of San Francisco. He is best known for his ANTLR parser generator and contributions to parsing theory. He also developed the StringTemplate engine ...
of the
University of San Francisco The University of San Francisco (USF) is a private Jesuit university in San Francisco, California. The university's main campus is located on a setting between the Golden Gate Bridge and Golden Gate Park. The main campus is nicknamed "The Hil ...
.


Usage

ANTLR takes as input a
grammar In linguistics, the grammar of a natural language is its set of structure, structural constraints on speakers' or writers' composition of clause (linguistics), clauses, phrases, and words. The term can also refer to the study of such constraint ...
that specifies a language and generates as output
source code In computing, source code, or simply code, is any collection of code, with or without comments, written using a human-readable programming language, usually as plain text. The source code of a program is specially designed to facilitate the wo ...
for a
recognizer A finite-state machine (FSM) or finite-state automaton (FSA, plural: ''automata''), finite automaton, or simply a state machine, is a mathematical model of computation. It is an abstract machine that can be in exactly one of a finite number o ...
of that language. While Version 3 supported generating code in the
programming language A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming ...
s
Ada95 Ada is a structured programming, structured, statically typed, Imperative programming, imperative, and Object-oriented programming, object-oriented high-level programming language, extended from Pascal (programming language), Pascal and other lan ...
,
ActionScript ActionScript is an object-oriented programming language originally developed by Macromedia Inc. (later acquired by Adobe). It is influenced by HyperTalk, the scripting language for HyperCard. It is now an implementation of ECMAScript (meaning i ...
, C, C#,
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
,
JavaScript JavaScript (), often abbreviated as JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS. As of 2022, 98% of Website, websites use JavaScript on the Client (computing), client side ...
,
Objective-C Objective-C is a general-purpose, object-oriented programming language that adds Smalltalk-style messaging to the C programming language. Originally developed by Brad Cox and Tom Love in the early 1980s, it was selected by NeXT for its NeXTS ...
,
Perl Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it also referred to its redesigned "sister language", Perl 6, before the latter's name was offici ...
,
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
,
Ruby A ruby is a pinkish red to blood-red colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called sa ...
, and
Standard ML Standard ML (SML) is a general-purpose, modular, functional programming language with compile-time type checking and type inference. It is popular among compiler writers and programming language researchers, as well as in the development of the ...
, Version 4 at present targets C#, C++,
Dart Dart or DART may refer to: * Dart, the equipment in the game of darts Arts, entertainment and media * Dart (comics), an Image Comics superhero * Dart, a character from ''G.I. Joe'' * Dart, a ''Thomas & Friends'' railway engine character * Dar ...
, Java, JavaScript, Go,
PHP PHP is a general-purpose scripting language geared toward web development. It was originally created by Danish-Canadian programmer Rasmus Lerdorf in 1993 and released in 1995. The PHP reference implementation is now produced by The PHP Group. ...
, Python (2 and 3), and
Swift Swift or SWIFT most commonly refers to: * SWIFT, an international organization facilitating transactions between banks ** SWIFT code * Swift (programming language) * Swift (bird), a family of birds It may also refer to: Organizations * SWIFT, ...
. A language is specified using a
context-free grammar In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form :A\ \to\ \alpha with A a ''single'' nonterminal symbol, and \alpha a string of terminals and/or nonterminals (\alpha can be empt ...
expressed using
Extended Backus–Naur Form In computer science, extended Backus–Naur form (EBNF) is a family of metasyntax notations, any of which can be used to express a context-free grammar. EBNF is used to make a formal description of a formal language such as a computer programm ...
(EBNF). ANTLR can generate lexers,
parser Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from Lati ...
s, tree parsers, and combined lexer-parsers. Parsers can automatically generate
parse tree A parse tree or parsing tree or derivation tree or concrete syntax tree is an ordered, rooted tree that represents the syntactic structure of a string according to some context-free grammar. The term ''parse tree'' itself is used primarily in co ...
s or
abstract syntax tree In computer science, an abstract syntax tree (AST), or just syntax tree, is a tree representation of the abstract syntactic structure of text (often source code) written in a formal language. Each node of the tree denotes a construct occurring ...
s, which can be further processed with tree parsers. ANTLR provides a single consistent notation for specifying lexers, parsers, and tree parsers. By default, ANTLR reads a grammar and generates a recognizer for the language defined by the grammar (i.e., a program that reads an input stream and generates an error if the input stream does not conform to the syntax specified by the grammar). If there are no syntax errors, the default action is to simply exit without printing any message. In order to do something useful with the language, actions can be attached to grammar elements in the grammar. These actions are written in the programming language in which the recognizer is being generated. When the recognizer is being generated, the actions are embedded in the source code of the recognizer at the appropriate points. Actions can be used to build and check symbol tables and to emit instructions in a target language, in the case of a compiler. Other than lexers and parsers, ANTLR can be used to generate tree parsers. These are recognizers that process abstract syntax trees, which can be automatically generated by parsers. These tree parsers are unique to ANTLR and help processing abstract syntax trees.


Licensing

and ANTLR 4 are
free software Free software or libre software is computer software distributed under terms that allow users to run the software for any purpose as well as to study, change, and distribute it and any adapted versions. Free software is a matter of liberty, no ...
, published under a three-clause
BSD License BSD licenses are a family of permissive free software licenses, imposing minimal restrictions on the use and distribution of covered software. This is in contrast to copyleft licenses, which have share-alike requirements. The original BSD lic ...
. Prior versions were released as
public domain software Public-domain software is software that has been placed in the public domain, in other words, software for which there is absolutely no ownership such as copyright, trademark, or patent. Software in the public domain can be modified, distributed, ...
. Documentation, derived from Parr's book ''The Definitive ANTLR 4 Reference'', is included with the BSD-licensed ANTLR 4 source. Various plugins have been developed for the Eclipse development environment to support the ANTLR grammar, including ANTLR Studio, a proprietary product, as well as the "ANTLR 2" and "ANTLR 3" plugins for Eclipse hosted on
SourceForge SourceForge is a web service that offers software consumers a centralized online location to control and manage open-source software projects and research business software. It provides source code repository hosting, bug tracking, mirrorin ...
.


ANTLR 4

ANTLR 4 deals with direct
left recursion In the formal language theory of computer science, left recursion is a special case of recursion where a string is recognized as part of a language by the fact that it decomposes into a string from that same language (on the left) and a suffix (on ...
correctly, but not with left recursion in general, i.e., grammar rules ''x'' that refer to ''y'' that refer to ''x''.


Development

As reported on the tools page of the ANTLR project, plug-ins that enable features like syntax highlighting, syntax error checking and code completion are freely available for the most common IDEs (
Intellij IntelliJ IDEA is an integrated development environment (IDE) written in Java for developing computer software written in Java, Kotlin, Groovy, and other JVM-based languages. It is developed by JetBrains (formerly known as IntelliJ) and is ava ...
IDEA,
NetBeans NetBeans is an integrated development environment (IDE) for Java (programming language), Java. NetBeans allows applications to be developed from a set of modular software components called ''modules''. NetBeans runs on Microsoft Windows, Windows, ...
, Eclipse,
Visual Studio Visual Studio is an integrated development environment (IDE) from Microsoft. It is used to develop computer programs including web site, websites, web apps, web services and mobile apps. Visual Studio uses Microsoft software development platfor ...
and
Visual Studio Code Visual Studio Code, also commonly referred to as VS Code, is a source-code editor made by Microsoft with the Electron Framework, for Windows, Linux and macOS. Features include support for debugging, syntax highlighting, intelligent code complet ...
).


Projects

Software built using ANTLR includes: *
Groovy ''Groovy'' (or, less commonly, ''groovie'' or ''groovey'') is a slang colloquialism popular during the 1950s, '60s and '70s. It is roughly synonymous with words such as "excellent", "fashionable", or "amazing", depending on context. History The ...
. *
Jython Jython is an implementation of the Python programming language designed to run on the Java platform. The implementation was formerly known as JPython until 1999. Overview Jython programs can import and use any Java class. Except for some standa ...
. *
Hibernate Hibernation is a state of minimal activity and metabolic depression undergone by some animal species. Hibernation is a seasonal heterothermy characterized by low body-temperature, slow breathing and heart-rate, and low metabolic rate. It most ...
* OpenJDK Compiler Grammar project experimental version of the
javac javac (pronounced "java-see") is the primary Java compiler included in the Java Development Kit (JDK) from Oracle Corporation. Martin Odersky implemented the GJ compiler, and his implementation became the basis for javac. The compiler accepts s ...
compiler based upon a grammar written in ANTLR. *
Apex The apex is the highest point of something. The word may also refer to: Arts and media Fictional entities * Apex (comics), a teenaged super villainess in the Marvel Universe * Ape-X, a super-intelligent ape in the Squadron Supreme universe *Apex, ...
,
Salesforce.com Salesforce, Inc. is an American cloud-based software company headquartered in San Francisco, California. It provides customer relationship management (CRM) software and applications focused on sales, customer service, marketing automation, a ...
's programming language. * The expression evaluator in
Numbers A number is a mathematical object used to count, measure, and label. The original examples are the natural numbers 1, 2, 3, 4, and so forth. Numbers can be represented in language with number words. More universally, individual numbers can ...
, Apple's spreadsheet. *
Twitter Twitter is an online social media and social networking service owned and operated by American company Twitter, Inc., on which users post and interact with 280-character-long messages known as "tweets". Registered users can post, like, and ...
's search query language. * Weblogic server. *
Apache Cassandra Cassandra is a free and open-source, distributed, wide-column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassand ...
. *
Processing Processing is a free graphical library and integrated development environment (IDE) built for the electronic arts, new media art, and visual design communities with the purpose of teaching non-programmers the fundamentals of computer programming ...
. * JabRef. *
Trino (SQL query engine) Trino is an open-source distributed SQL query engine designed to query large data sets distributed over one or more heterogeneous data sources. Trino can query datalakes that contain open column-oriented data file formats like ORC or Parquet ...
* Presto (SQL query engine) *
MySQL Workbench MySQL Workbench is a visual database design tool that integrates SQL development, administration, database design, creation and maintenance into a single integrated development environment for the MySQL database system. It is the successor t ...
Over 200 grammars implemented in ANTLR 4 are available on
GitHub GitHub, Inc. () is an Internet hosting service for software development and version control using Git. It provides the distributed version control of Git plus access control, bug tracking, software feature requests, task management, continuous ...
. They range from grammars for a URL to grammars for entire languages like C, Java and Go.


Example

In the following example, a parser in ANTLR describes the sum of expressions can be seen in the form of "1 + 2 + 3": // Common options, for example, the target language options // Followed by the parser class SumParser extends Parser; options // Definition of an expression statement: INTEGER (PLUS^ INTEGER)*; // Here is the Lexer class SumLexer extends Lexer; options PLUS: '+'; DIGIT: ('0'..'9'); INTEGER: (DIGIT)+; The following listing demonstrates the call of the parser in a program: TextReader reader; // (...) Fill TextReader with character SumLexer lexer = new SumLexer(reader); SumParser parser = new SumParser(lexer); parser.statement();


See also

*
Coco/R Coco/R is a compiler generator that takes wirth syntax notationIn the manual, however, it is referred as L-attributed Extended Backus–Naur Form syntax (EBNF). grammars of a source language and generates a scanner and a parser for that lang ...
*
DMS Software Reengineering Toolkit The DMS Software Reengineering Toolkit is a proprietary set of program transformation tools available for automating custom source program analysis, modification, translation or generation of software systems for arbitrary mixtures of source langu ...
*
JavaCC JavaCC (Java Compiler Compiler) is an open-source software, open-source parser generator and Lexical analysis, lexical analyzer generator written in the Java (programming language), Java programming language. JavaCC is similar to yacc in that it ...
*
Modular Syntax Definition Formalism The Syntax Definition Formalism (SDF) is a metasyntax used to define context-free grammars: that is, a formal way to describe formal languages. It can express the entire range of context-free grammars. Its current version is SDF3. A parser and par ...
*
Parboiled (Java) parboiled is an open-source Java library released under an Apache License. It provides support for defining PEG parsers directly in Java source code. parboiled is commonly used as an alternative for regular expressions or parser generators (like ...
*
Parsing expression grammar In computer science, a parsing expression grammar (PEG) is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language. The formalism was introduced by Bryan Ford in 200 ...
*
SableCC SableCC is an open-source compiler generator (or interpreter generator) in Java. Stable version is licensed under the GNU Lesser General Public License (LGPL). Rewritten version 4 is licensed under Apache License 2.0. SableCC includes the followi ...


References


Bibliography

* * *


Further reading

*


External links

* {{official website, www.antlr.org
ANTLR (mega) Tutorial

ANTLR Studio
1992 software Free compilers and interpreters Parser generators Software using the BSD license Public-domain software