HOME

TheInfoList



OR:

Lex is a
computer program A computer program is a sequence or set of instructions in a programming language for a computer to execute. Computer programs are one component of software, which also includes documentation and other intangible components. A computer prog ...
that generates lexical analyzers ("scanners" or "lexers"). Lex is commonly used with the
yacc Yacc (Yet Another Compiler-Compiler) is a computer program for the Unix operating system developed by Stephen C. Johnson. It is a Look Ahead Left-to-Right Rightmost Derivation (LALR) parser generator, generating a LALR parser (the part of a com ...
parser generator. Lex, originally written by
Mike Lesk Michael E. Lesk (born 1945) is an American computer scientist. Biography In the 1960s, Michael Lesk worked for the SMART Information Retrieval System project, wrote much of its retrieval code and did many of the retrieval experiments, as well a ...
and
Eric Schmidt Eric Emerson Schmidt (born April 27, 1955) is an American businessman and software engineer known for being the CEO of Google from 2001 to 2011, executive chairman of Google from 2011 to 2015, executive chairman of Alphabet Inc. from 2015 to 20 ...
and described in 1975, is the standard lexical analyzer generator on many
Unix Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, an ...
systems, and an equivalent tool is specified as part of the
POSIX The Portable Operating System Interface (POSIX) is a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems. POSIX defines both the system- and user-level application programming interf ...
standard. Lex reads an input
stream A stream is a continuous body of surface water flowing within the bed and banks of a channel. Depending on its location or certain characteristics, a stream may be referred to by a variety of local or regional names. Long large streams ...
specifying the lexical analyzer and writes
source code In computing, source code, or simply code, is any collection of code, with or without comments, written using a human-readable programming language, usually as plain text. The source code of a program is specially designed to facilitate the ...
which implements the lexical analyzer in the
C programming language ''The C Programming Language'' (sometimes termed ''K&R'', after its authors' initials) is a computer programming book written by Brian Kernighan and Dennis Ritchie, the latter of whom originally designed and implemented the language, as well a ...
. In addition to C, some old versions of Lex could generate a lexer in Ratfor.


Open source

Although originally distributed as proprietary software, some versions of Lex are now
open-source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
. Open-source versions of Lex, based on the original proprietary code, are now distributed with open-source operating systems such as
OpenSolaris OpenSolaris () is a discontinued open-source computer operating system based on Solaris and created by Sun Microsystems. It was also, perhaps confusingly, the name of a project initiated by Sun to build a developer and user community around th ...
and
Plan 9 from Bell Labs Plan 9 from Bell Labs is a distributed operating system which originated from the Computing Science Research Center (CSRC) at Bell Labs in the mid-1980s and built on UNIX concepts first developed there in the late 1960s. Since 2000, Plan 9 has ...
. One popular open-source version of Lex, called flex, or the "fast lexical analyzer", is not derived from proprietary coding.


Structure of a Lex file

The structure of a Lex file is intentionally similar to that of a yacc file: files are divided into three sections, separated by lines that contain only two percent signs, as follows: *The definitions section defines macros and imports
header file Many programming languages and other computer files have a directive, often called include (sometimes copy or import), that causes the contents of the specified file to be inserted into the original file. These included files are called copybooks ...
s written in C. It is also possible to write any C code here, which will be copied verbatim into the generated source file. *The rules section associates
regular expression A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" or ...
patterns with C statements. When the lexer sees text in the input matching a given pattern, it will execute the associated C code. *The C code section contains C statements and functions that are copied verbatim to the generated source file. These statements presumably contain code called by the rules in the rules section. In large programs it is more convenient to place this code in a separate file linked in at compile time.


Example of a Lex file

The following is an example Lex file for the flex version of Lex. It recognizes strings of numbers (positive integers) in the input, and simply prints them out. /*** Definition section ***/ % %% /*** Rules section ***/ /* -9 matches a string of one or more digits */ -9 ., \n %% /*** C Code section ***/ int main(void) If this input is given to flex, it will be converted into a C file, . This can be compiled into an executable which matches and outputs strings of integers. For example, given the input: abc123z.!&*2gj6 the program will print: Saw an integer: 123 Saw an integer: 2 Saw an integer: 6


Using Lex with other programming tools


Using Lex with parser generators

Lex and parser generators, such as
Yacc Yacc (Yet Another Compiler-Compiler) is a computer program for the Unix operating system developed by Stephen C. Johnson. It is a Look Ahead Left-to-Right Rightmost Derivation (LALR) parser generator, generating a LALR parser (the part of a com ...
or
Bison Bison are large bovines in the genus ''Bison'' (Greek: "wild ox" (bison)) within the tribe Bovini. Two extant taxon, extant and numerous extinction, extinct species are recognised. Of the two surviving species, the American bison, ''B. bison'' ...
, are commonly used together. Parser generators use a
formal grammar In formal language theory, a grammar (when the context is not given, often called a formal grammar for clarity) describes how to form strings from a language's alphabet that are valid according to the language's syntax. A grammar does not describe ...
to parse an input stream, something which Lex cannot do using simple
regular expression A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" or ...
s, as Lex is limited to simple
finite state automata A finite-state machine (FSM) or finite-state automaton (FSA, plural: ''automata''), finite automaton, or simply a state machine, is a mathematical model of computation. It is an abstract machine that can be in exactly one of a finite number o ...
. It is typically preferable to have a parser, one generated by Yacc for instance, accept a stream of tokens (a "token-stream") as input, rather than having to process a stream of characters (a "character-stream") directly. Lex is often used to produce such a token-stream.
Scannerless parsing In computer science, scannerless parsing (also called lexerless parsing) performs tokenization (breaking a stream of characters into words) and parsing (arranging the words into phrases) in a single step, rather than breaking it up into a pipeli ...
refers to parsing the input character-stream directly, without a distinct lexer.


Lex and make

make is a utility that can be used to maintain programs involving Lex. Make assumes that a file that has an extension of .l is a Lex source file. The make internal macro LFLAGS can be used to specify Lex options to be invoked automatically by make.


See also

*
Flex lexical analyser Flex (fast lexical analyzer generator) is a free and open-source software alternative to lex. It is a computer program that generates lexical analyzers (also known as "scanners" or "lexers"). It is frequently used as the lex implementation tog ...
*
Yacc Yacc (Yet Another Compiler-Compiler) is a computer program for the Unix operating system developed by Stephen C. Johnson. It is a Look Ahead Left-to-Right Rightmost Derivation (LALR) parser generator, generating a LALR parser (the part of a com ...
*
Ragel Ragel is a finite-state machine compiler and a parser generator. Initially Ragel supported output for C, C++ and Assembly source code, and was expanded to support several other languages including Objective C, D, Go, Ruby, and Java. Addition ...
*
PLY (Python Lex-Yacc) PLY is a parsing tool written purely in Python. It is, in essence, a re-implementation of Lex and Yacc originally in C-language. It was written by David M. Beazley. PLY uses the same LALR parsing technique as Lex and Yacc. It also has extens ...
* Comparison of parser generators


References


External links


Using Flex and Bison at Macworld.com
* * {{Plan 9 commands Compiling tools Unix programming tools Unix SUS2008 utilities Plan 9 commands Finite automata Lexical analysis