Flex (fast
lexical analyzer generator) is a
free and open-source software
Free and open-source software (FOSS) is software available under a license that grants users the right to use, modify, and distribute the software modified or not to everyone free of charge. FOSS is an inclusive umbrella term encompassing free ...
alternative to
lex.
It is a
computer program
A computer program is a sequence or set of instructions in a programming language for a computer to Execution (computing), execute. It is one component of software, which also includes software documentation, documentation and other intangibl ...
that generates
lexical analyzers (also known as "scanners" or "lexers").
It is frequently used as the lex implementation together with
Berkeley Yacc parser generator
In computer science, a compiler-compiler or compiler generator is a programming tool that creates a parser, interpreter, or compiler from some form of formal description of a programming language and machine.
The most common type of compiler- ...
on
BSD
The Berkeley Software Distribution (BSD), also known as Berkeley Unix or BSD Unix, is a discontinued Unix operating system developed and distributed by the Computer Systems Research Group (CSRG) at the University of California, Berkeley, beginni ...
-derived operating systems (as both
lex
and
yacc
are part of
POSIX
The Portable Operating System Interface (POSIX; ) is a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems. POSIX defines application programming interfaces (APIs), along with comm ...
), or together with
GNU bison (a version of
yacc
Yacc (Yet Another Compiler-Compiler) is a computer program for the Unix operating system developed by Stephen C. Johnson. It is a lookahead left-to-right rightmost derivation (LALR) parser generator, generating a LALR parser (the part of a co ...
) in
*BSD ports and in Linux distributions. Unlike Bison, flex is not part of the
GNU Project
The GNU Project ( ) is a free software, mass collaboration project announced by Richard Stallman on September 27, 1983. Its goal is to give computer users freedom and control in their use of their computers and Computer hardware, computing dev ...
and is not released under the
GNU General Public License
The GNU General Public Licenses (GNU GPL or simply GPL) are a series of widely used free software licenses, or ''copyleft'' licenses, that guarantee end users the freedom to run, study, share, or modify the software. The GPL was the first ...
, although a manual for Flex was produced and published by the Free Software Foundation.
History
Flex was written in
C around 1987
by
Vern Paxson, with the help of many ideas and much inspiration from
Van Jacobson. Original version by
Jef Poskanzer. The fast table representation is a partial implementation of a design done by Van Jacobson. The implementation was done by Kevin Gong and Vern Paxson.
Example lexical analyzer
This is an example of a Flex scanner for the instructional programming language
PL/0.
The tokens recognized are: '
+
', '
-
', '
*
', '
/
', '
=
', '
(
', '
)
', '
,
', '
;
', '
.
', '
:=
', '
<
', '
<=
', '
<>
', '
>
', '
>=
';
numbers:
0-9
; identifiers:
a-zA-Z
and keywords:
begin
,
call
,
const
,
do
,
end
,
if
,
odd
,
procedure
,
then
,
var
,
while
.
%
digit -9letter -zA-Z
%%
"+"
"-"
"*"
"/"
"("
")"
";"
","
"."
":="
"="
"<>"
"<"
">"
"<="
">="
"begin"
"call"
"const"
"do"
"end"
"if"
"odd"
"procedure"
"then"
"var"
"while"
(, )*
+
\t\n\r /* skip whitespace */
.
%%
int yywrap(void)
Internals
These programs perform character parsing and tokenizing via the use of a
deterministic finite automaton
In the theory of computation, a branch of theoretical computer science, a deterministic finite automaton (DFA)—also known as deterministic finite acceptor (DFA), deterministic finite-state machine (DFSM), or deterministic finite-state auto ...
(DFA). A DFA is a theoretical machine accepting
regular language
In theoretical computer science and formal language theory, a regular language (also called a rational language) is a formal language that can be defined by a regular expression, in the strict sense in theoretical computer science (as opposed to ...
s. These machines are a subset of the collection of
Turing machine
A Turing machine is a mathematical model of computation describing an abstract machine that manipulates symbols on a strip of tape according to a table of rules. Despite the model's simplicity, it is capable of implementing any computer algori ...
s. DFAs are equivalent to
read-only right moving Turing machines. The syntax is based on the use of
regular expressions
A regular expression (shortened as regex or regexp), sometimes referred to as rational expression, is a sequence of character (computing), characters that specifies a pattern matching, match pattern in string (computer science), text. Usually ...
. See also
nondeterministic finite automaton
In automata theory, a finite-state machine is called a deterministic finite automaton (DFA), if
* each of its transitions is ''uniquely'' determined by its source state and input symbol, and
* reading an input symbol is required for each state tr ...
.
Issues
Time complexity
A Flex lexical analyzer usually has time complexity
in the length of the input. That is, it performs a constant number of operations for each input symbol. This constant is quite low:
GCC generates 12 instructions for the DFA match loop. Note that the constant is independent of the length of the token, the length of the regular expression and the size of the DFA.
However, using the REJECT macro in a scanner with the potential to match extremely long tokens can cause Flex to generate a scanner with non-linear performance. This feature is optional. In this case, the programmer has explicitly told Flex to "go back and try again" after it has already matched some input. This will cause the DFA to backtrack to find other accept states. The REJECT feature is not enabled by default, and because of its performance implications its use is discouraged in the Flex manual.
Reentrancy
By default the scanner generated by Flex is not
reentrant. This can cause serious problems for programs that use the generated scanner from different threads. To overcome this issue there are options that Flex provides in order to achieve reentrancy. A detailed description of these options can be found in the Flex manual.
Usage under non-Unix environments
Normally the generated scanner contains references to the ''unistd.h'' header file, which is
Unix
Unix (, ; trademarked as UNIX) is a family of multitasking, multi-user computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, a ...
specific. To avoid generating code that includes ''
unistd.h
In the C programming language, C and C++ programming languages, unistd.h is the name of the header file that provides access to the POSIX operating system application programming interface, API. It is defined by the POSIX.1 standard, the base ...
'', ''%option nounistd'' should be used. Another issue is the call to ''
isatty'' (a Unix library function), which can be found in the generated code. The ''%option never-interactive'' forces flex to generate code that does not use ''isatty''.
Using flex from other languages
Flex can only generate code for
C and
C++. To use the scanner code generated by flex from other languages a
language binding
In programming and software design, a binding is an application programming interface (API) that provides glue code specifically made to allow a programming language to use a foreign library or operating system service (one that is not native to ...
tool such as
SWIG can be used.
Unicode support
Flex is limited to matching 1-byte (8-bit) binary values and therefore does not support
Unicode
Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
.
RE/flex and other alternatives do support Unicode matching.
Flex++
flex++ is a similar lexical scanner for
C++ which is included as part of the flex package. The generated code does not depend on any
runtime or external
library
A library is a collection of Book, books, and possibly other Document, materials and Media (communication), media, that is accessible for use by its members and members of allied institutions. Libraries provide physical (hard copies) or electron ...
except for a memory allocator (
malloc
C dynamic memory allocation refers to performing manual memory management for dynamic memory allocation in the C programming language via a group of functions in the C standard library, namely , , , and .
The C++ programming language includ ...
or a user-supplied alternative) unless the input also depends on it. This can be useful in
embedded and similar situations where traditional
operating system
An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs.
Time-sharing operating systems scheduler (computing), schedule tasks for ...
or
C runtime facilities may not be available.
The flex++ generated C++ scanner includes the header file
FlexLexer.h
, which defines the interfaces of the two C++ generated classes.
See also
*
Comparison of parser generators
*
Lex
*
yacc
Yacc (Yet Another Compiler-Compiler) is a computer program for the Unix operating system developed by Stephen C. Johnson. It is a lookahead left-to-right rightmost derivation (LALR) parser generator, generating a LALR parser (the part of a co ...
*
GNU Bison
*
Berkeley Yacc
References
Further reading
*
*M. E. Lesk and E. Schmidt, ''LEX - Lexical Analyzer Generator''
*Alfred Aho, Ravi Sethi and Jeffrey Ullman, ''Compilers: Principles, Techniques and Tools'', Addison-Wesley (1986). Describes the pattern-matching techniques used by flex (deterministic finite automata)
External links
*
ANSI-C Lex SpecificationJFlex: Fast Scanner Generator for JavaBrief description of Lex, Flex, YACC, and Bison
{{DEFAULTSORT:Flex Lexical Analyser
Compiling tools
Free software programmed in C
Finite-state machines
Software using the BSD license
Lexical analysis