HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includi ...
, a preprocessor (or precompiler) is a
program Program, programme, programmer, or programming may refer to: Business and management * Program management, the process of managing several related projects * Time management * Program, a part of planning Arts and entertainment Audio * Progra ...
that processes its input data to produce output that is used as input in another program. The output is said to be a preprocessed form of the input data, which is often used by some subsequent programs like
compiler In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs tha ...
s. The amount and kind of processing done depends on the nature of the preprocessor; some preprocessors are only capable of performing relatively simple textual substitutions and macro expansions, while others have the power of full-fledged
programming language A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming ...
s. A common example from
computer programming Computer programming is the process of performing a particular computation (or more generally, accomplishing a specific computing result), usually by designing and building an executable computer program. Programming involves tasks such as anal ...
is the processing performed on
source code In computing, source code, or simply code, is any collection of code, with or without comments, written using a human-readable programming language, usually as plain text. The source code of a program is specially designed to facilitate the w ...
before the next step of compilation. In some
computer language A computer language is a formal language used to communicate with a computer. Types of computer languages include: * Construction language – all forms of communication by which a human can specify an executable problem solution to a comput ...
s (e.g., C and
PL/I PL/I (Programming Language One, pronounced and sometimes written PL/1) is a procedural, imperative computer programming language developed and published by IBM. It is designed for scientific, engineering, business and system programming. I ...
) there is a phase of
translation Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. The English language draws a terminological distinction (which does not exist in every language) between ''transla ...
known as ''preprocessing''. It can also include macro processing, file inclusion and language extensions.


Lexical preprocessors

Lexical preprocessors are the lowest-level of preprocessors as they only require
lexical analysis In computer science, lexical analysis, lexing or tokenization is the process of converting a sequence of characters (such as in a computer program or web page) into a sequence of ''lexical tokens'' ( strings with an assigned and thus identified ...
, that is, they operate on the source text, prior to any
parsing Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from L ...
, by performing simple substitution of
tokenized In computer science, lexical analysis, lexing or tokenization is the process of converting a sequence of characters (such as in a computer program or web page) into a sequence of ''lexical tokens'' ( strings with an assigned and thus identified ...
character sequences for other tokenized character sequences, according to user-defined rules. They typically perform
macro substitution In computer programming, a macro (short for "macro instruction"; ) is a rule or pattern that specifies how a certain input should be mapped to a replacement output. Applying a macro to an input is known as macro expansion. The input and output ...
, textual inclusion of other files, and conditional compilation or inclusion.


C preprocessor

The most common example of this is the
C preprocessor The C preprocessor is the macro preprocessor for the C, Objective-C and C++ computer programming languages. The preprocessor provides the ability for the inclusion of header files, macro expansions, conditional compilation, and line control ...
, which takes lines beginning with '#' as directives. The C preprocessor does not expect its input to use the syntax of the C language. Some languages take a different approach and use built-in language features to achieve similar things. For example: * Instead of macros, some languages use aggressive inlining and templates. * Instead of includes, some languages use compile-time imports that rely on type information in the object code. * Some languages use if-then-else and
dead code elimination In compiler theory, dead-code elimination (also known as DCE, dead-code removal, dead-code stripping, or dead-code strip) is a compiler optimization to remove code which does not affect the program results. Removing such code has several benefits: ...
to achieve
conditional compilation In computer programming, conditional compilation is a compilation technique which results in an executable program that is able to be altered by changing specified parameters. This technique is commonly used when these alterations to the program a ...
.


Other lexical preprocessors

Other lexical preprocessors include the general-purpose m4, most commonly used in cross-platform build systems such as
autoconf GNU Autoconf is a tool for producing configure scripts for building, installing, and packaging software on computer systems where a Bourne shell is available. Autoconf is agnostic about the programming languages used, but it is often used for ...
, and GEMA, an open source macro processor which operates on patterns of context.


Syntactic preprocessors

Syntactic preprocessors were introduced with the Lisp family of languages. Their role is to transform syntax trees according to a number of user-defined rules. For some programming languages, the rules are written in the same language as the program (compile-time reflection). This is the case with Lisp and OCaml. Some other languages rely on a fully external language to define the transformations, such as the XSLT preprocessor for
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable ...
, or its statically typed counterpart CDuce. Syntactic preprocessors are typically used to customize the syntax of a language, extend a language by adding new primitives, or embed a
domain-specific programming language A domain-specific language (DSL) is a computer language specialized to a particular application domain. This is in contrast to a general-purpose language (GPL), which is broadly applicable across domains. There are a wide variety of DSLs, ranging ...
(DSL) inside a general purpose language.


Customizing syntax

A good example of syntax customization is the existence of two different syntaxes in the Objective Caml programming language. Programs may be written indifferently using the "normal syntax" or the "revised syntax", and may be pretty-printed with either syntax on demand. Similarly, a number of programs written in OCaml customize the syntax of the language by the addition of new operators.


Extending a language

The best examples of language extension through macros are found in the Lisp family of languages. While the languages, by themselves, are simple dynamically typed functional cores, the standard distributions of
Scheme A scheme is a systematic plan for the implementation of a certain idea. Scheme or schemer may refer to: Arts and entertainment * ''The Scheme'' (TV series), a BBC Scotland documentary series * The Scheme (band), an English pop band * ''The Schem ...
or Common Lisp permit imperative or object-oriented programming, as well as static typing. Almost all of these features are implemented by syntactic preprocessing, although it bears noting that the "macro expansion" phase of compilation is handled by the compiler in Lisp. This can still be considered a form of preprocessing, since it takes place before other phases of compilation.


Specializing a language

One of the unusual features of the Lisp family of languages is the possibility of using macros to create an internal DSL. Typically, in a large Lisp-based project, a module may be written in a variety of such
minilanguage A domain-specific language (DSL) is a computer language specialized to a particular application domain. This is in contrast to a general-purpose language (GPL), which is broadly applicable across domains. There are a wide variety of DSLs, ranging ...
s, one perhaps using a SQL-based dialect of Lisp, another written in a dialect specialized for
GUI The GUI ( "UI" by itself is still usually pronounced . or ), graphical user interface, is a form of user interface that allows users to interact with electronic devices through graphical icons and audio indicator such as primary notation, inste ...
s or pretty-printing, etc. Common Lisp's standard library contains an example of this level of syntactic abstraction in the form of the LOOP macro, which implements an Algol-like minilanguage to describe complex iteration, while still enabling the use of standard Lisp operators. The MetaOCaml preprocessor/language provides similar features for external DSLs. This preprocessor takes the description of the semantics of a language (i.e. an interpreter) and, by combining compile-time interpretation and code generation, turns that definition into a compiler to the OCaml programming languageā€”and from that language, either to bytecode or to native code.


General purpose preprocessor

Most preprocessors are specific to a particular data processing task (e.g.,
compiling In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...
the C language). A preprocessor may be promoted as being ''general purpose'', meaning that it is not aimed at a specific usage or programming language, and is intended to be used for a wide variety of text processing tasks. M4 is probably the most well known example of such a general purpose preprocessor, although the C preprocessor is sometimes used in a non-C specific role. Examples: * using C preprocessor for
JavaScript JavaScript (), often abbreviated as JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS. As of 2022, 98% of websites use JavaScript on the client side for webpage behavior, of ...
preprocessing. * using C preprocessor for devicetree processing within the Linux kernel. * using M4 (see on-article example) or C preprocessorShow how to use C-preprocessor as template engine
"Using a C preprocessor as an HTML authoring tool"
''by J. Korpela'', 2000.
as a template engine, to
HTML The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaSc ...
generation. *
imake imake is a build automation system written for the X Window System. It was used by X from X11R1 (1987) to X11R6.9 (2005), and continued to be used in XFree86 (last commit 2009). It is implemented on top of the C preprocessor and make. The first ve ...
, a
make Make or MAKE may refer to: * Make (magazine), a tech DIY periodical *Make (software), a software build tool *Make, Botswana, in the Kalahari Desert *Make Architects Make Architects is an international architecture practice headquartered in Londo ...
interface using the C preprocessor, written for the
X Window System The X Window System (X11, or simply X) is a windowing system for bitmap displays, common on Unix-like operating systems. X provides the basic framework for a GUI environment: drawing and moving windows on the display device and interacting wi ...
but now deprecated in favour of
automake In software development, GNU Automake is a programming tool to automate parts of the compilation process. It eases usual compilation problems. For example, it points to needed dependencies. It automatically generates one or more ''Makefile.in ...
. * grompp, a preprocessor for simulation input files for GROMACS (a fast, free, open-source code for some problems in computational chemistry) which calls the system C preprocessor (or other preprocessor as determined by the simulation input file) to parse the topology, using mostly the #define and #include mechanisms to determine the effective topology at grompp run time.


See also

* * * * * * * * * * The * The * The * The *


References


External links

{{Wiktionary, preprocessor
DSL Design in Lisp


* Th
Generic PreProcessor
* Gema, th
General Purpose Macro Processor
* The PIKTbr>piktc

pyexpander, a python based general purpose macro processor

minimac, a minimalist macro processor

Java Comment Preprocessor
Programming language implementation