HOME

TheInfoList



OR:

The C preprocessor is the macro preprocessor for the C,
Objective-C Objective-C is a general-purpose, object-oriented programming language that adds Smalltalk-style messaging to the C programming language. Originally developed by Brad Cox and Tom Love in the early 1980s, it was selected by NeXT for its NeXTS ...
and C++ computer
programming language A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming ...
s. The preprocessor provides the ability for the inclusion of
header files Many programming languages and other computer files have a directive, often called include (sometimes copy or import), that causes the contents of the specified file to be inserted into the original file. These included files are called copybooks ...
, macro expansions,
conditional compilation In computer programming, conditional compilation is a compilation technique which results in an executable program that is able to be altered by changing specified parameters. This technique is commonly used when these alterations to the program ...
, and line control. In many C implementations, it is a separate program invoked by the
compiler In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs tha ...
as the first part of
translation Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. The English language draws a terminological distinction (which does not exist in every language) between ''transl ...
. The language of preprocessor directives is only weakly related to the grammar of C, and so is sometimes used to process other kinds of
text files A text file (sometimes spelled textfile; an old alternative name is flatfile) is a kind of computer file that is structured as a sequence of lines of electronic text. A text file exists stored as data within a computer file system. In operatin ...
.


History

The preprocessor was introduced to C around 1973 at the urging of Alan Snyder and also in recognition of the usefulness of the file-inclusion mechanisms available in
BCPL BCPL ("Basic Combined Programming Language") is a procedural, imperative, and structured programming language. Originally intended for writing compilers for other languages, BCPL is no longer in common use. However, its influence is still ...
and
PL/I PL/I (Programming Language One, pronounced and sometimes written PL/1) is a procedural, imperative computer programming language developed and published by IBM. It is designed for scientific, engineering, business and system programming. I ...
. Its original version offered only file inclusion and simple string replacement using #include and #define for parameterless macros, respectively. It was extended shortly after, firstly by
Mike Lesk Michael E. Lesk (born 1945) is an American computer scientist. Biography In the 1960s, Michael Lesk worked for the SMART Information Retrieval System project, wrote much of its retrieval code and did many of the retrieval experiments, as well a ...
and then by John Reiser, to incorporate macros with arguments and conditional compilation. The C preprocessor was part of a long macro-language tradition at Bell Labs, which was started by Douglas Eastwood and
Douglas McIlroy Malcolm Douglas McIlroy (born 1932) is a mathematician, engineer, and programmer. As of 2019 he is an Adjunct Professor of Computer Science at Dartmouth College. McIlroy is best known for having originally proposed Unix pipelines and developed se ...
in 1959.


Phases

Preprocessing is defined by the first four (of eight) ''phases of translation'' specified in the C Standard. # Trigraph replacement: The preprocessor replaces trigraph sequences with the characters they represent. # Line splicing: Physical source lines that are continued with escaped
newline Newline (frequently called line ending, end of line (EOL), next line (NEL) or line break) is a control character or sequence of control characters in character encoding specifications such as ASCII, EBCDIC, Unicode, etc. This character, or a ...
sequences are ''spliced'' to form logical lines. # Tokenization: The preprocessor breaks the result into ''preprocessing tokens'' and
whitespace White space or whitespace may refer to: Technology * Whitespace characters, characters in computing that represent horizontal or vertical space * White spaces (radio), allocated but locally unused radio frequencies * TV White Space Database, a mec ...
. It replaces comments with whitespace. # Macro expansion and directive handling: Preprocessing directive lines, including file inclusion and conditional compilation, are executed. The preprocessor simultaneously expands macros and, since the 1999 version of the C standard, handles _Pragma operators.


Including files

One of the most common uses of the preprocessor is to include another file: #include int main(void) The preprocessor replaces the line #include <stdio.h> with the textual content of the file 'stdio.h', which declares the printf() function among other things. This can also be written using double quotes, e.g. #include "stdio.h". If the filename is enclosed within angle brackets, the file is searched for in the standard compiler include paths. If the filename is enclosed within double quotes, the search path is expanded to include the current source file directory. C compilers and programming environments all have a facility that allows the programmer to define where include files can be found. This can be introduced through a command-line flag, which can be parameterized using a
makefile In software development, Make is a build automation tool that automatically builds executable programs and libraries from source code by reading files called ''Makefiles'' which specify how to derive the target program. Though integrated deve ...
, so that a different set of include files can be swapped in for different operating systems, for instance. By convention, include files are named with either a ''.h'' or ''.hpp'' extension. However, there is no requirement that this is observed. Files with a ''.def'' extension may denote files designed to be included multiple times, each time expanding the same repetitive content; #include "icon.xbm" is likely to refer to an XBM image file (which is at the same time a C source file). #include often compels the use of #include guards or #pragma once to prevent double inclusion.


Conditional compilation

The if-else directives #if, #ifdef, #ifndef, #else, #elif and #endif can be used for
conditional compilation In computer programming, conditional compilation is a compilation technique which results in an executable program that is able to be altered by changing specified parameters. This technique is commonly used when these alterations to the program ...
. and are simple shorthands for and . #if VERBOSE >= 2 printf("trace message"); #endif Most compilers targeting
Microsoft Windows Windows is a group of several Proprietary software, proprietary graphical user interface, graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, W ...
implicitly define _WIN32. This allows code, including preprocessor commands, to compile only when targeting Windows systems. A few compilers define WIN32 instead. For such compilers that do not implicitly define the _WIN32 macro, it can be specified on the compiler's command line, using -D_WIN32. #ifdef __unix__ /* __unix__ is usually defined by compilers targeting Unix systems */ # include #elif defined _WIN32 /* _WIN32 is usually defined by compilers targeting 32 or 64 bit Windows systems */ # include #endif The example code tests if a macro __unix__ is defined. If it is, the file <unistd.h> is then included. Otherwise, it tests if a macro _WIN32 is defined instead. If it is, the file <windows.h> is then included. A more complex #if example can use operators, for example something like: #if !(defined __LP64__ , , defined __LLP64__) , , defined _WIN32 && !defined _WIN64 // we are compiling for a 32-bit system #else // we are compiling for a 64-bit system #endif Translation can also be caused to fail by using the #error directive: #if RUBY_VERSION

190 #error 1.9.0 not supported #endif


Macro definition and expansion

There are two types of macros, ''object-like'' and ''function-like''. Object-like macros do not take parameters; function-like macros do (although the list of parameters may be empty). The generic syntax for declaring an identifier as a macro of each type is, respectively: #define // object-like macro #define () // function-like macro, note parameters The ''function-like'' macro declaration must not have any whitespace between the identifier and the first, opening, parenthesis. If whitespace is present, the macro will be interpreted as object-like with everything starting from the first parenthesis added to the token list. A macro definition can be removed with #undef: #undef // delete the macro Whenever the identifier appears in the source code it is replaced with the replacement token list, which can be empty. For an identifier declared to be a function-like macro, it is only replaced when the following token is also a left parenthesis that begins the argument list of the macro invocation. The exact procedure followed for expansion of function-like macros with arguments is subtle. Object-like macros were conventionally used as part of good programming practice to create symbolic names for constants, e.g., #define PI 3.14159 instead of
hard-coding Hard coding (also hard-coding or hardcoding) is the software development practice of embedding data directly into the source code of a program or other executable object, as opposed to obtaining the data from external sources or generating it at ...
numbers throughout the code. An alternative in both C and C++, especially in situations in which a pointer to the number is required, is to apply the const qualifier to a global variable. This causes the value to be stored in memory, instead of being substituted by the preprocessor. An example of a function-like macro is: #define RADTODEG(x) ((x) * 57.29578) This defines a
radian The radian, denoted by the symbol rad, is the unit of angle in the International System of Units (SI) and is the standard unit of angular measure used in many areas of mathematics. The unit was formerly an SI supplementary unit (before that c ...
s-to-degrees conversion which can be inserted in the code where required, i.e., RADTODEG(34). This is expanded in-place, so that repeated multiplication by the constant is not shown throughout the code. The macro here is written as all uppercase to emphasize that it is a macro, not a compiled function. The second is enclosed in its own pair of parentheses to avoid the possibility of incorrect
order of operations In mathematics and computer programming, the order of operations (or operator precedence) is a collection of rules that reflect conventions about which procedures to perform first in order to evaluate a given mathematical expression. For examp ...
when it is an expression instead of a single value. For example, the expression expands correctly as ; without parentheses, gives precedence to the multiplication. Similarly, the outer pair of parentheses maintain correct order of operation. For example, expands to ; without parentheses, gives precedence to the division.


Order of expansion

''function-like'' macro expansion occurs in the following stages: # Stringification operations are replaced with the textual representation of their argument's replacement list (without performing expansion). # Parameters are replaced with their replacement list (without performing expansion). # Concatenation operations are replaced with the concatenated result of the two operands (without expanding the resulting token). # Tokens originating from parameters are expanded. # The resulting tokens are expanded as normal. This may produce surprising results: #define HE HI #define LLO _THERE #define HELLO "HI THERE" #define CAT(a,b) a##b #define XCAT(a,b) CAT(a,b) #define CALL(fn) fn(HE,LLO) CAT(HE, LLO) // "HI THERE", because concatenation occurs before normal expansion XCAT(HE, LLO) // HI_THERE, because the tokens originating from parameters ("HE" and "LLO") are expanded first CALL(CAT) // "HI THERE", because parameters are expanded first


Special macros and directives

Certain symbols are required to be defined by an implementation during preprocessing. These include __FILE__ and __LINE__, predefined by the preprocessor itself, which expand into the current file and line number. For instance the following: // debugging macros so we can pin down message origin at a glance // is bad #define WHERESTR " ile %s, line %d " #define WHEREARG __FILE__, __LINE__ #define DEBUGPRINT2(...) fprintf(stderr, __VA_ARGS__) #define DEBUGPRINT(_fmt, ...) DEBUGPRINT2(WHERESTR _fmt, WHEREARG, __VA_ARGS__) // OR // is good #define DEBUGPRINT(_fmt, ...) fprintf(stderr, " ile %s, line %d " _fmt, __FILE__, __LINE__, __VA_ARGS__) DEBUGPRINT("hey, x=%d\n", x); prints the value of x, preceded by the file and line number to the error stream, allowing quick access to which line the message was produced on. Note that the WHERESTR argument is concatenated with the string following it. The values of __FILE__ and __LINE__ can be manipulated with the #line directive. The #line directive determines the line number and the file name of the line below. E.g.: #line 314 "pi.c" printf("line=%d file=%s\n", __LINE__, __FILE__); generates the printf function: printf("line=%d file=%s\n", 314, "pi.c"); Source code
debugger A debugger or debugging tool is a computer program used to test and debug other programs (the "target" program). The main use of a debugger is to run the target program under controlled conditions that permit the programmer to track its execut ...
s refer also to the source position defined with __FILE__ and __LINE__. This allows source code debugging when C is used as the target language of a compiler, for a totally different language. The first C Standard specified that the macro __STDC__ be defined to 1 if the implementation conforms to the ISO Standard and 0 otherwise, and the macro __STDC_VERSION__ defined as a numeric literal specifying the version of the Standard supported by the implementation. Standard C++ compilers support the __cplusplus macro. Compilers running in non-standard mode must not set these macros or must define others to signal the differences. Other Standard macros include __DATE__, the current date, and __TIME__, the current time. The second edition of the C Standard,
C99 C99 (previously known as C9X) is an informal name for ISO/IEC 9899:1999, a past version of the C programming language standard. It extends the previous version ( C90) with new features for the language and the standard library, and helps im ...
, added support for __func__, which contains the name of the function definition within which it is contained, but because the preprocessor is
agnostic Agnosticism is the view or belief that the existence of God, of the divine or the supernatural is unknown or unknowable. (page 56 in 1967 edition) Another definition provided is the view that "human reason is incapable of providing sufficient ...
to the grammar of C, this must be done in the compiler itself using a variable local to the function. Macros that can take a varying number of arguments (
variadic macro A variadic macro is a feature of some computer programming languages, especially the C preprocessor, whereby a macro may be declared to accept a varying number of arguments. Variable-argument macros were introduced in 1999 in the ''ISO/IEC 9899 ...
s) are not allowed in C89, but were introduced by a number of compilers and standardized in
C99 C99 (previously known as C9X) is an informal name for ISO/IEC 9899:1999, a past version of the C programming language standard. It extends the previous version ( C90) with new features for the language and the standard library, and helps im ...
. Variadic macros are particularly useful when writing wrappers to functions taking a variable number of parameters, such as
printf The printf format string is a control parameter used by a class of functions in the input/output libraries of C and many other programming languages. The string is written in a simple template language: characters are usually copied litera ...
, for example when logging warnings and errors. One little-known usage pattern of the C preprocessor is known as X-Macros.Wirzenius, Lars. C "Preprocessor Trick For Implementing Similar Data Types". Retrieved January 9, 2011
/ref> An X-Macro is a
header file Many programming languages and other computer files have a directive, often called include (sometimes copy or import), that causes the contents of the specified file to be inserted into the original file. These included files are called copybooks ...
. Commonly these use the extension ".def" instead of the traditional ".h". This file contains a list of similar macro calls, which can be referred to as "component macros". The include file is then referenced repeatedly. Many compilers define additional, non-standard macros, although these are often poorly documented. A common reference for these macros is th
Pre-defined C/C++ Compiler Macros project
which lists "various pre-defined compiler macros that can be used to identify standards, compilers, operating systems, hardware architectures, and even basic run-time libraries at compile-time".


Token stringification

The # operator (known as the "Stringification Operator") converts a token into a C
string literal A string literal or anonymous string is a string value in the source code of a computer program. Modern programming languages commonly use a quoted sequence of characters, formally " bracketed delimiters", as in x = "foo", where "foo" is a string ...
, escaping any quotes or backslashes appropriately. Example: #define str(s) #s str(p = "foo\n";) // outputs "p = \"foo\\n\";" str(\n) // outputs "\n" If you want to stringify the expansion of a macro argument, you have to use two levels of macros: #define xstr(s) str(s) #define str(s) #s #define foo 4 str (foo) // outputs "foo" xstr (foo) // outputs "4" You cannot combine a macro argument with additional text and stringify it all together. You can however write a series of adjacent string constants and stringified arguments: the C compiler will then combine all the adjacent string constants into one long string.


Token concatenation

The ## operator (known as the "Token Pasting Operator") concatenates two tokens into one token. Example: #define DECLARE_STRUCT_TYPE(name) typedef struct name##_s name##_t DECLARE_STRUCT_TYPE(g_object); // Outputs: typedef struct g_object_s g_object_t;


User-defined compilation errors

The #error directive outputs a message through the error stream. #error "error message"


Implementations

All C, C++ and Objective-C implementations provide a preprocessor, as preprocessing is a required step for those languages, and its behavior is described by official standards for these languages, such as the ISO C standard. Implementations may provide their own extensions and deviations, and vary in their degree of compliance with written standards. Their exact behavior may depend on command-line flags supplied on invocation. For instance, the GNU C preprocessor can be made more standards compliant by supplying certain flags.


Compiler-specific preprocessor features

The #pragma directive is a compiler-specific directive, which compiler vendors may use for their own purposes. For instance, a #pragma is often used to allow suppression of specific error messages, manage heap and stack debugging and so on. A compiler with support for the
OpenMP OpenMP (Open Multi-Processing) is an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran, on many platforms, instruction-set architectures and operating sy ...
parallelization library can automatically parallelize a for loop with #pragma omp parallel for. C99 introduced a few standard #pragma directives, taking the form #pragma STDC ..., which are used to control the floating-point implementation. The alternative, macro-like form was also added. * Many implementations do not support trigraphs or do not replace them by default. * Many implementations (including, e.g., the C compilers by GNU, Intel, Microsoft and IBM) provide a non-standard directive to print out a warning message in the output, but not stop the compilation process. A typical use is to warn about the usage of some old code, which is now
deprecated In several fields, especially computing, deprecation is the discouragement of use of some terminology, feature, design, or practice, typically because it has been superseded or is no longer considered efficient or safe, without completely removing ...
and only included for compatibility reasons, e.g.: // GNU, Intel and IBM #warning "Do not use ABC, which is deprecated. Use XYZ instead." // Microsoft #pragma message("Do not use ABC, which is deprecated. Use XYZ instead.") * Some
Unix Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, an ...
preprocessors traditionally provided "assertions", which have little similarity to assertions used in programming. * GCC provides #include_next for chaining headers of the same name. *
Objective-C Objective-C is a general-purpose, object-oriented programming language that adds Smalltalk-style messaging to the C programming language. Originally developed by Brad Cox and Tom Love in the early 1980s, it was selected by NeXT for its NeXTS ...
preprocessors have #import, which is like #include but only includes the file once. A common vendor pragma with a similar functionality in C is #pragma once.


Other uses

As the C preprocessor can be invoked separately from the compiler with which it is supplied, it can be used separately, on different languages. Notable examples include its use in the now-deprecated
imake imake is a build automation system written for the X Window System. It was used by X from X11R1 (1987) to X11R6.9 (2005), and continued to be used in XFree86 (last commit 2009). It is implemented on top of the C preprocessor and make. The first ve ...
system and for preprocessing Fortran. However, such use as a general purpose preprocessor is limited: the input language must be sufficiently C-like. The GNU Fortran compiler automatically calls "traditional mode" (see below) cpp before compiling Fortran code if certain file extensions are used. Intel offers a Fortran preprocessor, fpp, for use with the ifort compiler, which has similar capabilities. CPP also works acceptably with most
assembly language In computer programming, assembly language (or assembler language, or symbolic machine code), often referred to simply as Assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence b ...
s and Algol-like languages. This requires that the language syntax not conflict with CPP syntax, which means no lines starting with # and that double quotes, which cpp interprets as
string literal A string literal or anonymous string is a string value in the source code of a computer program. Modern programming languages commonly use a quoted sequence of characters, formally " bracketed delimiters", as in x = "foo", where "foo" is a string ...
s and thus ignores, don't have syntactical meaning other than that. The "traditional mode" (acting like a pre-ISO C preprocessor) is generally more permissive and better suited for such use. The C preprocessor is not
Turing-complete In computability theory, a system of data-manipulation rules (such as a computer's instruction set, a programming language, or a cellular automaton) is said to be Turing-complete or computationally universal if it can be used to simulate any Tur ...
, but it comes very close: recursive computations can be specified, but with a fixed upper bound on the amount of recursion performed. However, the C preprocessor is not designed to be, nor does it perform well as, a general-purpose programming language. As the C preprocessor does not have features of some other preprocessors, such as recursive macros, selective expansion according to quoting, and string evaluation in conditionals, it is very limited in comparison to a more general macro processor such as m4.


See also

*
C syntax C, or c, is the third letter in the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages and others worldwide. Its name in English is ''cee'' (pronounced ), plural ''cees''. History "C" ...
* Make *
Preprocessor In computer science, a preprocessor (or precompiler) is a program that processes its input data to produce output that is used as input in another program. The output is said to be a preprocessed form of the input data, which is often used by s ...
*
m4 (computer language) m4 is a general-purpose macro processor included in most Unix-like operating systems, and is a component of the POSIX standard. The language was designed by Brian Kernighan and Dennis Ritchie for the original versions of UNIX. It is an extensio ...
* PL/I preprocessor


References


Sources

*


External links


ISO/IEC 9899
The official C standard. As of 2014, the latest publicly available version i
a working paper for C11



Visual Studio .NET preprocessor reference

Pre-defined C/C++ Compiler Macros project
lists "various pre-defined compiler macros that can be used to identify standards, compilers, operating systems, hardware architectures, and even basic run-time libraries at compile-time" {{CProLang C (programming language) Transformation languages Macro programming languages