HOME

TheInfoList



OR:

The printf format string is a control parameter used by a class of functions in the input/output libraries of C and many other
programming languages A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming ...
. The string is written in a simple
template language A template processor (also known as a template engine or template parser) is software designed to combine templates with a data model to produce result documents. The language that the templates are written in is known as a template language ...
: characters are usually copied literally into the function's output, but format specifiers, which start with a character, indicate the location and method to translate a piece of data (such as a number) to characters. "printf" is the name of one of the main C output functions, and stands for "''print f''ormatted". printf format strings are complementary to
scanf format string A scanf format string (''scan f''ormatted) is a control parameter used in various functions to specify the layout of an input string. The functions can then divide the string and translate into values of appropriate data types. String scannin ...
s, which provide formatted input (
lexing In computer science, lexical analysis, lexing or tokenization is the process of converting a sequence of characters (such as in a computer program or web page) into a sequence of ''lexical tokens'' ( strings with an assigned and thus identified ...
aka.
parsing Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from L ...
). In both cases these provide simple functionality and fixed format compared to more sophisticated and flexible template engines or lexers/parsers, but are sufficient for many purposes. Many languages other than C copy the printf format string syntax closely or exactly in their own I/O functions. Mismatches between the format specifiers and type of the data can cause crashes and other vulnerabilities. The format string itself is very often a
string literal A string literal or anonymous string is a string value in the source code of a computer program. Modern programming languages commonly use a quoted sequence of characters, formally " bracketed delimiters", as in x = "foo", where "foo" is a string ...
, which allows
static analysis Static analysis, static projection, or static scoring is a simplified analysis wherein the effect of an immediate change to a system is calculated without regard to the longer-term response of the system to that change. If the short-term effect i ...
of the function call. However, it can also be the value of a variable, which allows for dynamic formatting but also a security vulnerability known as an
uncontrolled format string Uncontrolled format string is a type of software vulnerability discovered around 1989 that can be used in security exploits. Originally thought harmless, format string exploits can be used to crash a program or to execute harmful code. The proble ...
exploit.


History

Early programming languages such as Fortran used special statements with completely different syntax from other calculations to build formatting descriptions. In this example, the format is specified on line 601, and the WRITE command refers to it by line number: WRITE OUTPUT TAPE 6, 601, IA, IB, IC, AREA 601 FORMAT (4H A= ,I5,5H B= ,I5,5H C= ,I5, & 8H AREA= ,F10.2, 13H SQUARE UNITS)
ALGOL 68 ALGOL 68 (short for ''Algorithmic Language 1968'') is an imperative programming language that was conceived as a successor to the ALGOL 60 programming language, designed with the goal of a much wider scope of application and more rigorously d ...
had more function-like
API An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how ...
, but still used special syntax (the delimiters surround special formatting syntax): printf(($"Color "g", number1 "6d,", number2 "4zd,", hex "16r2d,", float "-d.2d,", unsigned value"-3d"."l$, "red", 123456, 89, BIN 255, 3.14, 250)); But using the normal function calls and data types simplifies the language and compiler, and allows the implementation of the input/output to be written in the same language. These advantages outweigh the disadvantages (such as a complete lack of type safety in many instances) and in most newer languages I/O is not part of the syntax. C's has its origins in BCPL's function (1966). In comparison to and , is a BCPL ''language'' escape sequence representing a newline character (for which C uses the escape sequence ) and the order of the format specification's field width and type is reversed in : WRITEF("%I2-QUEENS PROBLEM HAS %I5 SOLUTIONS*N", NUMQUEENS, COUNT) Probably the first copying of the syntax outside the C language was the Unix shell command, which first appeared in Version 4, as part of the port to C.


Format placeholder specification

Formatting takes place via placeholders within the format string. For example, if a program wanted to print out a person's age, it could present the output by prefixing it with "Your age is ", and using the signed decimal specifier character to denote that we want the integer for the age to be shown immediately after that message, we may use the format string: printf("Your age is %d", age);


Syntax

The syntax for a format placeholder is


Parameter field

This is a
POSIX The Portable Operating System Interface (POSIX) is a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems. POSIX defines both the system- and user-level application programming inter ...
extension and not in
C99 C99 (previously known as C9X) is an informal name for ISO/IEC 9899:1999, a past version of the C programming language standard. It extends the previous version ( C90) with new features for the language and the standard library, and helps impl ...
. The Parameter field can be omitted or can be: This feature mainly sees its use in localization, where the order of occurrence of parameters vary due to the language-dependent convention. On the non-POSIX Microsoft Windows, support for this feature is placed in a separate printf_p function.


Flags field

The Flags field can be zero or more (in any order) of:


Width field

The Width field specifies a ''minimum'' number of characters to output and is typically used to pad fixed-width fields in tabulated output, where the fields would otherwise be smaller, although it does not cause truncation of oversized fields. The width field may be omitted, or a numeric integer value, or a dynamic value when passed as another argument when indicated by an asterisk . For example, will result in 10 being printed, with a total width of 5 characters. Though not part of the width field, a leading zero is interpreted as the zero-padding flag mentioned above, and a negative value is treated as the positive value in conjunction with the left-alignment flag also mentioned above.


Precision field

The Precision field usually specifies a ''maximum'' limit on the output, depending on the particular formatting type. For floating-point numeric types, it specifies the number of digits to the right of the decimal point that the output should be rounded. For the string type, it limits the number of characters that should be output, after which the string is truncated. The precision field may be omitted, or a numeric integer value, or a dynamic value when passed as another argument when indicated by an asterisk . For example, will result in being printed.


Length field

The Length field can be omitted or be any of: Additionally, several platform-specific length options came to exist prior to widespread use of the ISO C99 extensions: ISO C99 includes the inttypes.h header file that includes a number of macros for use in platform-independent coding. These must be outside double-quotes, e.g. Example macros include:


Type field

The Type field can be any of:


Custom format placeholders

There are a few implementations of -like functions that allow extensions to the escape-character-based
mini-language A domain-specific language (DSL) is a computer language specialized to a particular application domain. This is in contrast to a general-purpose language (GPL), which is broadly applicable across domains. There are a wide variety of DSLs, ranging ...
, thus allowing the programmer to have a specific formatting function for non-builtin types. One of the most well-known is the (now deprecated)
glibc The GNU C Library, commonly known as glibc, is the GNU Project's implementation of the C standard library. Despite its name, it now also directly supports C++ (and, indirectly, other programming languages). It was started in the 1980s by ...
'

However, it is rarely used due to the fact that it conflicts with static format string checking. Another i
Vstr custom formatters
which allows adding multi-character format names. Some applications (like the Apache HTTP Server) include their own -like function, and embed extensions into it. However these all tend to have the same problems that has. The Linux kernel
printk printk is a C function from the Linux kernel interface that prints messages to the kernel log. It accepts a string parameter called the format string, which specifies a method for rendering an arbitrary number of varied data type parameter(s) int ...
function supports a number of ways to display kernel structures using the generic specification, by ''appending'' additional format characters. For example, prints an IPv4 address in dotted-decimal form. This allows static format string checking (of the portion) at the expense of full compatibility with normal printf. Most languages that have a -like function work around the lack of this feature by just using the format and converting the object to a string representation.


Vulnerabilities


Invalid conversion specifications

If there are too few
function argument In computer programming, a parameter or a formal argument is a special kind of variable used in a subroutine to refer to one of the pieces of data provided as input to the subroutine. These pieces of data are the values of the arguments (often c ...
s provided to supply values for all the conversion specifications in the template string, or if the arguments are not of the correct types, the results are undefined, may crash. Implementations are inconsistent about whether syntax errors in the string consume an argument and what type of argument they consume. Excess arguments are ignored. In a number of cases, the undefined behavior has led to " Format string attack" security
vulnerabilities Vulnerability refers to "the quality or state of being exposed to the possibility of being attacked or harmed, either physically or emotionally." A window of vulnerability (WOV) is a time frame within which defensive measures are diminished, com ...
. In most C or C++ calling conventions arguments may be passed on the stack, which means in the case of too few arguments printf will read past the end of the current stackframe, thus allowing the attacker to read the stack. Some compilers, like the GNU Compiler Collection, will statically check the format strings of printf-like functions and warn about problems (when using the flags or ). GCC will also warn about user-defined printf-style functions if the non-standard "format" is applied to the function.


Field width versus explicit delimiters in tabular output

Using only field widths to provide for tabulation, as with a format like for three integers in three 8-character columns, will not guarantee that field separation will be retained if large numbers occur in the data: 1234567 1234567 1234567 123 123 123 123 12345678123 Loss of field separation can easily lead to corrupt output. In systems which encourage the use of programs as building blocks in scripts, such corrupt data can often be forwarded into and corrupt further processing, regardless of whether the original programmer expected the output would only be read by human eyes. Such problems can be eliminated by including explicit delimiters, even spaces, in all tabular output formats. Simply changing the dangerous example from before to addresses this, formatting identically until numbers become larger, but then explicitly preventing them from becoming merged on output due to the explicitly included spaces: 1234567 1234567 1234567 123 123 123 123 12345678 123 Similar strategies apply to string data.


Memory write

Although an outputting function on the surface, allows writing to a memory location specified by an argument via . This functionality is occasionally used as a part of more elaborate format-string attacks. The functionality also makes accidentally
Turing-complete In computability theory, a system of data-manipulation rules (such as a computer's instruction set, a programming language, or a cellular automaton) is said to be Turing-complete or computationally universal if it can be used to simulate any ...
even with a well-formed set of arguments. A game of tic-tac-toe written in the format string is a winner of the 27th
IOCCC The International Obfuscated C Code Contest (abbreviated IOCCC) is a computer programming contest for the most creatively obfuscated C code. Held annually, it is described as "celebrating 'ssyntactical opaqueness". The winning code for the 27t ...
.


Programming languages with printf

Not included in this list are languages that use format strings that deviate from the style in this article (such as
AMPL AMPL (A Mathematical Programming Language) is an algebraic modeling language to describe and solve high-complexity problems for large-scale mathematical computing (i.e., large-scale optimization and scheduling-type problems). It was developed ...
and
Elixir ELIXIR (the European life-sciences Infrastructure for biological Information) is an initiative that will allow life science laboratories across Europe to share and store their research data as part of an organised network. Its goal is to bring t ...
), languages that inherit their implementation from the
JVM A Java virtual machine (JVM) is a virtual machine that enables a computer to run Java programs as well as programs written in other languages that are also compiled to Java bytecode. The JVM is detailed by a specification that formally describes ...
or other environment (such as
Clojure Clojure (, like ''closure'') is a dynamic and functional dialect of the Lisp programming language on the Java platform. Like other Lisp dialects, Clojure treats code as data and has a Lisp macro system. The current development process is comm ...
and Scala), and languages that do not have a standard native printf implementation but have external libraries which emulate printf behavior (such as
JavaScript JavaScript (), often abbreviated as JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS. As of 2022, 98% of websites use JavaScript on the client side for webpage behavior, of ...
). *
awk AWK (''awk'') is a domain-specific language designed for text processing and typically used as a data extraction and reporting tool. Like sed and grep, it is a filter, and is a standard feature of most Unix-like operating systems. The AWK lang ...
* C **
C++ C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
(also provides overloaded shift operators and manipulators as an alternative for formatted output – see iostream and
iomanip #REDIRECT C++ Standard Library The C standard library or libc is the standard library for the C programming language, as specified in the ISO C standard. ISO/IEC (2018). '' ISO/IEC 9899:2018(E): Programming Languages - C §7'' Starting from the or ...
) **
Objective-C Objective-C is a general-purpose, object-oriented programming language that adds Smalltalk-style messaging to the C programming language. Originally developed by Brad Cox and Tom Love in the early 1980s, it was selected by NeXT for its NeXT ...
* D * F# *G (
LabVIEW Laboratory Virtual Instrument Engineering Workbench (LabVIEW) is a system-design platform and development environment for a visual programming language from National Instruments. The graphical language is named "G"; not to be confused with G-c ...
) * GNU MathProg *
GNU Octave GNU Octave is a high-level programming language primarily intended for scientific computing and numerical computation. Octave helps in solving linear and nonlinear problems numerically, and for performing other numerical experiments using a langu ...
* Go *
Haskell Haskell () is a general-purpose, statically-typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research and industrial applications, Haskell has pioneered a number of programming lan ...
* J *
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mos ...
(since version 1.5) and JVM languages *
Julia Julia is usually a feminine given name. It is a Latinate feminine form of the name Julio and Julius. (For further details on etymology, see the Wiktionary entry "Julius".) The given name ''Julia'' had been in use throughout Late Antiquity (e.g ...
(via its Printf standard library
Formatting.jl
library adds
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
-style general formatting and "c-style part of this package aims to get around the limitation that @sprintf has to take a literal string argument.") *
Lua Lua or LUA may refer to: Science and technology * Lua (programming language) * Latvia University of Agriculture * Last universal ancestor, in evolution Ethnicity and language * Lua people, of Laos * Lawa people, of Thailand sometimes referred t ...
(string.format) *
Maple ''Acer'' () is a genus of trees and shrubs commonly known as maples. The genus is placed in the family Sapindaceae.Stevens, P. F. (2001 onwards). Angiosperm Phylogeny Website. Version 9, June 2008 nd more or less continuously updated since http ...
*
MATLAB MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementa ...
* Max (via the sprintf object) * Mythryl *
PARI/GP PARI/GP is a computer algebra system with the main aim of facilitating number theory computations. Versions 2.1.0 and higher are distributed under the GNU General Public License. It runs on most common operating systems. System overview The ...
*
Perl Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it also referred to its redesigned "sister language", Perl 6, before the latter's name was offic ...
*
PHP PHP is a general-purpose scripting language geared toward web development. It was originally created by Danish-Canadian programmer Rasmus Lerdorf in 1993 and released in 1995. The PHP reference implementation is now produced by The PHP Group. ...
*
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
(via operator) * R * Raku (via , , and ) * Red/System *
Ruby A ruby is a pinkish red to blood-red colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called ...
*
Tcl TCL or Tcl or TCLs may refer to: Business * TCL Technology, a Chinese consumer electronics and appliance company **TCL Electronics, a subsidiary of TCL Technology * Texas Collegiate League, a collegiate baseball league * Trade Centre Limited ...
(via format command) *
Transact-SQL Transact-SQL (T-SQL) is Microsoft's and Sybase's proprietary extension to the SQL (Structured Query Language) used to interact with relational databases. T-SQL expands on the SQL standard to include procedural programming, local variables, vari ...
(vi
xp_sprintf
*
Vala Vala or VALA may refer to: Religion and mythology * Vala (Vedic), a demon or a stone cavern in the Hindu scriptures * Völva, also spelled Vala, a priestess in Norse mythology and Norse paganism Fiction * Vala (Middle-earth), an angelic being in ...
(via and ) *The utility command, sometimes built in to the shell, such as with some implementations of the KornShell (ksh),
Bourne again shell Bash is a Unix shell and command language written by Brian Fox for the GNU Project as a free software replacement for the Bourne shell. First released in 1989, it has been used as the default login shell for most Linux distributions. Bash was o ...
(bash), or
Z shell The Z shell (Zsh) is a Unix shell that can be used as an interactive login shell and as a command interpreter for shell scripting. Zsh is an extended Bourne shell with many improvements, including some features of Bash, ksh, and tcsh. History ...
(zsh). These commands usually interpret C escapes in the format string.


See also

* Format (Common Lisp) * C standard library * Format string attack * iostream *
ML (programming language) ML (Meta Language) is a general-purpose functional programming language. It is known for its use of the polymorphic Hindley–Milner type system, which automatically assigns the types of most expressions without requiring explicit type annot ...
*
printf debugging In computer programming and software development, debugging is the process of finding and resolving ''Software bug, bugs'' (defects or problems that prevent correct operation) within computer programs, software, or Software system, systems. Deb ...
* (Unix) *
printk printk is a C function from the Linux kernel interface that prints messages to the kernel log. It accepts a string parameter called the format string, which specifies a method for rendering an arbitrary number of varied data type parameter(s) int ...
(print kernel messages) *
scanf A scanf format string (''scan f''ormatted) is a control parameter used in various functions to specify the layout of an input string. The functions can then divide the string and translate into values of appropriate data types. String scannin ...
* string interpolation


References


External links


C++ reference for
* *Th

in Java 1.5
GNU Bash builtin
{{Unix commands Articles with example C code C standard library Unix software