scanf, short for scan formatted, is a
C standard library
In computer programming, a standard library is the library (computing), library made available across Programming language implementation, implementations of a programming language. Often, a standard library is specified by its associated program ...
function that reads and
parses text from
standard input.
The function accepts a format string parameter that specifies the layout of input
text
Text may refer to:
Written word
* Text (literary theory)
In literary theory, a text is any object that can be "read", whether this object is a work of literature, a street sign, an arrangement of buildings on a city block, or styles of clothi ...
. The function parses input text and loads values into variables based on
data type
In computer science and computer programming, a data type (or simply type) is a collection or grouping of data values, usually specified by a set of possible values, a set of allowed operations on these values, and/or a representation of these ...
.
Similar functions, with other names, predate C, such as
readf
in
ALGOL 68
ALGOL 68 (short for ''Algorithmic Language 1968'') is an imperative programming language member of the ALGOL family that was conceived as a successor to the ALGOL 60 language, designed with the goal of a much wider scope of application and ...
.
Input format strings are complementary to output format strings (see
printf), which provide formatted output (
templating).
History
Mike Lesk's
portable input/output library, including
scanf
, officially became part of Unix in
Version 7.
Usage
The
scanf
function reads input for numbers and other
datatype
In computer science and computer programming, a data type (or simply type) is a collection or grouping of data values, usually specified by a set of possible values, a set of allowed operations on these values, and/or a representation of these ...
s from
standard input.
The following C code reads a variable number of unformatted decimal
integer
An integer is the number zero (0), a positive natural number (1, 2, 3, ...), or the negation of a positive natural number (−1, −2, −3, ...). The negations or additive inverses of the positive natural numbers are referred to as negative in ...
s from standard input and prints each of them out on separate lines:
#include
int main(void)
For input:
456 123 789 456 12
456 1
2378
The output is:
456
123
789
456
12
456
1
2378
To print out a word:
#include
int main(void)
No matter what the data type the programmer wants the program to read, the arguments (such as
&n
above) must be
pointers pointing to memory. Otherwise, the function will not perform correctly because it will be attempting to overwrite the wrong sections of memory, rather than pointing to the memory location of the variable you are attempting to get input for.
In the last example an address-of operator (
&
) is ''not'' used for the argument: as
word
is the name of an
array of
char
, as such it is (in all contexts in which it evaluates to an address) equivalent to a pointer to the first element of the array. While the expression
&word
would numerically evaluate to the same value, semantically, it has an entirely different meaning in that it stands for the address of the whole array rather than an element of it. This fact needs to be kept in mind when assigning
scanf
output to strings.
As
scanf
is designated to read only from standard input, many programming languages with
interfaces, such as
PHP, have derivatives such as
sscanf
and
fscanf
but not
scanf
itself.
Format string specifications
The formatting
placeholders in
scanf
are more or less the same as that in
printf
, its reverse function. As in printf, the POSIX extension is defined.
[
There are rarely constants (i.e., characters that are not formatting placeholders) in a format string, mainly because a program is usually not designed to read known data, although ]scanf
does accept these if explicitly specified. The exception is one or more whitespace characters, which discards all whitespace characters in the input.[
Some of the most commonly used placeholders follow:
* ]%a
: Scan a floating-point number in its hexadecimal notation.
* %d
: Scan an integer as a signed decimal
The decimal numeral system (also called the base-ten positional numeral system and denary or decanary) is the standard system for denoting integer and non-integer numbers. It is the extension to non-integer numbers (''decimal fractions'') of th ...
number.
* %i
: Scan an integer as a signed number. Similar to %d
, but interprets the number as hexadecimal
Hexadecimal (also known as base-16 or simply hex) is a Numeral system#Positional systems in detail, positional numeral system that represents numbers using a radix (base) of sixteen. Unlike the decimal system representing numbers using ten symbo ...
when preceded by 0x
and octal
Octal (base 8) is a numeral system with eight as the base.
In the decimal system, each place is a power of ten. For example:
: \mathbf_ = \mathbf \times 10^1 + \mathbf \times 10^0
In the octal system, each place is a power of eight. For ex ...
when preceded by 0
. For example, the string 031
would be read as 31 using %d
, and 25 using %i
. The flag h
in %hi
indicates conversion to a short
and hh
conversion to a char
.
* %u
: Scan for decimal unsigned int
(Note that in the C99 standard the input value minus sign is optional, so if a minus sign is read, no errors will arise and the result will be the two's complement
Two's complement is the most common method of representing signed (positive, negative, and zero) integers on computers, and more generally, fixed point binary values. Two's complement uses the binary digit with the ''greatest'' value as the ''s ...
of a negative number, likely a very large value. See strtoul()
.) Correspondingly, %hu
scans for an unsigned short
and %hhu
for an unsigned char
.
* %f
: Scan a floating-point
In computing, floating-point arithmetic (FP) is arithmetic on subsets of real numbers formed by a ''significand'' (a Sign (mathematics), signed sequence of a fixed number of digits in some Radix, base) multiplied by an integer power of that ba ...
number in normal ( fixed-point) notation.
* %g
, %G
: Scan a floating-point number in either normal or exponential notation. %g
uses lower-case letters and %G
uses upper-case.
* %x
, %X
: Scan an integer as an unsigned hexadecimal
Hexadecimal (also known as base-16 or simply hex) is a Numeral system#Positional systems in detail, positional numeral system that represents numbers using a radix (base) of sixteen. Unlike the decimal system representing numbers using ten symbo ...
number.
* %o
: Scan an integer as an octal
Octal (base 8) is a numeral system with eight as the base.
In the decimal system, each place is a power of ten. For example:
: \mathbf_ = \mathbf \times 10^1 + \mathbf \times 10^0
In the octal system, each place is a power of eight. For ex ...
number.
* %s
: Scan a character string
In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). ...
. The scan terminates at whitespace. A null character
The null character is a control character with the value zero. Many character sets include a code point for a null character including Unicode (Universal Coded Character Set), ASCII (ISO/IEC 646), Baudot, ITA2 codes, the C0 control code, and EB ...
is stored at the end of the string, which means that the buffer supplied must be at least one character longer than the specified input length.
* %c
: Scan a character (char). No null character
The null character is a control character with the value zero. Many character sets include a code point for a null character including Unicode (Universal Coded Character Set), ASCII (ISO/IEC 646), Baudot, ITA2 codes, the C0 control code, and EB ...
is added.
* whitespace: Any whitespace characters trigger a scan for zero or more whitespace characters. The number and type of whitespace characters do not need to match in either direction.
* %lf
: Scan as a double
Double, The Double or Dubble may refer to:
Mathematics and computing
* Multiplication by 2
* Double precision, a floating-point representation of numbers that is typically 64 bits in length
* A double number of the form x+yj, where j^2=+1
* A ...
floating-point number. "Float" format with the "long" specifier.
* %Lf
: Scan as a long double floating-point number. "Float" format the "long long" specifier.
* %n
: Nothing is expected. The number of characters consumed thus far from the input is stored through the next pointer, which must be a pointer to int. This is not a conversion and does not increase the count returned by the function.
The above can be used in compound with numeric modifiers and the l
, L
modifiers which stand for "long" and "long long" in between the percent symbol and the letter. There can also be numeric values between the percent symbol and the letters, preceding the long
modifiers if any, that specifies the number of characters to be scanned. An optional asterisk
The asterisk ( ), from Late Latin , from Ancient Greek , , "little star", is a Typography, typographical symbol. It is so called because it resembles a conventional image of a star (heraldry), heraldic star.
Computer scientists and Mathematici ...
(*
) right after the percent symbol denotes that the datum read by this format specifier is not to be stored in a variable. No argument behind the format string should be included for this dropped variable.
The ff
modifier in printf is not present in scanf, causing differences between modes of input and output. The ll
and hh
modifiers are not present in the C90 standard, but are present in the C99 standard.
An example of a format string is
:"%7d%s %c%lf"
The above format string scans the first seven characters as a decimal integer, then reads the remaining as a string until a space, newline, or tab is found, then consumes whitespace until the first non-whitespace character is found, then consumes that character, and finally scans the remaining characters as a double
Double, The Double or Dubble may refer to:
Mathematics and computing
* Multiplication by 2
* Double precision, a floating-point representation of numbers that is typically 64 bits in length
* A double number of the form x+yj, where j^2=+1
* A ...
. Therefore, a robust program must check whether the scanf
call succeeded and take appropriate action. If the input was not in the correct format, the erroneous data will still be on the input stream and must discarded before new input can be read. An alternative method, which avoids this, is to use fgets
and then examine the string read in. The last step can be done by sscanf
, for example.
In the case of the many float type characters , many implementations choose to collapse most into the same parser. Microsoft MSVCRT does it with , while glibc
The GNU C Library, commonly known as glibc, is the GNU Project implementation of the C standard library. It provides a wrapper around the system calls of the Linux kernel and other kernels for application use. Despite its name, it now also dir ...
does so with all four.
ISO C99 includes the inttypes.h
header file that includes a number of macros for use in platform-independent coding. These must be outside double-quotes, e.g.
Example macros include:
Vulnerabilities
scanf
is vulnerable to format string attacks. Great care should be taken to ensure that the formatting string includes limitations for string and array sizes. In most cases the input string size from a user is arbitrary and cannot be determined before the scanf
function is executed. This means that %s
placeholders without length specifiers are inherently insecure and exploitable for buffer overflows. Another potential problem is to allow dynamic formatting strings, for example formatting strings stored in configuration files or other user-controlled files. In this case the allowed input length of string sizes cannot be specified unless the formatting string is checked beforehand and limitations are enforced. Related to this are additional or mismatched formatting placeholders which do not match the actual vararg list. These placeholders might be partially extracted from the stack or contain undesirable or even insecure pointers, depending on the particular implementation of varargs.
See also
* C programming language
C (''pronounced'' '' – like the letter c'') is a general-purpose programming language. It was created in the 1970s by Dennis Ritchie and remains very widely used and influential. By design, C's features cleanly reflect the capabilities of ...
* Format string attack
* Printf format string
* String interpolation
References
External links
*
C++ reference for std::scanf
{{CProLang
Articles with example C code
C standard library