The null character (also null terminator) is a
control character
In computing and telecommunication, a control Character (computing), character or non-printing character (NPC) is a code point (a number) in a character encoding, character set, that does not represent a written symbol. They are used as in-band ...
with the value zero.
It is present in many
character set
Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers. The numerical values that ...
s, including those defined by the
Baudot and
ITA2
The Baudot code is an early character encoding for telegraphy invented by Émile Baudot in the 1870s. It was the predecessor to the International Telegraph Alphabet No. 2 (ITA2), the most common teleprinter code in use until the advent of ASCII ...
codes,
ISO/IEC 646
ISO/IEC 646 is a set of ISO/IEC standards, described as ''Information technology — ISO 7-bit coded character set for information interchange'' and developed in cooperation with ASCII at least since 1964. Since its first edition in 1 ...
(or
ASCII
ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of ...
), the
C0 control code, the
Universal Coded Character Set
The Universal Coded Character Set (UCS, Unicode) is a standard set of characters defined by the international standard ISO/IEC 10646, ''Information technology — Universal Coded Character Set (UCS)'' (plus amendments to that standard), whi ...
(or
Unicode
Unicode, formally The Unicode Standard,The formal version reference is is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expre ...
), and
EBCDIC
Extended Binary Coded Decimal Interchange Code (EBCDIC; ) is an eight-bit character encoding used mainly on IBM mainframe and IBM midrange computer operating systems. It descended from the code used with punched cards and the corresponding six- ...
. It is available in nearly all mainstream
programming language
A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language.
The description of a programming ...
s. It is often abbreviated as NUL (or NULL, though in some contexts that term is used for the
null pointer
In computing, a null pointer or null reference is a value saved for indicating that the pointer or reference does not refer to a valid object. Programs routinely use null pointers to represent conditions such as the end of a list of unknown lengt ...
). In 8-bit codes, it is known as a null byte.
The original meaning of this character was like
NOP—when sent to a
printer
Printer may refer to:
Technology
* Printer (publishing), a person or a company
* Printer (computing), a hardware device
* Optical printer for motion picture films
People
* Nariman Printer (fl. c. 1940), Indian journalist and activist
* James ...
or a
terminal
Terminal may refer to:
Computing Hardware
* Terminal (electronics), a device for joining electrical circuits together
* Terminal (telecommunication), a device communicating over a line
* Computer terminal, a set of primary input and output dev ...
, it has no effect (some terminals, however, incorrectly display it as
space
Space is the boundless three-dimensional extent in which objects and events have relative position and direction. In classical physics, physical space is often conceived in three linear dimensions, although modern physicists usually consider ...
). When electromechanical
teleprinter
A teleprinter (teletypewriter, teletype or TTY) is an electromechanical device that can be used to send and receive typed messages through various communications channels, in both point-to-point and point-to-multipoint configurations. Initia ...
s were used as computer output devices, one or more null characters were sent at the end of each printed line to allow time for the mechanism to return to the first printing position on the next line. On
punched tape
Five- and eight-hole punched paper tape
Paper tape reader on the Harwell computer with a small piece of five-hole tape connected in a circle – creating a physical program loop
Punched tape or perforated paper tape is a form of data storage ...
, the character is represented with no holes at all, so a new unpunched tape is initially filled with null characters, and often text could be inserted at a reserved space of null characters by punching the new characters into the tape over the nulls.
Today the character has much more significance in the programming language
C and its derivatives and in many data formats, where it serves as a reserved character used to signify the end of a
string, often called a
null-terminated string
In computer programming, a null-terminated string is a character string stored as an array containing the characters and terminated with a null character (a character with a value of zero, called NUL in this article). Alternative names are C stri ...
. This allows the string to be any length with only the overhead of one byte; the alternative of storing a count requires either a string length limit of 255 or an overhead of more than one byte (there are other advantages/disadvantages described in the
null-terminated string
In computer programming, a null-terminated string is a character string stored as an array containing the characters and terminated with a null character (a character with a value of zero, called NUL in this article). Alternative names are C stri ...
article).
Representation
The null character is often represented as the
escape sequence
In computer science, an escape sequence is a combination of characters that has a meaning other than the literal characters contained therein; it is marked by one or more preceding (and possibly terminating) characters.
Examples
* In C and man ...
\0
in
source code
In computing, source code, or simply code, is any collection of code, with or without comments, written using a human-readable programming language, usually as plain text. The source code of a program is specially designed to facilitate the wo ...
,
string literal
A string literal (computer programming), literal or anonymous string is a String (computer science), string value in the source code of a computer program. Modern Computer programming, programming languages commonly use a quoted sequence of charact ...
s or character constants.
[Kernighan and Ritchie, ''C'', p. 38] In many languages (
such as C, which introduced this notation), this is not a separate escape sequence, but an octal escape sequence with a single
octal
The octal numeral system, or oct for short, is the radix, base-8 number system, and uses the Numerical digit, digits 0 to 7. This is to say that 10octal represents eight and 100octal represents sixty-four. However, English, like most languages, ...
digit 0; as a consequence,
\0
must not be followed by any of the digits
0
through
7
; otherwise it is interpreted as the start of a longer octal escape sequence. Other escape sequences that are found in use in various languages are
\000
,
\x00
,
\z
, or
\u0000
. A null character can be placed in a
URL with the
percent code %00
.
The ability to represent a null character does not always mean the resulting string will be correctly interpreted, as many programs will consider the null to be the end of the string. Thus the ability to type it (in case of
unchecked user input Improper input validation or unchecked user input is a type of vulnerability in computer software that may be used for security exploits. This vulnerability is caused when " e product does not validate or incorrectly validates input that can affect ...
) creates a
vulnerability
Vulnerability refers to "the quality or state of being exposed to the possibility of being attacked or harmed, either physically or emotionally."
A window of vulnerability (WOV) is a time frame within which defensive measures are diminished, com ...
known as null byte injection and can lead to security exploits.
Null Byte Injection
WASC Threat Classification Null Byte Attack section.
In caret notation
Caret notation is a notation for control characters in ASCII. The notation assigns to control-code 1, sequentially through the alphabet to assigned to control-code 26 (0x1A). For the control-codes outside of the range 1–26, the ...
the null character is ^@
. On some keyboards, one can enter a null character by holding down and pressing (on US layouts just will often work, there being no need for to get the @ sign).
In documentation, the null character is sometimes represented as a single- em-width symbol containing the letters "NUL". In Unicode
Unicode, formally The Unicode Standard,The formal version reference is is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expre ...
, there is a character with a corresponding glyph for visual representation of the null character, symbol for null, U+2400 (␀)—not to be confused with the actual null character, U+0000.
The Hexadecimal
In mathematics and computing, the hexadecimal (also base-16 or simply hex) numeral system is a positional numeral system that represents numbers using a radix (base) of 16. Unlike the decimal system representing numbers using 10 symbols, hexa ...
notation for null is 00
and decoding the Base64
In computer programming, Base64 is a group of binary-to-text encoding schemes that represent binary data (more specifically, a sequence of 8-bit bytes) in sequences of 24 bits that can be represented by four 6-bit Base64 digits.
Common to all bina ...
string AA
. also holds the null character.
In the Windows
Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for serv ...
operating system the character can be typed in in some GUI
The GUI ( "UI" by itself is still usually pronounced . or ), graphical user interface, is a form of user interface that allows users to interact with electronic devices through graphical icons and audio indicator such as primary notation, inste ...
applications, such as WordPad
WordPad is the basic word processor that has been included with almost all versions of Microsoft Windows from Windows 95 onwards. It is more advanced than Windows Notepad, and simpler than Microsoft Word and Microsoft Works (last updated in 2007) ...
, by typing 2400 followed immediately by the key combination.
Encoding
In all modern character sets, the null character has a code point value of zero. In most encodings, this is translated to a single code unit with a zero value. For instance, in UTF-8
UTF-8 is a variable-width encoding, variable-length character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode'' (or ''Universal Coded Character Set'') ''Transformation Format 8-bit'' ...
it is a single zero byte. However, in Modified UTF-8
UTF-8 is a variable-length character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode'' (or ''Universal Coded Character Set'') ''Transformation Format 8-bit''.
UTF-8 is capable of e ...
the null character is encoded as two bytes: 0xC0, 0x80. This allows the byte with the value of zero, which is now not used for any character, to be used as a string terminator.
References
External links
Null Byte Injection
WASC Threat Classification Null Byte Attack section
* Poison Null Byte Introduction Introduction to Nullify 9
* Byte Attack
* Apple
An apple is an edible fruit produced by an apple tree (''Malus domestica''). Apple fruit tree, trees are agriculture, cultivated worldwide and are the most widely grown species in the genus ''Malus''. The tree originated in Central Asia, wh ...
br>null byte injection
QR code
A QR code (an initialism for quick response code) is a type of matrix barcode (or two-dimensional barcode) invented in 1994 by the Japanese company Denso Wave. A barcode is a machine-readable optical label that can contain information about th ...
br>vulnerability
{{DEFAULTSORT:Null Character
Control characters
Computer security exploits