In
computing
Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes, and development of both hardware and software. Computing has scientific, e ...
and
telecommunication
Telecommunication is the transmission of information by various types of technologies over wire, radio, optical, or other electromagnetic systems. It has its origin in the desire of humans for communication over a distance greater than that fe ...
, an escape character is a
character
Character or Characters may refer to:
Arts, entertainment, and media Literature
* ''Character'' (novel), a 1936 Dutch novel by Ferdinand Bordewijk
* ''Characters'' (Theophrastus), a classical Greek set of character sketches attributed to The ...
that invokes an alternative interpretation on the following characters in a character sequence. An escape character is a particular case of
metacharacter
A metacharacter is a character that has a special meaning to a computer program, such as a shell interpreter or a regular expression (regex) engine.
In POSIX extended regular expressions, there are 14 metacharacters that must be ''escaped'' (prec ...
s. Generally, the judgement of whether something is an escape character or not depends on the context.
In the
telecommunication
Telecommunication is the transmission of information by various types of technologies over wire, radio, optical, or other electromagnetic systems. It has its origin in the desire of humans for communication over a distance greater than that fe ...
s field, escape characters are used to indicate that the following characters are encoded differently. This is used to alter
control character
In computing and telecommunication, a control Character (computing), character or non-printing character (NPC) is a code point (a number) in a character encoding, character set, that does not represent a written symbol. They are used as in-band ...
s that would otherwise be noticed and acted on by the underlying telecommunications hardware. In this context, the use of escape characters is often referred to as quoting.
Definition
An escape character may not have its own meaning, so all escape sequences are of two or more characters.
Escape characters are part of the
syntax
In linguistics, syntax () is the study of how words and morphemes combine to form larger units such as phrases and sentences. Central concerns of syntax include word order, grammatical relations, hierarchical sentence structure ( constituency) ...
for many programming languages, data formats, and communication protocols. For a given
alphabet
An alphabet is a standardized set of basic written graphemes (called letters) that represent the phonemes of certain spoken languages. Not all writing systems represent language in this way; in a syllabary, each character represents a syll ...
an escape character's purpose is to start character sequences (so named
escape sequence
In computer science, an escape sequence is a combination of characters that has a meaning other than the literal characters contained therein; it is marked by one or more preceding (and possibly terminating) characters.
Examples
* In C and man ...
s), which have to be interpreted differently from the same characters occurring without the prefixed escape character.
The functions of escape sequences include:
* To encode a syntactic entity, such as device commands or special data, which cannot be directly represented by the alphabet.
* To represent characters, referred to as ''character quoting'', which cannot be typed in the current context, or would have an undesired interpretation. In this case, an escape sequence is a
digraph consisting of an escape character itself and a "quoted" character.
Control character
Generally, an escape character is not a particular case of (device)
control character
In computing and telecommunication, a control Character (computing), character or non-printing character (NPC) is a code point (a number) in a character encoding, character set, that does not represent a written symbol. They are used as in-band ...
s, nor vice versa. If we define control characters as non-
graphic
Graphics () are visual images or designs on some surface, such as a wall, canvas, screen, paper, or stone, to inform, illustrate, or entertain. In contemporary usage, it includes a pictorial representation of data, as in design and manufacture, ...
, or as having a special meaning for an output device (e.g.
printer
Printer may refer to:
Technology
* Printer (publishing), a person or a company
* Printer (computing), a hardware device
* Optical printer for motion picture films
People
* Nariman Printer (fl. c. 1940), Indian journalist and activist
* James ...
or
text terminal
A computer terminal is an electronic or electromechanical hardware device that can be used for entering data into, and transcribing data from, a computer or a computing system. The teletype was an example of an early-day hard-copy terminal and ...
) then any escape character for this device is a control one. But escape characters used in programming (such as the
backslash
The backslash is a typographical mark used mainly in computing and mathematics. It is the mirror image of the common slash . It is a relatively recent mark, first documented in the 1930s.
History
, efforts to identify either the origin o ...
, "\") are graphic, hence are not control characters. Conversely most (but not all) of the
ASCII
ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of ...
"control characters" have some control function in isolation, therefore they are not escape characters.
In many programming languages, an escape character also forms some escape sequences which are referred to as control characters. For example,
line break has an escape sequence of .
Examples
JavaScript
JavaScript uses the (backslash) as an escape character for:
* single quote
* double quote
* backslash
* new line
* carriage return
* tab
* backspace
* form feed
* vertical tab (
Internet Explorer 9
Internet Explorer 9 or IE9 (officially Windows Internet Explorer 9) is a web browser for Windows. It was released by Microsoft on March 14, 2011, as the ninth version of Internet Explorer and the successor to Internet Explorer 8, and can replace p ...
and older treats as instead of a vertical tab (). If cross-browser compatibility is a concern, use instead of .)
* null character (U+0000 NULL) (only if the next character is not a decimal digit; else it is an octal escape sequence)
* character represented by the hexadecimal byte "FF"
Note that the and escapes are not allowed in JSON strings.
Example code:
console.log("Using \\n \nWill shift the characters after \\n one row down")
console.log("Using \\t \twill shift the characters after \\t one tab length to the right")
console.log("Using \\r \rWill imitate a carriage return, which means shifting to the start of the row") // can be used to clear the screen on some terminals. Windows uses \r\n instead of \n alone
ASCII escape character
The ASCII "escape" character (
octal
The octal numeral system, or oct for short, is the radix, base-8 number system, and uses the Numerical digit, digits 0 to 7. This is to say that 10octal represents eight and 100octal represents sixty-four. However, English, like most languages, ...
: ,
hexadecimal
In mathematics and computing, the hexadecimal (also base-16 or simply hex) numeral system is a positional numeral system that represents numbers using a radix (base) of 16. Unlike the decimal system representing numbers using 10 symbols, hexa ...
: , or , or, in decimal, ) is used in many output devices to start a series of characters called a control sequence or escape sequence. Typically, the escape character was sent first in such a sequence to alert the device that the following characters were to be interpreted as a control sequence rather than as plain characters, then one or more characters would follow to specify some detailed action, after which the device would go back to interpreting characters normally. For example, the sequence of , followed by the printable characters , would cause a
DEC VT102
The VT100 is a video terminal, introduced in August 1978 by Digital Equipment Corporation (DEC). It was one of the first terminals to support ANSI escape codes for cursor control and other tasks, and added a number of extended codes for special f ...
terminal to move its
cursor
Cursor may refer to:
* Cursor (user interface), an indicator used to show the current position for user interaction on a computer monitor or other display device
* Cursor (databases), a control structure that enables traversal over the records in ...
to the 10th cell of the 2nd line of the screen. This was later developed to
ANSI escape codes
ANSI escape sequences are a standard for in-band signaling to control cursor location, color, font styling, and other options on video text terminals and terminal emulators. Certain sequences of bytes, most starting with an ASCII escape charac ...
covered by the ANSI X3.64 standard. The escape character also starts each command sequence in the
Hewlett Packard
The Hewlett-Packard Company, commonly shortened to Hewlett-Packard ( ) or HP, was an American multinational information technology company headquartered in Palo Alto, California. HP developed and provided a wide variety of hardware components ...
Printer Command Language
Printer Command Language, more commonly referred to as PCL, is a page description language (PDL) developed by Hewlett-Packard as a printer protocol and has become a ''de facto'' industry standard. Originally developed for early inkjet printers ...
.
An early reference to the term "escape character" is found in
Bob Bemer
Robert William Bemer (February 8, 1920 – June 22, 2004) was a computer scientist best known for his work at IBM during the late 1950s and early 1960s.
Early life and education
Born in Sault Ste. Marie, Michigan, Bemer graduated from Cranbro ...
's IBM technical publications, who is credited with inventing this mechanism during his work on the
ASCII
ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of ...
character set.
The
Escape key
On computer keyboards, the Esc key (named ''Escape key'' in the international standard series ISO/IEC 9995) is a key used to generate the escape character (which can be represented as ASCII code 27 in decimal, Unicode U+001B, or ). The escape ...
is usually found on standard PC keyboards. However, it is commonly absent from keyboards for PDAs and other devices not designed primarily for ASCII communications. The DEC
VT220
The VT220 is a computer terminal introduced by Digital Equipment Corporation (DEC) in November 1983. The VT240 added monochrome ReGIS vector graphics support to the base model, while the VT241 did the same in color. The 200 series replaced the s ...
series was one of the few popular keyboards that did not have a dedicated Esc key, instead of using one of the keys above the main keypad. In
user interface
In the industrial design field of human–computer interaction, a user interface (UI) is the space where interactions between humans and machines occur. The goal of this interaction is to allow effective operation and control of the machine f ...
s of the 1970s–1980s it was not uncommon to use this key as an escape character, but in modern desktop computers, such use is dropped. Sometimes the key was identified with
AltMode (for alternative mode). Even with no dedicated key, the escape character code could be generated by typing while simultaneously holding down .
Programming and data formats
Many modern
programming language
A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language.
The description of a programming ...
s specify the double-quote character () as a
delimiter
A delimiter is a sequence of one or more characters for specifying the boundary between separate, independent regions in plain text, mathematical expressions or other data streams. An example of a delimiter is the comma character, which acts a ...
for a
string literal
A string literal (computer programming), literal or anonymous string is a String (computer science), string value in the source code of a computer program. Modern Computer programming, programming languages commonly use a quoted sequence of charact ...
. The
backslash
The backslash is a typographical mark used mainly in computing and mathematics. It is the mirror image of the common slash . It is a relatively recent mark, first documented in the 1930s.
History
, efforts to identify either the origin o ...
() escape character typically provides two ways to include double-quotes inside a string literal, either by modifying the meaning of the double-quote character embedded in the string ( becomes ), or by modifying the meaning of a sequence of characters including the hexadecimal value of a double-quote character ( becomes ).
C,
C++
C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
,
Java
Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
, and
Ruby
A ruby is a pinkish red to blood-red colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called sa ...
all allow exactly the same two backslash escape styles. The
PostScript
PostScript (PS) is a page description language in the electronic publishing and desktop publishing realm. It is a dynamically typed, concatenative programming language. It was created at Adobe Systems by John Warnock, Charles Geschke, Doug Br ...
language and Microsoft
Rich Text Format
)
As an example, the following RTF code
would be rendered as follows:
This is some bold text.
Character encoding
A standard RTF file can only consist of 7-bit ASCII characters, but can use escape sequences to encode other characters. Th ...
also use backslash escapes. The
quoted-printable
Quoted-Printable, or QP encoding, is a binary-to-text encoding system using printable ASCII characters (alphanumeric and the equals sign =) to transmit 8-bit data over a 7-bit data path or, generally, over a medium which is not 8-bit clean. Hist ...
encoding uses the
equals sign
The equals sign (British English, Unicode) or equal sign (American English), also known as the equality sign, is the mathematical symbol , which is used to indicate equality in some well-defined sense. In an equation, it is placed between two ...
as an escape character.
URL and
URI Uri may refer to:
Places
* Canton of Uri, a canton in Switzerland
* Úri, a village and commune in Hungary
* Uri, Iran, a village in East Azerbaijan Province
* Uri, Jammu and Kashmir, a town in India
* Uri (island), an island off Malakula Islan ...
use
%-
escapes
Escape or Escaping may refer to:
Computing
* Escape character, in computing and telecommunication, a character which signifies that what follows takes an alternative interpretation
** Escape sequence, a series of characters used to trigger some so ...
to quote characters with a special meaning, as for non-ASCII characters. The
ampersand
The ampersand, also known as the and sign, is the logogram , representing the conjunction "and". It originated as a ligature of the letters ''et''—Latin for "and".
Etymology
Traditionally in English, when spelling aloud, any letter that ...
() character may be considered as an escape character in
SGML
The Standard Generalized Markup Language (SGML; ISO 8879:1986) is a standard for defining generalized markup languages for documents. ISO 8879 Annex A.1 states that generalized markup is "based on two postulates":
* Declarative: Markup should des ...
and derived formats such as
HTML
The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScri ...
and
XML
Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. T ...
.
Some programming languages also provide other ways to represent special characters in literals, without requiring an escape character (see e.g.
delimiter collision
A delimiter is a sequence of one or more characters for specifying the boundary between separate, independent regions in plain text, mathematical expressions or other data streams. An example of a delimiter is the comma character, which acts a ...
).
Communication protocols
The
Point-to-Point Protocol (PPP) uses the
octet
Octet may refer to:
Music
* Octet (music), ensemble consisting of eight instruments or voices, or composition written for such an ensemble
** String octet, a piece of music written for eight string instruments
*** Octet (Mendelssohn), 1825 compos ...
(, or ASCII: }) as an escape character. The octet immediately following should be
XOR
Exclusive or or exclusive disjunction is a logical operation that is true if and only if its arguments differ (one is true, the other is false).
It is symbolized by the prefix operator J and by the infix operators XOR ( or ), EOR, EXOR, , ...
ed by before being passed to a higher level protocol. This is applied to both itself and the control character (which is used in PPP to mark the beginning and end of a frame) when those octets need to be transmitted by a higher level protocol encapsulated by PPP, as well as other octets negotiated when the link is established. That is, when a higher level protocol wishes to transmit , it is transmitted as the sequence , and is transmitted as .
Bourne shell
In
Bourne shell
The Bourne shell (sh) is a Shell (computing), shell Command-line interface#Command-line interpreter, command-line interpreter for computer operating systems.
The Bourne shell was the default Unix shell, shell for Version 7 Unix. Unix-like syste ...
(sh), the
asterisk
The asterisk ( ), from Late Latin , from Ancient Greek , ''asteriskos'', "little star", is a typographical symbol. It is so called because it resembles a conventional image of a heraldic star.
Computer scientists and mathematicians often voc ...
() and
question mark
The question mark (also known as interrogation point, query, or eroteme in journalism) is a punctuation mark that indicates an interrogative clause or phrase in many languages.
History
In the fifth century, Syriac Bible manuscripts used ques ...
() characters are
wildcard character
In software, a wildcard character is a kind of placeholder represented by a single character, such as an asterisk (), which can be interpreted as a number of literal characters or an empty string. It is often used in file searches so the full na ...
s expanded via
globbing
In computer programming, glob () patterns specify sets of filenames with wildcard characters. For example, the Unix Bash shell command mv *.txt textfiles/ moves (mv) all files with names ending in .txt from the current directory to the director ...
. Without a preceding escape character, an will expand to the names of all files in the
working directory
In computing, the working directory of a process is a directory of a hierarchical file system, if any, dynamically associated with each process. It is sometimes called the current working directory (CWD), e.g. the BSD getcwd function, or just cur ...
that do not start with a period
if and only if
In logic and related fields such as mathematics and philosophy, "if and only if" (shortened as "iff") is a biconditional logical connective between statements, where either both statements are true or both are false.
The connective is bicondi ...
there are such files, otherwise remains unexpanded. So to refer to a file literally called "*", the shell must be told not to interpret it in this way, by preceding it with a backslash (). This modifies the interpretation of the asterisk (). Compare:
Windows Command Prompt
The
Windows command-line interpreter uses a
caret
Caret is the name used familiarly for the character , provided on most QWERTY keyboards by typing . The symbol has a variety of uses in programming and mathematics. The name "caret" arose from its visual similarity to the original proofreade ...
character () to escape reserved characters that have special meanings (in particular: , , , , , , ). The
DOS command-line interpreter, though it has similar syntax, does not support this.
For example, on the Windows Command Prompt, this will result in a syntax error.
C:\>echo
The syntax of the command is incorrect.
whereas this will output the string:
C:\>echo ^
Windows PowerShell
In
Windows
Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for serv ...
, the backslash is used as a path separator; therefore, it generally cannot be used as an escape character.
PowerShell
PowerShell is a task automation and configuration management program from Microsoft, consisting of a command-line shell (computing), shell and the associated scripting language. Initially a Windows component only, known as Windows PowerShell, it ...
uses
backtick ( ` ) instead.
For example, the following command:
PS C:\> echo "`tFirst line`nNew line"
First line
New line
:
Others
*
Quoted-printable
Quoted-Printable, or QP encoding, is a binary-to-text encoding system using printable ASCII characters (alphanumeric and the equals sign =) to transmit 8-bit data over a 7-bit data path or, generally, over a medium which is not 8-bit clean. Hist ...
, which encodes 8-bit data into 7-bit data of limited line lengths, uses the
equals sign
The equals sign (British English, Unicode) or equal sign (American English), also known as the equality sign, is the mathematical symbol , which is used to indicate equality in some well-defined sense. In an equation, it is placed between two ...
() as an escape character.
See also
*
AltGr key
AltGr (also Alt Graph) is a modifier key found on many computer keyboards (rather than a second Alt key found on US keyboards). It is primarily used to type characters that are not widely used in the territory where sold, such as foreign cur ...
used to type characters that are unusual for the locale of the keyboard layout.
*
Escape sequences in C
Escape sequences are used in the programming languages C and C++, and their design was copied in many other languages such as Java, PHP, C#, etc. An escape sequence is a sequence of characters that does not represent itself when used inside a ...
*
Leaning toothpick syndrome In computer programming, leaning toothpick syndrome (LTS) is the situation in which a quoted expression becomes unreadable because it contains a large number of escape characters, usually backslashes ("\"), to avoid delimiter collision.
The officia ...
*
Nested quotation A nested quotation is a quotation that is encapsulated inside another quotation, forming a hierarchy with multiple levels. When focusing on a certain quotation, one must interpret it within its scope. Nested quotation can be used in literature (a ...
*
Stropping (syntax)
In computer language design, stropping is a method of explicitly marking letter sequences as having a special property, such as being a keyword, or a certain type of variable or storage location, and thus inhabiting a different namespace from ordin ...
– in some conventions a leading character (such as an apostrophe) functions as an escape character
References
External links
That Powerful ESCAPE Character -- Key and Sequences –
Bob Bemer
Robert William Bemer (February 8, 1920 – June 22, 2004) was a computer scientist best known for his work at IBM during the late 1950s and early 1960s.
Early life and education
Born in Sault Ste. Marie, Michigan, Bemer graduated from Cranbro ...
{{FS1037C
Pattern matching
Control characters