HOME

TheInfoList



OR:

A metacharacter is a character that has a special meaning to a computer program, such as a shell interpreter or a regular expression (regex) engine. In POSIX extended regular expressions, there are 14 metacharacters that must be ''escaped'' (preceded by a backslash (\)) in order to drop their special meaning and be treated literally inside an expression: opening and closing square brackets ( /code> and /code>); backslash (\); caret (^); dollar sign ($); period/full stop/dot (.); vertical bar/pipe symbol (, ); question mark (?); asterisk (*); plus and minus signs (+ and -); opening and closing curly brackets/braces (); and opening and closing parentheses (( and )). For example, to match the arithmetic expression (1+1)*3=6 with a regex, the correct regex is \(1\+1\)\*3=6; otherwise, the parentheses, plus sign, and asterisk will have special meanings.


Other examples

Some other characters may have special meaning in some environments. * In some
Unix shell A Unix shell is a command-line Interpreter (computing), interpreter or shell (computing), shell that provides a command line user interface for Unix-like operating systems. The shell is both an interactive command language and a scripting langua ...
s the
semicolon The semicolon or semi-colon is a symbol commonly used as orthographic punctuation. In the English language, a semicolon is most commonly used to link (in a single sentence) two independent clauses that are closely related in thought. When a ...
(";") is a statement separator. * In
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable ...
and
HTML The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScri ...
, the
ampersand The ampersand, also known as the and sign, is the logogram , representing the conjunction "and". It originated as a ligature of the letters ''et''—Latin for "and". Etymology Traditionally in English, when spelling aloud, any letter that ...
("&") introduces an
HTML entity In SGML, HTML and XML documents, the logical constructs known as ''character data'' and ''attribute values'' consist of sequences of characters, in which each character can manifest directly (representing itself), or can be represented by a series ...
. It also has special meaning in
MS-DOS MS-DOS ( ; acronym for Microsoft Disk Operating System, also known as Microsoft DOS) is an operating system for x86-based personal computers mostly developed by Microsoft. Collectively, MS-DOS, its rebranding as IBM PC DOS, and a few ope ...
/
Windows Command Prompt Command Prompt, also known as cmd.exe or cmd, is the default command-line interpreter for the OS/2, eComStation, ArcaOS, Microsoft Windows (Windows NT family and Windows CE family), and ReactOS operating systems. On Windows CE .NET 4.2, W ...
. * In some Unix shells and MS-DOS/Windows Command Prompt, the
less-than sign The less-than sign is a mathematical symbol that denotes an inequality between two values. The widely adopted form of two equal-length strokes connecting in an acute angle at the left, , has been found in documents dated as far back as the 1560s ...
and
greater-than sign The greater-than sign is a mathematical symbol that denotes an inequality between two values. The widely adopted form of two equal-length strokes connecting in an acute angle at the right, , has been found in documents dated as far back as the 1 ...
("<" and ">") are used for redirection and the backtick/grave accent ("`") is used for
command substitution In computing, command substitution is a facility that allows a command to be run and its output to be pasted back on the command line as arguments to another command. Command substitution first appeared in the Bourne shell, introduced with Version ...
. * In many
programming language A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming ...
s, strings are delimited using
quotes Quote is a hypernym of quotation, as the repetition or copy of a prior statement or thought. Quotation marks are punctuation marks that indicate a quotation. Both ''quotation'' and ''quotation marks'' are sometimes abbreviated as "quote(s)". ...
(" or '). In some cases,
escape character In computing and telecommunication, an escape character is a character (computing), character that invokes an alternative interpretation on the following characters in a character sequence. An escape character is a particular case of metacharac ...
s (and other methods) are used to avoid
delimiter collision A delimiter is a sequence of one or more characters for specifying the boundary between separate, independent regions in plain text, mathematical expressions or other data streams. An example of a delimiter is the comma character, which acts as ...
, e.g. "He said, \"Hello\"". * In
printf format string The printf format string is a control parameter used by a class of functions in the input/output libraries of C and many other programming languages. The string is written in a simple template language: characters are usually copied literal ...
s, the
percent sign The percent sign (sometimes per cent sign in British English) is the symbol used to indicate a percentage, a number or ratio as a fraction of 100. Related signs include the permille (per thousand) sign and the permyriad (per ten thousand) s ...
("%") is used to introduce format specifiers and must be escaped as "%%" to be interpreted literally. In SQL, the percent is used as a
wildcard character In software, a wildcard character is a kind of placeholder represented by a single character, such as an asterisk (), which can be interpreted as a number of literal characters or an empty string. It is often used in file searches so the full na ...
. * In SQL, the
underscore An underscore, ; also called an underline, low line, or low dash; is a line drawn under a segment of text. In proofreading, underscoring is a convention that says "set this text in italic type", traditionally used on Manuscript (publishing), man ...
("_") is used to match any single character.


Escaping

The term "to escape a metacharacter" means to make the metacharacter ineffective (to strip it of its special meaning), causing it to have its literal meaning. For example, in
PCRE Perl Compatible Regular Expressions (PCRE) is a library written in C, which implements a regular expression engine, inspired by the capabilities of the Perl programming language. Philip Hazel started writing PCRE in summer 1997. PCRE's syntax i ...
, a dot (".") stands for any single character. The regular expression "A.C" will match "ABC", "A3C", or even "A C". However, if the "." is escaped, it will lose its meaning as a metacharacter and will be interpreted literally as ".", causing the regular expression "A\.C" to only match the string "A.C". The usual way to escape a character in a regex and elsewhere is by prefixing it with a backslash ("\"). Other environments may employ different methods, like MS-DOS/Windows Command Prompt, where a caret ("^") is used instead.


See also

*
Markup language Markup language refers to a text-encoding system consisting of a set of symbols inserted in a text document to control its structure, formatting, or the relationship between its parts. Markup is often used to control the display of the document ...


References

Formal languages Pattern matching Programming language topics {{Prog-lang-stub