HOME

TheInfoList



OR:

A metacharacter is a character that has a special meaning to a computer program, such as a shell interpreter or a regular expression (regex) engine. In POSIX extended regular expressions, there are 14 metacharacters that must be ''escaped'' (preceded by a backslash (\)) in order to drop their special meaning and be treated literally inside an expression: opening and closing square brackets ( /code> and /code>); backslash (\); caret (^); dollar sign ($); period/full stop/dot (.); vertical bar/pipe symbol (, ); question mark (?); asterisk (*); plus and minus signs (+ and -); opening and closing curly brackets/braces (); and opening and closing parentheses (( and )). For example, to match the arithmetic expression (1+1)*3=6 with a regex, the correct regex is \(1\+1\)\*3=6; otherwise, the parentheses, plus sign, and asterisk will have special meanings.


Other examples

Some other characters may have special meaning in some environments. * In some
Unix shell A Unix shell is a command-line interpreter or shell that provides a command line user interface for Unix-like operating systems. The shell is both an interactive command language and a scripting language, and is used by the operating syste ...
s the
semicolon The semicolon or semi-colon is a symbol commonly used as orthographic punctuation. In the English language, a semicolon is most commonly used to link (in a single sentence) two independent clauses that are closely related in thought. When a ...
(";") is a statement separator. * In
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. T ...
and
HTML The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaS ...
, the
ampersand The ampersand, also known as the and sign, is the logogram , representing the conjunction "and". It originated as a ligature of the letters ''et''—Latin for "and". Etymology Traditionally in English, when spelling aloud, any letter tha ...
("&") introduces an HTML entity. It also has special meaning in
MS-DOS MS-DOS ( ; acronym for Microsoft Disk Operating System, also known as Microsoft DOS) is an operating system for x86-based personal computers mostly developed by Microsoft. Collectively, MS-DOS, its rebranding as IBM PC DOS, and a few o ...
/
Windows Command Prompt Command Prompt, also known as cmd.exe or cmd, is the default command-line interpreter for the OS/2, eComStation, ArcaOS, Microsoft Windows (Windows NT family and Windows CE family), and ReactOS operating systems. On Windows CE .NET 4.2, Win ...
. * In some Unix shells and MS-DOS/Windows Command Prompt, the
less-than sign The less-than sign is a mathematical symbol that denotes an inequality between two values. The widely adopted form of two equal-length strokes connecting in an acute angle at the left, , has been found in documents dated as far back as the 1560 ...
and greater-than sign ("<" and ">") are used for redirection and the
backtick The backtick is a typographical mark used mainly in computing. It is also known as backquote, grave, or grave accent. The character was designed for typewriters to add a grave accent to a (lower-case) base letter, by overtyping it atop that le ...
/grave accent ("`") is used for command substitution. * In many
programming language A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming ...
s, strings are delimited using quotes (" or '). In some cases,
escape character In computing and telecommunication, an escape character is a character (computing), character that invokes an alternative interpretation on the following characters in a character sequence. An escape character is a particular case of metacharac ...
s (and other methods) are used to avoid
delimiter collision A delimiter is a sequence of one or more characters for specifying the boundary between separate, independent regions in plain text, mathematical expressions or other data streams. An example of a delimiter is the comma character, which acts as ...
, e.g. "He said, \"Hello\"". * In printf format strings, the
percent sign The percent sign (sometimes per cent sign in British English) is the symbol used to indicate a percentage, a number or ratio as a fraction of 100. Related signs include the permille (per thousand) sign and the permyriad (per ten thousand) ...
("%") is used to introduce format specifiers and must be escaped as "%%" to be interpreted literally. In SQL, the percent is used as a wildcard character. * In SQL, the
underscore An underscore, ; also called an underline, low line, or low dash; is a line drawn under a segment of text. In proofreading, underscoring is a convention that says "set this text in italic type", traditionally used on manuscript or typescript ...
("_") is used to match any single character.


Escaping

The term "to escape a metacharacter" means to make the metacharacter ineffective (to strip it of its special meaning), causing it to have its literal meaning. For example, in PCRE, a dot (".") stands for any single character. The regular expression "A.C" will match "ABC", "A3C", or even "A C". However, if the "." is escaped, it will lose its meaning as a metacharacter and will be interpreted literally as ".", causing the regular expression "A\.C" to only match the string "A.C". The usual way to escape a character in a regex and elsewhere is by prefixing it with a backslash ("\"). Other environments may employ different methods, like MS-DOS/Windows Command Prompt, where a caret ("^") is used instead.


See also

*
Markup language Markup language refers to a text-encoding system consisting of a set of symbols inserted in a text document to control its structure, formatting, or the relationship between its parts. Markup is often used to control the display of the document ...


References

Formal languages Pattern matching Programming language topics {{Prog-lang-stub