HOME

TheInfoList



OR:

A metacharacter is a character that has a special meaning to a computer program, such as a shell interpreter or a
regular expression A regular expression (shortened as regex or regexp), sometimes referred to as rational expression, is a sequence of characters that specifies a match pattern in text. Usually such patterns are used by string-searching algorithms for "find" ...
(regex) engine. In
POSIX The Portable Operating System Interface (POSIX; ) is a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems. POSIX defines application programming interfaces (APIs), along with comm ...
extended regular expressions, there are 14 metacharacters that must be ''escaped'' — preceded by a backslash (\) — in order to drop their special meaning and be treated literally inside an expression: opening and closing square brackets ( /code> and /code>); backslash (\); caret (^); dollar sign ($); period/full stop/dot (.); vertical bar/pipe symbol (, ); question mark (?); asterisk (*); plus and minus signs (+ and -); opening and closing curly brackets/braces (); and opening and closing parentheses (( and )). For example, to match the arithmetic expression (1+1)*3=6 with a regex, the correct regex is \(1\+1\)\*3=6; otherwise, the parentheses, plus sign, and asterisk will have special meanings.


Other examples

Some other characters may have special meaning in some environments. * In some
Unix shell A Unix shell is a Command-line_interface#Command-line_interpreter, command-line interpreter or shell (computing), shell that provides a command line user interface for Unix-like operating systems. The shell is both an interactive command languag ...
s the
semicolon The semicolon (or semi-colon) is a symbol commonly used as orthographic punctuation. In the English language, a semicolon is most commonly used to link (in a single sentence) two independent clauses that are closely related in thought, such as ...
(";") is a statement separator. * In
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing data. It defines a set of rules for encoding electronic document, documents in a format that is both human-readable and Machine-r ...
and
HTML Hypertext Markup Language (HTML) is the standard markup language for documents designed to be displayed in a web browser. It defines the content and structure of web content. It is often assisted by technologies such as Cascading Style Sheets ( ...
, the
ampersand The ampersand, also known as the and sign, is the logogram , representing the grammatical conjunction, conjunction "and". It originated as a typographic ligature, ligature of the letters of the word (Latin for "and"). Etymology Tradi ...
("&") introduces an
HTML entity In SGML, HTML and XML documents, the logical constructs known as ''character data'' and ''attribute values'' consist of sequences of characters, in which each character can manifest directly (representing itself), or can be represented by a series ...
. It also has special meaning in
MS-DOS MS-DOS ( ; acronym for Microsoft Disk Operating System, also known as Microsoft DOS) is an operating system for x86-based personal computers mostly developed by Microsoft. Collectively, MS-DOS, its rebranding as IBM PC DOS, and a few op ...
/
Windows Command Prompt cmd.exe, a.k.a. Command Prompt, is a shell (computing), shell computer program, program on later versions of Windows (Windows NT, NT and Windows CE, CE families), OS/2,, eComStation, ArcaOS, and ReactOS. In some versions of Windows (Windows C ...
. * In some Unix shells and MS-DOS/Windows Command Prompt, the
less-than sign The less-than sign is a mathematical symbol that denotes an inequality between two values. The widely adopted form of two equal-length strokes connecting in an acute angle at the left, , has been found in documents dated as far back as the 1560 ...
and
greater-than sign The greater-than sign is a mathematical symbol that denotes an inequality between two values. The widely adopted form of two equal-length strokes connecting in an acute angle at the right, , has been found in documents dated as far back as 1631 ...
("<" and ">") are used for redirection and the
backtick The backtick is a typographical mark used mainly in computing. It is also known as backquote, grave, or grave accent. The character was designed for typewriters to add a grave accent to a (lower-case) base letter, by overtyping it atop that let ...
/grave accent ("`") is used for command substitution. * In many
programming language A programming language is a system of notation for writing computer programs. Programming languages are described in terms of their Syntax (programming languages), syntax (form) and semantics (computer science), semantics (meaning), usually def ...
s,
string String or strings may refer to: *String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects Arts, entertainment, and media Films * ''Strings'' (1991 film), a Canadian anim ...
s are delimited using quotes (" or '). In some cases,
escape character In computing and telecommunications, an escape character is a character that invokes an alternative interpretation on the following characters in a character sequence. An escape character is a particular case of metacharacters. Generally, the ...
s (and other methods) are used to avoid
delimiter collision A delimiter is a sequence of one or more characters for specifying the boundary between separate, independent regions in plain text, mathematical expressions or other data streams. An example of a delimiter is the comma character, which acts ...
, e.g. "He said, \"Hello\"". * In printf format strings, the
percent sign The percent sign (sometimes per cent sign in British English) is the symbol used to indicate a percentage, a number or ratio as a fraction (mathematics), fraction of 100. Related signs include the permille (per thousand) sign and the Basis p ...
("%") is used to introduce format specifiers and must be escaped as "%%" to be interpreted literally. In
SQL Structured Query Language (SQL) (pronounced ''S-Q-L''; or alternatively as "sequel") is a domain-specific language used to manage data, especially in a relational database management system (RDBMS). It is particularly useful in handling s ...
, the percent is used as a
wildcard character In software, a wildcard character is a kind of placeholder represented by a single character (computing), character, such as an asterisk (), which can be interpreted as a number of literal characters or an empty string. It is often used in file ...
. * In SQL, the
underscore An underscore or underline is a line drawn under a segment of text. In proofreading, underscoring is a convention that says "set this text in italic type", traditionally used on manuscript or typescript as an instruction to the printer. Its ...
("_") is used to match any single character.


Escaping

The term "to escape a metacharacter" means to make the metacharacter ineffective (to strip it of its special meaning), causing it to have its literal meaning. For example, in PCRE, a dot (".") stands for any single character. The regular expression "A.C" will match "ABC", "A3C", or even "A C". However, if the "." is escaped, it will lose its meaning as a metacharacter and will be interpreted literally as ".", causing the regular expression "A\.C" to only match the string "A.C". The usual way to escape a character in a regex and elsewhere is by prefixing it with a backslash ("\"). Other environments may employ different methods, like MS-DOS/Windows Command Prompt, where a caret ("^") is used instead.


See also

*
Markup language A markup language is a Encoding, text-encoding system which specifies the structure and formatting of a document and potentially the relationships among its parts. Markup can control the display of a document or enrich its content to facilitate au ...


References

Formal languages Pattern matching Programming language topics {{Prog-lang-stub