computer programming Computer programming is the process of performing a particular computation (or more generally, accomplishing a specific computing result), usually by designing and building an executable computer program. Programming involves tasks such as anal ...

, digraphs and trigraphs are sequences of two and three

character Character or Characters may refer to: Arts, entertainment, and media Literature * ''Character'' (novel), a 1936 Dutch novel by Ferdinand Bordewijk * ''Characters'' (Theophrastus), a classical Greek set of character sketches attributed to The ...

s, respectively, that appear in

source code In computing, source code, or simply code, is any collection of code, with or without comment (computer programming), comments, written using a human-readable programming language, usually as plain text. The source code of a Computer program, p ...

and, according to a

programming language A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming l ...

's specification, should be treated as if they were single characters. Trigraphs have been removed from the C++ language, and will be from C as of C23, thus likely aren't used much in practice in C already, nor in any other mainstream language (use of them in the language J is an exception). In the modern world of

Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, ...

UTF-8 UTF-8 is a variable-length character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode'' (or ''Universal Coded Character Set'') ''Transformation Format 8-bit''. UTF-8 is capable of ...

(even just with

ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...

) there's no need for trigraphs in language design, which were considered a burden, and neither really digraphs, that likely have very few users, at least in those languages. Various reasons exist for using digraphs and trigraphs: keyboards may not have keys to cover the entire

character set Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers. The numerical values tha ...

of the language, input of special characters may be difficult,

text editor A text editor is a type of computer program that edits plain text. Such programs are sometimes known as "notepad" software (e.g. Windows Notepad). Text editors are provided with operating systems and software development packages, and can be u ...

s may reserve some characters for special use and so on. Trigraphs might also be used for some

EBCDIC Extended Binary Coded Decimal Interchange Code (EBCDIC; ) is an eight-bit character encoding used mainly on IBM mainframe and IBM midrange computer operating systems. It descended from the code used with punched cards and the corresponding s ...

code page In computing, a code page is a character encoding and as such it is a specific association of a set of printable characters and control characters with unique numbers. Typically each number represents the binary value in a single byte. (In some c ...

s that lack characters such as .

History

The basic character set of the

C programming language ''The C Programming Language'' (sometimes termed ''K&R'', after its authors' initials) is a computer programming book written by Brian Kernighan and Dennis Ritchie, the latter of whom originally designed and implemented the language, as well as ...

is a subset of the

character set that includes nine characters which lie outside the

ISO 646 ISO/IEC 646 is a set of ISO/ IEC standards, described as ''Information technology — ISO 7-bit coded character set for information interchange'' and developed in cooperation with ASCII at least since 1964. Since its first edition in ...

invariant character set. This can pose a problem for writing

when the

encoding In communications and information processing, code is a system of rules to convert information—such as a letter (alphabet), letter, word, sound, image, or gesture—into another form, sometimes data compression, shortened or secrecy, secret ...

(and possibly

keyboard Keyboard may refer to: Text input * Keyboard, part of a typewriter * Computer keyboard ** Keyboard layout, the software control of computer keyboards and their mapping ** Keyboard technology, computer keyboard hardware and firmware Music * Musi ...

) being used does not support any of these nine characters. The

ANSI C ANSI C, ISO C, and Standard C are successive standards for the C programming language published by the American National Standards Institute (ANSI) and ISO/IEC JTC 1/SC 22/WG 14 of the International Organization for Standardization (ISO) and th ...

committee invented trigraphs as a way of entering source code using keyboards that support any version of the ISO 646 character set.

Implementations

Trigraphs are not commonly encountered outside

compiler In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...

test suite In software development, a test suite, less commonly known as a validation suite, is a collection of test cases that are intended to be used to test a software program to show that it has some specified set of behaviors. A test suite often contai ...

s. Some compilers support an option to turn recognition of trigraphs off, or disable trigraphs by default and require an option to turn them on. Some can issue warnings when they encounter trigraphs in source files. Borland supplied a separate program, the trigraph preprocessor (TRIGRAPH.EXE), to be used only when trigraph processing is desired (the rationale was to maximise speed of compilation).

Language support

Different systems define different sets of digraphs and trigraphs, as described below.

ALGOL

Early versions of

ALGOL ALGOL (; short for "Algorithmic Language") is a family of imperative computer programming languages originally developed in 1958. ALGOL heavily influenced many other languages and was the standard method for algorithm description used by th ...

predated the standardized ASCII and EBCDIC character sets, and were typically implemented using a manufacturer-specific

six-bit character code A six-bit character code is a character encoding designed for use on computers with word lengths a multiple of 6. Six bits can only encode 64 distinct characters, so these codes generally include only the upper-case letters, the numerals, some punc ...

. A number of ALGOL operations either lacked

codepoint In character encoding terminology, a code point, codepoint or code position is a numerical value that maps to a specific character. Code points usually represent a single grapheme—usually a letter, digit, punctuation mark, or whitespace—but ...

s in the available character set or were not supported by peripherals, leading to a number of substitutions including := for ← (assignment) and >= for ≥ (greater than or equal).

Pascal

The

Pascal Pascal, Pascal's or PASCAL may refer to: People and fictional characters * Pascal (given name), including a list of people with the name * Pascal (surname), including a list of people and fictional characters with the name ** Blaise Pascal, Frenc ...

programming language supports digraphs (., .), (* and *) for

 /code>, /code>,  respectively. Unlike all other cases mentioned here, (* and *) were and still are in wide use. However, many compilers treat them as a different type of commenting block rather than as actual digraphs, that is, a comment started with (* cannot be closed with } and vice versa.

  J 

The J programming language 

The J programming language, developed in the early 1990s by  Kenneth E. Iverson and  Roger Hui, is an array programming language based primarily on  APL (also by Iverson).

To avoid repeating the APL special-character problem, J uses only the basi ...
 is a descendant of  APL but uses the ASCII character set rather than  APL symbols. Because the printable range of ASCII is smaller than APL's specialized set of symbols, . (dot) and : (colon) characters are used to inflect ASCII symbols, effectively interpreting unigraphs, digraphs or rarely trigraphs as standalone "symbols".

Unlike the use of digraphs and trigraphs in C and  C++, there are no single-character equivalents to these in J.

  C 





The C preprocessor 


 
The C preprocessor is the  macro preprocessor for the  C, Objective-C and  C++ computer programming languages. The preprocessor provides the ability for the inclusion of  header files, macro expansions,  conditional compilation, and line contro ...
 (used for C and with slight differences in  C++; see below 
Below may refer to:

*Earth
* Ground (disambiguation)
* Soil
* Floor
* Bottom (disambiguation)
* Less than
*Temperatures below  freezing
* Hell or  underworld
 People with the surname
* Ernst von Below (1863–1955), German World War I general
* Fr ...
) replaces all occurrences of the nine trigraph sequences in this table by their single-character equivalents before any other processing (until  C23).

A programmer may want to place two question marks together yet not have the compiler treat them as introducing a trigraph. The C grammar does not permit two consecutive ? tokens, so the only places in a C file where two question marks in a row may be used are in multi-character constants, string literal 
A string  literal or anonymous string is a string value in the source code of a computer program. Modern  programming languages commonly use a quoted sequence of characters, formally " bracketed delimiters", as in x = "foo", where "foo" is a string ...
s, and comments. This is particularly a problem for the classic Mac OS 





Mac OS (originally System Software; retronym: Classic Mac OS) is the series of operating systems developed for the Macintosh family of personal computers by Apple Computer from 1984 to 2001, starting with  System 1 and ending with Mac OS 9. Th ...
, where the constant '????' may be used as a file type or creator. To safely place two consecutive question marks within a string literal, the programmer can use string concatenation "...?""?..." or an escape sequence 



In computer science, an escape sequence is a combination of  characters that has a meaning other than the literal characters contained therein; it is marked by one or more preceding (and possibly terminating) characters.
 Examples
* In  C and ma ...
 "...?\?...".

??? is not itself a trigraph sequence, but when followed by a character such as - it will be interpreted as ? + ??-, as in the example below which has 16 ?s before the /.

The ??/ trigraph can be used to introduce an escaped newline for line splicing; this must be taken into account for correct and efficient handling of trigraphs within the preprocessor.  It can also cause surprises, particularly within comments.  For example:



which is a single logical comment line (used in C++ and C99 






C99 (previously known as C9X) is an informal name for ISO/IEC 9899:1999, a past version of the C programming language standard. It extends the previous version ( C90) with new features for the language and the  standard library, and helps imp ...
), and



which is a correctly formed block comment. The concept can be used to check for trigraphs as in the following C99 example, where only one return statement will be executed.





In 1994, a normative amendment to the C standard, included in C99, supplied digraphs as more readable alternatives to five of the trigraphs.

Unlike trigraphs, digraphs are handled during tokenization 
Tokenization may refer to:

* Tokenization (lexical analysis) in language processing
* Tokenization (data security) in the field of data security
* Word segmentation
* Tokenism 
 
Tokenism is the practice of making only a perfunctory or symbolic ef ...
, and any digraph must always represent a full token by itself, or compose the token %:%: replacing the preprocessor concatenation token ##. If a digraph sequence occurs inside another token, for example a quoted string, or a character constant, it will not be replaced.

  C++ 





 C++ (through C++14 
C14, C.XIV or C-14 may be:
*  Autovía C-14, a highway in Catalonia in Spain
* Fokker C.XIV, a 1937 Dutch reconnaissance seaplane
* , a 1908 British C-class submarine
*  LSWR C14 class, a London and South Western Railway locomotive
* Ramal C-14, th ...
, see below 
Below may refer to:

*Earth
* Ground (disambiguation)
* Soil
* Floor
* Bottom (disambiguation)
* Less than
*Temperatures below  freezing
* Hell or  underworld
 People with the surname
* Ernst von Below (1863–1955), German World War I general
* Fr ...
) behaves like C, including the C99 additions, but with additional tokens listed in the table.

As a note, %:%: is treated as a single token, rather than two occurrences of %:.

In the sequence <:: if the subsequent character is neither : nor >, the < is treated as a preprocessing token by itself and not as the first character of the alternative token <:. This is done so certain uses of templates are not broken by the substitution.

The C++ Standard makes this comment with regards to the term "digraph":


Trigraphs were proposed for deprecation in C++0x 



C++11 is a version of the  ISO/ IEC 14882 standard for the  C++ programming language. C++11 replaced the prior version of the C++ standard, called  C++03, and was later replaced by C++14. The name follows the tradition of naming language version ...
, which was released as C++11 



C++11 is a version of the  ISO/ IEC 14882 standard for the  C++ programming language. C++11 replaced the prior version of the C++ standard, called  C++03, and was later replaced by  C++14. The name follows the tradition of naming language versio ...
. This was opposed by  IBM, speaking on behalf of itself and other users of C++, and as a result trigraphs were retained in C++11. Trigraphs were then proposed again for removal (not only deprecation) in C++17 


C++17 is a version of the  ISO/ IEC 14882 standard for the  C++ programming language. C++17 replaced the prior version of the C++ standard, called  C++14, and was later replaced by  C++20.
 History
Before the C++ Standards Committee fixed a 3-yea ...
. This passed a committee vote, and trigraphs (but not the additional tokens) are removed from C++17 despite the opposition from IBM. Existing code that uses trigraphs can be supported by translating from the source files (parsing trigraphs) to the basic source character set that does not include trigraphs.

  RPL 

 Hewlett-Packard calculators supporting the  RPL language and input method provide support for a large number of trigraphs (also called ''TIO codes'') to reliably transcribe non-seven-bit ASCII characters of the  calculators' extended character set on foreign platforms, and to ease keyboard input without using the  application. The first character of all TIO codes is a \, followed by two other ASCII characters vaguely resembling the glyph to be substituted. All other characters can be entered using the special \nnn TIO code syntax with nnn being a three-digit decimal number 



The decimal numeral system (also called the base-ten  positional numeral system and denary  or decanary) is the standard system for denoting integer and non-integer numbers. It is the extension to non-integer numbers of the Hindu–Arabic numera ...
 (with leading zero 
A leading zero is any  0 digit that comes before the first nonzero digit in a number string in  positional notation.. For example, James Bond's famous identifier, 007, has two leading zeros. Any zeroes appearing to the left of the first non-zero di ...
s if necessary) of the corresponding code point 



In  character encoding terminology, a code point, codepoint or code position is a numerical value that maps to a specific character. Code points usually represent a single  grapheme—usually a letter, digit, punctuation mark, or whitespace—bu ...
 (thereby formally representing a ''tetragraph 

A tetragraph (from the  el, τετρα-, ''tetra-'', "four" and γράφω, ''gráphō'', "write") is a sequence of four letters used to represent a single sound (phoneme), or a combination of sounds, that do not necessarily correspond to the indi ...
'').

  Application support 

  Vim 

The Vim 

Vim means enthusiasm and vigor. It may also refer to:
*  Vim (cleaning product)
* Vim Comedy Company, a movie studio
* Vim Records
*  Vimentin, a protein
* "Vim", a song by Machine Head on the album ''Through the Ashes of Empires''
*  Vim (text ed ...
 text editor supports digraphs for actual entry of text characters, following . The entry of digraphs is  bound to  by default. The list of all possible digraphs in Vim 

Vim means enthusiasm and vigor. It may also refer to:
*  Vim (cleaning product)
* Vim Comedy Company, a movie studio
* Vim Records
*  Vimentin, a protein
* "Vim", a song by Machine Head on the album ''Through the Ashes of Empires''
*  Vim (text ed ...
 can be displayed by typing .

  GNU Screen 

GNU Screen 


GNU Screen is a terminal multiplexer, a software application that can be used to  multiplex several virtual consoles, allowing a user to access multiple separate login sessions inside a single  terminal window, or detach and reattach sessions fro ...
 has a digraph command, bound to   by default.

  Lotus 

Lotus 1-2-3 



Lotus 1-2-3 is a discontinued spreadsheet program from  Lotus Software (later part of  IBM). It was the first  killer application of the  IBM PC, was hugely popular in the 1980s, and significantly contributed to the success of  IBM PC-compatible ...
 for DOS 

 DOS is shorthand for the MS-DOS and IBM PC DOS family of operating systems.

DOS may also refer to:

 Computing
* Data over signalling (DoS), multiplexing data onto a signalling channel
*  Denial-of-service attack (DoS), an attack on a communicat ...
 uses  as  compose key to allow easier input of many special characters of the Lotus International Character Set The Lotus International Character Set (LICS) is a proprietary  single-byte character encoding introduced in 1985 by Lotus Development Corporation. It is based on the 1983  DEC Multinational Character Set (MCS) for VT220 terminals. As such, LICS is a ...
 (LICS) and Lotus Multi-Byte Character Set 
The Lotus Multi-Byte Character Set (LMBCS) is a proprietary  multi-byte  character encoding originally conceived in 1988 at Lotus Development Corporation with input from Bob Balaban and others. Created around the same time and addressing some of th ...
 (LMBCS).

  See also 


*  Compose key
* List of XML and HTML character entity references 
In SGML, HTML and  XML documents, the logical constructs known as ''character data'' and ''attribute values'' consist of sequences of characters, in which each character can manifest directly (representing itself), or can be represented by a series ...

* Escape sequence 



In computer science, an escape sequence is a combination of  characters that has a meaning other than the literal characters contained therein; it is marked by one or more preceding (and possibly terminating) characters.
 Examples
* In  C and ma ...

* Escape sequences in C 



Escape sequences are used in the programming languages  C and  C++, and their design was copied in many other languages such as  Java,  PHP,  C#, etc. An escape sequence is a sequence of characters that does not represent itself when used inside ...

* C alternative tokens 


C alternative tokens refer to a set of alternative spellings of common operators in the C programming language. They are implemented as a group of macro constants in the  C standard library in the iso646.h header. The tokens were created by Bjarn ...


  References 



  External links 

* {{IETF_RFC, 1345

 C (programming language)
 Character encoding
 Input/output