The combining grapheme joiner (CGJ), is a
Unicode, formally the Unicode Standard, is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expressed in most of the world's wri ...
character that has no visible glyph and is "default ignorable" by applications. Its name is a misnomer and does not describe its function: the character does not join graphemes.
Its purpose is to semantically ''separate'' characters that should ''not'' be considered
Digraph may refer to:
* Digraph (orthography)
A digraph or digram (from the el, δίς ', "double" and ', "to write") is a pair of characters used in the orthography
An orthography is a set of conventions for writing
Writing is a m ...
as well as to block canonical reordering of
In digital typography
Desktop publishing (DTP) is the creation of documents using page layout
Image:Zeitschriften.JPG, 300px, Consumer magazine sponsored advertisements and covers rely heavily on professional page layout skills to compete ...
Normalization or normalisation refers to a process that makes something more normal or regular. Most commonly it refers to:
* Normalization (sociology)
Normalization refers to social processes through which ideas and actions come to be seen as ' ...
For example, in a
Hungarian () is a Uralic language
The Uralic languages (; sometimes called Uralian languages ) form a language family
A language is a structured system of communication used by humans, including speech ( spoken language), gestures (Sig ...
context, adjoining letters ''c'' and ''s'' would normally be considered equivalent to the cs digraph
. If they are separated by the CGJ, they will be considered as two separate graphemes. However, in contrast to the
The zero-width joiner (ZWJ, ) is a non-printing character
In computing and telecommunication, a control Character (computing), character or non-printing character (NPC) is a code point (a number) in a character encoding, character set, that d ...
and similar characters, the CGJ does not affect whether the two letters are ''rendered'' separately or as a
Ligature may refer to:
* Ligature (medicine), a piece of suture used to shut off a blood vessel or other anatomical structure
** Ligature (orthodontic), used in dentistry
* Ligature (music), an element of musical notation used especially in the med ...
or cursively joined—the default behavior for this is determined by the font.
The CGJ is also needed for complex scripts
. For example, in most cases the
Hebrew cantillation is the manner of chanting
A chant (from French ', from Latin
Latin (, or , ) is a classical language belonging to the Italic languages, Italic branch of the Indo-European languages. Latin was originally spoken in the ...
is supposed to appear to the left of the vowel point
and by default most display systems will render it like this even if it is typed before the vowel. But in some words in
Biblical Hebrew ( ''Ivrit Miqra'it'' or ''Leshon ha-Miqra''), also called Classical Hebrew, is an archaic form of Hebrew
Hebrew (, , or ) is a Northwest Semitic languages, Northwest Semitic language of the Afroasiatic languages, Afroas ...
the metheg appears to the right of the vowel, and to tell the display engine to render it properly on the right, CGJ must be typed between the metheg and the vowel. Compare:
In the case of several consecutive combining diacritics
, an intervening CGJ indicates that they should not be subject to canonical reordering.
In contrast, the "
The zero-width non-joiner (ZWNJ) is a non-printing character
In computing and telecommunication, a control Character (computing), character or non-printing character (NPC) is a code point (a number) in a character encoding, character set, that ...
" at U+200C in the General Punctuation
range, which prevents two adjacent character from turning into a ligature.
Unicode FAQ - Characters and Combining Marks
Unicode special code points