HOME

TheInfoList




The combining grapheme joiner (CGJ), is a
Unicode Unicode, formally the Unicode Standard, is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expressed in most of the world's wri ...

Unicode
character that has no visible glyph and is "default ignorable" by applications. Its name is a misnomer and does not describe its function: the character does not join graphemes. Its purpose is to semantically ''separate'' characters that should ''not'' be considered
digraphs Digraph may refer to: * Digraph (orthography) A digraph or digram (from the el, δίς ', "double" and ', "to write") is a pair of characters used in the orthography An orthography is a set of conventions for writing Writing is a m ...
as well as to block canonical reordering of
combining mark In digital typography Desktop publishing (DTP) is the creation of documents using page layout Image:Zeitschriften.JPG, 300px, Consumer magazine sponsored advertisements and covers rely heavily on professional page layout skills to compete ...
s during
normalization Normalization or normalisation refers to a process that makes something more normal or regular. Most commonly it refers to: * Normalization (sociology) Normalization refers to social processes through which ideas and actions come to be seen as ' ...
. For example, in a
Hungarian language Hungarian () is a Uralic language The Uralic languages (; sometimes called Uralian languages ) form a language family A language is a structured system of communication used by humans, including speech ( spoken language), gestures (Sig ...
context, adjoining letters ''c'' and ''s'' would normally be considered equivalent to the cs digraph. If they are separated by the CGJ, they will be considered as two separate graphemes. However, in contrast to the
zero-width joiner The zero-width joiner (ZWJ, ) is a non-printing character In computing and telecommunication, a control Character (computing), character or non-printing character (NPC) is a code point (a number) in a character encoding, character set, that d ...
and similar characters, the CGJ does not affect whether the two letters are ''rendered'' separately or as a
ligature Ligature may refer to: * Ligature (medicine), a piece of suture used to shut off a blood vessel or other anatomical structure ** Ligature (orthodontic), used in dentistry * Ligature (music), an element of musical notation used especially in the med ...
or cursively joined—the default behavior for this is determined by the font. The CGJ is also needed for complex scripts. For example, in most cases the
Hebrew cantillation Hebrew cantillation is the manner of chanting A chant (from French ', from Latin Latin (, or , ) is a classical language belonging to the Italic languages, Italic branch of the Indo-European languages. Latin was originally spoken in the ...
accent is supposed to appear to the left of the vowel point and by default most display systems will render it like this even if it is typed before the vowel. But in some words in
Biblical Hebrew Biblical Hebrew ( ''Ivrit Miqra'it'' or ''Leshon ha-Miqra''), also called Classical Hebrew, is an archaic form of Hebrew Hebrew (, , or ) is a Northwest Semitic languages, Northwest Semitic language of the Afroasiatic languages, Afroas ...
the metheg appears to the right of the vowel, and to tell the display engine to render it properly on the right, CGJ must be typed between the metheg and the vowel. Compare: In the case of several consecutive combining diacritics, an intervening CGJ indicates that they should not be subject to canonical reordering. In contrast, the "
zero-width non-joiner The zero-width non-joiner (ZWNJ) is a non-printing character In computing and telecommunication, a control Character (computing), character or non-printing character (NPC) is a code point (a number) in a character encoding, character set, that ...
" at U+200C in the General Punctuation range, which prevents two adjacent character from turning into a ligature.


References


External links


Unicode FAQ - Characters and Combining Marks


{{Unicode navigation Unicode special code points Control characters