Zero-width non-joiner
   HOME

TheInfoList



OR:

The zero-width non-joiner (ZWNJ) is a
non-printing character In computing and telecommunication, a control character or non-printing character (NPC) is a code point (a number) in a character set, that does not represent a written symbol. They are used as in-band signaling to cause effects other than ...
used in the computerization of
writing system A writing system is a method of visually representing verbal communication, based on a script and a set of rules regulating its use. While both writing and speech are useful in conveying messages, writing differs in also being a reliable fo ...
s that make use of ligatures. When placed between two characters that would otherwise be connected into a ligature, a ZWNJ causes them to be printed in their final and initial forms, respectively. This is also an effect of a space character, but a ZWNJ is used when it is desirable to keep the characters closer together or to connect a word with its morpheme. The ZWNJ is encoded in
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, ...
as .


Use of ZWNJ and unit separator for correct typography

In certain languages, the ZWNJ is necessary for unambiguously specifying the correct typographic form of a character sequence. The ASCII control code unit separator was formerly used. The picture shows how the code looks when it is ''rendered'' correctly, and in every row the correct and incorrect pictures should be different. On a system which not configured to display the Unicode correctly, the correct display and the incorrect one may look the same, or either of them may be significantly different from the corresponding picture. In this
Biblical Hebrew Biblical Hebrew (, or , ), also called Classical Hebrew, is an archaic form of the Hebrew language, a language in the Canaanite branch of Semitic languages spoken by the Israelites in the area known as the Land of Israel, roughly west of t ...
example, the placement of the to the left of the is correct, which has a sign written as two vertical dots to denote short vowel. If a were placed to the left of , it would be erroneous. In
Modern Hebrew Modern Hebrew ( he, עברית חדשה, ''ʿivrít ḥadašá ', , '' lit.'' "Modern Hebrew" or "New Hebrew"), also known as Israeli Hebrew or Israeli, and generally referred to by speakers simply as Hebrew ( ), is the standard form of the He ...
, there is no reason to use the for spoken language, so it is rarely used in Modern Hebrew typesetting. In German typography, ligatures may not cross the constituent boundaries within compounds. Thus, in the first German example, the prefix is separated from the rest of the word to prohibit the ligature ''fl''. Similarly, in English, some argue ligatures should not cross
morpheme A morpheme is the smallest meaningful Constituent (linguistics), constituent of a linguistic expression. The field of linguistics, linguistic study dedicated to morphemes is called morphology (linguistics), morphology. In English, morphemes are ...
boundaries. For example, in some words 'fly' and 'fish' are morphemes but in others they're not; therefore, by their reasoning, words like 'deaf‌ly' and 'self‌ish' (here shown with the non-joiner) should not have ligatures (respectively of fl and fi) while 'dayfly' and 'catfish' should have them. Persian uses this character extensively for certain prefixes, suffixes and compound words. It is necessary for disambiguating compounds from non-compound words, which use a full space. In the
Jawi script Jawi (; ace, Jawoë; Kelantan-Pattani: ''Yawi''; ) is a writing system used for writing several languages of Southeast Asia, such as Acehnese, Banjarese, Kerinci, Maguindanaon, Malay, Minangkabau, Tausūg, and Ternate. Jawi is bas ...
of Malay, ZWNJ is used whenever more than one consonants are written at the end of any phrase (, Malay for 'science' or in Latin script, pronounced /ˈsa.ɪns/.) It is used to signify that there are no vowels (specifically 'a' or 'ə') in between the two consonant letters as would otherwise be pronounced either /ˈsa.ɪnas/ or /ˈsa.ɪnəs/. A space would separate the phrase into different words, where phrases such as would now mean 'to sign the Arabic letter sin' ( in Latin script.)


Use of ZWNJ to display alternative forms

In Indic scripts, insertion of a ZWNJ after a consonant either with a halant or before a dependent vowel prevents the characters from being joined properly: In
Devanagari Devanagari ( ; , , Sanskrit pronunciation: ), also called Nagari (),Kathleen Kuiper (2010), The Culture of India, New York: The Rosen Publishing Group, , page 83 is a left-to-right abugida (a type of segmental writing system), based on the ...
, the characters and typically combine to form , but when a ZWNJ is inserted between them, (code: क्‌ष) is seen instead. In
Kannada Kannada (; ಕನ್ನಡ, ), originally romanised Canarese, is a Dravidian language spoken predominantly by the people of Karnataka in southwestern India, with minorities in all neighbouring states. It has around 47 million native s ...
, the characters ನ್ and ನ combine to form ನ್ನ, but when a ZWNJ is inserted between them, ನ್‌ನ is displayed. That style is typically used to write non-Kannada words in Kannada script: "
Facebook Facebook is an online social media and social networking service owned by American company Meta Platforms. Founded in 2004 by Mark Zuckerberg with fellow Harvard College students and roommates Eduardo Saverin, Andrew McCollum, Dust ...
" is written as ಫೇಸ್‌ಬುಕ್, though it can be written as ಫೇಸ್ಬುಕ್. ರಾಜ್‌ಕುಮಾರ್ and ರಾಮ್‌ಗೊಪಾಲ್ are examples of other proper nouns that need ZWNJ. In
Bengali Bengali or Bengalee, or Bengalese may refer to: *something of, from, or related to Bengal, a large region in South Asia * Bengalis, an ethnic and linguistic group of the region * Bengali language, the language they speak ** Bengali alphabet, the w ...
, when the Bengali letter য occurs at the end of a consonant cluster—i.e., য preceded by a ◌্ ('' hôsôntô'')—it appears in a special shape, , known as the য-ফলা (''ja-phala''), such as in ক্য (ক ্ য). However, when the Bengali letter র occurs at the beginning of a consonant cluster—i.e., র succeeded by a ''hôsôntô''—it appears in a special shape, known as the রেফ (''reph''). Thus, the sequence র ্ য is rendered by default as র্য. When the য-ফলা shape needs to be retained rather than the রেফ shape, the ZWNJ is inserted right after র, i.e., র‌্য to render র‌্য. র‌্য is commonly used for loanwords from English such as র‍্যান্ডম (random). Words like উদ্‌ঘাটন (code: উদ্‌ঘাটন) where the ''hôsôntô'' needs to be displayed explicitly also require ZWNJ inserted after the ''hôsôntô''.


Symbol

The symbol to be used on keyboards which enable the input of the ZWNJ directly is standardized in Amendment 1 (2012) of
ISO/IEC 9995 ISO/IEC 9995 ''Information technology — Keyboard layouts for text and office systems'' is an ISO/IEC standard series defining layout principles for computer keyboards. It does not define specific layouts but provides the base for national and in ...
-7:2009 ''"Information technology – Keyboard layouts for text and office systems – Symbols used to represent functions"'' as symbol number 81, and in
IEC The International Electrotechnical Commission (IEC; in French: ''Commission électrotechnique internationale'') is an international standards organization that prepares and publishes international standards for all electrical, electronic and r ...
60417 ''"Graphical Symbols for use on Equipment"'' as symbol no. IEC 60417-6177-2.


See also

* Zero-width joiner * Zero-width space *
Word divider In punctuation, a word divider is a glyph that separates written words. In languages which use the Latin, Cyrillic, and Arabic alphabets, as well as other scripts of Europe and West Asia, the word divider is a blank space, or ''whitespace''. ...


References


External links


Using the ZWNJ in Persian


/nowiki> JOINER)] {{Unicode navigation Control characters Persian orthography Typography Unicode formatting code points