CJK character encodings
   HOME

TheInfoList



OR:

In internationalization, CJK characters is a collective term for the
Chinese Chinese can refer to: * Something related to China * Chinese people, people of Chinese nationality, citizenship, and/or ethnicity **''Zhonghua minzu'', the supra-ethnic concept of the Chinese nation ** List of ethnic groups in China, people of va ...
, Japanese, and Korean languages, all of which include Chinese characters and derivatives in their writing systems, sometimes paired with other scripts. Collectively, the CJK characters often include ''Hànzì'' in Chinese, '' Kanji'' and '' Kana'' in Japanese, '' Hanja'' and '' Hangul'' in Korean. Vietnamese can be included, making the abbreviation CJKV, as Vietnamese historically used Chinese characters in which they were known as ''
Chữ Hán Chữ Hán (𡨸漢, literally "Chinese characters", ), Chữ Nho (𡨸儒, literally "Confucian characters", ) or Hán tự (漢字, ), is the Vietnamese term for Chinese characters, used to write Văn ngôn (which is a form of Classical Chinese ...
'' and '' Chữ Nôm'' in Vietnamese ('' Hán-Nôm'' altogether).


Character repertoire

Standard Mandarin Chinese and Standard Cantonese are written almost exclusively in Chinese characters. Over 3,000 characters are required for general literacy, with up to 40,000 characters for reasonably complete coverage. Japanese uses fewer characters—general literacy in Japanese can be expected with 2,136 characters. The use of Chinese characters in Korea is increasingly rare, although idiosyncratic use of Chinese characters in proper names requires knowledge (and therefore availability) of many more characters. Even today, however, South Korean students are taught 1,800 characters. Other scripts used for these languages, such as
bopomofo Bopomofo (), or Mandarin Phonetic Symbols, also named Zhuyin (), is a Chinese transliteration system for Mandarin Chinese and other related languages and dialects. More commonly used in Taiwanese Mandarin, it may also be used to transcribe ...
and the Latin-based pinyin for Chinese, hiragana and katakana for Japanese, and hangul for Korean, are not strictly "CJK characters", although CJK character sets almost invariably include them as necessary for full coverage of the target languages. The
sinologist Sinology, or Chinese studies, is an academic discipline that focuses on the study of China primarily through Chinese philosophy, language, literature, culture and history and often refers to Western scholarship. Its origin "may be traced to the ex ...
Carl Leban (1971) produced an early survey of CJK encoding systems. Until the early 20th century, Classical Chinese was the written language of government and scholarship in Vietnam. Popular literature in Vietnamese was written in the '' chữ Nôm'' script, consisting of Chinese characters with many characters created locally. From 1920s onwards, the script since then used for recording literature has been the Latin chữ Quốc ngữ.


Encoding

The number of characters required for complete coverage of all these languages' needs cannot fit in the 256-character code space of 8-bit character encodings, requiring at least a 16-bit fixed width encoding or multi-byte variable-length encodings. The 16-bit fixed width encodings, such as those from Unicode up to and including version 2.0, are now deprecated due to the requirement to encode more characters than a 16-bit encoding can accommodate—Unicode 5.0 has some 70,000 Han characters—and the requirement by the Chinese government that software in China support the GB 18030 character set. Although CJK encodings have common character sets, the encodings often used to represent them have been developed separately by different East Asian governments and software companies, and are mutually incompatible. Unicode has attempted, with some controversy, to unify the character sets in a process known as Han unification. CJK character encodings should consist minimally of Han characters plus language-specific phonetic scripts such as pinyin,
bopomofo Bopomofo (), or Mandarin Phonetic Symbols, also named Zhuyin (), is a Chinese transliteration system for Mandarin Chinese and other related languages and dialects. More commonly used in Taiwanese Mandarin, it may also be used to transcribe ...
, hiragana, katakana and hangul. CJK character encodings include: * Big5 (the most prevalent encoding before Unicode was implemented) * CCCII *
CNS 11643 The CNS 11643 character set (Chinese National Standard 11643), also officially known as the Chinese Standard Interchange Code or CSIC ( zh, tr=, t=中文標準交換碼), is officially the standard character set of Taiwan (Republic of China). In p ...
(official standard of
Republic of China Taiwan, officially the Republic of China (ROC), is a country in East Asia, at the junction of the East and South China Seas in the northwestern Pacific Ocean, with the People's Republic of China (PRC) to the northwest, Japan to the northeast ...
) * EUC-JP * EUC-KR * GB 2312 (subset and predecessor of GB 18030) * GB 18030 (mandated standard in the People's Republic of China) * Giga Character Set (GCS) * ISO 2022-JP * KS C 5861 *
Shift-JIS Shift JIS (Shift Japanese Industrial Standards, also SJIS, MIME name Shift_JIS, known as PCK in Solaris contexts) is a character encoding for the Japanese language, originally developed by a Japanese company called ASCII Corporation in conjunct ...
* TRON * Unicode The CJK character sets take up the bulk of the assigned Unicode code space. There is much controversy among Japanese experts of Chinese characters about the desirability and technical merit of the Han unification process used to map multiple Chinese and Japanese character sets into a single set of unified characters. All three languages can be written both left-to-right and top-to-bottom (right-to-left and top-to-bottom in ancient documents), but are usually considered left-to-right scripts when discussing encoding issues.


Legal status

Libraries cooperated on encoding standards for
JACKPHY In library automation the initialism JACKPHY refers to a group of language scripts not based on Roman characters, specifically: Japanese, Arabic, Chinese, Korean, Persian, Hebrew, and Yiddish. Focus on these seven writing systems by Library of Co ...
characters in the early 1980s. According to
Ken Lunde Ken Roger Lunde (, born 12 August 1965 in Madison, Wisconsin)Lunde, 2008. is an American specialist in information processing for East Asian languages. Academic Background Ken majored in linguistics at University of Wisconsin–Madison in 1985, w ...
, the abbreviation "CJK" was a registered trademark of
Research Libraries Group The Research Libraries Group (RLG) was a U.S.-based library consortium that existed from 1974 until its merger with the OCLC library consortium in 2006. RLG developed the Eureka interlibrary search engine, the RedLightGreen database of bibliographi ...
Ken Lunde, 1996
/ref> (which merged with
OCLC OCLC, Inc., doing business as OCLC, See also: is an American nonprofit cooperative organization "that provides shared technology services, original research, and community programs for its membership and the library community at large". It was ...
in 2006). The trademark owned by OCLC between 1987 and 2009 has now expired.Justia listing
/ref>


See also

* Chinese character description languages *
Chinese character encoding In computing, Chinese character encodings can be used to represent text written in the CJK languages—Chinese, Japanese, Korean—and (rarely) obsolete Vietnamese, all of which use Chinese characters. Several general-purpose character enc ...
* Chinese input methods for computers *
CJK Compatibility Ideographs CJK Compatibility Ideographs is a Unicode block created to contain Han characters that were encoded in multiple locations in other established character encodings, in addition to their CJK Unified Ideographs assignments, in order to retain roun ...
*
CJK strokes CJK strokes () are the calligraphic strokes needed to write the Chinese characters in regular script used in East Asian calligraphy. CJK strokes are the classified set of line patterns that may be arranged and combined to form Chinese charact ...
* CJK Unified Ideographs * Complex Text Layout languages (CTL) * Input method editor * Japanese language and computers * Korean language and computers * List of CJK fonts * Sinoxenic * Variable-width encoding


References


Works cited

* *


Sources

*
DeFrancis, John John DeFrancis (August 31, 1911January 2, 2009) was an American linguist, sinologist, author of Chinese language textbooks, lexicographer of Chinese dictionaries, and Professor Emeritus of Chinese Studies at the University of Hawaii at Māno ...
. '' The Chinese Language: Fact and Fantasy''. Honolulu: University of Hawaii Press, 1990. . * Hannas, William C. ''Asia's Orthographic Dilemma''. Honolulu: University of Hawaii Press, 1997. (paperback); (hardcover). * Lemberg, Werner: The CJK package for LATEX2ε—Multilingual support beyond babel. TUGboat, Volume 18 (1997), No. 3—Proceedings of the 1997 Annual Meeting. * Leban, Carl.
Automated Orthographic Systems for East Asian Languages (Chinese, Japanese, Korean)
', State-of-the-art Report, Prepared for the Board of Directors, Association for Asian Studies. 1971. * Lunde, Ken. ''CJKV Information Processing''. Sebastopol, Calif.: O'Reilly & Associates, 1998. .


External links


CJKV: A Brief Introduction

Lemberg CJK article from above, TUGboat18-3

On "CJK Unified Ideograph"
from Wenlin.com

{{CJK ideographs in Unicode Encodings of Asian languages Languages of East Asia Natural language and computing Chinese-language computing Japanese-language computing Korean-language computing