In
internationalization, CJK characters is a collective term for the
Chinese,
Japanese, and
Korean languages, all of which include
Chinese characters and derivatives in their writing systems, sometimes paired with other scripts. Collectively, the CJK characters often include ''
hànzì'' in
Chinese, ''
kanji'', ''
kana'' in
Japanese, ''
hanja'' and ''
hangul'' in
Korean. Rarely,
Vietnamese is included, making the abbreviation CJKV, since Vietnamese historically used Chinese characters as well; for details on Sino-Vietnamese characters, see article
Chữ Nôm.
Character repertoire
Standard Mandarin Chinese and Standard Cantonese are written almost exclusively in Chinese characters. It requires over 3,000 characters for general
literacy, but up to 40,000 characters for reasonably complete coverage. Japanese uses fewer characters—general literacy in Japanese can be expected with 2,136 characters. The use of Chinese characters in Korea is becoming increasingly rare, although idiosyncratic use of Chinese characters in proper names requires knowledge (and therefore availability) of many more characters. However, even today, students in South Korea are taught 1,800 characters.
Other scripts used for these languages, such as
bopomofo and the
Latin-based
pinyin for Chinese,
hiragana and
katakana for Japanese, and
hangul for Korean, are not strictly "CJK characters", although CJK character sets almost invariably include them as necessary for full coverage of the target languages.
Until the early 20th century,
Classical Chinese was the written language of government and scholarship in Vietnam. Popular literature in
Vietnamese was written in the ''
chữ Nôm'' script, consisting of borrowed Chinese characters together with many characters created locally. From 1920s onwards, the script since then used for recording literature has been the Latin
Vietnamese alphabet.
The
sinologist Carl Leban (1971) produced an early survey of CJK encoding systems.
Encoding
The number of characters required for complete coverage of all these languages' needs cannot fit in the 256-character code space of 8-bit
character encodings, requiring at least a 16-bit fixed width encoding or multi-byte variable-length encodings. The 16-bit fixed width encodings, such as those from
Unicode up to and including version 2.0, are now deprecated due to the requirement to encode more characters than a 16-bit encoding can accommodate—Unicode 5.0 has some 70,000 Han characters—and the requirement by the Chinese government that software in China support the
GB 18030 character set.
Although CJK encodings have common character sets, the encodings often used to represent them have been developed separately by different East Asian governments and software companies, and are mutually incompatible.
Unicode has attempted, with some controversy, to unify the character sets in a process known as
Han unification.
CJK character encodings should consist minimally of Han characters plus language-specific phonetic scripts such as
pinyin,
bopomofo, hiragana, katakana and hangul.
CJK character encodings include:
*
Big5 (the most prevalent encoding before Unicode was implemented)
*
CCCII
*
CNS 11643 (official standard of
Republic of China)
*
EUC-JP
*
EUC-KR
*
GB2312 (subset and predecessor of GB18030)
*
GB18030 (mandated standard in the
People's Republic of China)
*Giga Character Set (GCS)
*
ISO 2022-JP
*KS C 5861
*
Shift-JIS
*
TRON
*
Unicode
The CJK character sets take up the bulk of the assigned
Unicode code space. There is much controversy among Japanese experts of Chinese characters about the desirability and technical merit of the
Han unification process used to map multiple Chinese and Japanese character sets into a single set of unified characters.
All three languages can be written both
left-to-right and top-to-bottom (right-to-left and top-to-bottom in ancient documents), but are usually considered left-to-right scripts when discussing encoding issues.
Legal status
Libraries cooperated on encoding standards for
JACKPHY characters in the early 1980s. According to
Ken Lunde, the abbreviation "CJK" was a registered
trademark of
Research Libraries Group (which merged with
OCLC in 2006). The trademark owned by OCLC between 1987 and 2009 has now expired.
Justia listing
/ref>
See also
*Chinese character description languages
*Chinese character encoding
*Chinese input methods for computers
*CJK Compatibility Ideographs
*CJK strokes
*CJK Unified Ideographs
*Complex Text Layout languages (CTL)
*Input method editor
*Japanese language and computers
*Korean language and computers
*List of CJK fonts
*Sinoxenic
*Variable-width encoding
References
*DeFrancis, John. ''The Chinese Language: Fact and Fantasy''. Honolulu: University of Hawaii Press, 1990. .
*Hannas, William C. ''Asia's Orthographic Dilemma''. Honolulu: University of Hawaii Press, 1997. (paperback); (hardcover).
*Lemberg, Werner: The CJK package for LATEX2ε—Multilingual support beyond babel. TUGboat, Volume 18 (1997), No. 3—Proceedings of the 1997 Annual Meeting.
*Leban, Carl.
Automated Orthographic Systems for East Asian Languages (Chinese, Japanese, Korean)
', State-of-the-art Report, Prepared for the Board of Directors, Association for Asian Studies. 1971.
*Lunde, Ken. ''CJKV Information Processing''. Sebastopol, Calif.: O'Reilly & Associates, 1998. {{ISBN|1-56592-224-7.
External links
CJKV: A Brief Introduction
Lemberg CJK article from above, TUGboat18-3
On “CJK Unified Ideograph”
from Wenlin.com
Category:Encodings of Asian languages
Category:Languages of East Asia
Category:Natural language and computing
Category:Chinese-language computing
Category:Japanese-language computing
Category:Korean-language computing