In
internationalization
In economics, internationalization or internationalisation is the process of increasing involvement of enterprises in international markets, although there is no agreed definition of internationalization. Internationalization is a crucial strateg ...
, CJK characters is a collective term for the
Chinese
Chinese can refer to:
* Something related to China
* Chinese people, people of Chinese nationality, citizenship, and/or ethnicity
**''Zhonghua minzu'', the supra-ethnic concept of the Chinese nation
** List of ethnic groups in China, people of ...
,
Japanese
Japanese may refer to:
* Something from or related to Japan, an island country in East Asia
* Japanese language, spoken mainly in Japan
* Japanese people, the ethnic group that identifies with Japan through ancestry or culture
** Japanese diaspor ...
, and
Korean language
Korean ( South Korean: , ''hangugeo''; North Korean: , ''chosŏnmal'') is the native language for about 80 million people, mostly of Korean descent. It is the official and national language of both North Korea and South Korea (geographic ...
s, all of which include
Chinese characters
Chinese characters () are logograms developed for the writing of Chinese. In addition, they have been adapted to write other East Asian languages, and remain a key component of the Japanese writing system where they are known as ''kanji' ...
and derivatives in their writing systems, sometimes paired with other scripts. Collectively, the CJK characters often include ''Hànzì'' in Chinese, ''
Kanji
are the logographic Chinese characters taken from the Chinese family of scripts, Chinese script and used in the writing of Japanese language, Japanese. They were made a major part of the Japanese writing system during the time of Old Japanese ...
'' and ''
Kana
The term may refer to a number of syllabaries used to write Japanese phonological units, morae. Such syllabaries include (1) the original kana, or , which were Chinese characters (kanji) used phonetically to transcribe Japanese, the most pr ...
'' in Japanese, ''
Hanja
Hanja (Hangul: ; Hanja: , ), alternatively known as Hancha, are Chinese characters () used in the writing of Korean. Hanja was used as early as the Gojoseon period, the first ever Korean kingdom.
(, ) refers to Sino-Korean vocabulary, wh ...
'' and ''
Hangul
The Korean alphabet, known as Hangul, . Hangul may also be written as following South Korea's standard Romanization. ( ) in South Korea and Chosŏn'gŭl in North Korea, is the modern official writing system for the Korean language. The let ...
'' in Korean.
Vietnamese
Vietnamese may refer to:
* Something of, from, or related to Vietnam, a country in Southeast Asia
** A citizen of Vietnam. See Demographics of Vietnam.
* Vietnamese people, or Kinh people, a Southeast Asian ethnic group native to Vietnam
** Overse ...
can be included, making the abbreviation CJKV, as Vietnamese historically used Chinese characters in which they were known as ''
Chữ Hán
Chữ Hán (𡨸漢, literally "Chinese characters", ), Chữ Nho (𡨸儒, literally "Confucian characters", ) or Hán tự (漢字, ), is the Vietnamese term for Chinese characters, used to write Văn ngôn (which is a form of Classical Chinese ...
'' and ''
Chữ Nôm
Chữ Nôm (, ; ) is a logographic writing system formerly used to write the Vietnamese language. It uses Chinese characters (''Chữ Hán'') to represent Sino-Vietnamese vocabulary and some native Vietnamese words, with other words represented ...
'' in Vietnamese (''
Hán-Nôm'' altogether).
Character repertoire
Standard Mandarin Chinese and Standard Cantonese are written almost exclusively in Chinese characters. Over 3,000 characters are required for general
literacy
Literacy in its broadest sense describes "particular ways of thinking about and doing reading and writing" with the purpose of understanding or expressing thoughts or ideas in written form in some specific context of use. In other words, huma ...
, with up to 40,000 characters for reasonably complete coverage. Japanese uses fewer characters—general literacy in Japanese can be expected with 2,136 characters. The use of Chinese characters in Korea is increasingly rare, although idiosyncratic use of Chinese characters in proper names requires knowledge (and therefore availability) of many more characters. Even today, however,
South Korean students are taught 1,800 characters.
Other scripts used for these languages, such as
bopomofo
Bopomofo (), or Mandarin Phonetic Symbols, also named Zhuyin (), is a Chinese transliteration system for Mandarin Chinese and other related languages and dialects. More commonly used in Taiwanese Mandarin, it may also be used to transcribe ...
and the
Latin
Latin (, or , ) is a classical language belonging to the Italic branch of the Indo-European languages. Latin was originally a dialect spoken in the lower Tiber area (then known as Latium) around present-day Rome, but through the power of the ...
-based
pinyin
Hanyu Pinyin (), often shortened to just pinyin, is the official romanization system for Standard Mandarin Chinese in China, and to some extent, in Singapore and Malaysia. It is often used to teach Mandarin, normally written in Chinese for ...
for Chinese,
hiragana
is a Japanese syllabary, part of the Japanese writing system, along with ''katakana'' as well as ''kanji''.
It is a phonetic lettering system. The word ''hiragana'' literally means "flowing" or "simple" kana ("simple" originally as contrast ...
and
katakana
is a Japanese syllabary, one component of the Japanese writing system along with hiragana, kanji and in some cases the Latin script (known as rōmaji). The word ''katakana'' means "fragmentary kana", as the katakana characters are derived fr ...
for Japanese, and
hangul
The Korean alphabet, known as Hangul, . Hangul may also be written as following South Korea's standard Romanization. ( ) in South Korea and Chosŏn'gŭl in North Korea, is the modern official writing system for the Korean language. The let ...
for Korean, are not strictly "CJK characters", although CJK character sets almost invariably include them as necessary for full coverage of the target languages.
The
sinologist
Sinology, or Chinese studies, is an academic discipline that focuses on the study of China primarily through Chinese philosophy, language, literature, culture and history and often refers to Western scholarship. Its origin "may be traced to the ex ...
Carl Leban (1971) produced an early survey of CJK encoding systems.
Until the early 20th century,
Classical Chinese
Classical Chinese, also known as Literary Chinese (古文 ''gǔwén'' "ancient text", or 文言 ''wényán'' "text speak", meaning
"literary language/speech"; modern vernacular: 文言文 ''wényánwén'' "text speak text", meaning
"literar ...
was the written language of government and scholarship in Vietnam. Popular literature in
Vietnamese
Vietnamese may refer to:
* Something of, from, or related to Vietnam, a country in Southeast Asia
** A citizen of Vietnam. See Demographics of Vietnam.
* Vietnamese people, or Kinh people, a Southeast Asian ethnic group native to Vietnam
** Overse ...
was written in the ''
chữ Nôm
Chữ Nôm (, ; ) is a logographic writing system formerly used to write the Vietnamese language. It uses Chinese characters (''Chữ Hán'') to represent Sino-Vietnamese vocabulary and some native Vietnamese words, with other words represented ...
'' script, consisting of Chinese characters with many characters created locally. From 1920s onwards, the script since then used for recording literature has been the Latin
chữ Quốc ngữ
The Vietnamese alphabet ( vi, chữ Quốc ngữ, lit=script of the National language) is the modern Latin writing script or writing system for Vietnamese. It uses the Latin script based on Romance languages originally developed by Portuguese m ...
.
Encoding
The number of characters required for complete coverage of all these languages' needs cannot fit in the 256-character code space of 8-bit
character encoding
Character encoding is the process of assigning numbers to Graphics, graphical character (computing), characters, especially the written characters of Language, human language, allowing them to be Data storage, stored, Data communication, transmi ...
s, requiring at least a 16-bit fixed width encoding or multi-byte variable-length encodings. The 16-bit fixed width encodings, such as those from
Unicode
Unicode, formally The Unicode Standard,The formal version reference is is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expre ...
up to and including version 2.0, are now deprecated due to the requirement to encode more characters than a 16-bit encoding can accommodate—Unicode 5.0 has some 70,000 Han characters—and the requirement by the Chinese government that software in China support the
GB 18030
GB 18030 is a Chinese government standard, described as ''Information Technology — Chinese coded character set'' and defines the required language and character support necessary for software in China. GB18030 is the registered Internet ...
character set.
Although CJK encodings have common character sets, the encodings often used to represent them have been developed separately by different East Asian governments and software companies, and are mutually incompatible.
Unicode
Unicode, formally The Unicode Standard,The formal version reference is is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expre ...
has attempted, with some controversy, to unify the character sets in a process known as
Han unification
Han unification is an effort by the authors of Unicode and the Universal Character Set to map multiple character sets of the Han characters of the so-called CJK languages into a single set of unified characters. Han characters are a feature s ...
.
CJK character encodings should consist minimally of Han characters plus language-specific phonetic scripts such as
pinyin
Hanyu Pinyin (), often shortened to just pinyin, is the official romanization system for Standard Mandarin Chinese in China, and to some extent, in Singapore and Malaysia. It is often used to teach Mandarin, normally written in Chinese for ...
,
bopomofo
Bopomofo (), or Mandarin Phonetic Symbols, also named Zhuyin (), is a Chinese transliteration system for Mandarin Chinese and other related languages and dialects. More commonly used in Taiwanese Mandarin, it may also be used to transcribe ...
, hiragana, katakana and hangul.
CJK character encodings include:
*
Big5
Big-5 or Big5 is a Chinese character encoding method used in Taiwan, Hong Kong, and Macau for traditional Chinese characters.
The People's Republic of China (PRC), which uses simplified Chinese characters, uses the GB 18030 character set inst ...
(the most prevalent encoding before Unicode was implemented)
*
CCCII
The Chinese Character Code for Information Interchange () or CCCII is a character set developed by the Chinese Character Analysis Group in Taiwan. It was first published in 1980, and significantly expanded in 1982 and 1987.
It is used mostly by ...
*
CNS 11643
The CNS 11643 character set (Chinese National Standard 11643), also officially known as the Chinese Standard Interchange Code or CSIC ( zh, tr=, t=中文標準交換碼), is officially the standard character set of Taiwan (Republic of China). In p ...
(official standard of
Republic of China
Taiwan, officially the Republic of China (ROC), is a country in East Asia, at the junction of the East and South China Seas in the northwestern Pacific Ocean, with the People's Republic of China (PRC) to the northwest, Japan to the northeast ...
)
*
EUC-JP
Extended Unix Code (EUC) is a multibyte character encoding system used primarily for Japanese, Korean, and simplified Chinese.
The most commonly used EUC codes are variable-length encodings with a character belonging to an compliant coded charac ...
*
EUC-KR
Extended Unix Code (EUC) is a multibyte character encoding
Character encoding is the process of assigning numbers to Graphics, graphical character (computing), characters, especially the written characters of Language, human language, allowing ...
*
GB 2312
is a key official character set of the People's Republic of China, used for Simplified Chinese characters. GB2312 is the registered internet name for EUC-CN, which is its usual encoded form. ''GB'' refers to the Guobiao standards (国家标准 ...
(subset and predecessor of GB 18030)
*
GB 18030
GB 18030 is a Chinese government standard, described as ''Information Technology — Chinese coded character set'' and defines the required language and character support necessary for software in China. GB18030 is the registered Internet ...
(mandated standard in the
People's Republic of China
China, officially the People's Republic of China (PRC), is a country in East Asia. It is the world's most populous country, with a population exceeding 1.4 billion, slightly ahead of India. China spans the equivalent of five time zones and ...
)
* Giga Character Set (GCS)
*
ISO 2022-JP
ISO/IEC 2022 ''Information technology—Character code structure and extension techniques'', is an ISO/IEC standard (equivalent to the ECMA standard ECMA-35, the ANSI standard ANSI X3.41 and the Japanese Industrial Standard JIS X 0202) in the ...
* KS C 5861
*
Shift-JIS
*
TRON
''Tron'' (stylized as ''TRON'') is a 1982 American science fiction action-adventure film written and directed by Steven Lisberger from a story by Lisberger and Bonnie MacBird. The film stars Jeff Bridges as Kevin Flynn, a computer programmer a ...
*
Unicode
Unicode, formally The Unicode Standard,The formal version reference is is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expre ...
The CJK character sets take up the bulk of the assigned
Unicode
Unicode, formally The Unicode Standard,The formal version reference is is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expre ...
code space. There is much controversy among Japanese experts of Chinese characters about the desirability and technical merit of the
Han unification
Han unification is an effort by the authors of Unicode and the Universal Character Set to map multiple character sets of the Han characters of the so-called CJK languages into a single set of unified characters. Han characters are a feature s ...
process used to map multiple Chinese and Japanese character sets into a single set of unified characters.
All three languages can be written both
left-to-right and top-to-bottom (right-to-left and top-to-bottom in ancient documents), but are usually considered left-to-right scripts when discussing encoding issues.
Legal status
Libraries cooperated on encoding standards for
JACKPHY In library automation the initialism JACKPHY refers to a group of language scripts not based on Roman characters, specifically: Japanese, Arabic, Chinese, Korean, Persian, Hebrew, and Yiddish. Focus on these seven writing systems by Library of C ...
characters in the early 1980s. According to
Ken Lunde
Ken Roger Lunde (, born 12 August 1965 in Madison, Wisconsin)Lunde, 2008. is an American specialist in information processing for East Asian languages.
Academic Background
Ken majored in linguistics at University of Wisconsin–Madison in 1985, w ...
, the abbreviation "CJK" was a registered
trademark
A trademark (also written trade mark or trade-mark) is a type of intellectual property consisting of a recognizable sign, design, or expression that identifies products or services from a particular source and distinguishes them from others ...
of
Research Libraries Group The Research Libraries Group (RLG) was a U.S.-based library consortium that existed from 1974 until its merger with the OCLC library consortium in 2006. RLG developed the Eureka interlibrary search engine, the RedLightGreen database of bibliographi ...
[Ken Lunde, 1996](_blank)
/ref> (which merged with OCLC
OCLC, Inc., doing business as OCLC, See also: is an American nonprofit cooperative organization "that provides shared technology services, original research, and community programs for its membership and the library community at large". It was ...
in 2006). The trademark owned by OCLC between 1987 and 2009 has now expired.Justia listing
/ref>
See also
* Chinese character description languages
The Chinese character description languages are several proposed languages to most accurately and completely describe Chinese (or CJK) characters and information such as their list of components, list of strokes (basic and complex), their order, a ...
* Chinese character encoding
In computing, Chinese character encodings can be used to represent text written in the CJK languages—Chinese, Japanese, Korean—and (rarely) obsolete Vietnamese, all of which use Chinese characters. Several general-purpose character enc ...
* Chinese input methods for computers
Chinese input methods are methods that allow a computer user to input Chinese characters. Most, if not all, Chinese input methods fall into one of two categories: phonetic readings or root shapes. Methods under the phonetic category usually are e ...
* CJK Compatibility Ideographs
CJK Compatibility Ideographs is a Unicode block created to contain Han characters that were encoded in multiple locations in other established character encodings, in addition to their CJK Unified Ideographs assignments, in order to retain roun ...
* CJK strokes
CJK strokes () are the calligraphic strokes needed to write the Chinese characters in regular script used in East Asian calligraphy. CJK strokes are the classified set of line patterns that may be arranged and combined to form Chinese charact ...
* CJK Unified Ideographs
The Chinese, Japanese and Korean (CJK) scripts share a common background, collectively known as CJK characters. In the process called Han unification, the common (shared) characters were identified and named CJK Unified Ideographs. As of Unicode ...
* Complex Text Layout languages
Complex commonly refers to:
* Complexity, the behaviour of a system whose components interact in multiple ways so possible interactions are difficult to describe
** Complex system, a system composed of many components which may interact with each ...
(CTL)
* Input method editor
An input method (or input method editor, commonly abbreviated IME) is an operating system component or program that enables users to generate characters not natively available on their input devices by using sequences of characters (or mouse o ...
* Japanese language and computers
In relation to the Japanese language and computers many adaptation issues arise, some unique to Japanese and others common to languages which have a very large number of characters. The number of characters needed in order to write in English is ...
* Korean language and computers
The writing system of the Korean language is a syllabic alphabet of character parts () organized into character blocks () representing syllables. The character parts cannot be written from left to right on the computer, as in many Western lan ...
* List of CJK fonts
This is a list of notable CJK fonts (computer fonts which contain a large range of Chinese/Japanese/Korean characters). These fonts are primarily sorted by their typeface, the main classes being "with serif", "without serif" and "script". In thi ...
* Sinoxenic
Sino-Xenic or Sinoxenic pronunciations are regular systems for reading Chinese characters in Japan, Korea and Vietnam, originating in medieval times and the source of large-scale borrowings of Chinese words into the Japanese, Korean and Vietname ...
* Variable-width encoding
A variable-width encoding is a type of character encoding scheme in which codes of differing lengths are used to encode a character set (a repertoire of symbols) for representation, usually in a computer. Most common variable-width encodings are ...
References
Works cited
*
*
Sources
* DeFrancis, John
John DeFrancis (August 31, 1911January 2, 2009) was an American linguist, sinologist, author of Chinese language textbooks, lexicographer of Chinese dictionaries, and Professor Emeritus of Chinese Studies at the University of Hawaii at Mānoa.
B ...
. '' The Chinese Language: Fact and Fantasy''. Honolulu: University of Hawaii Press, 1990. .
* Hannas, William C. ''Asia's Orthographic Dilemma''. Honolulu: University of Hawaii Press, 1997. (paperback); (hardcover).
* Lemberg, Werner: The CJK package for LATEX2ε—Multilingual support beyond babel. TUGboat, Volume 18 (1997), No. 3—Proceedings of the 1997 Annual Meeting.
* Leban, Carl.
Automated Orthographic Systems for East Asian Languages (Chinese, Japanese, Korean)
', State-of-the-art Report, Prepared for the Board of Directors, Association for Asian Studies. 1971.
* Lunde, Ken. ''CJKV Information Processing''. Sebastopol, Calif.: O'Reilly & Associates, 1998. .
External links
CJKV: A Brief Introduction
Lemberg CJK article from above, TUGboat18-3
On "CJK Unified Ideograph"
from Wenlin.com
{{CJK ideographs in Unicode
Encodings of Asian languages
Languages of East Asia
Natural language and computing
Chinese-language computing
Japanese-language computing
Korean-language computing