In
internationalization, CJK characters is a collective term for the
Chinese,
Japanese, and
Korean languages, all of which include
Chinese characters and derivatives in their writing systems, sometimes paired with other scripts. Collectively, the CJK characters often include ''Hànzì'' in Chinese, ''
Kanji'' and ''
Kana'' in Japanese, ''
Hanja'' and ''
Hangul
The Korean alphabet, known as Hangul, . Hangul may also be written as following South Korea's standard Romanization. ( ) in South Korea and Chosŏn'gŭl in North Korea, is the modern official writing system for the Korean language. The ...
'' in Korean.
Vietnamese can be included, making the abbreviation CJKV, as Vietnamese historically used Chinese characters in which they were known as ''
Chữ Hán
Chữ Hán (𡨸漢, literally "Chinese characters", ), Chữ Nho (𡨸儒, literally "Confucian characters", ) or Hán tự (漢字, ), is the Vietnamese term for Chinese characters, used to write Văn ngôn (which is a form of Classical Chines ...
'' and ''
Chữ Nôm
Chữ Nôm (, ; ) is a logographic writing system formerly used to write the Vietnamese language. It uses Chinese characters (''Chữ Hán'') to represent Sino-Vietnamese vocabulary and some native Vietnamese words, with other words represen ...
'' in Vietnamese (''
Hán-Nôm'' altogether).
Character repertoire
Standard Mandarin Chinese and Standard Cantonese are written almost exclusively in Chinese characters. Over 3,000 characters are required for general
literacy
Literacy in its broadest sense describes "particular ways of thinking about and doing reading and writing" with the purpose of understanding or expressing thoughts or ideas in written form in some specific context of use. In other words, hum ...
, with up to 40,000 characters for reasonably complete coverage. Japanese uses fewer characters—general literacy in Japanese can be expected with 2,136 characters. The use of Chinese characters in Korea is increasingly rare, although idiosyncratic use of Chinese characters in proper names requires knowledge (and therefore availability) of many more characters. Even today, however,
South Korean students are taught 1,800 characters.
Other scripts used for these languages, such as
bopomofo and the
Latin
Latin (, or , ) is a classical language belonging to the Italic branch of the Indo-European languages. Latin was originally a dialect spoken in the lower Tiber area (then known as Latium) around present-day Rome, but through the power ...
-based
pinyin
Hanyu Pinyin (), often shortened to just pinyin, is the official romanization system for Standard Mandarin Chinese in China, and to some extent, in Singapore and Malaysia. It is often used to teach Mandarin, normally written in Chinese fo ...
for Chinese,
hiragana
is a Japanese language, Japanese syllabary, part of the Japanese writing system, along with ''katakana'' as well as ''kanji''.
It is a phonetic lettering system. The word ''hiragana'' literally means "flowing" or "simple" kana ("simple" ori ...
and
katakana
is a Japanese syllabary, one component of the Japanese writing system along with hiragana, kanji and in some cases the Latin script (known as rōmaji). The word ''katakana'' means "fragmentary kana", as the katakana characters are derived f ...
for Japanese, and
hangul
The Korean alphabet, known as Hangul, . Hangul may also be written as following South Korea's standard Romanization. ( ) in South Korea and Chosŏn'gŭl in North Korea, is the modern official writing system for the Korean language. The ...
for Korean, are not strictly "CJK characters", although CJK character sets almost invariably include them as necessary for full coverage of the target languages.
The
sinologist
Sinology, or Chinese studies, is an academic discipline that focuses on the study of China primarily through Chinese philosophy, language, literature, culture and history and often refers to Western scholarship. Its origin "may be traced to the e ...
Carl Leban (1971) produced an early survey of CJK encoding systems.
Until the early 20th century,
Classical Chinese
Classical Chinese, also known as Literary Chinese (古文 ''gǔwén'' "ancient text", or 文言 ''wényán'' "text speak", meaning
"literary language/speech"; modern vernacular: 文言文 ''wényánwén'' "text speak text", meaning
"literar ...
was the written language of government and scholarship in Vietnam. Popular literature in
Vietnamese was written in the ''
chữ Nôm
Chữ Nôm (, ; ) is a logographic writing system formerly used to write the Vietnamese language. It uses Chinese characters (''Chữ Hán'') to represent Sino-Vietnamese vocabulary and some native Vietnamese words, with other words represen ...
'' script, consisting of Chinese characters with many characters created locally. From 1920s onwards, the script since then used for recording literature has been the Latin
chữ Quốc ngữ.
Encoding
The number of characters required for complete coverage of all these languages' needs cannot fit in the 256-character code space of 8-bit
character encoding
Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers. The numerical values tha ...
s, requiring at least a 16-bit fixed width encoding or multi-byte variable-length encodings. The 16-bit fixed width encodings, such as those from
Unicode
Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, ...
up to and including version 2.0, are now deprecated due to the requirement to encode more characters than a 16-bit encoding can accommodate—Unicode 5.0 has some 70,000 Han characters—and the requirement by the Chinese government that software in China support the
GB 18030
GB 18030 is a Chinese government standard, described as ''Information Technology — Chinese coded character set'' and defines the required language and character support necessary for software in China. GB18030 is the registered Internet n ...
character set.
Although CJK encodings have common character sets, the encodings often used to represent them have been developed separately by different East Asian governments and software companies, and are mutually incompatible.
Unicode
Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, ...
has attempted, with some controversy, to unify the character sets in a process known as
Han unification
Han unification is an effort by the authors of Unicode and the Universal Character Set to map multiple character sets of the Han characters of the so-called CJK languages into a single set of unified characters. Han characters are a featu ...
.
CJK character encodings should consist minimally of Han characters plus language-specific phonetic scripts such as
pinyin
Hanyu Pinyin (), often shortened to just pinyin, is the official romanization system for Standard Mandarin Chinese in China, and to some extent, in Singapore and Malaysia. It is often used to teach Mandarin, normally written in Chinese fo ...
,
bopomofo, hiragana, katakana and hangul.
CJK character encodings include:
*
Big5
Big-5 or Big5 is a Chinese character encoding method used in Taiwan, Hong Kong, and Macau for traditional Chinese characters.
The People's Republic of China (PRC), which uses simplified Chinese characters, uses the GB 18030 character s ...
(the most prevalent encoding before Unicode was implemented)
*
CCCII
*
CNS 11643
The CNS 11643 character set (Chinese National Standard 11643), also officially known as the Chinese Standard Interchange Code or CSIC ( zh, tr=, t=中文標準交換碼), is officially the standard character set of Taiwan (Republic of China). In ...
(official standard of
Republic of China
Taiwan, officially the Republic of China (ROC), is a country in East Asia, at the junction of the East and South China Seas in the northwestern Pacific Ocean, with the People's Republic of China (PRC) to the northwest, Japan to the northea ...
)
*
EUC-JP
*
EUC-KR
Extended Unix Code (EUC) is a multibyte character encoding system used primarily for Japanese, Korean, and simplified Chinese.
The most commonly used EUC codes are variable-length encodings with a character belonging to an compliant coded cha ...
*
GB 2312
is a key official character set of the People's Republic of China, used for Simplified Chinese characters. GB2312 is the registered internet name for EUC-CN, which is its usual encoded form. ''GB'' refers to the Guobiao standards (国家标� ...
(subset and predecessor of GB 18030)
*
GB 18030
GB 18030 is a Chinese government standard, described as ''Information Technology — Chinese coded character set'' and defines the required language and character support necessary for software in China. GB18030 is the registered Internet n ...
(mandated standard in the
People's Republic of China
China, officially the People's Republic of China (PRC), is a country in East Asia. It is the world's List of countries and dependencies by population, most populous country, with a Population of China, population exceeding 1.4 billion, sli ...
)
* Giga Character Set (GCS)
*
ISO 2022-JP
* KS C 5861
*
Shift-JIS
Shift JIS (Shift Japanese Industrial Standards, also SJIS, MIME name Shift_JIS, known as PCK in Solaris contexts) is a character encoding for the Japanese language, originally developed by a Japanese company called ASCII Corporation in conjunct ...
*
TRON
*
Unicode
Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, ...
The CJK character sets take up the bulk of the assigned
Unicode
Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, ...
code space. There is much controversy among Japanese experts of Chinese characters about the desirability and technical merit of the
Han unification
Han unification is an effort by the authors of Unicode and the Universal Character Set to map multiple character sets of the Han characters of the so-called CJK languages into a single set of unified characters. Han characters are a featu ...
process used to map multiple Chinese and Japanese character sets into a single set of unified characters.
All three languages can be written both
left-to-right and top-to-bottom (right-to-left and top-to-bottom in ancient documents), but are usually considered left-to-right scripts when discussing encoding issues.
Legal status
Libraries cooperated on encoding standards for
JACKPHY characters in the early 1980s. According to
Ken Lunde, the abbreviation "CJK" was a registered
trademark
A trademark (also written trade mark or trade-mark) is a type of intellectual property consisting of a recognizable sign, design, or expression that identifies products or services from a particular source and distinguishes them from oth ...
of
Research Libraries Group[Ken Lunde, 1996](_blank)
/ref> (which merged with OCLC
OCLC, Inc., doing business as OCLC, See also: is an American nonprofit cooperative organization "that provides shared technology services, original research, and community programs for its membership and the library community at large". It wa ...
in 2006). The trademark owned by OCLC between 1987 and 2009 has now expired.Justia listing
/ref>
See also
* Chinese character description languages
* Chinese character encoding
In computing, Chinese character encodings can be used to represent text written in the CJK languages—Chinese, Japanese, Korean—and (rarely) obsolete Vietnam