In internationalization, CJK characters is a collective term for the Chinese, Japanese, and Korean languages, all of which include Chinese characters and derivatives in their writing systems, sometimes paired with other scripts. Collectively, the CJK characters often include ''Hànzì'' in Chinese, '' Kanji'' and '' Kana'' in Japanese, '' Hanja'' and ''

Hangul The Korean alphabet, known as Hangul, . Hangul may also be written as following South Korea's standard Romanization. ( ) in South Korea and Chosŏn'gŭl in North Korea, is the modern official writing system for the Korean language. The ...

'' in Korean. Vietnamese can be included, making the abbreviation CJKV, as Vietnamese historically used Chinese characters in which they were known as ''

Chữ Hán Chữ Hán (𡨸漢, literally "Chinese characters", ), Chữ Nho (𡨸儒, literally "Confucian characters", ) or Hán tự (漢字, ), is the Vietnamese term for Chinese characters, used to write Văn ngôn (which is a form of Classical Chines ...

'' and ''

Chữ Nôm Chữ Nôm (, ; ) is a logographic writing system formerly used to write the Vietnamese language. It uses Chinese characters (''Chữ Hán'') to represent Sino-Vietnamese vocabulary and some native Vietnamese words, with other words represen ...

'' in Vietnamese ('' Hán-Nôm'' altogether).

Character repertoire

Standard Mandarin Chinese and Standard Cantonese are written almost exclusively in Chinese characters. Over 3,000 characters are required for general

literacy Literacy in its broadest sense describes "particular ways of thinking about and doing reading and writing" with the purpose of understanding or expressing thoughts or ideas in written form in some specific context of use. In other words, hum ...

, with up to 40,000 characters for reasonably complete coverage. Japanese uses fewer characters—general literacy in Japanese can be expected with 2,136 characters. The use of Chinese characters in Korea is increasingly rare, although idiosyncratic use of Chinese characters in proper names requires knowledge (and therefore availability) of many more characters. Even today, however, South Korean students are taught 1,800 characters. Other scripts used for these languages, such as bopomofo and the

Latin Latin (, or , ) is a classical language belonging to the Italic branch of the Indo-European languages. Latin was originally a dialect spoken in the lower Tiber area (then known as Latium) around present-day Rome, but through the power ...

-based

pinyin Hanyu Pinyin (), often shortened to just pinyin, is the official romanization system for Standard Mandarin Chinese in China, and to some extent, in Singapore and Malaysia. It is often used to teach Mandarin, normally written in Chinese fo ...

for Chinese,

hiragana is a Japanese language, Japanese syllabary, part of the Japanese writing system, along with ''katakana'' as well as ''kanji''. It is a phonetic lettering system. The word ''hiragana'' literally means "flowing" or "simple" kana ("simple" ori ...

and

katakana is a Japanese syllabary, one component of the Japanese writing system along with hiragana, kanji and in some cases the Latin script (known as rōmaji). The word ''katakana'' means "fragmentary kana", as the katakana characters are derived f ...

for Japanese, and

hangul The Korean alphabet, known as Hangul, . Hangul may also be written as following South Korea's standard Romanization. ( ) in South Korea and Chosŏn'gŭl in North Korea, is the modern official writing system for the Korean language. The ...

for Korean, are not strictly "CJK characters", although CJK character sets almost invariably include them as necessary for full coverage of the target languages. The

sinologist Sinology, or Chinese studies, is an academic discipline that focuses on the study of China primarily through Chinese philosophy, language, literature, culture and history and often refers to Western scholarship. Its origin "may be traced to the e ...

Carl Leban (1971) produced an early survey of CJK encoding systems. Until the early 20th century,

Classical Chinese Classical Chinese, also known as Literary Chinese (古文 ''gǔwén'' "ancient text", or 文言 ''wényán'' "text speak", meaning "literary language/speech"; modern vernacular: 文言文 ''wényánwén'' "text speak text", meaning "literar ...

was the written language of government and scholarship in Vietnam. Popular literature in Vietnamese was written in the ''

chữ Nôm Chữ Nôm (, ; ) is a logographic writing system formerly used to write the Vietnamese language. It uses Chinese characters (''Chữ Hán'') to represent Sino-Vietnamese vocabulary and some native Vietnamese words, with other words represen ...

'' script, consisting of Chinese characters with many characters created locally. From 1920s onwards, the script since then used for recording literature has been the Latin chữ Quốc ngữ.

Encoding

The number of characters required for complete coverage of all these languages' needs cannot fit in the 256-character code space of 8-bit

character encoding Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers. The numerical values tha ...

s, requiring at least a 16-bit fixed width encoding or multi-byte variable-length encodings. The 16-bit fixed width encodings, such as those from

Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, ...

up to and including version 2.0, are now deprecated due to the requirement to encode more characters than a 16-bit encoding can accommodate—Unicode 5.0 has some 70,000 Han characters—and the requirement by the Chinese government that software in China support the

GB 18030 GB 18030 is a Chinese government standard, described as ''Information Technology — Chinese coded character set'' and defines the required language and character support necessary for software in China. GB18030 is the registered Internet n ...

character set. Although CJK encodings have common character sets, the encodings often used to represent them have been developed separately by different East Asian governments and software companies, and are mutually incompatible.

has attempted, with some controversy, to unify the character sets in a process known as

Han unification Han unification is an effort by the authors of Unicode and the Universal Character Set to map multiple character sets of the Han characters of the so-called CJK languages into a single set of unified characters. Han characters are a featu ...

. CJK character encodings should consist minimally of Han characters plus language-specific phonetic scripts such as

, bopomofo, hiragana, katakana and hangul. CJK character encodings include: *

Big5 Big-5 or Big5 is a Chinese character encoding method used in Taiwan, Hong Kong, and Macau for traditional Chinese characters. The People's Republic of China (PRC), which uses simplified Chinese characters, uses the GB 18030 character s ...

(the most prevalent encoding before Unicode was implemented) * CCCII *

CNS 11643 The CNS 11643 character set (Chinese National Standard 11643), also officially known as the Chinese Standard Interchange Code or CSIC ( zh, tr=, t=中文標準交換碼), is officially the standard character set of Taiwan (Republic of China). In ...

(official standard of

Republic of China Taiwan, officially the Republic of China (ROC), is a country in East Asia, at the junction of the East and South China Seas in the northwestern Pacific Ocean, with the People's Republic of China (PRC) to the northwest, Japan to the northea ...

) * EUC-JP *

EUC-KR Extended Unix Code (EUC) is a multibyte character encoding system used primarily for Japanese, Korean, and simplified Chinese. The most commonly used EUC codes are variable-length encodings with a character belonging to an compliant coded cha ...

GB 2312 is a key official character set of the People's Republic of China, used for Simplified Chinese characters. GB2312 is the registered internet name for EUC-CN, which is its usual encoded form. ''GB'' refers to the Guobiao standards (国家标� ...

(subset and predecessor of GB 18030) *

(mandated standard in the

People's Republic of China China, officially the People's Republic of China (PRC), is a country in East Asia. It is the world's List of countries and dependencies by population, most populous country, with a Population of China, population exceeding 1.4 billion, sli ...

) * Giga Character Set (GCS) * ISO 2022-JP * KS C 5861 *

Shift-JIS Shift JIS (Shift Japanese Industrial Standards, also SJIS, MIME name Shift_JIS, known as PCK in Solaris contexts) is a character encoding for the Japanese language, originally developed by a Japanese company called ASCII Corporation in conjunct ...

* TRON *

The CJK character sets take up the bulk of the assigned

code space. There is much controversy among Japanese experts of Chinese characters about the desirability and technical merit of the

process used to map multiple Chinese and Japanese character sets into a single set of unified characters. All three languages can be written both left-to-right and top-to-bottom (right-to-left and top-to-bottom in ancient documents), but are usually considered left-to-right scripts when discussing encoding issues.

Legal status

Libraries cooperated on encoding standards for JACKPHY characters in the early 1980s. According to Ken Lunde, the abbreviation "CJK" was a registered

trademark A trademark (also written trade mark or trade-mark) is a type of intellectual property consisting of a recognizable sign, design, or expression that identifies products or services from a particular source and distinguishes them from oth ...

of Research Libraries GroupKen Lunde, 1996
/ref> (which merged with

OCLC OCLC, Inc., doing business as OCLC, See also: is an American nonprofit cooperative organization "that provides shared technology services, original research, and community programs for its membership and the library community at large". It wa ...

in 2006). The trademark owned by OCLC between 1987 and 2009 has now expired.Justia listing
/ref>