HOME

TheInfoList



OR:

In
internationalization In economics, internationalization or internationalisation is the process of increasing involvement of enterprises in international markets, although there is no agreed definition of internationalization. Internationalization is a crucial strateg ...
, CJK characters is a collective term for the Chinese,
Japanese Japanese may refer to: * Something from or related to Japan, an island country in East Asia * Japanese language, spoken mainly in Japan * Japanese people, the ethnic group that identifies with Japan through ancestry or culture ** Japanese diaspor ...
, and
Korean language Korean (South Korean: , ''hangugeo''; North Korean: , ''chosŏnmal'') is the native language for about 80 million people, mostly of Korean descent. It is the official and national language of both North Korea and South Korea (geographica ...
s, all of which include
Chinese characters Chinese characters () are logograms developed for the writing of Chinese. In addition, they have been adapted to write other East Asian languages, and remain a key component of the Japanese writing system where they are known as ''kanji ...
and derivatives in their writing systems, sometimes paired with other scripts. Collectively, the CJK characters often include ''Hànzì'' in Chinese, ''
Kanji are the logographic Chinese characters taken from the Chinese script and used in the writing of Japanese. They were made a major part of the Japanese writing system during the time of Old Japanese and are still used, along with the subsequ ...
'' and ''
Kana The term may refer to a number of syllabaries used to write Japanese phonological units, morae. Such syllabaries include (1) the original kana, or , which were Chinese characters ( kanji) used phonetically to transcribe Japanese, the most ...
'' in Japanese, ''
Hanja Hanja (Hangul: ; Hanja: , ), alternatively known as Hancha, are Chinese characters () used in the writing of Korean. Hanja was used as early as the Gojoseon period, the first ever Korean kingdom. (, ) refers to Sino-Korean vocabulary, ...
'' and ''
Hangul The Korean alphabet, known as Hangul, . Hangul may also be written as following South Korea's standard Romanization. ( ) in South Korea and Chosŏn'gŭl in North Korea, is the modern official writing system for the Korean language. The l ...
'' in Korean. Vietnamese can be included, making the abbreviation CJKV, as Vietnamese historically used Chinese characters in which they were known as ''
Chữ Hán Chữ Hán (𡨸漢, literally "Chinese characters", ), Chữ Nho (𡨸儒, literally "Confucian characters", ) or Hán tự (漢字, ), is the Vietnamese term for Chinese characters, used to write Văn ngôn (which is a form of Classical Chinese ...
'' and ''
Chữ Nôm Chữ Nôm (, ; ) is a logographic writing system formerly used to write the Vietnamese language. It uses Chinese characters ('' Chữ Hán'') to represent Sino-Vietnamese vocabulary and some native Vietnamese words, with other words represent ...
'' in Vietnamese ('' Hán-Nôm'' altogether).


Character repertoire

Standard Mandarin Chinese and Standard Cantonese are written almost exclusively in Chinese characters. Over 3,000 characters are required for general
literacy Literacy in its broadest sense describes "particular ways of thinking about and doing reading and writing" with the purpose of understanding or expressing thoughts or ideas in Writing, written form in some specific context of use. In other wo ...
, with up to 40,000 characters for reasonably complete coverage. Japanese uses fewer characters—general literacy in Japanese can be expected with 2,136 characters. The use of Chinese characters in Korea is increasingly rare, although idiosyncratic use of Chinese characters in proper names requires knowledge (and therefore availability) of many more characters. Even today, however, South Korean students are taught 1,800 characters. Other scripts used for these languages, such as
bopomofo Bopomofo (), or Mandarin Phonetic Symbols, also named Zhuyin (), is a Chinese transliteration system for Mandarin Chinese and other related languages and dialects. More commonly used in Taiwanese Mandarin, it may also be used to transcribe ...
and the
Latin Latin (, or , ) is a classical language belonging to the Italic languages, Italic branch of the Indo-European languages. Latin was originally a dialect spoken in the lower Tiber area (then known as Latium) around present-day Rome, but through ...
-based
pinyin Hanyu Pinyin (), often shortened to just pinyin, is the official romanization system for Standard Mandarin Chinese in China, and to some extent, in Singapore and Malaysia. It is often used to teach Mandarin, normally written in Chinese fo ...
for Chinese,
hiragana is a Japanese syllabary, part of the Japanese writing system, along with ''katakana'' as well as ''kanji''. It is a phonetic lettering system. The word ''hiragana'' literally means "flowing" or "simple" kana ("simple" originally as contras ...
and
katakana is a Japanese syllabary, one component of the Japanese writing system along with hiragana, kanji and in some cases the Latin script (known as rōmaji). The word ''katakana'' means "fragmentary kana", as the katakana characters are derived f ...
for Japanese, and
hangul The Korean alphabet, known as Hangul, . Hangul may also be written as following South Korea's standard Romanization. ( ) in South Korea and Chosŏn'gŭl in North Korea, is the modern official writing system for the Korean language. The l ...
for Korean, are not strictly "CJK characters", although CJK character sets almost invariably include them as necessary for full coverage of the target languages. The
sinologist Sinology, or Chinese studies, is an academic discipline that focuses on the study of China primarily through Chinese philosophy, language, literature, culture and history and often refers to Western scholarship. Its origin "may be traced to the ex ...
Carl Leban (1971) produced an early survey of CJK encoding systems. Until the early 20th century,
Classical Chinese Classical Chinese, also known as Literary Chinese (古文 ''gǔwén'' "ancient text", or 文言 ''wényán'' "text speak", meaning "literary language/speech"; modern vernacular: 文言文 ''wényánwén'' "text speak text", meaning "literar ...
was the written language of government and scholarship in Vietnam. Popular literature in Vietnamese was written in the ''
chữ Nôm Chữ Nôm (, ; ) is a logographic writing system formerly used to write the Vietnamese language. It uses Chinese characters ('' Chữ Hán'') to represent Sino-Vietnamese vocabulary and some native Vietnamese words, with other words represent ...
'' script, consisting of Chinese characters with many characters created locally. From 1920s onwards, the script since then used for recording literature has been the Latin
chữ Quốc ngữ The Vietnamese alphabet ( vi, chữ Quốc ngữ, lit=script of the National language) is the modern Latin writing script or writing system for Vietnamese. It uses the Latin script based on Romance languages originally developed by Portuguese m ...
.


Encoding

The number of characters required for complete coverage of all these languages' needs cannot fit in the 256-character code space of 8-bit
character encoding Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers. The numerical values tha ...
s, requiring at least a 16-bit fixed width encoding or multi-byte variable-length encodings. The 16-bit fixed width encodings, such as those from
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, ...
up to and including version 2.0, are now deprecated due to the requirement to encode more characters than a 16-bit encoding can accommodate—Unicode 5.0 has some 70,000 Han characters—and the requirement by the Chinese government that software in China support the
GB 18030 GB 18030 is a Chinese government standard, described as ''Information Technology — Chinese coded character set'' and defines the required language and character support necessary for software in China. GB18030 is the registered Internet n ...
character set. Although CJK encodings have common character sets, the encodings often used to represent them have been developed separately by different East Asian governments and software companies, and are mutually incompatible.
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, ...
has attempted, with some controversy, to unify the character sets in a process known as
Han unification Han unification is an effort by the authors of Unicode and the Universal Character Set to map multiple character sets of the Han characters of the so-called CJK languages into a single set of unified characters. Han characters are a featur ...
. CJK character encodings should consist minimally of Han characters plus language-specific phonetic scripts such as
pinyin Hanyu Pinyin (), often shortened to just pinyin, is the official romanization system for Standard Mandarin Chinese in China, and to some extent, in Singapore and Malaysia. It is often used to teach Mandarin, normally written in Chinese fo ...
,
bopomofo Bopomofo (), or Mandarin Phonetic Symbols, also named Zhuyin (), is a Chinese transliteration system for Mandarin Chinese and other related languages and dialects. More commonly used in Taiwanese Mandarin, it may also be used to transcribe ...
, hiragana, katakana and hangul. CJK character encodings include: *
Big5 Big-5 or Big5 is a Chinese character encoding method used in Taiwan, Hong Kong, and Macau for traditional Chinese characters. The People's Republic of China (PRC), which uses simplified Chinese characters, uses the GB 18030 character se ...
(the most prevalent encoding before Unicode was implemented) *
CCCII The Chinese Character Code for Information Interchange () or CCCII is a character set developed by the Chinese Character Analysis Group in Taiwan. It was first published in 1980, and significantly expanded in 1982 and 1987. It is used mostly by ...
*
CNS 11643 The CNS 11643 character set (Chinese National Standard 11643), also officially known as the Chinese Standard Interchange Code or CSIC ( zh, tr=, t=中文標準交換碼), is officially the standard character set of Taiwan (Republic of China). In p ...
(official standard of
Republic of China Taiwan, officially the Republic of China (ROC), is a country in East Asia, at the junction of the East and South China Seas in the northwestern Pacific Ocean, with the People's Republic of China (PRC) to the northwest, Japan to the northeas ...
) *
EUC-JP Extended Unix Code (EUC) is a multibyte character encoding system used primarily for Japanese, Korean, and simplified Chinese. The most commonly used EUC codes are variable-length encodings with a character belonging to an compliant coded char ...
*
EUC-KR Extended Unix Code (EUC) is a multibyte character encoding system used primarily for Japanese, Korean, and simplified Chinese. The most commonly used EUC codes are variable-length encodings with a character belonging to an compliant coded chara ...
*
GB 2312 is a key official character set of the People's Republic of China, used for Simplified Chinese characters. GB2312 is the registered internet name for EUC-CN, which is its usual encoded form. ''GB'' refers to the Guobiao standards (国家标准 ...
(subset and predecessor of GB 18030) *
GB 18030 GB 18030 is a Chinese government standard, described as ''Information Technology — Chinese coded character set'' and defines the required language and character support necessary for software in China. GB18030 is the registered Internet n ...
(mandated standard in the
People's Republic of China China, officially the People's Republic of China (PRC), is a country in East Asia. It is the world's List of countries and dependencies by population, most populous country, with a Population of China, population exceeding 1.4 billion, slig ...
) * Giga Character Set (GCS) * ISO 2022-JP * KS C 5861 *
Shift-JIS Shift JIS (Shift Japanese Industrial Standards, also SJIS, MIME name Shift_JIS, known as PCK in Oracle Solaris, Solaris contexts) is a character encoding for the Japanese language, originally developed by a Japanese company called ASCII Corporati ...
*
TRON ''Tron'' (stylized as ''TRON'') is a 1982 American science fiction action- adventure film written and directed by Steven Lisberger from a story by Lisberger and Bonnie MacBird. The film stars Jeff Bridges as Kevin Flynn, a computer programmer ...
*
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, ...
The CJK character sets take up the bulk of the assigned
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, ...
code space. There is much controversy among Japanese experts of Chinese characters about the desirability and technical merit of the
Han unification Han unification is an effort by the authors of Unicode and the Universal Character Set to map multiple character sets of the Han characters of the so-called CJK languages into a single set of unified characters. Han characters are a featur ...
process used to map multiple Chinese and Japanese character sets into a single set of unified characters. All three languages can be written both left-to-right and top-to-bottom (right-to-left and top-to-bottom in ancient documents), but are usually considered left-to-right scripts when discussing encoding issues.


Legal status

Libraries cooperated on encoding standards for
JACKPHY In library automation the initialism JACKPHY refers to a group of language scripts not based on Roman characters, specifically: Japanese, Arabic, Chinese, Korean, Persian, Hebrew, and Yiddish. Focus on these seven writing systems by Library of ...
characters in the early 1980s. According to
Ken Lunde Ken Roger Lunde (, born 12 August 1965 in Madison, Wisconsin)Lunde, 2008. is an American specialist in information processing for East Asian languages. Academic Background Ken majored in linguistics at University of Wisconsin–Madison in 1985, ...
, the abbreviation "CJK" was a registered
trademark A trademark (also written trade mark or trade-mark) is a type of intellectual property consisting of a recognizable sign, design, or expression that identifies products or services from a particular source and distinguishes them from ot ...
of
Research Libraries Group The Research Libraries Group (RLG) was a U.S.-based library consortium that existed from 1974 until its merger with the OCLC library consortium in 2006. RLG developed the Eureka interlibrary search engine, the RedLightGreen database of bibliogra ...
Ken Lunde, 1996
/ref> (which merged with
OCLC OCLC, Inc., doing business as OCLC, See also: is an American nonprofit cooperative organization "that provides shared technology services, original research, and community programs for its membership and the library community at large". It wa ...
in 2006). The trademark owned by OCLC between 1987 and 2009 has now expired.Justia listing
/ref>


See also

*
Chinese character description languages The Chinese character description languages are several proposed languages to most accurately and completely describe Chinese (or CJK) characters and information such as their list of components, list of strokes (basic and complex), their order, a ...
*
Chinese character encoding In computing, Chinese character encodings can be used to represent text written in the CJK languages— Chinese, Japanese, Korean—and (rarely) obsolete Vietnamese, all of which use Chinese characters. Several general-purpose characte ...
*
Chinese input methods for computers Chinese input methods are methods that allow a computer user to input Chinese characters. Most, if not all, Chinese input methods fall into one of two categories: phonetic readings or root shapes. Methods under the phonetic category usually are e ...
*
CJK Compatibility Ideographs CJK Compatibility Ideographs is a Unicode block created to contain Han characters that were encoded in multiple locations in other established character encodings, in addition to their CJK Unified Ideographs assignments, in order to retain rou ...
* CJK strokes *
CJK Unified Ideographs The Chinese, Japanese and Korean (CJK) scripts share a common background, collectively known as CJK characters. In the process called Han unification, the common (shared) characters were identified and named CJK Unified Ideographs. As of Unicode ...
* Complex Text Layout languages (CTL) *
Input method editor An input method (or input method editor, commonly abbreviated IME) is an operating system component or program that enables users to generate characters not natively available on their input devices by using sequences of characters (or mouse o ...
*
Japanese language and computers In relation to the Japanese language and computers many adaptation issues arise, some unique to Japanese and others common to languages which have a very large number of characters. The number of characters needed in order to write in English is ...
*
Korean language and computers The writing system of the Korean language is a syllabic alphabet of character parts () organized into character blocks () representing syllables. The character parts cannot be written from left to right on the computer, as in many Western lan ...
*
List of CJK fonts This is a list of notable CJK fonts ( computer fonts which contain a large range of Chinese/Japanese/Korean characters). These fonts are primarily sorted by their typeface, the main classes being "with serif", "without serif" and "script". In th ...
*
Sinoxenic Sino-Xenic or Sinoxenic pronunciations are regular systems for reading Chinese characters in Japan, Korea and Vietnam, originating in medieval times and the source of large-scale borrowings of Chinese words into the Japanese, Korean and Vietname ...
*
Variable-width encoding A variable-width encoding is a type of character encoding scheme in which codes of differing lengths are used to encode a character set (a repertoire of symbols) for representation, usually in a computer. Most common variable-width encodings a ...


References


Works cited

* *


Sources

* DeFrancis, John. '' The Chinese Language: Fact and Fantasy''. Honolulu: University of Hawaii Press, 1990. . * Hannas, William C. ''Asia's Orthographic Dilemma''. Honolulu: University of Hawaii Press, 1997. (paperback); (hardcover). * Lemberg, Werner: The CJK package for LATEX2ε—Multilingual support beyond babel. TUGboat, Volume 18 (1997), No. 3—Proceedings of the 1997 Annual Meeting. * Leban, Carl.
Automated Orthographic Systems for East Asian Languages (Chinese, Japanese, Korean)
', State-of-the-art Report, Prepared for the Board of Directors, Association for Asian Studies. 1971. * Lunde, Ken. ''CJKV Information Processing''. Sebastopol, Calif.: O'Reilly & Associates, 1998. .


External links


CJKV: A Brief Introduction

Lemberg CJK article from above, TUGboat18-3

On "CJK Unified Ideograph"
from Wenlin.com

{{CJK ideographs in Unicode Encodings of Asian languages Languages of East Asia Natural language and computing Chinese-language computing Japanese-language computing Korean-language computing