The Chinese, Japanese and Korean (CJK) scripts share a common background, collectively known as

CJK characters In internationalization, CJK characters is a collective term for the Chinese, Japanese, and Korean languages, all of which include Chinese characters and derivatives in their writing systems, sometimes paired with other scripts. Collectively, the ...

. In the process called

Han unification Han unification is an effort by the authors of Unicode and the Universal Character Set to map multiple character sets of the Han characters of the so-called CJK languages into a single set of unified characters. Han characters are a feature s ...

, the common (shared) characters were identified and named CJK Unified Ideographs. As of Unicode 15.0, Unicode defines a total of 97,058 CJK Unified Ideographs. The term ''ideographs'' is a misnomer, as the

Chinese script Chinese characters () are logograms developed for the writing of Chinese. In addition, they have been adapted to write other East Asian languages, and remain a key component of the Japanese writing system where they are known as ''kanji' ...

is not

ideographic An ideogram or ideograph (from Greek "idea" and "to write") is a graphic symbol that represents an idea or concept, independent of any particular language, and specific words or phrases. Some ideograms are comprehensible only by familiari ...

but rather

logographic In a written language, a logogram, logograph, or lexigraph is a written character that represents a word or morpheme. Chinese characters (pronounced '' hanzi'' in Mandarin, ''kanji'' in Japanese, ''hanja'' in Korean) are generally logograms, a ...

. Historically, Vietnam used Chinese characters too, so sometimes the abbreviation CJKV is used. Vietnamese use was replaced by the Latin-based

Vietnamese alphabet The Vietnamese alphabet ( vi, chữ Quốc ngữ, lit=script of the National language) is the modern Latin writing script or writing system for Vietnamese language, Vietnamese. It uses the Latin script based on Romance languages originally develo ...

in the 1920s.

Sources

The

Ideographic Research Group The Ideographic Research Group (IRG), formerly called the Ideographic Rapporteur Group, is a subgroup of Working Group 2 (WG2) of ISO/IEC JTC 1/SC 2 (SC 2), the subcommittee of the Joint Technical Committee of ISO and IEC which is responsible for ...

(IRG) is responsible for developing extensions to the encoded repertoires of CJK unified ideographs. IRG processes proposals for new CJK unified ideographs submitted by its member bodies, and after undergoing several rounds of expert review, IRG submits a consolidated set of characters to

ISO/IEC JTC 1/SC 2 ISO/IEC JTC 1/SC 2 Coded character sets is a standardization subcommittee of the Joint Technical Committee ISO/IEC JTC 1 of the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC), that devel ...

Working Group 2 (WG2) and the

Unicode Technical Committee The Unicode Consortium (legally Unicode, Inc.) is a 501(c)(3) non-profit organization incorporated and based in Mountain View, California. Its primary purpose is to maintain and publish the Unicode Standard which was developed with the intent ...

(UTC) for consideration for inclusion in the

ISO/IEC 10646 ISO/IEC JTC 1, entitled "Information technology", is a joint technical committee (JTC) of the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC). Its purpose is to develop, maintain and pr ...

and

Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expre ...

standards. The following IRG member bodies have been involved in the standardization of CJK unified ideographs: *

China China, officially the People's Republic of China (PRC), is a country in East Asia. It is the world's most populous country, with a population exceeding 1.4 billion, slightly ahead of India. China spans the equivalent of five time zones and ...

Hong Kong Hong Kong ( (US) or (UK); , ), officially the Hong Kong Special Administrative Region of the People's Republic of China ( abbr. Hong Kong SAR or HKSAR), is a city and special administrative region of China on the eastern Pearl River Delt ...

Japan Japan ( ja, 日本, or , and formally , ''Nihonkoku'') is an island country in East Asia. It is situated in the northwest Pacific Ocean, and is bordered on the west by the Sea of Japan, while extending from the Sea of Okhotsk in the north ...

South Korea South Korea, officially the Republic of Korea (ROK), is a country in East Asia, constituting the southern part of the Korea, Korean Peninsula and sharing a Korean Demilitarized Zone, land border with North Korea. Its western border is formed ...

North Korea North Korea, officially the Democratic People's Republic of Korea (DPRK), is a country in East Asia. It constitutes the northern half of the Korea, Korean Peninsula and shares borders with China and Russia to the north, at the Yalu River, Y ...

Macau Macau or Macao (; ; ; ), officially the Macao Special Administrative Region of the People's Republic of China (MSAR), is a city and special administrative region of China in the western Pearl River Delta by the South China Sea. With a pop ...

Taiwan Taiwan, officially the Republic of China (ROC), is a country in East Asia, at the junction of the East and South China Seas in the northwestern Pacific Ocean, with the People's Republic of China (PRC) to the northwest, Japan to the nort ...

, liaison member represented by the Taipei Computer Association (TCA) *

Vietnam Vietnam or Viet Nam ( vi, Việt Nam, ), officially the Socialist Republic of Vietnam,., group="n" is a country in Southeast Asia, at the eastern edge of mainland Southeast Asia, with an area of and population of 96 million, making i ...

(liaison member) *

United Kingdom The United Kingdom of Great Britain and Northern Ireland, commonly known as the United Kingdom (UK) or Britain, is a country in Europe, off the north-western coast of the continental mainland. It comprises England, Scotland, Wales and North ...

* SAT (liaison member) The ideographs submitted by the UTC and the United Kingdom are not specific to any particular region, but are characters which have been suggested for encoding by individual experts. The ideographs submitted by SAT are required for the SAT Daizōkyō text database. The table below gives the numbers of encoded CJK unified ideographs for each IRG source for Unicode 15.0. The total number of characters (223,653) far exceeds the number of encoded CJK unified ideographs (97,058) as many characters have more than one source.

UTC sources

The majority of characters submitted by the UTC to the IRG are derived from Unicode Technical Committee (UTC) documents. Other sources include: * '' ABC Chinese-English Dictionary'' by

John DeFrancis John DeFrancis (August 31, 1911January 2, 2009) was an American linguist, sinologist, author of Chinese language textbooks, lexicographer of Chinese dictionaries, and Professor Emeritus of Chinese Studies at the University of Hawaii at Mānoa. ...

* The Adobe-CNS1 glyph collection * The Adobe-Japan1 glyph collection * A Complete Checklist of Species and Subspecies of Chinese Birds (中国鸟类系统检索) * The Great Nom Dictionary (Đại Tự Điển Chữ Nôm) * Annotations to ''

Shuowen Jiezi ''Shuowen Jiezi'' () is an ancient Chinese dictionary from the Han dynasty. Although not the first comprehensive Chinese character dictionary (the '' Erya'' predates it), it was the first to analyze the structure of the characters and to give ...

'' (annotated by

Duan Yucai Duan Yucai () (1735–1815), courtesy name Ruoying () was a Chinese philologist of the Qing Dynasty. He made great contributions to the study of Historical Chinese phonology, and is known for his annotated edition of ''Shuowen Jiezi''. Biograph ...

) * GB18030-2000 * Required Character List Supplied by

the Church of Jesus Christ of Latter-day Saints The Church of Jesus Christ of Latter-day Saints, informally known as the LDS Church or Mormon Church, is a Nontrinitarianism, nontrinitarian Christianity, Christian church that considers itself to be the Restorationism, restoration of the ...

(Hong Kong) * New Commercial Dictionary (商务新词典), Hong Kong * Modern Chinese Dictionary (现代汉语词典), by

Chinese Academy of Social Sciences The Chinese Academy of Social Sciences (CASS) is a Chinese research institute and think tank. The institution is the premier comprehensive national academic research organization in the People's Republic of China for the study in the fields of ...

, Linguistics Research Institute, Dictionary Editorial Office * Working Group (WG2) documents * Wenlin (文林) http://www.wenlin.com/

CJK Unified Ideographs blocks

CJK Unified Ideographs

The basic block named ''

CJK Unified Ideographs The Chinese, Japanese and Korean (CJK) scripts share a common background, collectively known as CJK characters. In the process called Han unification, the common (shared) characters were identified and named CJK Unified Ideographs. As of Unicode ...

'' (4E00–9FFF) contains 20,992 basic

Chinese characters Chinese characters () are logograms developed for the writing of Chinese. In addition, they have been adapted to write other East Asian languages, and remain a key component of the Japanese writing system where they are known as ''kanji' ...

in the range U+4E00 through U+9FFF. The block not only includes characters used in the

Chinese writing system Written Chinese () comprises Chinese characters used to represent the Chinese language. Chinese characters do not constitute an alphabet or a compact syllabary. Rather, the writing system is roughly logosyllabic; that is, a character generally rep ...

but also

kanji are the logographic Chinese characters taken from the Chinese family of scripts, Chinese script and used in the writing of Japanese language, Japanese. They were made a major part of the Japanese writing system during the time of Old Japanese ...

used in the

Japanese writing system The modern Japanese writing system uses a combination of logographic kanji, which are adopted Chinese characters, and syllabic kana. Kana itself consists of a pair of syllabaries: hiragana, used primarily for native or naturalised Japanese wo ...

and

hanja Hanja (Hangul: ; Hanja: , ), alternatively known as Hancha, are Chinese characters () used in the writing of Korean. Hanja was used as early as the Gojoseon period, the first ever Korean kingdom. (, ) refers to Sino-Korean vocabulary, wh ...

, whose use is diminishing in

Korea Korea ( ko, 한국, or , ) is a peninsular region in East Asia. Since 1945, it has been divided at or near the 38th parallel, with North Korea (Democratic People's Republic of Korea) comprising its northern half and South Korea (Republic o ...

. Many characters in this block are used in all three

writing system A writing system is a method of visually representing verbal communication, based on a script and a set of rules regulating its use. While both writing and speech are useful in conveying messages, writing differs in also being a reliable form ...

s, while others are in only one or two of the three.

Chữ Hán Chữ Hán (𡨸漢, literally "Chinese characters", ), Chữ Nho (𡨸儒, literally "Confucian characters", ) or Hán tự (漢字, ), is the Vietnamese term for Chinese characters, used to write Văn ngôn (which is a form of Classical Chinese ...

are also used in Vietnam's

chữ Nôm Chữ Nôm (, ; ) is a logographic writing system formerly used to write the Vietnamese language. It uses Chinese characters (''Chữ Hán'') to represent Sino-Vietnamese vocabulary and some native Vietnamese words, with other words represented ...

(now obsolete). The first 20,902 characters in the block are arranged according to the

Kangxi Dictionary The ''Kangxi Dictionary'' ( (Compendium of standard characters from the Kangxi period), published in 1716, was the most authoritative dictionary of Chinese characters from the 18th century through the early 20th. The Kangxi Emperor of the Qing d ...

ordering of

radical Radical may refer to: Politics and ideology Politics *Radical politics, the political intent of fundamental societal change *Radicalism (historical), the Radical Movement that began in late 18th century Britain and spread to continental Europe and ...

s. In this system the characters written with the fewest strokes are listed first. The remaining characters were added later, and so are not in radical order. The block is the result of

, which was somewhat controversial within East Asia. Since Chinese, Japanese and Korean characters were coded in the same location, the appearance of a selected glyph could depend on the particular font being used. However, the ''source separation rule'' states that characters encoded separately in an earlier character set would remain separate in the new Unicode encoding. Using

variation selectors Variation Selectors is the block name of a Unicode code point block containing 16 variation selectors. Each variation selector is used to specify a specific glyph variant for a preceding character. They are currently used to specify standardize ...

, it is possible to specify certain variant CJK ideograms within Unicode. The Adobe-Japan1

character set Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers. The numerical values that ...

, which has 14,684 ideographic variation sequences, is an extreme example of the use of variation selectors.

Charts

4E00-62FF, 6300-77FF, 7800-8CFF, 8D00-9FFF.

Sources

Note: Most characters appear in multiple sources, so the sum of individual character counts (102,794) is far greater than the number of encoded characters (20,992). In Unicode 4.1, 14 HKSCS-2004 characters and 8

GB 18030 GB 18030 is a Chinese government standard, described as ''Information Technology — Chinese coded character set'' and defines the required language and character support necessary for software in China. GB18030 is the registered Internet n ...

characters were assigned to between U+9FA6 and U+9FBB code points. Since then, other additions were added to this block for various reasons, all summarized in the

version history Software versioning is the process of assigning either unique ''version names'' or unique ''version numbers'' to unique states of computer software. Within a given version number category (e.g., major or minor), these numbers are generally assig ...

section below.

CJK Unified Ideographs Extension A

The block named '' CJK Unified Ideographs Extension A'' (3400–4DBF) contains 6,592 additional characters in the range U+3400 through U+4DBF.

Charts

3400-4DBF.

Sources

Note: Most characters appear in more than one source, so the sum of individual character counts (18,832) is far greater than the number of encoded characters (6,592).

CJK Unified Ideographs Extension B

The block named ''

Note: Some characters appear in more than one source, so the sum of individual character counts (7,774) is greater than the number of encoded characters (7,473).

CJK Unified Ideographs Extension G

A block named '' CJK Unified Ideographs Extension G'' was added as part of Unicode 13.0 to the

Tertiary Ideographic Plane In the Unicode standard, a plane is a continuous group of 65,536 (216) code points. There are 17 planes, identified by the numbers 0 to 16, which corresponds with the possible values 00–1016 of the first two positions in six position hexadecima ...

in the range U+30000 through U+3134F, containing 4,939 characters.

Charts

30000–3134F.

Sources

Note: Some characters appear in more than one source, so the sum of individual character counts (5,081) is greater than the number of encoded characters (4,939).

CJK Unified Ideographs Extension H

A block named ''

CJK Unified Ideographs Extension H __FORCETOC__ CJK Unified Ideographs Extension H is a Unicode block containing rare and historic CJK Unified Ideographs for Chinese, Japanese, Korean, Sawndip, and Vietnamese. Block History The following Unicode-related documents record the purpo ...

'' was added as part of Unicode 15.0 to the

in the range U+31350 through U+323AF, containing 4,192 characters.

Charts

31350–323AF.

Sources

Note: Some characters appear in more than one source, so the sum of individual character counts (4,305) is greater than the number of encoded characters (4,192).

CJK Compatibility Ideographs

The block named ''

CJK Compatibility Ideographs CJK Compatibility Ideographs is a Unicode block created to contain Han characters that were encoded in multiple locations in other established character encodings, in addition to their CJK Unified Ideographs assignments, in order to retain roun ...

'' (F900–FAFF) was created to retain round-trip compatibility with other standards. Only twelve of its characters have the "Unified Ideograph" property: U+FA0E, FA0F, FA11, FA13, FA14, FA1F, FA21, FA23, FA24, FA27, FA28 and FA29. None of the other characters in this and other "Compatibility" blocks relate to CJK Unification.

Charts

F900–FAFF.

Sources

Note: All characters appear in more than one source, so the sum of individual character counts (36) is greater than the number of encoded characters (12).

Known issues

Disunification

U+4039

The character U+4039 (䀹) was a unification of two different characters (one with jiā 夾 phonetic and one with shǎn 㚒 phonetic) until Unicode 5.0. However, they were lexically different characters that should not have been unified; they have different pronunciations and different meanings. The proposal of disunification of U+4039 was accepted and the new character is encoded at U+9FC3 (鿃) in Unicode 5.1.

Other 3 glyphs in Extension B

In CJK Unified Ideographs Extension B, some characters are incorrectly unified with others. These characters include U+2017B (𠅻), U+204AF (𠒯) and U+24CB2 (𤲲). The first two characters contained a wrong unification of Chinese Mainland and Vietnamese source of their glyph, while the last one unifies the Chinese Mainland and Taiwanese ones.

Unifiable variants and exact duplicates in Extension B

Also in CJK Unified Ideographs Extension B, hundreds of glyph variants were encoded. In addition to the deliberate encoding of close glyph variants, six exact duplicates (where the same character has inadvertently been encoded twice) and two semi-duplicates (where the CJK-B character represents a ''de facto'' disunification of two glyph forms unified in the corresponding BMP character) were encoded by mistake: * U+34A8 㒨 = U+20457 𠑗 : U+20457 is the same as the China-source glyph for U+34A8, but it is significantly different from the Taiwan-source glyph for U+34A8 * U+3DB7 㶷 = U+2420E 𤈎 : same glyph shapes * U+8641 虁 = U+27144 𧅄 : U+27144 is the same as the Korean-source glyph for U+8641, but it is significantly different from the Chinese Mainland-, Taiwan- and Japan-source glyphs for U+8641 * U+204F2 𠓲 = U+23515 𣔕 : same glyph shapes, but ordered under different radicals * U+249BC 𤦼 = U+249E9 𤧩 : same glyph shapes * U+24BD2 𤯒 = U+2A415 𪐕 : same glyph shapes, but ordered under different radicals * U+26842 𦡂 = U+26866 𦡦 : same glyph shapes * U+FA23 﨣 = U+27EAF 𧺯 : same glyph shapes (U+FA23 﨣 is a unified CJK ideograph, despite its name "CJK COMPATIBILITY IDEOGRAPH-FA23.")

Other CJK ideographs in Unicode, not Unified

Apart from the nine blocks of "Unified Ideographs," Unicode has about a dozen more blocks with not-unified CJK-characters. These are mainly CJK radicals, strokes, punctuation, marks, symbols and compatibility characters. Although some characters have their (decomposable) counterparts in other blocks, the usages can be different. An example of a not-unified CJK-character is in the

CJK Symbols and Punctuation CJK Symbols and Punctuation is a Unicode block containing symbols and punctuation used for writing the Chinese, Japanese and Korean languages. It also contains one Chinese character. Block The block has variation sequences defined for East ...

block. Although it is not covered under "CJK Unified Ideographs", it is treated as a CJK-character for all other intents and purposes. Four blocks of compatibility characters are included for compatibility with legacy text handling systems and older character sets: *

CJK Compatibility CJK Compatibility is a Unicode block containing square symbols (both CJK and Latin alphanumeric) encoded for compatibility with East Asian character sets. In Unicode 1.0, it was divided into two blocks, named CJK Squared Words (U+3300–U+337F) ...

(3300–33FF) *

CJK Compatibility Forms CJK Compatibility Forms is a Unicode block containing vertical glyph variants for east Asian compatibility. Its block name in Unicode 1.0 was CNS 11643 Compatibility, in reference to CNS 11643. History The following Unicode-related documents ...

(FE30–FE4F) *

(F900–FAFF) *

CJK Compatibility Ideographs Supplement CJK Compatibility Ideographs Supplement is a Unicode block containing Han characters used only for Round-trip format conversion, roundtrip compatibility mapping with planes 3, 4, 5, 6, 7, and 15 of CNS 11643-1992. Block History The following Un ...

(2F800–2FA1F) They include forms of characters for vertical text layout and rich text characters that Unicode recommends handling through other means. Therefore, their use is discouraged.

Font support

The blocks CJK Unified Ideographs and CJK Unified Ideographs Extension A, being parts of the

Basic Multilingual Plane In the Unicode standard, a plane is a continuous group of 65,536 (216) code points. There are 17 planes, identified by the numbers 0 to 16, which corresponds with the possible values 00–1016 of the first two positions in six position hexadecima ...

, are supported by the majority of the CJK fonts. However, Japanese and Korean fonts usually have fewer characters (about 13,000 and 8,000, respectively) than Chinese. Extensions B, C, D are supported by additional fonts MingLiU-ExtB, MingLiU_HKSCS-ExtB, PMingLiU-ExtB, SimSun-ExtB included in Microsoft Windows since Vista.

Unicode version history

Notes

External links

UK-Source Ideographs
(Documents IRG N2107R2 and IRG N2232R) {{Unicode navigation CJK, Unicode CJK Unified Ideographs

Sources

UTC sources

CJK Unified Ideographs blocks

CJK Unified Ideographs

Charts

Sources

CJK Unified Ideographs Extension A

Charts

Sources

CJK Unified Ideographs Extension B

Charts

Sources

CJK Unified Ideographs Extension C

Charts

Sources

CJK Unified Ideographs Extension D

Charts

Sources

CJK Unified Ideographs Extension E

Charts

Sources

CJK Unified Ideographs Extension F

Charts

Sources

CJK Unified Ideographs Extension G

Charts

Sources

CJK Unified Ideographs Extension H

Charts

Sources

CJK Compatibility Ideographs

Charts

Sources

Known issues

Disunification

U+4039

Other 3 glyphs in Extension B

Unifiable variants and exact duplicates in Extension B

Other CJK ideographs in Unicode, not Unified

Font support

Unicode version history

See also

Notes

External links