HOME

TheInfoList



OR:

__FORCETOC__ CJK Unified Ideographs Extension I is a
Unicode block A Unicode block is one of several contiguous ranges of numeric character codes (code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation purposes. Typically, proposals such as the ad ...
comprising
CJK Unified Ideographs The Chinese, Japanese and Korean (CJK) scripts share a common background, collectively known as CJK characters. In the process called Han unification, the common (shared) characters were identified and named CJK Unified Ideographs. As of Unicode ...
included in drafts of an amendment to China's
GB 18030 GB 18030 is a Chinese government standard, described as ''Information Technology — Chinese coded character set'' and defines the required language and character support necessary for software in China. GB18030 is the registered Internet n ...
standard circulated in 2022 and 2023, which were fast-tracked into
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, ...
in 2023.


Background

Unlike most other sets of CJK unified ideographs, Extension I was not prepared and submitted by the
Ideographic Research Group The Ideographic Research Group (IRG), formerly called the Ideographic Rapporteur Group, is a subgroup of Working Group 2 (WG2) of ISO/IEC JTC 1/SC 2 (SC 2), the subcommittee of the Joint Technical Committee of ISO and IEC which is responsible for ...
(IRG).
GB 18030 GB 18030 is a Chinese government standard, described as ''Information Technology — Chinese coded character set'' and defines the required language and character support necessary for software in China. GB18030 is the registered Internet n ...
is a mandatory national standard of the
People's Republic of China China, officially the People's Republic of China (PRC), is a country in East Asia. It is the world's List of countries and dependencies by population, most populous country, with a Population of China, population exceeding 1.4 billion, sli ...
(PRC). It defines a
Unicode Transformation Format Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, whi ...
which retains compatibility with existing data in the earlier GBK and
EUC-CN Extended Unix Code (EUC) is a multibyte character encoding system used primarily for Japanese, Korean, and simplified Chinese. The most commonly used EUC codes are variable-length encodings with a character belonging to an compliant coded charac ...
character encodings, and specifies particular Unicode characters which devices sold in China must support. Its 2022 edition, , changed a number of required characters to map to standard Unicode code points, rather than to
private use area In Unicode, a Private Use Area (PUA) is a range of code points that, by definition, will not be assigned characters by the Unicode Consortium. Three private use areas are defined: one in the Basic Multilingual Plane (), and one each in, and nearl ...
code points. In late 2022, the PRC made a draft of a further amendment to be made to GB 18030 available for public consultation. This draft would have placed 897 new
sinograph Chinese characters () are logograms developed for the writing of Chinese. In addition, they have been adapted to write other East Asian languages, and remain a key component of the Japanese writing system where they are known as ''kanji' ...
ic characters in Plane 10 (
hexadecimal In mathematics and computing, the hexadecimal (also base-16 or simply hex) numeral system is a positional numeral system that represents numbers using a radix (base) of 16. Unlike the decimal system representing numbers using 10 symbols, h ...
: 0A), a yet-untitled astral Unicode plane. This was motivated by a "strong need of citizen real-name certification in China". Since it would impact
ISO/IEC 10646 ISO/IEC JTC 1, entitled "Information technology", is a joint technical committee (JTC) of the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC). Its purpose is to develop, maintain and pr ...
(the Universal Coded Character Set, the
ISO ISO is the most common abbreviation for the International Organization for Standardization. ISO or Iso may also refer to: Business and finance * Iso (supermarket), a chain of Danish supermarkets incorporated into the SuperBest chain in 2007 * Is ...
standard synchronised with Unicode), the draft was circulated in
ISO/IEC JTC 1/SC 2 ISO/IEC JTC 1/SC 2 Coded character sets is a standardization subcommittee of the Joint Technical Committee ISO/IEC JTC 1 of the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC), that deve ...
, the ISO subcommittee responsible for ISO 10646. The Chinese national body maintained that "ISO/IEC 10646 do not specify the purpose of the 0A plane", which ISO 10646 denotes as "reserved for future standardization", and that this use was therefore "not inappropriate". However, since the intent of ISO 10646 was for Plane 10 to be reserved for future allocation by ISO 10646 and Unicode via their usual ballot process, not for it to be allocated
unilateral __NOTOC__ Unilateralism is any doctrine or agenda that supports one-sided action. Such action may be in disregard for other parties, or as an expression of a commitment toward a direction which other parties may find disagreeable. As a word, ''un ...
ly by national standards bodies, this proposed move was criticised by experts and other national bodies as one which would "destabilize the synchronization" between GB 18030 and ISO/IEC 10646 (and thus Unicode), and which would make it impossible to conform to both with a single implementation, effectively forking Unicode. At its meeting in March 2023, the IRG emphasised the importance of providing any subsequent GB 18030 amendment drafts to IRG experts in a timely manner, and of not "using the ISO/IEC 10646 standard inappropriately". As an alternative, the
repertoire A repertoire () is a list or set of dramas, operas, musical compositions or roles which a company or person is prepared to perform. Musicians often have a musical repertoire. The first known use of the word ''repertoire'' was in 1847. It is a ...
(eventually reduced to 622 characters after expert review) was fast-tracked into Unicode version 15.1 in September 2023, as the CJK Unified Ideographs Extension I block. The characters constitute the "GIDC23"
Unihan Han unification is an effort by the authors of Unicode and the Universal Character Set to map multiple character sets of the Han characters of the so-called CJK languages into a single set of unified characters. Han characters are a feature s ...
source, defined as sourced from the "ID system of the Ministry of Public Security of China, 2023". The
CJK Unified Ideographs Extension D CJK Unified Ideographs Extension D is a Unicode block A Unicode block is one of several contiguous ranges of numeric character codes (code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and doc ...
block was cited as a precedent, since it comprised a repertoire of urgently needed characters (UNCs) from IRG member bodies, whereas the IRG working-set initially slated to become Extension D would instead become Extension E. For compactness, the block was allocated to the available space in the
Supplementary Ideographic Plane In the Unicode standard, a plane is a continuous group of 65,536 (216) code points. There are 17 planes, identified by the numbers 0 to 16, which corresponds with the possible values 00–1016 of the first two positions in six position hexadecimal ...
after
CJK Unified Ideographs Extension F CJK Unified Ideographs Extension F is a Unicode block containing rare and historic CJK ideographs for Chinese, Japanese, Korean, and Vietnamese, as well as more than a thousand Sawndip characters for writing the Zhuang language The Zhuang la ...
, as opposed to on the
Tertiary Ideographic Plane In the Unicode standard, a plane is a continuous group of 65,536 (216) code points. There are 17 planes, identified by the numbers 0 to 16, which corresponds with the possible values 00–1016 of the first two positions in six position hexadecimal ...
after CJK Unified Ideographs Extension H; this means that the CJK extension blocks are no longer in alphabetical order by extension letter. Following this, the draft GB 18030 amendment was modified to use the Extension I code points. At its next meeting in October 2023, the IRG expressed concerns about bypassing the IRG for large collections of CJK characters, and noted that two of the characters in Extension I had, for the purposes of other regions' character sources, previously been unified with existing characters under IRG unification rules: (Note: the referenced document refers to an earlier draft of Extension I with code points that differ from those in the final version accepted into Unicode. U+2ED90 in the referenced document corresponds to in the final version, while U+2EDD1 in the referenced document corresponds to in the final version.) * Allowing for interchangeable forms of the
grass radical Radical 140 or radical grass () meaning "grass" is one of 29 of the 214 Kangxi radicals that are composed of 6 strokes. It transforms into when appearing at the top of a character or component. In the ''Kangxi Dictionary'' and in modern standard ...
, corresponds to the pre-existing T-source (
Taiwan Taiwan, officially the Republic of China (ROC), is a country in East Asia, at the junction of the East and South China Seas in the northwestern Pacific Ocean, with the People's Republic of China (PRC) to the northwest, Japan to the northe ...
) glyph for (referenced from
CNS 11643 The CNS 11643 character set (Chinese National Standard 11643), also officially known as the Chinese Standard Interchange Code or CSIC ( zh, tr=, t=中文標準交換碼), is officially the standard character set of Taiwan (Republic of China). In ...
), as well as to a proposed J-source ( Japan) glyph for the same. A character corresponding to the other (G-source, i.e. Mainland China) glyph of U+8286 does exist elsewhere in more recent editions of CNS 11643, so the addition of U+2ED9D impacts the existing correspondences between CNS 11643 and Unicode although, due to neither character being in planes 1 or 2, there are no implications for the Unicode mapping of
Big5 Big-5 or Big5 is a Chinese character encoding method used in Taiwan, Hong Kong, and Macau for traditional Chinese characters. The People's Republic of China (PRC), which uses simplified Chinese characters, uses the GB 18030 character s ...
. * corresponds to a proposed J-source (Japan) glyph for . It had previously been proposed as a new character twice (once with reference to CNS 11643, and once by Japan), but rejected on the basis that it was unifiable with U+8FF3. The proposed glyph was later moved to the new code point, per a request by the Japanese national body. In response, the IRG recommended that, in future, submitters of proposed CJK characters be required to provide information about the impact on other CJK character sources of any disunifications proposed by the submission, and that the IRG be given time to review all large submissions of CJK characters. The IRG encouraged the Chinese body to propose solutions to the issues caused by the addition of these two characters at the next IRG meeting.


Block


History

The following Unicode-related documents record the purpose and process of defining specific characters in the CJK Unified Ideographs Extension I block:


References


Further reading

* This article details how the CJK Unified Ideographs Extension I block became standardized, and its relationship with two drafts of the GB 18030-2022 amendment. {{Unicode CJK Unified Ideographs Unicode blocks