The CCITT Chinese Primary Set
is a multi-byte graphic
character set
Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers. The numerical values that ...
for
Chinese
Chinese can refer to:
* Something related to China
* Chinese people, people of Chinese nationality, citizenship, and/or ethnicity
**''Zhonghua minzu'', the supra-ethnic concept of the Chinese nation
** List of ethnic groups in China, people of va ...
communications created for the
Consultative Committee on International Telephone and Telegraph (CCITT) in 1992.
[
] It is defined in
ITU T.101, annex C, which codifies Data Syntax 2
Videotex
Videotex (or interactive videotex) was one of the earliest implementations of an end-user information system. From the late 1970s to early 2010s, it was used to deliver information (usually pages of text) to a user in computer-like format, typi ...
.
It is registered with the
ISO-IR
ISO/IEC 2022 ''Information technology—Character code structure and extension techniques'', is an ISO/IEC standard (equivalent to the ECMA standard ECMA-35, the ANSI standard ANSI X3.41 and the Japanese Industrial Standard JIS X 0202) in the f ...
registry for use with
ISO/IEC 2022
ISO/IEC 2022 ''Information technology—Character code structure and extension techniques'', is an ISO/IEC standard (equivalent to the ECMA standard ECMA-35, the ANSI standard ANSI X3.41 and the Japanese Industrial Standard JIS X 0202) in the f ...
as ISO-IR-165,
and encodable in the
ISO-2022-CN-EXT code version.
It is an extended modification of
GB/T 2312-80, and corresponds to the union of the Mainland Chinese
GB standards
The National Standards of the People's Republic of China (), coded as , are the standards issued by the Standardization Administration of China under the authorization of Article 10 of the Standardization Law of the People's Republic of China.
...
GB 6345.1-86 and GB 8565.2-88, with some further modification and extensions. A subset of the GB 6345.1 extensions are incorporated into
GB 18030
GB 18030 is a Chinese government standard, described as ''Information Technology — Chinese coded character set'' and defines the required language and character support necessary for software in China. GB18030 is the registered Internet ...
, while GB 8565.2 serves as the Mainland Chinese source reference for certain
CJK Unified Ideographs
The Chinese, Japanese and Korean (CJK) scripts share a common background, collectively known as CJK characters. In the process called Han unification, the common (shared) characters were identified and named CJK Unified Ideographs. As of Unicode ...
.
GB 6345.1
GB 6345.1-86 (''32 × 32 Dot Matrix Font Set of Chinese Ideographs for Information Interchange'') includes both a
corrigendum
An erratum or corrigendum (plurals: errata, corrigenda) (comes from la, wikt:errata corrige, errata corrige) is a correction of a published text. As a general rule, publishers issue an erratum for a production error (i.e., an error introduced duri ...
and an extension for GB 2312.
The corrigendum alters the following two characters:
Deployed implementations incorporating GB 2312, such as
Windows code page 936, generally follow these corrections in mapping 79-81 to U+953A.
The extension adds half-width
ISO 646-CN
ISO/IEC 646 is a set of ISO/ IEC standards, described as ''Information technology — ISO 7-bit coded character set for information interchange'' and developed in cooperation with ASCII at least since 1964. Since its first edition in ...
characters in row 10 (in addition to the existing full-width characters in row 3) and extends the set of 26 non-ASCII
pinyin
Hanyu Pinyin (), often shortened to just pinyin, is the official romanization system for Standard Mandarin Chinese in China, and to some extent, in Singapore and Malaysia. It is often used to teach Mandarin, normally written in Chinese for ...
characters in row 8 with six additional such characters. These GB 6345.1 extensions are also incorporated into
GB/T 12345
GB 12345, entitled ''Code of Chinese ideogram set for information interchange supplementary set'' ( zh, s=信息交換用漢字編碼字符集 輔助集), is a Traditional Chinese character set standard established by China, and can be thought ...
, the
Traditional Chinese
A tradition is a belief or behavior (folk custom) passed down within a group or society with symbolic meaning or special significance with origins in the past. A component of cultural expressions and folklore, common examples include holidays or ...
counterpart to GB 2312, in addition to 29 vertical presentation forms in row 6.
Later GB/T 6345.1-2010 published in 2011 officially adds half-width forms of the 32 pinyin characters (including the six new additions) in row 8 to row 11.
This addition is not featured in GB 18030.
The six additional pinyin characters from GB 6345.1 and the vertical presentation forms from GB 12345 — but not the half-width forms — are included in the
classic Mac OS
Mac OS (originally System Software; retronym: Classic Mac OS) is the series of operating systems developed for the Macintosh family of personal computers by Apple Computer from 1984 to 2001, starting with System 1 and ending with Mac OS 9. The ...
encoding for Simplified Chinese (a modification of
EUC-CN
Extended Unix Code (EUC) is a multibyte character encoding system used primarily for Japanese, Korean, and simplified Chinese.
The most commonly used EUC codes are variable-length encodings with a character belonging to an compliant coded chara ...
),
and also as two-byte codes in
GB 18030
GB 18030 is a Chinese government standard, described as ''Information Technology — Chinese coded character set'' and defines the required language and character support necessary for software in China. GB18030 is the registered Internet ...
.
The additional pinyin characters are as follows:
These extensions and modifications to GB 2312 were first introduced in GB 5007.1-85 in 1985.
GB 8565.2
GB 8565.2-88 (''Information Processing - Coded Character Sets for Text Communication - Part 2: Graphic Characters'') defines an extension for GB 2312, adding 705 characters between rows 13–15 and 90–94, of which 69 (all in row 15) are non-hanzi. It includes the GB 2312 corrections from GB 6345.1, but not its extensions.
The
Unihan
Han unification is an effort by the authors of Unicode and the Universal Character Set to map multiple character sets of the Han characters of the so-called CJK languages into a single set of unified characters. Han characters are a feature s ...
database references GB 8565.2 as the Mainland Chinese source of several hanzi included in
Unicode
Unicode, formally The Unicode Standard,The formal version reference is is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expre ...
. Its Unihan source abbreviation is .
CCITT changes
ISO-IR-165 incorporates the GB 2312 extensions from both GB 6345.1-86 and GB 8565.2-88.
Additionally, it adds 161 further characters (including 139 hanzi, identified as “general Chinese characters and variants”).
These CCITT hanzi extensions have on occasion been mistaken for standard GB 8565.2 characters, including in previous revisions of the
Unihan
Han unification is an effort by the authors of Unicode and the Universal Character Set to map multiple character sets of the Han characters of the so-called CJK languages into a single set of unified characters. Han characters are a feature s ...
database.
In total the set contains 8446 characters.
A number of patterned
semigraphic
Text-based semigraphics or pseudographics is a primitive method used in early text mode video hardware to emulate raster graphics without having to implement the logic for such a display mode.
There are two different ways to accomplish the emu ...
characters are included in row 6.
This collides with the vertical presentation forms included in other extensions such as Mac OS Simplified Chinese
and GB 18030.
The GB 6345.1 corrections to GB 2312 are applied, but two Unicode mappings are reversed compared to other encodings which include GB 2312 with GB 6345.1 extensions. The table below shows the mappings and their corresponding glyphs including
GB 18030
GB 18030 is a Chinese government standard, described as ''Information Technology — Chinese coded character set'' and defines the required language and character support necessary for software in China. GB18030 is the registered Internet ...
:
References
External links
ISO-IR-165: Code of the Chinese graphic character set for communication(registered 1992, amended 1994)
Unicode mappings for ISO-IR-165
{{DEFAULTSORT:Iso-Ir-165
Chinese-language computing
Character sets