is a key official
character set of the
People's Republic of China
China, officially the People's Republic of China (PRC), is a country in East Asia. It is the world's List of countries and dependencies by population, most populous country, with a Population of China, population exceeding 1.4 billion, sli ...
, used for
Simplified Chinese characters. GB2312 is the registered internet name for
EUC-CN, which is its usual encoded form. ''GB'' refers to the
Guobiao standards (国家标准), whereas the ''T'' suffix ( zh, c=
推荐, p=tuījiàn, l=recommendation, labels=no) denotes a non-mandatory standard.
[
]
was originally a mandatory national standard designated . However, following a National Standard Bulletin of the
People's Republic of China
China, officially the People's Republic of China (PRC), is a country in East Asia. It is the world's List of countries and dependencies by population, most populous country, with a Population of China, population exceeding 1.4 billion, sli ...
in 2017, GB 2312 is no longer mandatory, and its standard code is modified to .
has been superseded by
GBK and
GB 18030
GB 18030 is a Chinese government standard, described as ''Information Technology — Chinese coded character set'' and defines the required language and character support necessary for software in China. GB18030 is the registered Internet n ...
, which include additional characters, but remains in widespread use as a subset of those encodings.
, GB2312 is the second-most popular encoding served from China and territories (after
UTF-8
UTF-8 is a variable-length character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode'' (or ''Universal Coded Character Set'') ''Transformation Format 8-bit''.
UTF-8 is capable of ...
), with 5.5% of web servers serving a page declaring it.
Globally, GB2312 is declared on 0.1% of all web pages. However, all major web browsers decode GB2312-marked documents as if they were marked with the superset GBK encoding, except for Safari and Edge on the label
GB_2312
.
There is an analogous character set known as GB/T 12345, closely related to GB/T 2312, but with
traditional character forms replacing simplified forms, and some extra 62 supplemental characters.
GB-encoded fonts often come in pairs, one with the GB/T 2312 (simplified) character set and the other with the GB/T 12345 (traditional) character set.
Character range in rows
While GB/T 2312 covers over 99.99% contemporary Chinese text usage, historical texts and many names remain out of scope. Old standard includes 6,763 Chinese characters (on two levels: the first is arranged by reading, the second by
radical then number of strokes), along with symbols and punctuation, Japanese
kana
The term may refer to a number of syllabaries used to write Japanese phonological units, morae. Such syllabaries include (1) the original kana, or , which were Chinese characters (kanji) used phonetically to transcribe Japanese, the most pr ...
, the
Greek
Greek may refer to:
Greece
Anything of, from, or related to Greece, a country in Southern Europe:
*Greeks, an ethnic group.
*Greek language, a branch of the Indo-European language family.
**Proto-Greek language, the assumed last common ancestor ...
and
Cyrillic alphabets,
Zhuyin, and a double-byte set of
Pinyin
Hanyu Pinyin (), often shortened to just pinyin, is the official romanization system for Standard Mandarin Chinese in China, and to some extent, in Singapore and Malaysia. It is often used to teach Mandarin, normally written in Chinese fo ...
letters with tone marks. In later version GB/T 2312-1980, there are 7,445 letters.
Characters in GB/T 2312 are arranged in a 94×94 grid (as in
ISO 2022), and the two-byte code point of each character is expressed in the ''kuten'' (or qūwèi, 区位) form, which specifies a row (''ku'' or qū,区) and the position of the character within the row (cell, ''ten'' or wèi,位). For example, the character "外" (meaning: foreign) is located in row 45 position 66, thus its ''kuten'' code is 45-66.
The rows (numbered from 1 to 94) contain characters as follows:
* 01–09, comprising punctuation and other special characters; also
Hiragana
is a Japanese language, Japanese syllabary, part of the Japanese writing system, along with ''katakana'' as well as ''kanji''.
It is a phonetic lettering system. The word ''hiragana'' literally means "flowing" or "simple" kana ("simple" ori ...
,
Katakana,
Greek
Greek may refer to:
Greece
Anything of, from, or related to Greece, a country in Southern Europe:
*Greeks, an ethnic group.
*Greek language, a branch of the Indo-European language family.
**Proto-Greek language, the assumed last common ancestor ...
,
Cyrillic,
Pinyin
Hanyu Pinyin (), often shortened to just pinyin, is the official romanization system for Standard Mandarin Chinese in China, and to some extent, in Singapore and Malaysia. It is often used to teach Mandarin, normally written in Chinese fo ...
,
Bopomofo
* 16–55, the first level of
Chinese characters
Chinese characters () are logograms developed for the writing of Chinese. In addition, they have been adapted to write other East Asian languages, and remain a key component of the Japanese writing system where they are known as '' kan ...
, arranged according to
Pinyin
Hanyu Pinyin (), often shortened to just pinyin, is the official romanization system for Standard Mandarin Chinese in China, and to some extent, in Singapore and Malaysia. It is often used to teach Mandarin, normally written in Chinese fo ...
. (3755 characters).
* 56–87, the second level of Chinese characters, arranged according to radical and strokes. (3008 characters).
The rows 10–15 and 88–94 are unassigned.
For GB/T 2312-1980, it contains 682 signs and 6763 Chinese Characters.
Encodings of GB/T 2312
EUC-CN
EUC-CN is often used as the
character encoding
Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers. The numerical values tha ...
(i.e. for external storage) in programs that deal with GB/T 2312, thus maintaining compatibility with
ASCII
ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
. Two
bytes are used to represent every character not found in
ASCII
ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
. The value of the first byte is from
0xA1–0xF7
(161–247), while the value of the second byte is from
0xA1–0xFE
(161–254). Since all of these ranges are beyond ASCII, like UTF-8, it is possible to check if a byte is part of a multi-byte construct when using EUC-CN, but not if a byte is first or last.
Compared to
UTF-8
UTF-8 is a variable-length character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode'' (or ''Universal Coded Character Set'') ''Transformation Format 8-bit''.
UTF-8 is capable of ...
, GB/T 2312 (whether native or encoded in EUC-CN) is more storage efficient: while
UTF-8
UTF-8 is a variable-length character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode'' (or ''Universal Coded Character Set'') ''Transformation Format 8-bit''.
UTF-8 is capable of ...
uses three bytes per
CJK ideograph, GB/T 2312 only uses two. However, GB/T 2312 does not cover as many ideographs as Unicode does.
To map the ''kuten'' code points to EUC bytes, add 160 (
0xA0
) to both the row number (''ku'' or qū, 区) and cell/column number (''ten'' or wèi, 位). The result of addition to the row number of the code point will form the high byte, and the result of addition to the cell number of the code point will form the low byte.
For example, to encode the character "外" at ''kuten'' cell 45-66, the high byte will use the row number 45: 45+160=205=
0xCD
, and the low byte will come from the cell number 66: 66+160=212=
0xE2
. So, the full encoding is
.
ISO-2022-CN
ISO-2022-CN is another encoding form of GB/T 2312, which is also the encoding specified in the official documentation. This encoding references the
ISO-2022 standard, which also uses two bytes to encode characters not found in ASCII. However, instead of using the extended region of ASCII, ISO-2022 uses the same byte range as ASCII: the value of the first byte is from
0x21–0x77
(33–119), while the value of the second byte is from
0x21–0x7E
(33–126). As the byte range overlaps ASCII significantly, special characters are required to indicate whether a character is in the ASCII range or is part of the two-byte sequence of extended region, namely the
Shift Out and Shift In functions. This poses a risk for misencoding as improper handling of text can result in missing information.
To map the ''kuten'' code points to ISO-2022 bytes, add 32 (
0x20
) to both the row number (''ku'' or qū, 区) and cell/column number (''ten'' or wèi, 位). The result of addition to the row number of the code point will form the high byte, and the result of addition to the cell number of the code point will form the low byte similar to EUC encoding.
For example, to encode the character "外" at ''kuten'' cell 45-66, the high byte will use the row number 45: 45+32=77=
0x4D
, and the low byte will come from the cell number 66: 66+32=98=
0x62
. So, the full encoding is
<4D 62>
.
HZ
HZ is another encoding of GB/T 2312 that is used mostly for
Usenet
Usenet () is a worldwide distributed discussion system available on computers. It was developed from the general-purpose Unix-to-Unix Copy (UUCP) dial-up network architecture. Tom Truscott and Jim Ellis conceived the idea in 1979, and it was ...
postings; characters are represented with the same byte pairs as in ISO-2022-CN, but the byte sequences denoting the beginning and end of a range of GB 2312 text differ.
Code charts
In the tables below, where a pair of hexadecimal numbers is given for a prefix byte or a coding byte, the smaller (with the eighth bit unset or unavailable) is used when encoded over GL (
0x21-0x7E), as in
ISO-2022-CN or
HZ-GB-2312
The HZ character encoding is an encoding of GB 2312 that was formerly commonly used in email and USENET postings. It was designed in 1989 by Fung Fung Lee () of Stanford University, and subsequently codified in 1995 into RFC 1843.
The HZ, short f ...
, and the larger (with the eighth bit set) is used in the more typical case of it being encoded over GR (0xA1-0xFE), as in
EUC-CN,
GBK or
GB 18030
GB 18030 is a Chinese government standard, described as ''Information Technology — Chinese coded character set'' and defines the required language and character support necessary for software in China. GB18030 is the registered Internet n ...
.
Qūwèi numbers are given in decimal.
When GB/T 2312 is encoded over GR, both bytes have the eighth bit set (i.e. are greater than 0x7F). GBK and GB 18030 also make use of two-byte codes in which only the first byte has the eighth bit set for extension purposes: such codes are outside of the GB/T 2312 plane, and are not tabulated here.
Lead byte
This chart details the overall layout of the main plane of the GB/T 2312 character set by lead byte. For lead bytes used for characters other than
hanzi
Chinese characters () are logograms developed for the writing of Chinese. In addition, they have been adapted to write other East Asian languages, and remain a key component of the Japanese writing system where they are known as ''kanji ...
, links are provided to charts on this page listing the characters encoded under that lead byte. For lead bytes used for hanzi, links are provided to the appropriate section of
Wiktionary
Wiktionary ( , , rhyming with "dictionary") is a multilingual, web-based project to create a free content dictionary of terms (including words, phrases, proverbs, linguistic reconstructions, etc.) in all natural languages and in a number ...
's hanzi index.
Non-Hanzi rows
The following charts list the non-
hanzi
Chinese characters () are logograms developed for the writing of Chinese. In addition, they have been adapted to write other East Asian languages, and remain a key component of the Japanese writing system where they are known as ''kanji ...
characters available in GB/T 2312, in GB/T 12345, and in double-byte region 1 of
GB 18030
GB 18030 is a Chinese government standard, described as ''Information Technology — Chinese coded character set'' and defines the required language and character support necessary for software in China. GB18030 is the registered Internet n ...
(which roughly corresponds to the non-hanzi region of GB/T 2312). Notes are made where these differ, and where
GB 6345.1 and
ISO-IR-165 differ from these. Cross-references are made to articles on other CJK national character sets for comparison.
Two implementations of GB2312
Unicode mappings of the
interpunct
An interpunct , also known as an interpoint, middle dot, middot and centered dot or centred dot, is a punctuation mark consisting of a vertically centered dot used for interword separation in ancient Latin script. (Word-separating spaces did no ...
() and
em dash
The dash is a punctuation mark consisting of a long horizontal line. It is similar in appearance to the hyphen but is longer and sometimes higher from the baseline. The most common versions are the endash , generally longer than the hyphen b ...
() in the subset of
GBK and
GB 18030
GB 18030 is a Chinese government standard, described as ''Information Technology — Chinese coded character set'' and defines the required language and character support necessary for software in China. GB18030 is the registered Internet n ...
corresponding to GB/T 2312 ( and ) differ from the those which are listed in GB2312.TXT ( and ), which is a data file which was previously provided by the
Unicode Consortium
The Unicode Consortium (legally Unicode, Inc.) is a 501(c)(3) non-profit organization incorporated and based in Mountain View, California. Its primary purpose is to maintain and publish the Unicode Standard which was developed with the intent ...
,
although it has been designated as obsolete since August 2011 and is no longer hosted as of September 2016.
As of 2015, Microsoft .Net Framework follows GB 18030 mappings when mapping those two characters in data labelled , whereas
ICU
ICU commonly refers to:
* Intensive care unit, a special department of a hospital
ICU may also refer to:
Organisations Universities
* Information and Communications University, South Korea
*Istanbul Commerce University, Istanbul, Turkey
* Intern ...
, iconv-1.14, php-5.6, ActivePerl-5.20, Java 1.7 and Python 3.4 follow GB2312.TXT in response to the label. Ruby 2.2 is compatible with both implementations; it internally converts the conflictive characters to the GB 18030 subset. The
W3C/
WHATWG
The Web Hypertext Application Technology Working Group (WHATWG) is a community of people interested in evolving HTML and related technologies. The WHATWG was founded by individuals from Apple Inc., the Mozilla Foundation and Opera Software, ...
technical recommendation for use with
HTML5 specifies a GBK encoding to be inferred for streams labelled
gb2312
, which in turn uses a GB18030 decoder.
Other differing mappings have been defined and used by individual vendors,
including one from
Apple
An apple is an edible fruit produced by an apple tree (''Malus domestica''). Apple trees are cultivated worldwide and are the most widely grown species in the genus '' Malus''. The tree originated in Central Asia, where its wild ances ...
.
Character set 0x21/0xA1 (row 1: punctuation and symbols)
This row contains punctuation, mathematical operators, and other symbols. The following table shows the GB 18030 mappings
for these GB/T 2312 characters first, followed by any other documented mappings.
Character set 0x22/0xA2 (row 2: list markers)
This row contains various types of list marker. Lowercase forms of the Roman numerals were not included in the original GB/T 2312
nor in GB/T 12345,
but are included in both
Windows code page 936 and
GB 18030
GB 18030 is a Chinese government standard, described as ''Information Technology — Chinese coded character set'' and defines the required language and character support necessary for software in China. GB18030 is the registered Internet n ...
.
A
euro sign was also added by GB 18030.
Character set 0x23/0xA3 (row 3: ISO 646-CN)
This row contains
ISO 646-CN (GB/T 1988-80), a national counterpart to
ASCII
ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
. Compare
row 3 of KS X 1001, which does the same with
South Korea
South Korea, officially the Republic of Korea (ROK), is a country in East Asia, constituting the southern part of the Korea, Korean Peninsula and sharing a Korean Demilitarized Zone, land border with North Korea. Its western border is formed ...
's ISO 646 version, and
row 3 of JIS X 0208 and
of KPS 9566, which include only the alphanumeric subset, but in the same layout. The following chart lists ISO 646-CN.
When used in an encoding allowing combination with ASCII such as
EUC-CN (and its superset
GB 18030
GB 18030 is a Chinese government standard, described as ''Information Technology — Chinese coded character set'' and defines the required language and character support necessary for software in China. GB18030 is the registered Internet n ...
), these characters are usually implemented as
fullwidth characters, hence mappings to the
Halfwidth and Fullwidth Forms block are used as shown below.
GB 6345.1 also handles this row as fullwidth, and adds the halfwidth forms (as above) as row 10.
Apple mostly maps this row to fullwidth code points as below, but uses non-fullwidth mappings for the overline and
yuan sign as above.
Character set 0x24/0xA4 (row 4: Hiragana)
This set contains
Hiragana
is a Japanese language, Japanese syllabary, part of the Japanese writing system, along with ''katakana'' as well as ''kanji''.
It is a phonetic lettering system. The word ''hiragana'' literally means "flowing" or "simple" kana ("simple" ori ...
for writing the
Japanese language
is spoken natively by about 128 million people, primarily by Japanese people and primarily in Japan, the only country where it is the national language. Japanese belongs to the Japonic or Japanese- Ryukyuan language family. There have been ...
.
Compare with
row 4 of JIS X 0208, which this row matches, and with
row 10 of KS X 1001 and
of KPS 9566, which use the same layout, but in a different row.
Character set 0x25/0xA5 (row 5: Katakana)
This set contains
Katakana for writing the
Japanese language
is spoken natively by about 128 million people, primarily by Japanese people and primarily in Japan, the only country where it is the national language. Japanese belongs to the Japonic or Japanese- Ryukyuan language family. There have been ...
. However, the
Japanese long vowel mark, which is used in katakana text and included in row 1 of
JIS X 0208, is not included in GB/T 2312, although it is added in GBK and GB 18030 outside of the main GB/T 2312 plane,
[
] at 0xA960.
Compare with
row 5 of JIS X 0208, which this row matches, and with
row 11 of KS X 1001 and
of KPS 9566, which use the same layout, but in a different row.
Character set 0x26/0xA6 (row 6: Greek and vertical extensions)
This row contains basic support for the modern
Greek alphabet
The Greek alphabet has been used to write the Greek language since the late 9th or early 8th century BCE. It is derived from the earlier Phoenician alphabet, and was the earliest known alphabetic script to have distinct letters for vowels as ...
, without diacritics or the
final sigma
Sigma (; uppercase Σ, lowercase σ, lowercase in word-final position ς; grc-gre, σίγμα) is the eighteenth letter of the Greek alphabet. In the system of Greek numerals, it has a value of 200. In general mathematics, uppercase Σ is used as ...
.
The highlighted characters are presentation forms of punctuation marks for vertical writing, and are not included in GB/T 2312 proper, but are included in this row by GB/T 12345,
Windows code page 936,
Mac OS Simplified Chinese,
and GB 18030.
They are seen as "standard extensions to GB 2312".
Conversely,
ISO-IR-165 includes patterned
semigraphic characters in this row (mostly without exact counterparts in Unicode), colliding with the code positions used for the vertical extensions.
Compare with
row 6 of JIS X 0208, which this row matches when the vertical forms are not included, and with
row 6 of KPS 9566, which includes the same Greek letters in the same layout, but adds Roman numerals rather than vertical forms. Contrast
row 5 of KS X 1001, which offsets the Greek letters to include the Roman numerals first.
Character set 0x27/0xA7 (row 7: Cyrillic)
This set includes both cases of 33 letters from the
Cyrillic script
The Cyrillic script ( ), Slavonic script or the Slavic script, is a writing system used for various languages across Eurasia. It is the designated national script in various Slavic, Turkic, Mongolic, Uralic, Caucasian and Iranic-speaking cou ...
, sufficient to write the modern
Russian alphabet and
Bulgarian alphabet
The Bulgarian Cyrillic alphabet is used to write the Bulgarian language.
The Cyrillic alphabet was originally developed in the First Bulgarian Empire during the 9th – 10th century AD at the Preslav Literary School.
It has been used in Bulgar ...
, although other forms of Cyrillic require additional letters.
Compare with
row 7 of JIS X 0208, which this row matches, and with
row 12 of KS X 1001 and
row 5 of KPS 9566, which use the same layout but in different rows.
Character set 0x28/0xA8 (row 8: zhuyin and non-ASCII pinyin)
This row contains
bopomofo and
pinyin
Hanyu Pinyin (), often shortened to just pinyin, is the official romanization system for Standard Mandarin Chinese in China, and to some extent, in Singapore and Malaysia. It is often used to teach Mandarin, normally written in Chinese fo ...
characters, excluding ASCII letters (which are in row 3). The highlighted characters are those which are not in the base GB 2312 set but are added by
GB 6345.1,
and also included in GB/T 12345,
Windows code page 936,
Mac OS Simplified Chinese
and GB 18030.
They are seen as "standard extensions to GB 2312".
GB 6345.1 treats the pinyin in this row as fullwidth, and includes halfwidth counterparts as row 11;
GB 18030 does not do this.
Character set 0x29/0xA9 (row 9: box drawing)
Hanzi rows
Inclusion of non-standard Simplified Chinese characters and Traditional Chinese characters
GB/T 2312 included 2 non-standard
Simplified Chinese characters:
* (68–41): Simplified from “”, but the ''Complete List of Simplified Characters'' ( zh, c=简化字总表, p=Jiǎnhuà Zì Zǒng Biǎo) has merged “” with “”. Old versions of ''
Xinhua Zidian
The ''Xinhua Zidian'' (), or ''Xinhua Dictionary'', is a Chinese language dictionary published by the Commercial Press. It is the best-selling Chinese dictionary and the world's most popular reference work. In 2016, Guinness World Records offic ...
'' ( zh, c=新华字典, p=Xīnhuá Zìdiǎn) had included this word and noted as juice ( zh, c=汁, p=zhì), new versions has cancelled this and merged “” with “”.
* (79–64): Simplified from “”, but the ''Complete List of Simplified Characters'' has merged “” with “”.
GB/T 2312 also included 3
Traditional Chinese characters
Traditional Chinese characters are one type of standard Chinese characters, Chinese character sets of the contemporary written Chinese. The traditional characters had taken shapes since the libian, clerical change and mostly remained in the ...
:
* (79–81): The original document used the character “” with traditional part, but the ''Complete List of Simplified Characters'' has merged “” with “” and simplified to “”, later templates changed the word to “”.) in 1964 noted that can be used in names and citing Classical Chinese texts, ''
Table of General Standard Chinese Characters'' ( zh, c=通用規範漢字表, p=Tōngyòng Guīfàn Hànzì Biǎo) in 2013 has accepted (2013:7679) to be used in names.
* (65–65): The character has been merged with “” (26-83) in the ''Complete List of Simplified Characters'', and did not have any notes about unclear usage, but GB/T 2312 had included this character.
* (84–80): The original document used the character “” with traditional part, but the ''Complete List of Simplified Characters'' has stated that “” should be simplified to “”; the corresponding Simplified Chinese character “” was submitted to Unicode by Japan as
Shinjitai
are the simplified forms of kanji used in Japan since the promulgation of the Tōyō Kanji List in 1946. Some of the new forms found in ''shinjitai'' are also found in Simplified Chinese characters, but ''shinjitai'' is generally not as extensi ...
“”. Although GB 5007.1–85 has changed “” with “”, however, the following amendments (GB 5007.1–2001 and GB/T 5007.1–2010) keeps the unsimplified form. ''
Table of General Standard Chinese Characters'' included “” on 2013:7748.
Corrections
GB 5007.1-85 ''24x24
Bitmap Font
In movable type, metal typesetting, a font is a particular #Characteristics, size, weight and style of a typeface. Each font is a matched set of type, with a piece (a "Sort (typesetting), sort") for each glyph. A typeface consists of a range of ...
Set of Chinese Characters for Information Exchange'' ( zh, c=信息交换用汉字 24x24 点阵字模集) is the earliest font template based on GB/T 2312 that features corrections and extensions including:
* changing the glyph shape of
Latin alphabet
The Latin alphabet or Roman alphabet is the collection of letters originally used by the ancient Romans to write the Latin language. Largely unaltered with the exception of extensions (such as diacritics), it used to write English and the ...
"g"
* adding 6
Hanyu Pinyin
Hanyu Pinyin (), often shortened to just pinyin, is the official romanization system for Standard Mandarin Chinese in China, and to some extent, in Singapore and Malaysia. It is often used to teach Mandarin, normally written in Chinese for ...
characters:
ɑ,
ḿ
Ḿ, ḿ ( m- acute) is a letter in Chinese pinyin. In Chinese pinyin ḿ is the ''yángpíng'' tone (阳平, high-rising tone) of “m”. It was also used in an old version of the Sorbian alphabet and in older Polish.
This letter is also use ...
,
ń,
ň,
ǹ,
ɡ
* changed “” to “”
* included 94 half-width glyphs in row 10 (half-width form of row 3, equivalent to GB 1988–80
* included half-width form of 32 Hanyu Pinyin characters from row 8 in row 11.
GB/T 2312 did not have corrections, but these corrections are included in font templates that are based on GB/T 2312 including GB/T 12345; its supersets
GBK and
GB 18030
GB 18030 is a Chinese government standard, described as ''Information Technology — Chinese coded character set'' and defines the required language and character support necessary for software in China. GB18030 is the registered Internet n ...
also included these corrections. GB/T 2312 is also used in
ISO-IR-165.
See also
*
Guobiao code
*
CJK
*
Chinese character encoding
In computing, Chinese character encodings can be used to represent text written in the CJK languages—Chinese, Japanese, Korean—and (rarely) obsolete Vietnamese, all of which use Chinese characters. Several general-purpose character enc ...
*
Unicode
Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, ...
*
Big5
Big-5 or Big5 is a Chinese character encoding method used in Taiwan, Hong Kong, and Macau for traditional Chinese characters.
The People's Republic of China (PRC), which uses simplified Chinese characters, uses the GB 18030 character s ...
standard used in Taiwan and Hong Kong
*
GB 18030
GB 18030 is a Chinese government standard, described as ''Information Technology — Chinese coded character set'' and defines the required language and character support necessary for software in China. GB18030 is the registered Internet n ...
, which has superseded GB/T 2312-1980
*
GB/T 12345-1990, traditional counterpart of GB/T 2312-1980, superseded by GB18030
References
Notes
Further reading
*
External links
Graphical View of GB2312 in ICU's Converter ExplorerChinese Character CodesCoded Chinese Graphic Character Set for Information Interchange ISO-IR 58C code generates 6763 basic characters with output
{{Character encoding
Character sets
2312
Encodings of Asian languages
Chinese-language computing