Chinese Character Encoding
   HOME
*





Chinese Character Encoding
In computing, Chinese character encodings can be used to represent text written in the CJK languages—Chinese, Japanese, Korean—and (rarely) obsolete Vietnamese, all of which use Chinese characters. Several general-purpose character encodings accommodate Chinese characters, and some of them were developed specifically for Chinese. In addition to Unicode (with the set of CJK Unified Ideographs), local encoding systems exist. The Chinese Guobiao (or GB, "national standard") system is used in Mainland China and Singapore, and the (mainly) Taiwanese Big5 system is used in Taiwan, Hong Kong and Macau as the two primary "legacy" local encoding systems. Guobiao is usually displayed using simplified characters and Big5 is usually displayed using traditional characters. There is however no mandated connection between the encoding system and the font used to display the characters; font and encoding are usually tied together for practical reasons. The issue of which encoding to ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Chinese Language
Chinese (, especially when referring to written Chinese) is a group of languages spoken natively by the ethnic Han Chinese majority and many minority ethnic groups in Greater China. About 1.3 billion people (or approximately 16% of the world's population) speak a variety of Chinese as their first language. Chinese languages form the Sinitic branch of the Sino-Tibetan languages family. The spoken varieties of Chinese are usually considered by native speakers to be variants of a single language. However, their lack of mutual intelligibility means they are sometimes considered separate languages in a family. Investigation of the historical relationships among the varieties of Chinese is ongoing. Currently, most classifications posit 7 to 13 main regional groups based on phonetic developments from Middle Chinese, of which the most spoken by far is Mandarin (with about 800 million speakers, or 66%), followed by Min (75 million, e.g. Southern Min), Wu (74 million, e.g. Shangh ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


China
China, officially the People's Republic of China (PRC), is a country in East Asia. It is the world's most populous country, with a population exceeding 1.4 billion, slightly ahead of India. China spans the equivalent of five time zones and borders fourteen countries by land, the most of any country in the world, tied with Russia. Covering an area of approximately , it is the world's third largest country by total land area. The country consists of 22 provinces, five autonomous regions, four municipalities, and two Special Administrative Regions (Hong Kong and Macau). The national capital is Beijing, and the most populous city and financial center is Shanghai. Modern Chinese trace their origins to a cradle of civilization in the fertile basin of the Yellow River in the North China Plain. The semi-legendary Xia dynasty in the 21st century BCE and the well-attested Shang and Zhou dynasties developed a bureaucratic political system to serve hereditary monarchies, or dyna ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


PostScript Fonts
PostScript fonts are font files encoded in outline font specifications developed by Adobe Systems for professional digital typesetting. This system uses PostScript file format to encode font information. "PostScript fonts" may also separately be used to refer to a basic set of fonts included as standards in the PostScript system, such as Times New Roman, Helvetica, and Avant Garde. History Type 1 and Type 3 fonts, though introduced by Adobe in 1984 as part of the PostScript page description language, did not see widespread use until March 1985 when the first laser printer to use the PostScript language, the Apple LaserWriter, was introduced. Even then, in 1985, the outline fonts were resident only in the printer, and the screen used bitmap fonts as substitutes for outline fonts. Although originally part of PostScript, Type 1 fonts used a simplified set of drawing operations compared to ordinary PostScript (programmatic elements such as loops and variables were removed, much l ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Ethnic Minorities In China
Ethnic minorities in China are the non-Han Chinese, Han population in the People's Republic of China (PRC). The PRC officially recognizes 55 minority group, ethnic minority groups within China in addition to the Han majority. As of 2010, the combined population of officially-recognized minority groups comprised 8.49% of the population of mainland China. In addition to these officially-recognized ethnic minority groups, there are Chinese nationals who privately classify themselves as members of unrecognized ethnic groups in China, unrecognized ethnic groups, such as the very small Chinese history of the Jews in China, Jewish, Tuvans, Tuvan, and Ili Turk people, Ili Turk communities, as well as the much larger Oirats, Oirat and Japanese people in China, Japanese communities. In Chinese, 'ethnic minority' has translated to (), wherein () means 'Nationalities (ethnic affiliations), nationality' or 'nation' (as in ethnic group)—in line with the Soviet concept of ethnicity—a ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


GB 18030
GB 18030 is a Chinese government standard, described as ''Information Technology — Chinese coded character set'' and defines the required language and character support necessary for software in China. GB18030 is the registered Internet name for the official character set of the People's Republic of China (PRC) superseding GB2312. As a Unicode Transformation Format (i.e. an encoding of all Unicode code points), GB18030 supports both simplified and traditional Chinese characters. It is also compatible with legacy encodings including GB2312, CP936, and GBK 1.0. In addition to the "GB18030 character encoding", this standard contains requirements about which scripts must be supported, font support, etc. As of 2022, in terms of font implementations, "only the Simplified Chinese fonts of the ''Noto Sans CJK'' (Google), ''Source Han Mono'' (Adobe), and ''Source Han Sans'' (Adobe) typeface families are already compliant with GB 18030-2022 Implementation Level 2 .''Microsoft ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Code Page 1386
Windows Code page 936 (abbreviated MS936, Windows-936 or (ambiguously) CP936), is Microsoft's character encoding for simplified Chinese, one of the four DBCSs for East Asian languages. Originally, Windows-936 covered GB 2312 (in its EUC-CN form), but it was expanded to cover most of GBK with the release of Windows 95. IBM's Code page 936 is a different encoding for Simplified Chinese, although International Components for Unicode does not include an IBM-936 codec, and uses the Windows code page for the "cp936" label. IBM's code page for GBK coverage is Code page 1386 (CP1386 or IBM-1386), which is defined as a combination of the single byte Code page 1114 and the double byte Code page 1385. It was superseded by code page 54936 (GB 18030), but was still prevalent in use. The Windows command prompt uses CP936 as the default code page for simplified Chinese installations, although part of the GB 18030 was made mandatory for all software products sold in China. In 2002, the IAN ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Traditional Chinese
A tradition is a belief or behavior (folk custom) passed down within a group or society with symbolic meaning or special significance with origins in the past. A component of cultural expressions and folklore, common examples include holidays or impractical but socially meaningful clothes (like lawyers' wigs or military officers' spurs), but the idea has also been applied to social norms such as greetings. Traditions can persist and evolve for thousands of years—the word ''tradition'' itself derives from the Latin ''tradere'' literally meaning to transmit, to hand over, to give for safekeeping. While it is commonly assumed that traditions have an ancient history, many traditions have been invented on purpose, whether that be political or cultural, over short periods of time. Various academic disciplines also use the word in a variety of ways. The phrase "according to tradition", or "by tradition", usually means that whatever information follows is known only by oral tradition, ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


GBK (character Encoding)
GBK is an extension of the GB 2312 character set for Simplified Chinese characters, used in the People's Republic of China. It includes all unified CJK characters found in , i.e. ISO/IEC 10646:1993, or Unicode 1.1. Since its initial release in 1993, GBK has been extended by Microsoft in Code page 936/1386, which was then extended into GBK 1.0. GBK is also the IANA-registered internet name for the Microsoft mapping, which differs from other implementations primarily by the single-byte euro sign at 0x80. ''GB'' abbreviates Guojia Biaozhun, which means ''national standard'' in Chinese, while ''K'' stands for ''Extension'' (扩展 ''kuòzhǎn''). GBK not only extended the old standard with Traditional Chinese characters, but also with Chinese characters that were simplified after the establishment of in 1981. With the arrival of GBK, certain names with characters formerly unrepresentable, like the 镕 (''róng'') character in former Chinese Premier Zhu Rongji's name, are now re ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


GB/T 12345
GB 12345, entitled ''Code of Chinese ideogram set for information interchange supplementary set'' ( zh, s=信息交換用漢字編碼字符集 輔助集), is a Traditional Chinese character set standard established by China, and can be thought as the traditional counterpart of GB 2312. It is used as an encoding of traditional Chinese characters, although it is not as commonly used as Big5. It has 6,866 characters, and has no relationship nor compatibility with Big5 and CNS 11643. Characters Characters in GB 12345 are arranged in a 94×94 grid (as in ISO/IEC 2022), and the two-byte code point of each character is expressed in the ''qu''-''wei'' form, which specifies a row (''qu'' 区) and the position of the character within the row (cell, ''wei'' 位). The rows (numbered from 1 to 94) contain characters as follows: * 01–09: identical to GB 2312, except in row 06 position 57–85, added 29 vertical punctation forms, and in row 08 position 27–32, added 6 pinyin characte ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


HZ (character Encoding)
The HZ character encoding is an encoding of GB 2312 that was formerly commonly used in email and USENET postings. It was designed in 1989 by Fung Fung Lee () of Stanford University, and subsequently codified in 1995 into RFC 1843. The HZ, short for ''Hanzi'' (), encoding was invented to facilitate the use of Chinese characters through e-mail, which at that time only allowed 7-bit characters. Therefore, in lieu of standard ISO 2022 escape sequences (as in the case of ISO-2022-JP) or 8-bit characters (as in the case of EUC), the HZ code uses only printable, 7-bit characters to represent Chinese characters. It was also popular in USENET networks, which in the late 1980s and early 1990s, generally did not allow transmission of 8-bit characters or escape characters. History HZ superseded the earlier "zW" encoding, which marked entire lines as being GB 2312 text by beginning them with the characters zW. Structure and use In the HZ encoding system, the character sequences "~" act as ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

EUC-CN
Extended Unix Code (EUC) is a multibyte character encoding system used primarily for Japanese, Korean, and simplified Chinese. The most commonly used EUC codes are variable-length encodings with a character belonging to an compliant coded character set (such as ASCII) taking one byte, and a character belonging to a 94x94 coded character set (such as ) represented in two bytes. The EUC-CN form of and EUC-KR are examples of such two-byte EUC codes. EUC-JP includes characters represented by up to three bytes, including an initial , whereas a single character in EUC-TW can take up to four bytes. Modern applications are more likely to use UTF-8, which supports all of the glyphs of the EUC codes, and more, and is generally more portable with fewer vendor deviations and errors. EUC is however still very popular, especially EUC-KR for South Korea. Encoding structure The structure of EUC is based on the standard, which specifies a system of graphical character sets which can be repres ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




GB 2312
is a key official character set of the People's Republic of China, used for Simplified Chinese characters. GB2312 is the registered internet name for EUC-CN, which is its usual encoded form. ''GB'' refers to the Guobiao standards (国家标准), whereas the ''T'' suffix ( zh, c= 推荐, p=tuījiàn, l=recommendation, labels=no) denotes a non-mandatory standard. was originally a mandatory national standard designated . However, following a National Standard Bulletin of the People's Republic of China in 2017, GB 2312 is no longer mandatory, and its standard code is modified to . has been superseded by GBK and GB 18030, which include additional characters, but remains in widespread use as a subset of those encodings. , GB2312 is the second-most popular encoding served from China and territories (after UTF-8), with 5.5% of web servers serving a page declaring it. Globally, GB2312 is declared on 0.1% of all web pages. However, all major web browsers decode GB2312-marked docume ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]