HOME

TheInfoList



OR:

The Hong Kong Supplementary Character Set (; commonly abbreviated to HKSCS) is a set of
Chinese character Chinese characters () are logograms developed for the writing of Chinese. In addition, they have been adapted to write other East Asian languages, and remain a key component of the Japanese writing system where they are known as ''kanji' ...
s – 4,702 in total in the initial release—used in
Cantonese Cantonese ( zh, t=廣東話, s=广东话, first=t, cy=Gwóngdūng wá) is a language within the Chinese (Sinitic) branch of the Sino-Tibetan languages originating from the city of Guangzhou (historically known as Canton) and its surrounding a ...
, as well as when writing the names of some places in Hong Kong (whether in written Cantonese or standard written Chinese sentences). It evolved from the preceding Government Chinese Character Set () or GCCS. GCCS is a set of supplementary
Chinese character Chinese characters () are logograms developed for the writing of Chinese. In addition, they have been adapted to write other East Asian languages, and remain a key component of the Japanese writing system where they are known as ''kanji' ...
s coded in the user-defined areas of the
Big5 Big-5 or Big5 is a Chinese character encoding method used in Taiwan, Hong Kong, and Macau for traditional Chinese characters. The People's Republic of China (PRC), which uses simplified Chinese characters, uses the GB 18030 character se ...
character set. It was originally used within the Hong Kong Government and later used by the public. It later evolved into Hong Kong Supplementary Character Set when the characters in the set were submitted to ISO-10646 for coding.


Development history

Due to the inherent differences between standard written Chinese and written Cantonese, the Government of Hong Kong recognised the need for a standardised set of ''proprietary'' characters that would allow for the streamlining of electronic communication; at the time, the
Big5 Big-5 or Big5 is a Chinese character encoding method used in Taiwan, Hong Kong, and Macau for traditional Chinese characters. The People's Republic of China (PRC), which uses simplified Chinese characters, uses the GB 18030 character se ...
Chinese encoding scheme did not contain a vast majority of these characters (some were erroneously cross-listed with similar characters). The Government Chinese Character Set () or GCCS was thus developed by the government. The character set consists of Chinese characters commonly used in Hong Kong. Some characters are
Cantonese Cantonese ( zh, t=廣東話, s=广东话, first=t, cy=Gwóngdūng wá) is a language within the Chinese (Sinitic) branch of the Sino-Tibetan languages originating from the city of Guangzhou (historically known as Canton) and its surrounding a ...
-specific, while some are alternative forms of characters. The set is not well-organised and the characters are not closely examined. Subsequently, the HKSCS-1999 (HKSCS 1999 specification) was developed. Following its acceptance, newer revisions were released in 2001 (adding 116 new characters) and in 2004 (adding 123 new characters), totalling 4,941 characters. 106 GCCS characters were removed in HKSCS-1999 as a result of unification, and their Big5 code points are reserved for compatibility. Retired "not verifiable" GCCS characters are found in UTC Sources (UTC-00877–UTC-00898), where they are sourced from Adobe-CNS1-1, an Adobe-CNS1 supplement implemented to support GCCS. The HKSCS is encoded in
Big5 Big-5 or Big5 is a Chinese character encoding method used in Taiwan, Hong Kong, and Macau for traditional Chinese characters. The People's Republic of China (PRC), which uses simplified Chinese characters, uses the GB 18030 character se ...
(Big5-HKSCS, big5hk) and ISO 10646 (
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, ...
). Starting from HKSCS-2004, all characters previously using the Private Use Area section of Unicode are remapped, with many of them reassigned to Extension B Block or Supplementary Ideographic Plane Compatibility Block. However, to preserve compatibility with programs that generated PUA code points, the allocated code points are reserved, and no new characters will be mapped to PUA.


Version history

The HKSCS has gone through a few iterations. The last edition of HKSCS to encode all of its characters in Big5 was HKSCS-2008, while the characters added in HKSCS-2016 are mapped to Unicode only (as a
CJK Unified Ideographs The Chinese, Japanese and Korean (CJK) scripts share a common background, collectively known as CJK characters. In the process called Han unification, the common (shared) characters were identified and named CJK Unified Ideographs. As of Unicode ...
horizontal glyph extension where appropriate).


Macao Supplementary Character Set

Similarly to Hong Kong's situation, there are also characters that are needed by Macao but included in neither Big5 nor HKSCS, hence, the ''Macao Supplementary Character Set'' was developed, building on HKSCS with additional Unicode-mapped characters. The first batch of 121 MSCS characters were submitted for addition to or horizontal extension in Unicode (as appropriate) in 2009, and the first final version of MSCS was established in 2020.


Compatibility


Operating systems


Microsoft Windows

In
Microsoft Windows Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for ...
98, NT 4.0, 2000, XP, HKSCS support can be enabled using Microsoft's patch. In Microsoft's implementation, application using
code page 950 Code page 950 is the code page used on Microsoft Windows for Traditional Chinese. It is Microsoft's implementation of the '' de facto'' standard Big5 character encoding. The code page is not registered with IANA, and hence, it is not a standard ...
automatically uses a hidden code page 951 table for the
Big5 Big-5 or Big5 is a Chinese character encoding method used in Taiwan, Hong Kong, and Macau for traditional Chinese characters. The People's Republic of China (PRC), which uses simplified Chinese characters, uses the GB 18030 character se ...
encoding of the HKSCS extensions. The table supports all code points in HKSCS-2001, except for the compatibility code points specified by the standard. In addition, the MingLiU font is altered using Microsoft's patch. This patch is known to create conflicts in applications such as Microsoft Office, or any application using fonts supporting simplified Chinese characters (e.g.:
SimSun Ming or Song is a category of typefaces used to display Chinese characters, which are used in the Chinese, Japanese and Korean languages. They are currently the most common style of type in print for Chinese and Japanese. Name The names ' ...
). If the target environment contains custom font mapped to the code points affected by Microsoft's patch, the custom fonts can undo Microsoft's patch. Furthermore, the patch breaks EUDC Editor supplied with the affected versions of Windows. Starting with Windows Vista, HKSCS-2004 characters are only supported as Unicode 4.1 or later. All characters are assigned standard, non- PUA codepoints. The characters are displayed with the
MingLiU Ming or Song is a category of typefaces used to display Chinese characters, which are used in the Chinese, Japanese and Korean languages. They are currently the most common style of type in print for Chinese and Japanese. Name The names ' ...
font, and these characters can be entered via the keyboard. The patch that provides
Big5 Big-5 or Big5 is a Chinese character encoding method used in Taiwan, Hong Kong, and Macau for traditional Chinese characters. The People's Republic of China (PRC), which uses simplified Chinese characters, uses the GB 18030 character se ...
encoding of HKSCS is unsupported in Windows Vista and later. A utility provided by Microsoft is available to convert HKSCS and Unicode PUA-encoded characters to Unicode 4.1 version. In 2010, Microsoft published a HKSCS-2004 patch for Windows XP and Windows Server 2003. It replaces Windows XP version of MingLiu, PMingLiu, and MingLiu_HKSCS (if HKSCS-2001 patch was applied) with Windows 7 version of MingLiu, PMingLiu and MingLiu_HKSCS. In addition, MingLiU-ExtB, MingLiU_HKSCS-ExtB and PMingLiU-ExtB fonts will be added onto target system. However, IME is not updated as it was in the case of HKSCS-2001 patch, and the fonts are from pre-release of Windows 7. For earlier versions of the OS, HKSCS support requires the use of Microsoft's patch, or the Hong Kong government's Digital 21's utilities.


IBM

IBM assigns CCSID 5471 to the HKSCS-2001 Big5 code page (with CPGID 1374 as CCSID 5470 as the double byte component), CCSID 9567 to the HKSCS-2004 code page (with CPGID 1374 as CCSID 9566 as the double byte component), and CCSID 13663 to the HKSCS-2008 code page (with CPGID 1374 as CCSID 13662 as the double byte component), while CCSID 1375 (with CPGID 1374 as CCSID 1374 as its double byte component) is assigned to a growing HKSCS code page, currently equivalent to CCSID 13663.


Linux

HKSCS support was added to
glibc The GNU C Library, commonly known as glibc, is the GNU Project's implementation of the C standard library. Despite its name, it now also directly supports C++ (and, indirectly, other programming languages). It was started in the 1980s ...
in 2000, but it has not been updated since then. HKSCS-2004 support is handled as Unicode 4.1 and later. For freedesktop.org setup, ''AR PL ShanHeiSun Uni'' font fully supports HKSCS-2004 since 0.1-0.dot.1, with latest revision of HKSCS-2004 supported in version 0.1.20060903-1. Modern desktop distributions (e.g. Ubuntu) include Arphic Technology's HKSCS-compliant UKai and UMing fonts out of the box when Traditional Chinese Language support is selected during installation. They can also be installed manually at a later time.


Mac OS

Mac OS X 10.0–10.2 supports HKSCS-1999. 10.3–10.4 supports HKSCS-2001. Some of the letters added to HKSCS-2004 is supported via Unicode PUA in OS X 10.4. Starting with OS X 10.5, all the HKSCS-2004 characters are supported via standard Unicode 4.1 code points.


Applications and the Web

Mozilla Mozilla (stylized as moz://a) is a free software community founded in 1998 by members of Netscape. The Mozilla community uses, develops, spreads and supports Mozilla products, thereby promoting exclusively free software and open standards, ...
1.5 and above supports HKSCS, with HKSCS-2004 support added into Gecko 1.8.1 code base. Unlike the above-mentioned patch, Mozilla uses its own code page table. However, the fix for bug 343129 does not support characters mapped to code points above Basic Multilingual Plane. QT 3.x-based applications (e.g.: KDE) only support characters mapped to code points FFFF or lower. In QT4, characters outside BMP are supported via surrogates. Big5-HKSCS Text Codec supports HKSCS-1999 back in Qt-2.3.x, but it was too late in Qt development schedule to be officially included in the Qt-2.3.x series, so it was officially supported in Qt-3.0.1. HKSCS-2001 support was added in Qt-3.0.5. GNOME supports HKSCS characters in Unicode ranges, except those mapped to the Basic Multilingual Plane compatibility block. Patches to support characters mapped to above Basic Multilingual Plane was introduced during Pango 1.1. The WHATWG Encoding Standard (used by HTML5) includes HKSCS in its definition of Big5 (used even with the plain Big5 label). However, only its decoder uses all HKSCS extensions, while its encoder explicitly excludes those with lead bytes below 0xA1 (thus excluding most of the HKSCS extensions but including, for example, those inherited from Big5 ETEN). Newer browsers follow this standard, including Firefox.


See also

*
Cantonese Cantonese ( zh, t=廣東話, s=广东话, first=t, cy=Gwóngdūng wá) is a language within the Chinese (Sinitic) branch of the Sino-Tibetan languages originating from the city of Guangzhou (historically known as Canton) and its surrounding a ...
* Written Cantonese


Notes


References


External links


Hong Kong Government site on the HKSCS
Downloadable HKSCS documents & font
Microsoft HKSCS Support for Windows Platform
Download page of Dynalab ()'s HKSCS font.
Graphical View of Big5-HKSCS in ICU's Converter ExplorerA character set that works on Mac OS XUMing/UKai – A free, open-source font supporting HKSCSOpen Source Hong Kong Fonts Project
{{character encoding Character sets Culture of Hong Kong Cantonese language Chinese characters Chinese-language computing