HOME

TheInfoList



OR:

The Hong Kong Supplementary Character Set (; commonly abbreviated to HKSCS) is a set of
Chinese character Chinese characters () are logograms developed for the Written Chinese, writing of Chinese. In addition, they have been adapted to write other East Asian languages, and remain a key component of the Japanese writing system where they are k ...
s – 4,702 in total in the initial release—used in
Cantonese Cantonese ( zh, t=廣東話, s=广东话, first=t, cy=Gwóngdūng wá) is a language within the Chinese (Sinitic) branch of the Sino-Tibetan languages originating from the city of Guangzhou (historically known as Canton) and its surrounding are ...
, as well as when writing the names of some places in Hong Kong (whether in
written Cantonese Written Cantonese is the most complete written form of Chinese after that for Mandarin Chinese and Classical Chinese. Written Chinese was originally developed for Classical Chinese, and was the main literary language of China until the 19th cent ...
or
standard written Chinese Written vernacular Chinese, also known as Baihua () or Huawen (), is the forms of written Chinese based on the varieties of Chinese spoken throughout China, in contrast to Classical Chinese, the written standard used during imperial China up to ...
sentences). It evolved from the preceding Government Chinese Character Set () or GCCS. GCCS is a set of supplementary
Chinese character Chinese characters () are logograms developed for the Written Chinese, writing of Chinese. In addition, they have been adapted to write other East Asian languages, and remain a key component of the Japanese writing system where they are k ...
s coded in the user-defined areas of the
Big5 Big-5 or Big5 is a Chinese character encoding method used in Taiwan, Hong Kong, and Macau for traditional Chinese characters. The People's Republic of China (PRC), which uses simplified Chinese characters, uses the GB 18030 character set inst ...
character set. It was originally used within the
Hong Kong Government The Government of the Hong Kong Special Administrative Region, commonly known as the Hong Kong Government or HKSAR Government, refers to the Executive (government), executive authorities of Hong Kong Special administrative regions of China, ...
and later used by the public. It later evolved into Hong Kong Supplementary Character Set when the characters in the set were submitted to
ISO-10646 The Universal Coded Character Set (UCS, Unicode) is a standard set of characters defined by the international standard ISO/IEC 10646, ''Information technology — Universal Coded Character Set (UCS)'' (plus amendments to that standard), wh ...
for coding.


Development history

Due to the inherent differences between
standard written Chinese Written vernacular Chinese, also known as Baihua () or Huawen (), is the forms of written Chinese based on the varieties of Chinese spoken throughout China, in contrast to Classical Chinese, the written standard used during imperial China up to ...
and
written Cantonese Written Cantonese is the most complete written form of Chinese after that for Mandarin Chinese and Classical Chinese. Written Chinese was originally developed for Classical Chinese, and was the main literary language of China until the 19th cent ...
, the Government of Hong Kong recognised the need for a standardised set of ''proprietary'' characters that would allow for the streamlining of electronic communication; at the time, the
Big5 Big-5 or Big5 is a Chinese character encoding method used in Taiwan, Hong Kong, and Macau for traditional Chinese characters. The People's Republic of China (PRC), which uses simplified Chinese characters, uses the GB 18030 character set inst ...
Chinese encoding scheme did not contain a vast majority of these characters (some were erroneously cross-listed with similar characters). The Government Chinese Character Set () or GCCS was thus developed by the government. The character set consists of Chinese characters commonly used in Hong Kong. Some characters are
Cantonese Cantonese ( zh, t=廣東話, s=广东话, first=t, cy=Gwóngdūng wá) is a language within the Chinese (Sinitic) branch of the Sino-Tibetan languages originating from the city of Guangzhou (historically known as Canton) and its surrounding are ...
-specific, while some are alternative forms of characters. The set is not well-organised and the characters are not closely examined. Subsequently, the HKSCS-1999 (HKSCS 1999 specification) was developed. Following its acceptance, newer revisions were released in 2001 (adding 116 new characters) and in 2004 (adding 123 new characters), totalling 4,941 characters. 106 GCCS characters were removed in HKSCS-1999 as a result of unification, and their Big5 code points are reserved for compatibility. Retired "not verifiable" GCCS characters are found in UTC Sources (UTC-00877–UTC-00898), where they are sourced from Adobe-CNS1-1, an Adobe-CNS1 supplement implemented to support GCCS. The HKSCS is encoded in
Big5 Big-5 or Big5 is a Chinese character encoding method used in Taiwan, Hong Kong, and Macau for traditional Chinese characters. The People's Republic of China (PRC), which uses simplified Chinese characters, uses the GB 18030 character set inst ...
(Big5-HKSCS, big5hk) and
ISO 10646 ISO is the most common abbreviation for the International Organization for Standardization. ISO or Iso may also refer to: Business and finance * Iso (supermarket), a chain of Danish supermarkets incorporated into the SuperBest chain in 2007 * Iso ...
(
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expre ...
). Starting from HKSCS-2004, all characters previously using the Private Use Area section of Unicode are remapped, with many of them reassigned to Extension B Block or Supplementary Ideographic Plane Compatibility Block. However, to preserve compatibility with programs that generated PUA code points, the allocated code points are reserved, and no new characters will be mapped to PUA.


Version history

The HKSCS has gone through a few iterations. The last edition of HKSCS to encode all of its characters in Big5 was HKSCS-2008, while the characters added in HKSCS-2016 are mapped to Unicode only (as a
CJK Unified Ideographs The Chinese, Japanese and Korean (CJK) scripts share a common background, collectively known as CJK characters. In the process called Han unification, the common (shared) characters were identified and named CJK Unified Ideographs. As of Unicode ...
horizontal glyph extension where appropriate).


Macao Supplementary Character Set

Similarly to Hong Kong's situation, there are also characters that are needed by Macao but included in neither Big5 nor HKSCS, hence, the ''Macao Supplementary Character Set'' was developed, building on HKSCS with additional Unicode-mapped characters. The first batch of 121 MSCS characters were submitted for addition to or horizontal extension in Unicode (as appropriate) in 2009, and the first final version of MSCS was established in 2020.


Compatibility


Operating systems


Microsoft Windows

In
Microsoft Windows Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for serv ...
98, NT 4.0, 2000, XP, HKSCS support can be enabled using Microsoft's patch. In Microsoft's implementation, application using
code page 950 Code page 950 is the code page used on Microsoft Windows for Traditional Chinese. It is Microsoft's implementation of the ''de facto'' standard Big5 character encoding. The code page is not registered with IANA, and hence, it is not a standard t ...
automatically uses a hidden
code page 951 Code page 951 is a code page number used for different purposes by IBM and Microsoft. * IBM uses the code page number 951 for their double-byte PC Data KS code, the double byte component of their code page 949, an encoding for the Korean language. ...
table for the
Big5 Big-5 or Big5 is a Chinese character encoding method used in Taiwan, Hong Kong, and Macau for traditional Chinese characters. The People's Republic of China (PRC), which uses simplified Chinese characters, uses the GB 18030 character set inst ...
encoding of the HKSCS extensions. The table supports all code points in HKSCS-2001, except for the compatibility code points specified by the standard. In addition, the MingLiU font is altered using Microsoft's patch. This patch is known to create conflicts in applications such as
Microsoft Office Microsoft Office, or simply Office, is the former name of a family of client software, server software, and services developed by Microsoft. It was first announced by Bill Gates on August 1, 1988, at COMDEX in Las Vegas. Initially a marketin ...
, or any application using fonts supporting
simplified Chinese characters Simplified Chinese characters are standardized Chinese characters used in mainland China, Malaysia and Singapore, as prescribed by the ''Table of General Standard Chinese Characters''. Along with traditional Chinese characters, they are one o ...
(e.g.:
SimSun Ming or Song is a category of typefaces used to display Chinese characters, which are used in the Chinese, Japanese and Korean languages. They are currently the most common style of type in print for Chinese and Japanese. Name The names ...
). If the target environment contains custom font mapped to the code points affected by Microsoft's patch, the custom fonts can undo Microsoft's patch. Furthermore, the patch breaks EUDC Editor supplied with the affected versions of Windows. Starting with Windows Vista, HKSCS-2004 characters are only supported as Unicode 4.1 or later. All characters are assigned standard, non- PUA codepoints. The characters are displayed with the
MingLiU Ming or Song is a category of typefaces used to display Chinese characters, which are used in the Chinese, Japanese and Korean languages. They are currently the most common style of type in print for Chinese and Japanese. Name The names ...
font, and these characters can be entered via the keyboard. The patch that provides
Big5 Big-5 or Big5 is a Chinese character encoding method used in Taiwan, Hong Kong, and Macau for traditional Chinese characters. The People's Republic of China (PRC), which uses simplified Chinese characters, uses the GB 18030 character set inst ...
encoding of HKSCS is unsupported in Windows Vista and later. A utility provided by Microsoft is available to convert HKSCS and Unicode PUA-encoded characters to Unicode 4.1 version. In 2010, Microsoft published a HKSCS-2004 patch for Windows XP and Windows Server 2003. It replaces Windows XP version of MingLiu, PMingLiu, and MingLiu_HKSCS (if HKSCS-2001 patch was applied) with Windows 7 version of MingLiu, PMingLiu and MingLiu_HKSCS. In addition, MingLiU-ExtB, MingLiU_HKSCS-ExtB and PMingLiU-ExtB fonts will be added onto target system. However, IME is not updated as it was in the case of HKSCS-2001 patch, and the fonts are from pre-release of Windows 7. For earlier versions of the OS, HKSCS support requires the use of Microsoft's patch, or the Hong Kong government's Digital 21's utilities.


IBM

IBM assigns CCSID 5471 to the HKSCS-2001 Big5
code page In computing, a code page is a character encoding and as such it is a specific association of a set of printable characters and control characters with unique numbers. Typically each number represents the binary value in a single byte. (In some co ...
(with CPGID 1374 as CCSID 5470 as the double byte component), CCSID 9567 to the HKSCS-2004 code page (with CPGID 1374 as CCSID 9566 as the double byte component), and CCSID 13663 to the HKSCS-2008 code page (with CPGID 1374 as CCSID 13662 as the double byte component), while CCSID 1375 (with CPGID 1374 as CCSID 1374 as its double byte component) is assigned to a growing HKSCS code page, currently equivalent to CCSID 13663.


Linux

HKSCS support was added to
glibc The GNU C Library, commonly known as glibc, is the GNU Project's implementation of the C standard library. Despite its name, it now also directly supports C++ (and, indirectly, other programming languages). It was started in the 1980s by ...
in 2000, but it has not been updated since then. HKSCS-2004 support is handled as Unicode 4.1 and later. For
freedesktop.org freedesktop.org (fd.o) is a project to work on interoperability and shared base technology for free-software desktop environments for the X Window System (X11) and Wayland on Linux and other Unix-like operating systems. It was founded by Hav ...
setup, ''AR PL ShanHeiSun Uni'' font fully supports HKSCS-2004 since 0.1-0.dot.1, with latest revision of HKSCS-2004 supported in version 0.1.20060903-1. Modern desktop distributions (e.g. Ubuntu) include
Arphic Technology Arphic Technology Co., Ltd. (, aka.: Arphic Technology (文鼎科技)) is a type foundry based in Taiwan (Republic of China), founded in May 1990. Fonts Arphic PL Fonts Arphic Technology is the creator of the Arphic PL Fonts (where "PL" means " ...
's HKSCS-compliant UKai and UMing fonts out of the box when Traditional Chinese Language support is selected during installation. They can also be installed manually at a later time.


Mac OS

Mac OS X macOS (; previously OS X and originally Mac OS X) is a Unix operating system developed and marketed by Apple Inc. since 2001. It is the primary operating system for Apple's Mac (computer), Mac computers. Within the market of ...
10.0–10.2 supports HKSCS-1999. 10.3–10.4 supports HKSCS-2001. Some of the letters added to HKSCS-2004 is supported via Unicode PUA in OS X 10.4. Starting with OS X 10.5, all the HKSCS-2004 characters are supported via standard Unicode 4.1 code points.


Applications and the Web

Mozilla Mozilla (stylized as moz://a) is a free software community founded in 1998 by members of Netscape. The Mozilla community uses, develops, spreads and supports Mozilla products, thereby promoting exclusively free software and open standards, wi ...
1.5 and above supports HKSCS, with HKSCS-2004 support added into Gecko 1.8.1 code base. Unlike the above-mentioned patch, Mozilla uses its own code page table. However, the fix for bug 343129 does not support characters mapped to code points above Basic Multilingual Plane. QT 3.x-based applications (e.g.:
KDE KDE is an international free software community that develops free and open-source software. As a central development hub, it provides tools and resources that allow collaborative work on this kind of software. Well-known products include the ...
) only support characters mapped to code points FFFF or lower. In QT4, characters outside BMP are supported via surrogates. Big5-HKSCS Text Codec supports HKSCS-1999 back in Qt-2.3.x, but it was too late in Qt development schedule to be officially included in the Qt-2.3.x series, so it was officially supported in Qt-3.0.1. HKSCS-2001 support was added in Qt-3.0.5.
GNOME A gnome is a mythological creature and diminutive spirit in Renaissance magic and alchemy, first introduced by Paracelsus in the 16th century and later adopted by more recent authors including those of modern fantasy literature. Its characte ...
supports HKSCS characters in Unicode ranges, except those mapped to the Basic Multilingual Plane compatibility block. Patches to support characters mapped to above Basic Multilingual Plane was introduced during Pango 1.1. The
WHATWG The Web Hypertext Application Technology Working Group (WHATWG) is a community of people interested in evolving HTML and related technologies. The WHATWG was founded by individuals from Apple Inc., the Mozilla Foundation and Opera Software, lea ...
Encoding Standard (used by
HTML5 HTML5 is a markup language used for structuring and presenting content on the World Wide Web. It is the fifth and final major HTML version that is a World Wide Web Consortium (W3C) recommendation. The current specification is known as the HTML ...
) includes HKSCS in its definition of Big5 (used even with the plain Big5 label). However, only its decoder uses all HKSCS extensions, while its encoder explicitly excludes those with lead bytes below 0xA1 (thus excluding most of the HKSCS extensions but including, for example, those inherited from Big5 ETEN). Newer browsers follow this standard, including
Firefox Mozilla Firefox, or simply Firefox, is a free and open-source web browser developed by the Mozilla Foundation and its subsidiary, the Mozilla Corporation. It uses the Gecko rendering engine to display web pages, which implements current and ...
.


See also

*
Cantonese Cantonese ( zh, t=廣東話, s=广东话, first=t, cy=Gwóngdūng wá) is a language within the Chinese (Sinitic) branch of the Sino-Tibetan languages originating from the city of Guangzhou (historically known as Canton) and its surrounding are ...
*
Written Cantonese Written Cantonese is the most complete written form of Chinese after that for Mandarin Chinese and Classical Chinese. Written Chinese was originally developed for Classical Chinese, and was the main literary language of China until the 19th cent ...


Notes


References


External links


Hong Kong Government site on the HKSCS
Downloadable HKSCS documents & font
Microsoft HKSCS Support for Windows Platform
Download page of Dynalab ()'s HKSCS font.
Graphical View of Big5-HKSCS in ICU's Converter ExplorerA character set that works on Mac OS XUMing/UKai – A free, open-source font supporting HKSCSOpen Source Hong Kong Fonts Project
{{character encoding Character sets Culture of Hong Kong Cantonese language Chinese characters Chinese-language computing