KPS 9566 ("''DPRK Standard Korean Graphic Character Set for Information Interchange''") is a

North Korea North Korea, officially the Democratic People's Republic of Korea (DPRK), is a country in East Asia. It constitutes the northern half of the Korean Peninsula and shares borders with China and Russia to the north, at the Yalu (Amnok) and T ...

n standard specifying a character encoding for the

Chosŏn'gŭl The Korean alphabet, known as Hangul, . Hangul may also be written as following South Korea's standard Romanization. ( ) in South Korea and Chosŏn'gŭl in North Korea, is the modern official writing system for the Korean language. The let ...

(Hangul) writing system used for the

Korean language Korean ( South Korean: , ''hangugeo''; North Korean: , ''chosŏnmal'') is the native language for about 80 million people, mostly of Korean descent. It is the official and national language of both North Korea and South Korea (geographic ...

. The edition of 1997 specified an

ISO 2022 ISO/IEC 2022 ''Information technology—Character code structure and extension techniques'', is an ISO/ IEC standard (equivalent to the ECMA standard ECMA-35, the ANSI standard ANSI X3.41 and the Japanese Industrial Standard JIS X 0202) in the ...

-compliant 94×94 two-byte

coded character set Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers. The numerical values that ...

. Subsequent editions have added additional encoded characters outside of the 94×94 plane, in a manner comparable to UHC or GBK. KPS 9566 differs in approach from

KS X 1001 KS X 1001, "''Code for Information Interchange (Hangul and Hanja)''", formerly called KS C 5601, is a South Korean coded character set standard to represent hangul and hanja characters on a computer. KS X 1001 is encoded by the most common le ...

, its

South Korea South Korea, officially the Republic of Korea (ROK), is a country in East Asia, constituting the southern part of the Korean Peninsula and sharing a land border with North Korea. Its western border is formed by the Yellow Sea, while its eas ...

n counterpart, in using a different ordering of chosŏn'gŭl, in encoding explicit vertical presentation forms of punctuation, in not encoding duplicate

hanja Hanja (Hangul: ; Hanja: , ), alternatively known as Hancha, are Chinese characters () used in the writing of Korean. Hanja was used as early as the Gojoseon period, the first ever Korean kingdom. (, ) refers to Sino-Korean vocabulary, ...

for multiple readings, and in including several characters specific to the North Korean political system, including special encodings for the names of the country's past and present leaders (

Kim Il-sung Kim Il-sung (; , ; born Kim Song-ju, ; 15 April 1912 – 8 July 1994) was a North Korean politician and the founder of North Korea, which he ruled from the country's establishment in 1948 until his death in 1994. He held the posts of ...

, Kim Jong-il and Kim Jong-un). Although KPS 9566 was the original source of several characters added to

Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, wh ...

, not all KPS 9566 characters have Unicode equivalents. Those which do not are mapped to similar Unicode characters or to the

Private Use Area In Unicode, a Private Use Area (PUA) is a range of code points that, by definition, will not be assigned characters by the Unicode Consortium. Three private use areas are defined: one in the Basic Multilingual Plane (), and one each in, and nearl ...

Background and other standards

The

ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...

character set originated in the

United States The United States of America (U.S.A. or USA), commonly known as the United States (U.S. or US) or America, is a country primarily located in North America. It consists of 50 states, a federal district, five major unincorporated territori ...

in 1963, and was revised in 1967 to the form it has today. ASCII also became accepted as an international standard in 1967, becoming ECMA-6, designated

ISO/IEC 646 ISO/IEC 646 is a set of International Organization for Standardization, ISO/International Electrotechnical Commission, IEC standards, described as ''Information technology — ISO 7-bit coded character set for information interchange' ...

by the

International Organization for Standardization The International Organization for Standardization (ISO ) is an international standard development organization composed of representatives from the national standards organizations of member countries. Membership requirements are given in Art ...

. It is presently designated ANSI X3.4-1986 and ISO 646:1991. ASCII was a 7-bit, single-byte encoding including 94 graphical characters, the space, and 33 control codes, which provided basic support for representing American English text as a series of bytes. The next edition of ISO 646, published in 1972, revised the standard to introduce the concept of national versions of the code, allowing countries to replace a few less commonly used codes with their own required characters. At the same time, work on defining extension mechanisms for ASCII was underway, with the intention of being applicable to both 7-bit and 8-bit environments. This was completed in 1973 and published as JIS X 0202, ECMA-35 and

. ISO 2022 specifies mechanisms for using single-byte and multiple-byte

character sets Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers. The numerical values that ...

with a certain structure in both 7-bit and 8-bit environments, and for declaring and switching between them in a standard fashion using shift codes and escape sequences. Countries in

East Asia East Asia is the eastern region of Asia, which is defined in both Geography, geographical and culture, ethno-cultural terms. The modern State (polity), states of East Asia include China, Japan, Mongolia, North Korea, South Korea, and Taiwan. ...

, due to using large repertoires of

Chinese characters Chinese characters () are logograms developed for the writing of Chinese. In addition, they have been adapted to write other East Asian languages, and remain a key component of the Japanese writing system where they are known as ''kanji ...

, introduced standardised double-byte encodings (DBCS) for their writing systems, since the number of characters representable in a single-byte code was not sufficient. In an ISO 2022 compliant DBCS, every character can be represented with two ASCII printing character bytes; the location of a character can be referenced by these byte values, or by two numbers from 1 to 94 (a

kuten JIS X 0208 is a 2-byte character set specified as a Japanese Industrial Standard, containing 6879 graphic characters suitable for writing text, place names, personal names, and so forth in the Japanese language. The official title of the current ...

), equal to the respective bytes minus 32. The first registered ISO 2022 compliant DBCS, and the first East Asian DBCS to be established as a national standard, was the first edition of

JIS X 0208 JIS X 0208 is a 2-byte character set specified as a Japanese Industrial Standards, Japanese Industrial Standard, containing 6879 graphic characters suitable for writing text, place names, personal names, and so forth in the Japanese language. Th ...

(Japan), published in 1978. This was followed by

GB 2312 is a key official character set of the People's Republic of China, used for Simplified Chinese characters. GB2312 is the registered internet name for EUC-CN, which is its usual encoded form. ''GB'' refers to the Guobiao standards (国家标准 ...

(Mainland China) in 1980, and by Wansung code (South Korea; first designated KS C 5601-1987) in 1987.

Big5 Big-5 or Big5 is a Chinese character encoding method used in Taiwan, Hong Kong, and Macau for traditional Chinese characters. The People's Republic of China (PRC), which uses simplified Chinese characters, uses the GB 18030 character set inst ...

(Taiwan), defined in 1984, did not follow the ISO 2022 structure. When used in an 8-bit (rather than 7-bit) environment, GB 2312 and Wansung code were usually used with the eighth bit set, with ASCII or a similar

SBCS SBCS, or Single Byte Character Set, is used to refer to character encodings that use exactly one byte for each graphic character. An SBCS can accommodate a maximum of 256 symbols, and is useful for scripts that do not have many symbols or accented ...

used with the eighth bit unset; these encoding schemes are known as

EUC-CN Extended Unix Code (EUC) is a multibyte character encoding system used primarily for Japanese, Korean, and simplified Chinese. The most commonly used EUC codes are variable-length encodings with a character belonging to an compliant coded chara ...

and

EUC-KR Extended Unix Code (EUC) is a multibyte character encoding system used primarily for Japanese, Korean, and simplified Chinese. The most commonly used EUC codes are variable-length encodings with a character belonging to an compliant coded char ...

, respectively. Although the Korean writing system includes individual symbols ( jamo) for consonants and vowels, serving as an

alphabet An alphabet is a standardized set of basic written graphemes (called letters) that represent the phonemes of certain spoken languages. Not all writing systems represent language in this way; in a syllabary, each character represents a syllab ...

, Korean text is properly typeset with these symbols composed into blocks for each syllable. Wansung code included individual Korean syllable blocks separately, treating them as a large set of characters similarly to

, and was first defined by the third edition of the South Korean standard KS C 5601. The first edition had defined an encoding of individual jamo which allowed syllable blocks to be encoded as sequences, which was named N-byte Hangul, and had not been adopted as widely as intended. Wansung code did not encode all possible modern Korean syllables, only a selection of the 2350 most common, although it allowed them to be specified using combining sequences, which often were not supported. An alternative encoding, also South Korean, named

Johab KS X 1001, "''Code for Information Interchange (Hangul and Hanja)''", formerly called KS C 5601, is a South Korean coded character set standard to represent hangul and hanja characters on a computer. KS X 1001 is encoded by the most common leg ...

did, and served as a competitor to Wansung for some time.

Unified Hangul Code Unified Hangul Code (UHC), or Extended Wansung, also known under Microsoft Windows as Code Page 949 (Windows-949, MS949 or ambiguously CP949), is the Microsoft Windows code page for the Korean language. It is an extension of Wansung Code (KS C ...

(UHC), introduced by Microsoft with

Windows 95 Windows 95 is a consumer-oriented operating system developed by Microsoft as part of its Windows 9x family of operating systems. The first operating system in the 9x family, it is the successor to Windows 3.1x, and was released to manufacturi ...

, extended EUC-KR, allowing the use of invalid EUC double-byte codes to represent all other syllables available in Johab. A similar approach was taken by the Mainland Chinese GBK encoding, extending

with support for Traditional Chinese and for less common Chinese characters by encoding them to double-byte codes invalid in

. South Korea was not the only country developing an ISO 2022 DBCS for Korean: the Mainland Chinese

GB 12052 GB 12052-89, entitled ''Korean character coded character set for information interchange'' ( zh, s=信息交换用朝鲜文字编码字符集), is a Korean-language character set standard established by China. It consists of a total of 5,979 charact ...

was published in 1989. This was not closely related to Wansung code, although it also included composed syllables. Instead, it corresponded to GB 2312 with Korean syllables (and 94

) replacing the Chinese characters, except for the inclusion of a dollar sign in place of a yuan sign. It may have been developed for use by the Korean minority in north-eastern China. Likewise, North Korea developed KPS 9566. Although North Korea and South Korea both use Korean Chosŏn'gŭl (Hangul) as their primary writing system, they use different lexicographical orders. Hence, character ordering differs between Wansung code and KPS 9566. KPS 9566 has undergone several revisions, including editions of 1997 and 2003, mainly to enhance compatibility with

. These are commonly indicated by specifying the year (e.g. KPS 9566-97, 9566-2003). The current edition as of the release of

Red Star OS Red Star OS () is a North Korean Linux distribution, with development first starting in 1998 at the Korea Computer Center (KCC). Prior to its release, computers in North Korea typically used Red Hat Linux and later switched to a modified Windows ...

3.0 appears to be KPS 9566-2011, which adds Kim Jong-un to the list of leaders. The publicly available code chart for the 1997 edition of KPS 9566 shows a ISO 2022 94×94 plane. The more recent editions, from what sources of information are available outside of North Korea itself, appear to define additional allocations outside of the EUC plane (similarly to GBK or UHC). Due to the interoperability issues arising from the use of multiple national standard and platform- or font-specific proprietary character encodings, the

standard was developed with the intent of allowing all representable text to be interchanged in a single, universal format. The first edition of Unicode was published in 1991 and 1992, and

ISO/IEC 10646 ISO/IEC JTC 1, entitled "Information technology", is a joint technical committee (JTC) of the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC). Its purpose is to develop, maintain and pr ...

was established in sync with Unicode in 1993. Unicode formats are preferred for international use on the

World Wide Web The World Wide Web (WWW), commonly known as the Web, is an information system enabling documents and other web resources to be accessed over the Internet. Documents and downloadable media are made available to the network through web ...

, where legacy character encodings are treated as partial encodings of Unicode by means of mapping files.

Design

In principle, KPS 9566 is similar to the Wansung character set defined by the

standard, although the two are not compatible. Both encode a section of punctuation, symbols, jamo,

kana The term may refer to a number of syllabaries used to write Japanese phonological units, morae. Such syllabaries include (1) the original kana, or , which were Chinese characters (kanji) used phonetically to transcribe Japanese, the most p ...

and alphabetical characters, followed by a subset of the possible modern chosŏn'gŭl syllables, followed by a section of

. However, KPS 9566 uses a different ordering of jamo and syllables to conform with North Korean lexicographical ordering standards. KPS 9566 also includes 28 explicitly rotated punctuation characters for vertical typography, which KS X 1001 does not, and encodes each hanja only once, whereas KS X 1001 encodes several hanja with multiple readings multiple times. KPS 9566-97 encodes a total of 2679 chosŏn'gŭl syllables and 4653 hanja. This provides better coverage than the 2350 syllables encoded by Wansung code: for instance, the 똠 character used in the name of , a noted Korean literary work, does not have an assigned Wansung codepoint, but has one (38-02) in KPS 9566. The hanja section includes 4652 characters from the Unified Repertoire and Ordering and one from CJK Unified Ideographs Extension A. The entirety of row 15, the latter half of row 44 (after the syllables block) and the latter half of row 94 (after the hanja block) may be used for user-defined purposes. KPS 9566 is especially distinguished by its inclusion of several special characters from North Korean political life. Specifically, it includes the hammer, sickle and brush emblem of the

Workers' Party of Korea The Workers' Party of Korea (WPK) is the founding and sole ruling party of the Democratic People's Republic of Korea, commonly known as North Korea. Founded in 1949 from the merger of the Workers' Party of North Korea and the Workers' Party ...

, both uncircled and circled (code points 12-01 and 12-02), and two groups of three special-purpose characters which spell out the names of the North Korean leaders ''

'' (김일성) and '' Kim Jong-il'' (김정일) in a special decorative font (code points 04-72 to 04-74 and 04-75 to 04-77, respectively). The syllables for Kim and Il, which are identical in the spelling of both names, are encoded twice. KPS 9566-2011 additionally includes the name of '' Kim Jong-un'' (김정은) as code points 04-78 to 04-80. Due to these special characters, there is currently no full round-trip compatibility between KPS 9566 and Unicode, unless unsupported characters are mapped to the

KPS 10721

North Korea also developed a second character set, KPS 10721 "''Code of the supplementary Korean Hanja Set for Information Interchange''", which was published in 2000. KPS 10721 encodes a set of at least 19469 hanja additional to those included in KPS 9566. , these did not all have mappings to Unicode, but included 10358 from the Unified Repertoire and Ordering, 3187 from CJK Unified Ideographs Extension A and 107 from

CJK Compatibility Ideographs CJK Compatibility Ideographs is a Unicode block created to contain Han characters that were encoded in multiple locations in other established character encodings, in addition to their CJK Unified Ideographs assignments, in order to retain roun ...

(all in the

Basic Multilingual Plane In the Unicode standard, a plane is a continuous group of 65,536 (216) code points. There are 17 planes, identified by the numbers 0 to 16, which corresponds with the possible values 00–1016 of the first two positions in six position hexadecima ...

), as well as 5767 from

CJK Unified Ideographs Extension B CJK Unified Ideographs Extension B is a Unicode block A Unicode block is one of several contiguous ranges of numeric character codes ( code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and ...

and 50 from

CJK Compatibility Ideographs Supplement CJK Compatibility Ideographs Supplement is a Unicode block containing Han characters used only for Round-trip format conversion, roundtrip compatibility mapping with planes 3, 4, 5, 6, 7, and 15 of CNS 11643-1992. Block History The following Un ...

(in the

Supplementary Ideographic Plane In the Unicode standard, a plane is a continuous group of 65,536 (216) code points. There are 17 planes, identified by the numbers 0 to 16, which corresponds with the possible values 00–1016 of the first two positions in six position hexadecima ...

). Besides the mapping of these hanja to Unicode, little is known about the KPS 10721 standard outside of North Korea. North Korean reference glyphs are not provided for these hanja in the Unicode code charts, due to a lack of suitable font data available to the Unicode Consortium. Unicode hanja characters with KPS 9566 or KPS 10721 sources are nonetheless cross-referenced to their KPS codes in the

Unihan Han unification is an effort by the authors of Unicode and the Universal Character Set to map multiple character sets of the Han characters of the so-called CJK languages into a single set of unified characters. Han characters are a feature s ...

database with the key kIRG_KPSource.

Documentation and relationship to Unicode

Unicode's initial coverage of Korean syllables, added in version 1.0, was based on Wansung code. In Unicode version 2.0, a new block of Korean syllables (the present

Hangul Syllables Hangul Syllables is a Unicode block containing precomposed Hangul syllable blocks for modern Korean. The syllables can be directly mapped by algorithm to sequences of two or three characters in the Hangul Jamo Unicode block: * one of U+1100� ...

block) was added, based on the syllable repertoire available in Johab, and the previous block was deleted (it is now occupied by CJK Unified Ideographs Extension A). This was done under the assumption that no Unicode-encoded Korean data existed yet, but became known as the "Korean mess", and the responsible committees pledged not to make such an incompatible change in the future, a pledge codified by the Unicode Stability Policy. The code chart for KPS 9566-97, published April 1997, was submitted to the ISO International Register of Coded Character Sets for registration for use with

ISO/IEC 2022 ISO/IEC 2022 ''Information technology—Character code structure and extension techniques'', is an ISO/IEC standard (equivalent to the ECMA standard ECMA-35, the ANSI standard ANSI X3.41 and the Japanese Industrial Standard JIS X 0202) in the ...

. It was registered in June 1998 with the number ISO-IR-202. This code chart is publicly available from the

Information Processing Society of Japan The Information Processing Society of Japan ("IPSJ") is a Japanese learned society for computing. Founded in 1960, it is headquartered in Tokyo, Japan. IPSJ publishes a magazine and several professional journals mainly in Japanese, and sponsors c ...

. In August 1999, the North Korean national body submitted a document to WG2 ( ISO/IEC JTC 1/SC 2 Working Group 2), the ISO body responsible for

, the international standard corresponding to

. This document requested the addition of the KPS 9566 codes to the existing cross-references from the

CJK Unified Ideographs The Chinese, Japanese and Korean (CJK) scripts share a common background, collectively known as CJK characters. In the process called Han unification, the common (shared) characters were identified and named CJK Unified Ideographs. As of Unicode ...

charts, the addition of 80 symbol characters from KPS 9566 which did not have existing Unicode mappings, a resolution to the difference in collation order between KPS 9566 and Unicode (due to the order of the characters in Unicode following the South Korean encodings) and the addition of 8 combining jamo. It also requested for WG2 to edit the existing Unicode character and block names to use the term "Korean character" rather than "Hangul". An expanded version of this proposal, broken into several documents, was submitted as a work item in December 1999. A detailed response was submitted by the

Swedish Swedish or ' may refer to: Anything from or related to Sweden, a country in Northern Europe. Or, specifically: * Swedish language, a North Germanic language spoken primarily in Sweden and Finland ** Swedish alphabet, the official alphabet used by ...

representative in March 2000, opposing several of the points and elaborating on Sweden's vote against the proposal. This response stated that changing the encoding of the Korean characters again would cause major disruption, even more so than the first time, which was done when comparatively few implementations existed, but which in retrospect should not have been done. It explained that that few or no languages can be collated correctly by code point value, and that a tailoring for the

Unicode Collation Algorithm The Unicode collation algorithm (UCA) is an algorithm defined in Unicode Technical Report #10, which is a customizable method to produce binary keys from strings representing text in any writing system and language that can be represented with Uni ...

or ISO/IEC 14651 (then being drafted) should be used for that purpose, and that normative names of characters already assigned cannot be changed, due to the stability policy, although non-normative translations to other languages can be employed. It suggested that a machine-readable mapping file between Unicode and KPS 9566 could be provided by the North Korean body itself, and would be more useful than a printed cross-reference in the standard document. Regarding the proposed additional characters, the response stated that characters which would have compatibility decompositions in Unicode should not be added and that logos, including those of political parties, and special characters for names of particular people should not be added. In July 2000, the North Korean body wrote to WG2, accusing them of developing both versions of the Unicode encoding for Korean on the basis of South Korean proposals only, without consulting North Korea, accusing them putting the commercial interests of companies and fears of international confusion over respect to North Korea's sovereignty, and stating that North Korea would regard further refusal to change the name and order of the Korean characters in Unicode as an insult to their sovereign dignity and as compromising the

ISO ISO is the most common abbreviation for the International Organization for Standardization. ISO or Iso may also refer to: Business and finance * Iso (supermarket), a chain of Danish supermarkets incorporated into the SuperBest chain in 2007 * Iso ...

's claims to impartiality. They re-iterated their demand for WG2 and Unicode to "correct" the order of the Korean characters, and to "correct" the names "Hangul Jamo" and "Hangul Syllable" to "Korean Alphabet" and "Korean Syllable". In August 2000, the North Korean national body submitted a more detailed version of their requests in a series of five consecutive proposals. These requested the addition of 14 additional jamo characters, the addition of 82 symbol characters, and the use of the term "Korean alphabet" instead of "Hangul", provided supporting evidence for the North Korean collation order, and requested addition of the North Korean hanja repertoire. These proposals were discussed in two meetings between North Korean,

n, Swedish and other WG2 representatives in September 2000, in which the North Korean body was asked to provide manuscript evidence for the additional jamo characters, to resubmit their symbols proposal with symbols which had already been accepted into Unicode removed, and to consider using ISO/IEC 14651, then at final draft stage, for collation purposes. In September 2001, the North Korean national body submitted a revised series of proposals requesting the addition of several KPS 9566 and KPS 10721 characters, including 70 symbol characters, to Unicode. In this version of the proposal, a section of document excerpts demonstrating use of several characters and short explanations of their purpose was included. The

symbol was named the "Hammer and Sickle and Brush", renamed from "Mark of the Workers' Party of Korea" in earlier versions of the proposal, and justified as being used as an identifying symbol on maps. As justification for the proposed characters for leaders' names, they explained that the leaders' names often appear with a different size and font weight in North Korean publications for the purpose of emphasis. A follow-up by South Korean WG2 representatives requested evidence, names in Korean and justifications for adding certain of these characters, and noted that non-emphasised versions of the characters for the leaders' names already existed. A meeting of North and South Korean representatives from WG2 was convened in October 2001, which recommended 47 of the symbol characters for adding to Unicode, and suggested that the leaders' names and WPK symbols be raised for further discussion by WG2. A subsequent feedback document from February 2002 regarding the North Korean proposed additions requested that the "tea" symbol for a

tea house A teahouse (mainly Asia) or tearoom (also tea room) is an establishment which primarily serves tea and other light refreshments. A tea room may be a room set aside in a hotel especially for serving afternoon tea, or may be an establishment wh ...

be accepted as a more general "hot beverage" symbol, equating it with symbols used in guidebooks to denote hot or non-alcoholic beverages. It also recommended that the reference glyph for the existing codepoint for an umbrella without rain be modified to harmonise with the proposed reference glyph for the umbrella with rain, equating them to the "keep dry" symbols used on packaging, and raised the question of which lightning bolt and high voltage warning symbols in existing symbol collections could be unified with the proposed "high voltage" character. All three of these characters were accepted into Unicode in version 4.0. It also recommended that the horizontal-barred fractions and the left-up pointing scissors be encoded using a variation selector, since the scissors did not accompany a differently-oriented pair of scissors, and since the existing Unicode fraction codepoints unified the skewed and horizontal forms. In November 2002, the South Korean body published a set of three-way tables mapping characters between the KPS 9566, KS X 1001 (as EUC-KR) and ISO/IEC 10646 standards as they existed in 2000. These tables had been prepared without input from North Korea. ote: updated links for tables accompanying document

In August 2004, a pair of mapping tables between KPS 9566-2003 and

were submitted to the

OpenOffice.org OpenOffice.org (OOo), commonly known as OpenOffice, is a discontinued open-source office suite. Active successor projects include LibreOffice (the most actively developed), Apache OpenOffice, Collabora Online (enterprise ready LibreOffice) a ...

project by an individual using the name "ooprojlover", who stated that they represented the updated version of the KPS 9566 standard and requested that support be added. These files mapped the characters unavailable in Unicode to the

, and included additional encoded forms for other syllable blocks outside of the main ISO-IR-202 plane. A mapping table was later published by the

Unicode Consortium The Unicode Consortium (legally Unicode, Inc.) is a 501(c)(3) non-profit organization incorporated and based in Mountain View, California. Its primary purpose is to maintain and publish the Unicode Standard which was developed with the intentio ...

in 2011, based on this mapping data but with errors corrected with reference to the ISO-IR chart. Copies of

3.0 include fonts for a more recent edition of KPS 9566, appearing to be KPS 9566-2011. The mapping table used by Red Star OS internally has been successfully extracted. Besides adding Kim Jong-un to the list of leaders, KPS 9566-2011 amends the mappings of certain vertical forms compared to the 2003 mappings (taking advantage of the

Vertical Forms Vertical Forms is a Unicode block containing vertical punctuation for compatibility characters with the Chinese Standard GB 18030 GB 18030 is a Chinese government standard, described as ''Information Technology — Chinese coded character ...

block added in Unicode 4.1), and also includes several additional hanja and symbols encoded outside of the ISO-IR-202 plane. Several of these additional symbols are also mapped to the Private Use Area; however, their identity is not known, since no names or reference glyphs for those characters are known outside of North Korea.

Impact on Unicode today

Several current Unicode characters were added to Unicode 4.0 as a result of the North Korean proposals, although not always at the original proposed codepoints. These include HOT BEVERAGE (☕, proposed as TEA SYMBOL), which was proposed as a map symbol for marking a

, and the flag symbols WHITE FLAG (⚐) and BLACK FLAG (⚑), which were proposed as map symbols for sites of battles and military victories. These characters were proposed for the provisional code points U+270A, U+268E and U+268F respectively, but encoded at the final code points U+2615, U+2690 and U+2691 respectively. They also include a series of directional bold arrows in the range U+2B05 through U+2B0D, excluding a rightward arrow, which was mapped to an existing character in the Dingbats block, which were added at the same code points they were proposed for, besides the north-east and north-west arrows being swapped compared to the proposal. Other pictographic characters which were included in the North Korean proposal include the umbrella with raindrops (☔), the lightning bolt for high voltage (⚡) and the warning triangle (⚠). Following some discussion about which other high voltage symbol glyphs in use represented the same character as the one from the North Korean proposal, and which glyph would be best to include for it in the Unicode code chart, and following modification of the code chart glyph of the existing umbrella character without rain (U+2602, ☂) to harmonise with the new umbrella with raindrops from the North Korean proposal, these characters were also added in Unicode 4.0, at the same time as the flags and the beverage symbol. Although proposed for the provisional code points U+2618, U+267F and U+267E, they were given the final code points U+2614, U+26A1 and U+26A0 respectively. Of these characters, the hot beverage, umbrella with raindrops, lightning bolt and warning triangle, and the upward, downward and leftward arrows were subsequently selected as mappings from the Japanese cellular emoji sets, making a total of seven current Unicode emoji which were originally added to Unicode at the request of North Korea. The umbrella with raindrops and the upward, downward and leftward arrows were also unified with characters from the ARIB extensions used in Japanese broadcasting, which include several characters now classified as emoji, and was mapped to Unicode in Unicode 5.2. However, the pair of white and black flags used as emoji or in emoji regional and identity flag sequences is a different, "waving" set added in Unicode 7.0 (U+1F3F3 🏳 and U+1F3F4 🏴), not the North Korean pair. As of 2018, several KPS 9566 characters remained which are not mapped to Unicode. These include the WPK symbol, four triangular marks, a leftward-pointing pair of scissors (excluded on the rationale that contrastive use with the rightward scissors in the Dingbats block had not been demonstrated), an upward-pointing

manicule The manicule, , is a typographic mark with the appearance of a hand with its index finger extending in a pointing gesture. Originally used for handwritten marginal notes, it later came to be used in printed works to draw the reader's attention ...

in a circle, vertical presentation forms of punctuation marks, variants of closing brackets incorporating full stops, horizontal-barred variants of

vulgar fractions A fraction (from la, fractus, "broken") represents a part of a whole or, more generally, any number of equal parts. When spoken in everyday English, a fraction describes how many parts of a certain size there are, for example, one-half, eight ...

encoded separately from their slanted versions, and the leaders' names. A

Japanese postal mark is the service mark of Japan Post and its successor, Japan Post Holdings, the postal operator in Japan. It is also used as a Japanese postal code mark since the introduction of the latter in 1968. Historically, it was used by the , which ope ...

with a downward pointing triangle was included in KPS 9566-97 but removed in KPS 9566-2003 after the North Korean body had withdrawn it from their Unicode proposal for review in response to requests from the South Korean body for evidence of the symbol's use in North Korea. This mark was re-proposed in 2018 on the basis of KPS 9566 compatibility, and identified as an electrical conformity mark used in Japan prior to its replacement by the PSE diamond. It was added to Unicode in version 13.0, published in 2020.

Encoded forms

The 1997 edition of KPS 9566 was registered with the International Register of Coded Character Sets for Use with Escape Sequences as ISO-IR-202, and can therefore be encoded using

. It is a 94ⁿ multiple-byte G-set, i.e. if it is used in a 7-bit ISO 2022 code (analogous to

ISO-2022-JP ISO/IEC 2022 ''Information technology—Character code structure and extension techniques'', is an ISO/ IEC standard (equivalent to the ECMA standard ECMA-35, the ANSI standard ANSI X3.41 and the Japanese Industrial Standard JIS X 0202) in the ...

ISO-2022-KR ISO/IEC 2022 ''Information technology—Character code structure and extension techniques'', is an ISO/ IEC standard (equivalent to the ECMA standard ECMA-35, the ANSI standard ANSI X3.41 and the Japanese Industrial Standard JIS X 0202) in the ...

), characters will be encoded with pairs of bytes between 0x21 and 0x7E when in the appropriate mode. The documented mappings between KPS 9566 and Unicode for the 2003 and 2011 editions of KPS 9566 use an encoding resembling an adaptation of

(UHC) to encode KPS 9566 rather than Wansung code, with their updated versions of the ISO-IR-202 plane being encoded using pairs of bytes between 0xA1 and 0xFE, and with other two-byte codes used for syllables not present in ISO-IR-202. The order of the extended syllables follows usual KPS 9566 order. Similarly to UHC, they use lead bytes 0x81 and above, and trail bytes from the ranges 0x41–0x5A, 0x61–0x7A and 0x81–0xFE, excluding the range 0xA1–0xFE if the lead byte is 0xA1 or above. The 2011 edition also includes several additional hanja and symbols encoded outside of the ISO-IR-202 plane, after the range used for the extended syllable blocks. This approach is similar to that taken by GBK, but with the trail bytes remaining in the UHC-style ranges: like the extended syllables with lead bytes 0xA1 and above, these all use the trail byte ranges 0x41–0x5A, 0x61–0x7A and 0x81–0xA0. Extended hanja are encoded with lead bytes between 0xC8 and 0xDC, extended symbols are encoded using lead bytes between 0xE0 and 0xEA, and extended codes with lead bytes between 0xEC and 0xFE are mapped, without gaps, to the

(compare the user-defined ranges in GBK). Several of the characters in the extended symbols section and three in the hanja section are also mapped to the Unicode Private Use Area; unlike the PUA-mapped symbols in the main ISO-IR-202 plane, the identity of these characters is not documented.

Lead byte

This chart details the overall layout of the main plane of the KPS 9566 character set by lead byte. For lead bytes used for characters other than composed chosŏn'gŭl syllables or hanja, links are provided to charts on this page listing the characters encoded under that lead byte. For lead bytes used for hanja, links are provided to the appropriate section of

Wiktionary Wiktionary ( , , rhyming with "dictionary") is a multilingual, web-based project to create a free content dictionary of terms (including words, phrases, proverbs, linguistic reconstructions, etc.) in all natural languages and in a numbe ...

's hanja index. Where two hexadecimal numbers are given, the value below 0x7F is used in a 7-bit encoding, and the larger value (between 0xA1 and 0xFE) is used in an 8-bit EUC-style encoding. The extended UHC-style 8-bit encodings defined by the 2003 edition onwards likewise use the larger byte values, between 0xA1 and 0xFE inclusive, for the main ISO-IR-202-based plane.

Non-Hanja, non-composed sets in the main plane

Character set 0x21/0xA1 (row number 1, punctuation and vertical forms)

This set contains common sentence punctuation such as brackets, quotation marks, commas and so forth, as well as presentation forms for use in vertical writing. ASCII punctuation (highlighted) is shown below mapped to Basic Latin codepoints (consistent with articles on other CJK character sets, such as

), but is mapped to the

Halfwidth and Fullwidth Forms In CJK (Chinese, Japanese and Korean) computing, graphic characters are traditionally classed into fullwidth (in Taiwan and Hong Kong: 全形; in CJK: 全角) and halfwidth (in Taiwan and Hong Kong: 半形; in CJK: 半角) characters. Unlik ...

block when used in an encoding which combines KPS 9566 with

(as defined by, for example, the 2003 edition). Compared to the 2003 mapping, the 2011 mapping changes the Unicode mappings of three vertical presentation forms to take advantage of the

block introduced with Unicode 4.1.

Character set 0x22/0xA2 (row number 2, symbols and operators)

This set includes mathematical operators, and some other symbols such as the ampersand,

pilcrow The pilcrow, ¶, is a handwritten or typographical character used to identify a paragraph. It is also called the paragraph mark (or sign or symbol), paraph, or blind P. The pilcrow may be used at the start of separate paragraphs or t ...

musical note In music, a note is the representation of a musical sound. Notes can represent the pitch and duration of a sound in musical notation. A note can also represent a pitch class. Notes are the building blocks of much written music: discretizatio ...

and so forth. ASCII punctuation (highlighted) is shown below mapped to Basic Latin codepoints (consistent with articles on other CJK character sets), but is mapped to the

block when used in an encoding which combines KPS 9566 with

. Several triangular "road mark" symbols denoting upcoming mountains or inclines ahead or to one side are included in this row, but not presently included in Unicode. They are mapped to the Private Use Area.

Character set 0x23/0xA3 (row number 3, digits and Roman)

This set includes a subset of

, minus punctuation and symbols, comprising

western Arabic numerals Arabic numerals are the ten numerical digits: , , , , , , , , and . They are the most commonly used symbols to write decimal numbers. They are also used for writing numbers in other systems such as octal, and for writing identifiers such as ...

and both cases of the Basic Latin alphabet. Compare row 3 of JIS X 0208, which this row exactly matches. Compare and contrast row 3 of KS X 1001 and

, which include their entire national variants of

ISO 646 ISO/IEC 646 is a set of ISO/IEC standards, described as ''Information technology — ISO 7-bit coded character set for information interchange'' and developed in cooperation with ASCII at least since 1964. Since its first edition in ...

in this row, rather than only the alphanumeric subset. The characters in this row are shown below mapped to Basic Latin codepoints (consistent with articles on the other character sets), but is mapped to the

block when used in an encoding which combines KPS 9566 with

Character set 0x24/0xA4 (row number 4, Chosŏn'gŭl jamo and leaders' names)

This set contains

jamo, as well as special encodings for the names of (as of 2003) the North Korean Leaders

and Kim Jong-il. The name of Kim Jong-un is also included as of the 2011 edition. Compare with row 4 of KS X 1001. The jamo in this row which exist in the Unicode

Hangul Compatibility Jamo Hangul Compatibility Jamo is a Unicode block containing Hangul characters for compatibility with the South Korean national standard KS X 1001 KS X 1001, "''Code for Information Interchange (Hangul and Hanja)''", formerly called KS C 5601, ...

block (which contains the position-independent characters mapped from KS X 1001) are mapped to that block. The obsolete jamo distinguishing palatalised sibilants map to the position-specific characters in the Hangul Jamo block. Conversely, not all of the obsolete jamo encoded by KS X 1001 are encoded in the main plane of KPS 9566. In the 2011 edition of KPS 9566, some of the other historic jamo from KS X 1001 are included outside of the main plane, with the lead byte 0xEA. The special encodings of the leaders' names are not present in Unicode and are mapped to the Private Use Area. They are shown below simulated with markup.

Character set 0x25/0xA5 (row number 5, Cyrillic)

This set includes both cases of 33 letters from the

Cyrillic script The Cyrillic script ( ), Slavonic script or the Slavic script, is a writing system used for various languages across Eurasia. It is the designated national script in various Slavic, Turkic, Mongolic, Uralic, Caucasian and Iranic-speaking co ...

, sufficient to write the modern Russian alphabet and

Bulgarian alphabet The Bulgarian Cyrillic alphabet is used to write the Bulgarian language. The Cyrillic alphabet was originally developed in the First Bulgarian Empire during the 9th – 10th century AD at the Preslav Literary School. It has been used in Bulgar ...

, although other forms of Cyrillic require additional letters. Compare row 12 of KS X 1001 and row 7 of JIS X 0208, which use the same layout (but in a different row).

Character set 0x26/0xA6 (row number 6, Greek letters and Roman numerals)

This set contains Roman numerals and basic support for the

Greek alphabet The Greek alphabet has been used to write the Greek language since the late 9th or early 8th century BCE. It is derived from the earlier Phoenician alphabet, and was the earliest known alphabetic script to have distinct letters for vowels as w ...

, without diacritics or the final sigma. Compare and contrast row 5 of KS X 1001 (which uses the same characters but in a different layout and a different row) and row 6 of JIS X 0208 (which uses the same layout for the Greek letters, but without the Roman numerals).

Character set 0x27/0xA7 (row number 7, encircled, superscript, subscript, fractions)

Several circled numbers in this row were mapped to Unicode incorrectly in the 2003 edition, due to using non-final proposed code points. They were corrected in the 2011 edition.

Character set 0x28/0xA8 (row number 8, unit, quantity and currency symbols)

This set contains symbols for units of measure and currency. Those present in ASCII (highlighted) are shown below mapped to Basic Latin codepoints (consistent with articles on other CJK character sets), but are mapped to the

block when used in an encoding which combines KPS 9566 with

. The

Kelvin The kelvin, symbol K, is the primary unit of temperature in the International System of Units (SI), used alongside its prefixed forms and the degree Celsius. It is named after the Belfast-born and University of Glasgow-based engineer and phy ...

sign was replaced with a euro sign in the 2003 edition. The 2011 edition includes an alternative encoding of the Kelvin sign at 0xE988. Compare and contrast with the repertoire of unit symbols included in row 7 of KS X 1001.

Character set 0x29/0xA9 (row number 9, box drawing)

Character set 0x2A/0xAA (row number 10, Hiragana)

This row contains

Hiragana is a Japanese syllabary, part of the Japanese writing system, along with ''katakana'' as well as ''kanji''. It is a phonetic lettering system. The word ''hiragana'' literally means "flowing" or "simple" kana ("simple" originally as contrast ...

for use in the

Japanese language is spoken natively by about 128 million people, primarily by Japanese people and primarily in Japan, the only country where it is the national language. Japanese belongs to the Japonic or Japanese- Ryukyuan language family. There have been ...

. Compare row 10 of KS X 1001, which uses the same layout. Compare and contrast row 4 of JIS X 0208, which also uses the same layout, but in a different row.

Character set 0x2B/0xAB (row number 11, Katakana)

This row contains

Katakana is a Japanese syllabary, one component of the Japanese writing system along with hiragana, kanji and in some cases the Latin script (known as rōmaji). The word ''katakana'' means "fragmentary kana", as the katakana characters are derived f ...

for use in the

. However, the Japanese long vowel mark, which is used in katakana text and included in row 1 of

, is not included (similarly to with GB 2312 and KS X 1001), although it is included by KPS 9566-2011 outside of the main plane, at 0xEA48. Compare row 11 of KS X 1001, which uses the same layout. Compare and contrast row 5 of JIS X 0208, which also uses the same layout, but in a different row.

Character set 0x2C/0xAC (row number 12, miscellaneous symbols and arrows)

For the purpose of mapping this row to Unicode, the bold rightward arrow was unified with the bold rightward arrow from

Zapf Dingbats ITC Zapf Dingbats is one of the more common dingbat typefaces. It was designed by the typographer Hermann Zapf in 1978 and licensed by International Typeface Corporation. History In 1977, Zapf created about 1000 (or over 1200 according to Lino ...

(U+27A1), although earlier tables (which lacked mappings for the other bold arrows) had instead unified it with U+279E, a slightly different Zapf Dingbats character. Since corresponding arrows in other directions were not included in the Dingbats block, additional arrows were encoded between U+2B05 and U+2B0D for compatibility with KPS 9566. These were incorporated into the Unicode code charts using the reference glyphs proposed by the North Korean national body, while U+27A1 retained its reference glyph based on Zapf Dingbats. These arrows (U+2B05 through U+2B07, plus U+27A1) were chosen in Unicode 6.0 as the mappings for some of the arrow characters in cellular emoji sets. Subsequently, during the addition of the Wingdings 3 repertoire in Unicode 7.0, the Unicode coverage of arrow characters was reviewed, resulting in an additional rightward arrow being added at U+2B95 with the intent of harmonising with characters U+2B05 through U+2B0D (in text presentation), since changing the reference glyph for the Zapf Dingbats character was not considered appropriate. In earlier editions of KPS 9566, such as the 1997 edition, this row included both the simple Japanese-style postal mark (〒) and a version in a downward-pointing triangle, which was proposed by the North Korean national body for addition to Unicode alongside the other missing KPS 9566 characters. A response by a

n representative, amongst other requests, requested evidence for the symbol's use in North Korea, noting that the Japanese-style postal mark is not used in South Korea, which uses a circled 우 (i.e. ㉾) for a similar purpose, and enquiring whether a Japanese-style postal mark was in use in North Korea. A subsequent meeting was held to discuss this proposal, attended by North and South Korean WG2 representatives; the meeting report notes that the North Korean body had decided to review the character before discussing it further, and therefore did not recommend it for consideration by WG2 as a whole. The postal mark triangle was subsequently removed from KPS 9566 in 2003, leaving only the unenclosed postal mark. The postal mark triangle was eventually added to Unicode in version 13.0, both for compatibility with the legacy KPS 9566-97 character, and subsequent to the mark being identified as a symbol which had been used for certification for

electrical appliance A home appliance, also referred to as a domestic appliance, an electric appliance or a household appliance, is a machine which assists in household functions such as cooking, cleaning and food preservation. Appliances are divided into three ty ...

s in Japan (as a predecessor to the PSE diamond). Certain KPS 9566 characters in this row, namely two forms of the emblem of the

, a pair of scissors pointing in a different direction to those in the Dingbats block, and a circled upward-pointing

, remain mapped to the

. The north-east and north-west white arrows used incorrect swapped Unicode mappings in the 2003 edition. This was corrected in the 2011 edition mappings.

Character set 0x2E/0xAE (row number 14, Latin-1 subset)

The characters in this set were not present in the 1997 version of the character set, but were added in the 2003 version. They constitute a subset of the

Latin-1 Supplement The Latin-1 Supplement (also called C1 Controls and Latin-1 Supplement) is the second Unicode block in the Unicode standard. It encodes the upper range of ISO 8859-1: 80 (U+0080) - FF (U+00FF). C1 Controls (0080–009F) are not graphic. Thi ...

block of Unicode (equivalent to the upper half of the ISO 8859-1 (Latin-1) character set). This includes accented Roman letters and symbols. Some of the symbols which were already included are omitted, while some others are duplicated as halfwidth counterparts to the earlier

fullwidth forms In CJK (Chinese, Japanese and Korean) computing, graphic characters are traditionally classed into fullwidth (in Taiwan and Hong Kong: 全形; in CJK: 全角) and halfwidth (in Taiwan and Hong Kong: 半形; in CJK: 半角) characters. Unlik ...

: for example, the

not sign In logic, negation, also called the logical complement, is an operation that takes a proposition P to another proposition "not P", written \neg P, \mathord P or \overline. It is interpreted intuitively as being true when P is false, and false ...

(¬, U+00AC) is represented as 0xAEAC, while its fullwidth form (￢, U+FFE2) is represented as 0xA2D1 (in row 2). This row is omitted from the mapping for the 2011 edition of the standard, indicating it may have been removed at some point after the 2003 edition. The halfwidth yen sign is instead encoded at 0xE98E in the 2011 edition. The required space would fall outside of the 94-character range, colliding with the area used for extended chosŏn'gŭl syllables when a UHC-style encoding is used (specifically, with the syllable 쁲), and is omitted. Although the y with trema also falls outside the 94-character range, and the trail byte 0xFF is otherwise unused, the code 0xAEFF is mapped to it in KPS 9566-2003.

Precomposed Chosŏn'gŭl sets (rows number 16 through 44)

Precomposed Chosŏn'gŭl syllable clusters are allocated code points in a continuous sorted block between code points 16-01 and 44-47 inclusive. Not all possible clusters are allocated code points. Compare the different ordering and availability in KS X 1001. The encoded form documented for KPS 9566-2003 encodes the KPS 9566 plane on GR (0xA1-0xFE) and additionally encodes the remaining syllable clusters using lead bytes in the range 0x80-0xC2 and trail bytes in the ranges 0x41-0x5A, 0x61-0x7A and 0x81-0xFE (where at most one byte is in the range 0xA1-0xFE), similarly to

but with the omitted clusters from and sorting order of KPS 9566, not

. * Row 16: 가 각 간 갇 갈 갉 갊 감 갑 값 갓 강 갖 갗 같 갚 갛 갔 갸 갹 갼 걀 걈 걋 걍 거 걱 건 걷 걸 걹 걺 검 겁 것 겅 겆 겉 겊 겋 겄 겨 격 견 겯 결 겸 겹 겻 경 곁 겪 겼 고 곡 곤 곧 골 곪 곬 곯 곰 곱 곳 공 곶 곺 교 굔 굘 굡 굣 구 국 군 굳 굴 굵 굶 굻 굼 굽 굿 궁 궂 규 균 귤 귬 귱 그 극 근 귿 * Row 17: 글 긁 긇 금 급 긋 긍 기 긱 긴 긷 길 긺 김 깁 깃 깅 깆 깇 깉 깊 개 객 갠 갤 갬 갭 갯 갱 갰 걔 걘 걜 게 겍 겐 겔 겜 겝 겟 겡 겠 계 곈 곌 곕 곗 괴 괵 괸 괼 굄 굅 굇 굉 굈 귀 귁 귄 귈 귐 귑 귓 긔 과 곽 관 괃 괄 괆 괌 괍 괏 광 괐 궈 궉 권 궐 궘 궝 궜 괘 괙 괜 괠 괩 괭 괬 궤 궥 궷 나 낙 * Row 18: 낛 난 낟 날 낡 낢 남 납 낫 낭 낮 낯 낱 낳 낚 났 냐 냑 냔 냘 냠 냡 냥 너 넉 넋 넌 널 넒 넓 넘 넙 넛 넝 넢 넣 넊 넜 녀 녁 년 녈 념 녑 녓 녕 녘 녔 노 녹 논 놀 놂 놈 놉 놋 농 높 놓 뇨 뇩 뇬 뇰 뇸 뇹 뇻 뇽 누 눅 눈 눋 눌 눔 눕 눗 눙 눞 뉴 뉵 뉸 뉼 늄 늅 늉 느 늑 는 늘 늙 늚 늠 늡 늣 능 * Row 19: 늦 늪 니 닉 닌 닐 닒 님 닙 닛 닝 닢 내 낵 낸 낼 냄 냅 냇 냉 냈 냬 네 넥 넨 넬 넴 넵 넷 넹 넸 녜 녠 뇌 뇐 뇔 뇜 뇝 뇟 뉘 뉜 뉠 뉨 뉩 뉭 늬 늰 늴 늼 닁 놔 놘 놜 놧 놨 눠 눨 눳 눴 놰 눼 다 닥 단 닫 달 닭 닮 닯 닲 닳 담 답 닷 당 닺 닻 닾 닿 닦 닸 댜 더 덕 던 덛 덜 덞 덟 덤 덥 덧 덩 덫 * Row 20: 덮 덯 덖 덨 뎌 뎐 뎔 뎡 뎠 도 독 돈 돋 돌 돎 돐 돔 돕 돗 동 돛 돝 됴 두 둑 둔 둘 둠 둡 둣 둥 듀 듄 듈 듐 듕 드 득 든 듣 들 듥 듦 듧 듬 듭 듯 등 디 딕 딘 딛 딜 딤 딥 딧 딩 딪 딮 딨 대 댁 댄 댈 댐 댑 댓 댕 댔 댸 데 덱 덴 덷 델 뎀 뎁 뎃 뎅 뎄 뎨 뎬 되 된 될 됨 됩 됫 됭 됬 뒤 뒥 뒨 뒬 * Row 21: 뒴 뒵 뒷 뒹 듸 듼 딀 딉 딍 돠 돤 돨 둬 둰 둴 둼 둿 뒀 돼 됀 됄 됐 뒈 뒝 라 락 란 랄 람 랍 랏 랑 랒 랖 랗 랐 랴 략 랸 랼 럄 럅 럇 량 러 럭 런 럴 럼 럽 럿 렁 렆 렇 렀 려 력 련 렬 렴 렵 렷 령 렸 로 록 론 롤 롬 롭 롯 롱 롶 료 룐 룔 룜 룝 룟 룡 루 룩 룬 룰 룸 룹 룻 룽 류 륙 륜 률 륨 륩 * Row 22: 륫 륭 르 륵 른 를 름 릅 릇 릉 릊 릍 릎 리 릭 린 릴 림 립 릿 링 맆 래 랙 랜 랠 램 랩 랫 랭 랬 럐 레 렉 렌 렐 렘 렙 렛 렝 렜 례 롄 롈 롑 롓 뢰 뢴 뢸 룀 룁 룃 룅 룄 뤼 뤽 륀 륄 륌 륏 륑 릐 릔 릘 릠 롸 롼 뢉 뢍 뤄 뤘 뢔 뢨 뤠 마 막 만 많 맏 말 맑 맒 맘 맙 맛 망 맞 맟 맡 맣 먀 먁 먄 먈 * Row 23: 먐 먕 머 먹 먼 멀 멁 멂 멈 멉 멋 멍 멎 멓 멌 며 멱 면 멸 몀 몁 몃 명 몇 몄 모 목 몫 몬 몯 몰 몲 몸 몹 못 몽 뫃 묘 묜 묠 묩 묫 무 묵 문 묻 물 묽 묾 뭄 뭅 뭇 뭉 뭍 뭏 묶 뮤 뮥 뮨 뮬 뮴 뮷 뮹 므 믄 믈 믐 믑 믓 믕 미 믹 민 믿 밀 밂 밈 밉 밋 밍 및 밑 밌 매 맥 맨 맬 맴 맵 맷 맹 맺 맸 먜 * Row 24: 메 멕 멘 멜 멤 멥 멧 멩 멨 몌 몐 뫼 묀 묄 묌 묍 묏 묑 뮈 뮌 뮐 믜 믠 믬 뫄 뫈 뫙 뫘 뭐 뭔 뭘 뭠 뭡 뭣 뭤 뫠 뭬 바 박 밗 반 받 발 밝 밞 밟 밤 밥 밧 방 밭 밖 뱌 뱍 뱐 뱜 뱝 버 벅 번 벋 벌 벍 벎 범 법 벗 벙 벚 벜 벘 벼 벽 변 별 볌 볍 볏 병 볓 볕 볐 보 복 본 볼 봄 봅 봇 봉 봏 볶 뵤 뵨 * Row 25: 뵬 부 북 분 붇 불 붉 붊 붐 붑 붓 붕 붙 붚 뷰 뷴 뷸 븀 븁 븃 븅 브 븍 븐 블 븜 븝 븟 븡 비 빅 빈 빌 빎 빔 빕 빗 빙 빚 빛 배 백 밴 밷 밸 뱀 뱁 뱃 뱅 뱉 뱄 뱨 베 벡 벤 벧 벨 벰 벱 벳 벵 벴 볘 볜 뵈 뵉 뵌 뵐 뵘 뵙 뵜 뷔 뷕 뷘 뷜 뷩 븨 븬 븰 븽 봐 봔 봡 봣 봤 붜 붤 붯 붴 붰 봬 봰 뵀 붸 * Row 26: 사 삭 삯 산 삳 살 삵 삶 삼 삽 삿 상 샅 샀 샤 샥 샨 샬 샴 샵 샷 샹 서 석 섟 선 섣 설 섦 섧 섬 섭 섯 성 섶 섞 섰 셔 셕 션 셜 셤 셥 셧 셩 셨 소 속 손 솓 솔 솖 솜 솝 솟 송 솥 솎 쇼 쇽 숀 숄 숌 숍 숏 숑 수 숙 순 숟 술 숨 숩 숫 숭 숯 숱 숲 슈 슉 슌 슐 슘 슙 슛 슝 스 슥 슨 슬 슭 슲 슳 슴 * Row 27: 습 슷 승 시 식 신 싣 실 싫 심 십 싯 싱 싶 새 색 샌 샐 샘 샙 샛 생 샜 섀 섄 섈 섐 섕 세 섹 센 셀 셈 셉 셋 셍 셑 셒 셌 셰 셴 셸 솅 쇠 쇡 쇤 쇨 쇰 쇱 쇳 쇵 쇴 쉬 쉭 쉰 쉴 쉼 쉽 쉿 슁 싀 싄 솨 솩 솬 솰 솻 솽 숴 쉈 쇄 쇈 쇌 쇔 쇗 쇘 쉐 쉑 쉔 쉘 쉠 쉡 쉥 자 작 잔 잖 잗 잘 잚 잠 잡 잣 장 * Row 28: 잦 잤 쟈 쟉 쟌 쟎 쟐 쟘 쟙 쟝 저 적 전 절 젊 점 접 젓 정 젖 젔 져 젹 젼 졀 졈 졉 졋 졍 졌 조 족 존 졸 졺 좀 좁 좃 종 좆 좇 좋 죠 죡 죤 죨 죰 죵 주 죽 준 줄 줅 줆 줌 줍 줏 중 쥬 쥰 쥴 쥼 즁 즈 즉 즌 즐 즘 즙 즛 증 지 직 진 짇 질 짊 짐 집 짓 징 짖 짙 짚 재 잭 잰 잴 잼 잽 잿 쟁 쟀 쟤 * Row 29: 쟨 쟬 제 젝 젠 젤 젬 젭 젯 젱 젶 젰 졔 졘 졜 죄 죈 죌 죔 죕 죗 죙 죘 쥐 쥑 쥔 쥗 쥘 쥠 쥡 쥣 즤 좌 좍 좐 좔 좝 좟 좡 줘 줬 좨 좽 좼 줴 줸 줼 쥄 쥅 쥈 차 착 찬 찮 찰 참 찹 찻 창 찾 찼 챠 챤 챦 챨 챰 챱 챵 처 척 천 철 첨 첩 첫 청 첬 쳐 쳑 쳔 쳘 쳤 초 촉 촌 촐 촘 촙 촛 총 쵸 쵼 춀 춈 * Row 30: 추 축 춘 춛 출 춤 춥 춧 충 츄 츈 츌 츔 츙 츠 측 츤 츨 츰 츱 츳 층 치 칙 친 칟 칠 칡 침 칩 칫 칭 채 책 챈 챌 챔 챕 챗 챙 챘 챼 체 첵 첸 첼 쳄 쳅 쳇 쳉 쳈 쳬 쳰 촁 최 쵠 쵤 쵬 쵭 쵯 쵱 취 췬 췰 췸 췹 췻 췽 츼 촤 촥 촨 촬 촹 춰 췃 췄 쵀 쵄 췌 췐 카 칵 칸 칼 캄 캅 캇 캉 캎 캈 캬 캭 캰 * Row 31: 캼 캽 컁 커 컥 컨 컫 컬 컴 컵 컷 컹 컽 컾 컸 켜 켠 켤 켬 켭 켯 켱 켰 코 콕 콘 콜 콤 콥 콧 콩 쿄 쿠 쿡 쿤 쿨 쿰 쿱 쿳 쿵 큐 큔 큘 큠 크 큭 큰 클 큼 큽 킁 키 킥 킨 킬 킴 킵 킷 킹 킾 캐 캑 캔 캘 캠 캡 캣 캥 캪 캤 컈 케 켁 켄 켈 켐 켑 켓 켕 켸 쾨 쾰 퀴 퀵 퀸 퀼 큄 큅 큇 큉 킈 콰 콱 콴 * Row 32: 콸 쾀 쾅 쿼 퀀 퀄 퀑 쾌 쾐 쾔 쾡 퀘 퀙 퀠 퀭 타 탁 탄 탈 탉 탐 탑 탓 탕 탚 탔 탸 탼 턍 터 턱 턴 털 턺 텀 텁 텃 텅 텄 텨 텬 텼 토 톡 톤 톨 톰 톱 톳 통 톺 툐 투 툭 툰 툴 툼 툽 툿 퉁 튜 튠 튤 튬 튱 트 특 튼 튿 틀 틂 틈 틉 틋 틍 티 틱 틴 틸 팀 팁 팃 팅 태 택 탠 탤 탬 탭 탯 탱 탶 탰 턔 * Row 33: 테 텍 텐 텔 템 텝 텟 텡 텦 톄 톈 퇴 퇸 툇 툉 튀 튁 튄 튈 튐 튑 튕 틔 틘 틜 틤 틥 톼 퇀 퉈 퉜 퇘 퉤 퉨 퉸 파 팍 판 팔 팖 팜 팝 팟 팡 팥 팎 팠 퍄 퍅 퍼 퍽 펀 펄 펌 펍 펏 펑 펐 펴 펵 편 펼 폄 폅 폇 평 폈 포 폭 폰 폴 폼 폽 폿 퐁 표 푠 푤 푭 푯 푸 푹 푼 푿 풀 풂 품 풉 풋 풍 퓨 퓬 퓰 퓸 * Row 34: 퓻 퓽 프 픈 플 픔 픕 픗 픙 피 픽 핀 필 핌 핍 핏 핑 패 팩 팬 팰 팸 팹 팻 팽 팼 퍠 페 펙 펜 펠 펨 펩 펫 펭 펲 폐 폔 폘 폡 폣 푀 푄 퓌 퓐 퓔 퓜 퓟 픠 픤 퐈 퐝 풔 풩 하 학 한 할 핥 함 합 핫 항 햐 향 허 헉 헌 헐 헒 헕 헗 험 헙 헛 헝 혀 혁 현 혈 혐 협 혓 형 혔 호 혹 혼 혿 홀 홅 홈 홉 홋 * Row 35: 홍 홑 효 횬 횰 횹 횻 후 훅 훈 훌 훑 훔 훕 훗 훙 휴 휵 휸 휼 흄 흇 흉 흐 흑 흔 흖 흗 흘 흙 흝 흠 흡 흣 흥 흩 히 힉 힌 힐 힘 힙 힛 힝 해 핵 핸 핼 햄 햅 햇 행 했 햬 헤 헥 헨 헬 헴 헵 헷 헹 헸 혜 혠 혤 혭 회 획 횐 횔 횝 횟 횡 휘 휙 휜 휠 휨 휩 휫 휭 희 흰 흴 흼 흽 힁 화 확 환 활 홤 홥 * Row 36: 홧 황 훠 훡 훤 훨 훰 훵 홰 홱 홴 횃 횅 횄 훼 훽 휀 휄 휑 까 깍 깐 깓 깔 깖 깜 깝 깟 깡 깥 깎 깠 꺄 꺅 꺈 꺌 꺼 꺽 껀 껄 껌 껍 껏 껑 꺾 껐 껴 껸 껼 꼇 꼍 꼈 꼬 꼭 꼰 꼱 꼲 꼴 꼼 꼽 꼿 꽁 꽂 꽃 꾜 꾸 꾹 꾼 꾿 꿀 꿇 꿈 꿉 꿋 꿍 꿎 뀨 끄 끅 끈 끊 끌 끎 끓 끔 끕 끗 끙 끝 끼 끽 낀 낄 낌 * Row 37: 낍 낏 낑 깨 깩 깬 깰 깸 깹 깻 깽 깼 꺠 께 껙 껜 껠 껨 껩 껫 껭 껬 꼐 꾀 꾁 꾄 꾈 꾐 꾑 꾕 뀌 뀐 뀔 뀜 뀝 뀡 꽈 꽉 꽌 꽐 꽛 꽝 꽜 꿔 꿘 꿜 꿥 꿧 꿩 꿨 꽤 꽥 꽨 꽬 꽹 꿰 꿱 꿴 꿸 뀀 뀁 뀅 뀄 따 딱 딴 딸 딿 땀 땁 땃 땅 땋 딲 땄 땨 땰 떠 떡 떤 떨 떪 떫 떰 떱 떳 떵 떻 떴 뗘 뗬 또 똑 똔 * Row 38: 똘 똠 똡 똣 똥 뚀 뚜 뚝 뚠 뚤 뚫 뚬 뚭 뚱 뜌 뜨 뜩 뜬 뜯 뜰 뜸 뜹 뜻 뜽 띠 띡 띤 띨 띰 띱 띳 띵 때 땍 땐 땔 땜 땝 땟 땡 땠 떼 떽 뗀 뗄 뗌 뗍 뗏 뗑 뗐 뙤 뙨 뛰 뛴 뛸 뜀 뜁 뜅 띄 띅 띈 띌 띔 띕 띙 똬 똰 똴 뚸 뛌 뙈 뙉 뛔 빠 빡 빤 빨 빪 빰 빱 빳 빵 빻 빴 뺘 뺙 뺜 뺨 뻐 뻑 뻔 뻗 뻘 뻠 * Row 39: 뻣 뻥 뻤 뼈 뼉 뼘 뼙 뼛 뼝 뼜 뽀 뽁 뽄 뽈 뽐 뽑 뽓 뽕 뾰 뿅 뿌 뿍 뿐 뿔 뿜 뿝 뿟 뿡 쀼 쁑 쁘 쁜 쁠 쁨 쁩 삐 삑 삔 삘 삠 삡 삣 삥 빼 빽 뺀 뺄 뺌 뺍 뺏 뺑 뺐 뺴 뻬 뻭 뻰 뻴 뻼 뼁 뾔 쀠 쁴 뽜 뿨 싸 싹 싻 싼 쌀 쌈 쌉 쌋 쌍 쌓 쌌 쌰 쌴 쌸 썅 써 썩 썬 썰 썲 썸 썹 썻 썽 썪 썼 쎠 쏘 쏙 쏜 * Row 40: 쏟 쏠 쏢 쏨 쏩 쏫 쏭 쑈 쑌 쑐 쑘 쑝 쑤 쑥 쑨 쑬 쑴 쑵 쑹 쓔 쓘 쓧 쓩 쓰 쓱 쓴 쓸 쓺 쓿 씀 씁 씅 씨 씩 씬 씯 씰 씸 씹 씻 씽 씼 쌔 쌕 쌘 쌜 쌤 쌥 쌧 쌩 쌨 썌 쎄 쎅 쎈 쎌 쎔 쎕 쎙 쎼 쏀 쐬 쐭 쐰 쐴 쐼 쐽 쑀 쒸 쒼 씌 씐 씔 씜 쏴 쏵 쏸 쏼 쐇 쐉 쐈 쒀 쒔 쐐 쐑 쐤 쒜 쒠 쒭 짜 짝 짠 짢 짤 * Row 41: 짧 짬 짭 짯 짱 짰 쨔 쨘 쨤 쨩 쩌 쩍 쩐 쩔 쩗 쩜 쩝 쩟 쩡 쩠 쪄 쪘 쪼 쪽 쫀 쫄 쫌 쫍 쫏 쫑 쫒 쫓 쫗 쬬 쬰 쬼 쭁 쭈 쭉 쭌 쭐 쭘 쭙 쭛 쭝 쮸 쯀 쯔 쯕 쯘 쯜 쯤 쯧 쯩 쯪 찌 찍 찐 찔 찜 찝 찟 찡 찢 찦 찧 째 짹 짼 쨀 쨈 쨉 쨋 쨍 쨌 쨰 쨴 쩨 쩩 쩬 쩰 쩸 쩹 쩽 쪠 쬐 쬔 쬘 쬠 쬡 쬤 쮜 쯰 쯴 * Row 42: 쫘 쫙 쫜 쫠 쫭 쫬 쭤 쭹 쭸 쫴 쬈 쮀 아 악 안 앉 않 알 앍 앎 앒 앓 암 압 앗 앙 앝 앞 앟 았 야 약 얀 얃 얄 얇 얌 얍 얏 양 얕 얗 얐 어 억 언 얹 얻 얼 얽 얾 엄 업 없 엇 엉 엊 엌 엎 엏 었 여 역 연 엳 열 엶 엷 염 엽 엾 엿 영 옅 옆 옇 엮 였 오 옥 온 올 옭 옮 옰 옳 옴 옵 옷 옹 옻 옾 요 욕 * Row 43: 욘 욜 욤 욥 욧 용 우 욱 운 울 욹 욺 움 웁 웃 웅 유 육 윤 율 윰 윱 윳 융 윷 으 윽 은 읃 을 읅 읊 음 읍 읏 응 읒 읓 읔 읕 읖 읗 이 익 인 일 읽 읾 잃 임 입 잇 잉 잊 잎 있 애 액 앤 앨 앰 앱 앳 앵 앴 얘 얜 얠 얩 에 엑 엔 엘 엠 엡 엣 엥 엤 예 옌 옐 옘 옙 옛 옝 옜 외 왹 왼 욀 욈 욉 욋 욍 * Row 44: 위 윅 윈 윌 윔 윕 윗 윙 의 읜 읠 읨 읫 와 왁 완 왇 왈 왐 왑 왓 왕 왔 워 웍 원 월 웜 웝 웟 웡 웠 왜 왝 왠 왬 왯 왱 웨 웩 웬 웰 웸 웹 웻 웽 윁

Hanja sets (rows number 45 through 94)

The hanja at 69-09 (0xE5A9) is mapped to U+676E in all documented tables; characters are, however ordered according to their readings, from which it appears that it is intended to be U+67FF instead.

Extended non-syllable, non-hanja sets in KPS 9566:2011

Following are charts for the non-syllable, non-hanja section of KPS 9566-2011 outside of the main plane.

Extension set 0xE0 (symbols and pictographs)

Extension sets 0xE1, 0xE2, 0xE3 (unknown)

These extension sets map to the private use area. Their purpose is not documented.

Extension set 0xE4 (arrows)

This set includes several, mostly rightward arrows mapping to the Unicode Dingbats block and elsewhere.

Extension set 0xE5 (Roman superscripts and subscripts)

This row includes several lowercase Roman superscripts with trail bytes corresponding to their uppercase ASCII equivalents, and lowercase Roman subscripts with trail bytes corresponding to their lowercase ASCII equivalents.

Extension set 0xE6 (Greek and symbol superscripts and subscripts)

Extension set 0xE7 (further list markers)

Extension set 0xE8

Extension set 0xE9 (additional symbols and punctuation)

This set contains playing card suit symbols, various miscellaneous symbols, and halfwidth counterparts for some of the currency symbols in row 8. The

sign is also included, having been replaced in row 8 by the euro sign.

Extension set 0xEA (Japanese punctuation and additional jamo)

This set contains several punctuation marks used in Japan, and some characters from the

Unicode block which are not already included in row 4. This comprises some of the jamo characters present in

, but previously absent in KPS 9566.

Footnotes

References

External links

KPS 9566-97 code table
from

ISO-IR ISO/IEC 2022 ''Information technology—Character code structure and extension techniques'', is an ISO/ IEC standard (equivalent to the ECMA standard ECMA-35, the ANSI standard ANSI X3.41 and the Japanese Industrial Standard JIS X 0202) in the ...

registry
Three-way mappings between EUC-KP (KPS 9566), EUC-KR and Unicode as of 2000
(file in EUC-KR; note typographical error mapping 0xA1BA to rather than )
KPS 9566-2003 to Unicode mapping

KPS 9566-2011 code table and mapping
reverse engineered from

{{Hangul Jamo Encodings of Asian languages Korean-language computing Hangul