KS X 1001, "''Code for Information Interchange (Hangul and Hanja)''",
formerly called KS C 5601, is a South Korean
coded character set
Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers. The numerical values that ...
standard to represent
hangul
The Korean alphabet, known as Hangul, . Hangul may also be written as following South Korea's standard Romanization. ( ) in South Korea and Chosŏn'gŭl in North Korea, is the modern official writing system for the Korean language. The let ...
and
hanja
Hanja (Hangul: ; Hanja: , ), alternatively known as Hancha, are Chinese characters () used in the writing of Korean. Hanja was used as early as the Gojoseon period, the first ever Korean kingdom.
(, ) refers to Sino-Korean vocabulary, wh ...
characters on a computer.
KS X 1001 is encoded by the most common legacy (pre-
Unicode
Unicode, formally The Unicode Standard,The formal version reference is is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expre ...
)
character encoding
Character encoding is the process of assigning numbers to Graphics, graphical character (computing), characters, especially the written characters of Language, human language, allowing them to be Data storage, stored, Data communication, transmi ...
s for
Korean
Korean may refer to:
People and culture
* Koreans, ethnic group originating in the Korean Peninsula
* Korean cuisine
* Korean culture
* Korean language
**Korean alphabet, known as Hangul or Chosŏn'gŭl
**Korean dialects and the Jeju language
** ...
, including
EUC-KR
Extended Unix Code (EUC) is a multibyte character encoding system used primarily for Japanese, Korean, and simplified Chinese.
The most commonly used EUC codes are variable-length encodings with a character belonging to an compliant coded char ...
and Microsoft's
Unified Hangul Code
Unified Hangul Code (UHC), or Extended Wansung, also known under Microsoft Windows as Code Page 949 (Windows-949, MS949 or ambiguously CP949), is the Microsoft Windows code page for the Korean language. It is an extension of Wansung Code (KS C ...
(UHC). It contains Korean
Hangul
The Korean alphabet, known as Hangul, . Hangul may also be written as following South Korea's standard Romanization. ( ) in South Korea and Chosŏn'gŭl in North Korea, is the modern official writing system for the Korean language. The let ...
syllables, CJK ideographs (Hanja),
Greek
Greek may refer to:
Greece
Anything of, from, or related to Greece, a country in Southern Europe:
*Greeks, an ethnic group.
*Greek language, a branch of the Indo-European language family.
**Proto-Greek language, the assumed last common ancestor ...
,
Cyrillic
, bg, кирилица , mk, кирилица , russian: кириллица , sr, ћирилица, uk, кирилиця
, fam1 = Egyptian hieroglyphs
, fam2 = Proto-Sinaitic
, fam3 = Phoenician
, fam4 = G ...
, Japanese (
Hiragana
is a Japanese syllabary, part of the Japanese writing system, along with ''katakana'' as well as ''kanji''.
It is a phonetic lettering system. The word ''hiragana'' literally means "flowing" or "simple" kana ("simple" originally as contrast ...
and
Katakana
is a Japanese syllabary, one component of the Japanese writing system along with hiragana, kanji and in some cases the Latin script (known as rōmaji). The word ''katakana'' means "fragmentary kana", as the katakana characters are derived fr ...
) and some other characters.
KS X 1001 is arranged as a 94×94 table, following the structure of 2-byte code words in
ISO 2022
ISO/IEC 2022 ''Information technology—Character code structure and extension techniques'', is an ISO/ IEC standard (equivalent to the ECMA standard ECMA-35, the ANSI standard ANSI X3.41 and the Japanese Industrial Standard JIS X 0202) in the ...
and
EUC. Therefore, its
code point
In character encoding terminology, a code point, codepoint or code position is a numerical value that maps to a specific character. Code points usually represent a single grapheme—usually a letter, digit, punctuation mark, or whitespace—but ...
s are
pairs
Concentration, also known as Memory, Shinkei-suijaku (Japanese meaning "nervous breakdown"), Matching Pairs, Match Match, Match Up, Pelmanism, Pexeso or simply Pairs, is a card game in which all of the cards are laid face down on a surface and tw ...
of integers 1–94. However, some encodings (UHC and
Johab
KS X 1001, "''Code for Information Interchange (Hangul and Hanja)''", formerly called KS C 5601, is a South Korean coded character set standard to represent hangul and hanja characters on a computer.
KS X 1001 is encoded by the most common leg ...
), in addition to providing codes for every code point, provide additional codes for characters otherwise representable only as code point sequences.
History
This standard was previously known as KS C 5601. There have been several revisions of this standard. For example, there were revisions in 1987, 1992, 1998 and 2002.
The present, double-byte, Wansung ( ko, 완성, translit=Wanseong, lit=precomposing, label=none)
character set was standardised by the third edition of KS C 5601,
which was published in 1986.
It is an
ISO 2022
ISO/IEC 2022 ''Information technology—Character code structure and extension techniques'', is an ISO/ IEC standard (equivalent to the ECMA standard ECMA-35, the ANSI standard ANSI X3.41 and the Japanese Industrial Standard JIS X 0202) in the ...
compatible encoding, typically used in
EUC form, which assigns double-byte codes for non-Hangul, Hangul jamo, and the most common Hangul syllables, in contrast to Johab ( ko, 조합, translit=Johap, lit=combining, label=none)
which is not compatible with ISO 2022, but assigns double-byte codes to all Hangul syllables using modern jamo.
Wansung is technically a variable-length encoding, allowing other syllables to be represented with eight-byte sequences (using the jamo and Hangul Filler character), but this feature is not always implemented.
The earliest edition of KS C 5601, published in 1974,
defined a variable-length
7-bit character set which assigned single-byte code points to 51
basic
Hangul jamo
This is the list of Hangul ''jamo'' (Korean alphabet letters which represent consonants and vowels in Korean) including obsolete ones. This list contains Unicode code points.
In the lists below,
* code points in were added in Unicode 5.2. , somewhat analogously to
JIS C 6220, in an encoding known as "N-byte Hangul".
The second edition, published in 1982, retained the main character set from the 1974 edition but defined two supplementary sets, including a version of Johab. Neither edition was adopted as widely as intended.
Wansung was kept unchanged in the 1987 and 1992 editions. In the 1992 edition, additional annex material was added,
including the definition of the Johab encoding
in annex 3, and the older N-byte Hangul encoding in annex 4.
It was published in response to industry use of Johab as a competing encoding to Wansung, being used at the time by
Hangul Word Processor. Following the introduction of
Unified Hangul Code
Unified Hangul Code (UHC), or Extended Wansung, also known under Microsoft Windows as Code Page 949 (Windows-949, MS949 or ambiguously CP949), is the Microsoft Windows code page for the Korean language. It is an extension of Wansung Code (KS C ...
by Microsoft in
Windows 95
Windows 95 is a consumer-oriented operating system developed by Microsoft as part of its Windows 9x family of operating systems. The first operating system in the 9x family, it is the successor to Windows 3.1x, and was released to manufacturin ...
, and Hangul Word Processor abandoning Johab in favour of
Unicode
Unicode, formally The Unicode Standard,The formal version reference is is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expre ...
in 2000, Johab ceased to be commonly used.
Encodings
Encoding schemes of KS X 1001 include
EUC-KR
Extended Unix Code (EUC) is a multibyte character encoding system used primarily for Japanese, Korean, and simplified Chinese.
The most commonly used EUC codes are variable-length encodings with a character belonging to an compliant coded char ...
(in both
ASCII
ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of ...
and
ISO 646
ISO/IEC 646 is a set of ISO/IEC standards, described as ''Information technology — ISO 7-bit coded character set for information interchange'' and developed in cooperation with ASCII at least since 1964. Since its first edition in ...
-KR based variants, the latter of which includes a
won currency sign (
₩
The won sign , is a currency symbol. It represents the South Korean won, the North Korean won and, unofficially, the old Korean won.
Appearance
Its appearance is "W" (the first letter of "Won") with a horizontal strike going through the cent ...
) at byte
0x5C rather than a backslash) and
ISO-2022-KR
ISO/IEC 2022 ''Information technology—Character code structure and extension techniques'', is an ISO/IEC standard (equivalent to the ECMA standard ECMA-35, the ANSI standard ANSI X3.41 and the Japanese Industrial Standard JIS X 0202) in the f ...
,
as well as
ISO-2022-JP-2 (which also encodes
JIS X 0208
JIS X 0208 is a 2-byte character set specified as a Japanese Industrial Standards, Japanese Industrial Standard, containing 6879 graphic characters suitable for writing text, place names, personal names, and so forth in the Japanese language. Th ...
and
JIS X 0212
JIS X 0212 is a Japanese Industrial Standard defining a coded character set for encoding supplementary characters for use in Japanese. This standard is intended to supplement JIS X 0208 (Code page 952). It is numbered 953 or 5049 as an IBM code ...
). These all have the drawback that they only assign codes for the 2350 precomposed Hangul syllables which have their own KS X 1001 codepoints (out of 11172 in total, not counting those using obsolete jamo), and require others to use eight-byte composition sequences, which are not supported by some partial implementations of the standard.
The
Johab
KS X 1001, "''Code for Information Interchange (Hangul and Hanja)''", formerly called KS C 5601, is a South Korean coded character set standard to represent hangul and hanja characters on a computer.
KS X 1001 is encoded by the most common leg ...
encoding (stipulated in annex 3 of the 1992 version of the standard) and the EUC-KR superset known as
Unified Hangul Code
Unified Hangul Code (UHC), or Extended Wansung, also known under Microsoft Windows as Code Page 949 (Windows-949, MS949 or ambiguously CP949), is the Microsoft Windows code page for the Korean language. It is an extension of Wansung Code (KS C ...
(UHC, also called Windows-949) provide single codes for all 11172 Hangul syllables.
ISO-2022-KR and Johab are rarely used. Some operating systems extend this standard in other non-uniform ways, e.g. the EUC-KR extensions MacKorean on the
classic Mac OS
Mac OS (originally System Software; retronym: Classic Mac OS) is the series of operating systems developed for the Macintosh family of personal computers by Apple Computer from 1984 to 2001, starting with System 1 and ending with Mac OS 9. The ...
, and
IBM-949 by
IBM.
Hangul Filler
The Hangul Filler character is used to introduce eight-byte Hangul composition sequences
and to stand in for an absent element (usually an empty final) in such a sequence.
Unicode includes the Wansung code Hangul Filler in the
Hangul Compatibility Jamo
Hangul Compatibility Jamo is a Unicode block containing Hangul characters for compatibility with the South Korean national standard KS X 1001
KS X 1001, "''Code for Information Interchange (Hangul and Hanja)''", formerly called KS C 5601, ...
block for round-trip compatibility, but uses
its own system (with its own, differently used, filler characters) for composing Hangul. The KS X 1001 Hangul composition system is not used in Unicode, and the filler renders merely as an empty space; KS X 1001 composition sequences using modern jamo may be mapped to precomposed characters in Unicode.
This is not usually done with
Unified Hangul Code
Unified Hangul Code (UHC), or Extended Wansung, also known under Microsoft Windows as Code Page 949 (Windows-949, MS949 or ambiguously CP949), is the Microsoft Windows code page for the Korean language. It is an extension of Wansung Code (KS C ...
.
For round-trip compatibility, Unicode also includes the N-byte Hangul code Hangul Filler separately in the
Halfwidth and Fullwidth Forms
In CJK (Chinese, Japanese and Korean) computing, graphic characters are traditionally classed into fullwidth (in Taiwan and Hong Kong: 全形; in CJK: 全角) and halfwidth (in Taiwan and Hong Kong: 半形; in CJK: 半角) characters. Unlik ...
block, named the "Halfwidth Hangul Filler".
N-byte Hangul code
This is the N-byte Hangul code,
as specified by KS C 5601-1974 and by annex 4 of KS C 5601-1992. The second half of IBM's
Code page 1040 Code page 1040 (CCSID 1040), also known as Korean PC Data Extended, is a single byte character set (SBCS) used by IBM in its PC DOS operating system for Hangul. It is an extended version of the 8-bit form of the N-byte Hangul Code first specified ...
is a superset of this, assigning the characters
¢¬\~
(although not
£
) to the same locations as in
Code page 1041, while the unextended N-Byte Hangul (besides
C0 control code replacement graphics in some usage contexts, shared with IBM-1040) is Code page 891. Character 0x40/0xC0 is a Hangul Filler (see
above), used in combining sequences.
Similarly to its Japanese counterpart
JIS C 6220 (JIS X 0201), N-byte Hangul code could be used as a 7-bit encoding, with character allocations over the range
0x40 through 0x7C.
The chart below shows the code in an 8-bit environment with the high bit set (i.e. over 0xC0 through 0xFC), as it is used in e.g. code page 891 or 1040.
Wansung code charts
Following are the code charts for KS X 1001 in Wansung layout. Where a pair of hexadecimal numbers is given, the smaller is used when encoded over GL (0x21-0x7E), as in
ISO-2022-KR
ISO/IEC 2022 ''Information technology—Character code structure and extension techniques'', is an ISO/IEC standard (equivalent to the ECMA standard ECMA-35, the ANSI standard ANSI X3.41 and the Japanese Industrial Standard JIS X 0202) in the f ...
when the Korean set has been shifted to, and the larger is used in the more typical case of it being encoded over GR (0xA1-0xFE), as in
EUC-KR
Extended Unix Code (EUC) is a multibyte character encoding system used primarily for Japanese, Korean, and simplified Chinese.
The most commonly used EUC codes are variable-length encodings with a character belonging to an compliant coded char ...
or UHC.
Johab
KS X 1001, "''Code for Information Interchange (Hangul and Hanja)''", formerly called KS C 5601, is a South Korean coded character set standard to represent hangul and hanja characters on a computer.
KS X 1001 is encoded by the most common leg ...
changes the arrangement to encode all 11172 Hangul clusters separately and in order.
To illustrate vendor differences in implementation, multiple Unicode mappings are shown for some characters. Apple's
HangulTalk extensions to the Wansung plane (i.e. where both bytes are in the 0xA1-0xFE range) are shown, but other HangulTalk extension ranges are not. The additional codes for composed syllables in Unified Hangul Code, and IBM's extensions in
IBM-949, are also not shown, since both fall outside of the Wansung plane.
Lead bytes
Non-Hanja non-precomposed sets
Character set 0x21 / 0xA1 (row number 1, special characters)
This set contains punctuation and other symbols, excluding punctuation present in KS X 1003 (which is included in row 3). Encodings which combine KS X 1001 with single-byte ASCII may use alternative Unicode mapping to the
Halfwidth and Fullwidth Forms
In CJK (Chinese, Japanese and Korean) computing, graphic characters are traditionally classed into fullwidth (in Taiwan and Hong Kong: 全形; in CJK: 全角) and halfwidth (in Taiwan and Hong Kong: 半形; in CJK: 半角) characters. Unlik ...
block for the backslash. Unicode mapping of the wave dash (tilde dash) also differs between vendors, and may be U+301C (favoured by IBM and Apple)
or U+223C (favoured by Microsoft).
Compare the similar but not identical handling of the
JIS wave dash, and the handling of the tilde in the next row.
Except for the backslash, if two mappings are shown below, the first is used by Apple and the second is used by Microsoft.
Character set 0x22 / 0xA2 (row number 2, special characters)
This set contains additional punctuation and symbols. Similarly to the tilde character in the previous row, different mappings are used by Apple and Microsoft for the tilde character in this row (U+02DC by Apple, FF5E by Microsoft),
which is intended to be shown as a raised tilde, whereas the tilde in the previous row is intended to be shown in-line at dash height. Mapping of the
circled dot The circled dot, circumpunct, or circle with a point at its centre may refer to one or more of these glyphs or articles
Solar system
*One of many solar symbols used to represent the Sun
* (Planet symbol in astronomy)
**Gold (Alchemical_symbols# ...
also differs.
The euro and registered trademark sign were added to the standard in 1998, while the Korean postal mark (㉾) was added in 2002.
These three code points, as with the still-unused code points, have been put to use for other, non-standard, purposes by vendors, e.g. for boxed list markers by Apple.
Microsoft updated its
Unified Hangul Code
Unified Hangul Code (UHC), or Extended Wansung, also known under Microsoft Windows as Code Page 949 (Windows-949, MS949 or ambiguously CP949), is the Microsoft Windows code page for the Korean language. It is an extension of Wansung Code (KS C ...
implementation to add the 1998 additions including the euro sign, but did not add the Korean postal mark when it was added to the standard.
Character set 0x23 / 0xA3 (row number 3, basic Latin / ISO 646-KR)
This set corresponds to KS X 1003 (the
ISO 646
ISO/IEC 646 is a set of ISO/IEC standards, described as ''Information technology — ISO 7-bit coded character set for information interchange'' and developed in cooperation with ASCII at least since 1964. Since its first edition in ...
variant for Korean, a similar set to
ASCII
ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of ...
), but as two-byte codes preceded by 0x23 (or 0xA3 in GR-delegated (EUC) form). It includes the
English alphabet
The alphabet for Modern English is a Latin-script alphabet consisting of 26 letters, each having an upper- and lower-case form. The word ''alphabet'' is a compound of the first two letters of the Greek alphabet, '' alpha'' and '' beta''. ...
/
Basic Latin alphabet,
western Arabic numerals
Arabic numerals are the ten numerical digits: , , , , , , , , and . They are the most commonly used symbols to write decimal numbers. They are also used for writing numbers in other systems such as octal, and for writing identifiers such as ...
and punctuation.
Compare the Roman set of
JIS X 0201
JIS X 0201, a Japanese Industrial Standard developed in 1969 (then called JIS C 6220 until the JIS category reform), was the first Japanese electronic character set to become widely used. It is either a 7-bit encoding or an 8-bit encoding, altho ...
, which differs by including a
Yen sign
The yen and yuan sign, ¥, is a currency sign used for the Japanese yen and the Renminbi, Chinese yuan currency, currencies when writing in Latin scripts. This monetary symbol resembles a Latin letter Y with a single or double horizontal stroke. ...
rather than a
Won sign
The won sign , is a currency symbol. It represents the South Korean won, the North Korean won and, unofficially, the old Korean won.
Appearance
Its appearance is "W" (the first letter of "Won") with a horizontal strike going through the cent ...
. Contrast the third rows
of KPS 9566 and
of JIS X 0208, which follow the ISO 646 layout but only include letters and digits.
Encodings such as EUC-KR and UHC combine KS X 1001 with single-byte ASCII or KS X 1003, and hence use alternative Unicode mappings to the
Halfwidth and Fullwidth Forms
In CJK (Chinese, Japanese and Korean) computing, graphic characters are traditionally classed into fullwidth (in Taiwan and Hong Kong: 全形; in CJK: 全角) and halfwidth (in Taiwan and Hong Kong: 半形; in CJK: 半角) characters. Unlik ...
block for the double-byte representations of these characters.
Character set 0x24 / 0xA4 (row number 4, Hangul jamo)
This set includes modern Hangul consonants, followed by vowels, both ordered by South Korean collation customs, followed by obsolete consonants. When used individually, these characters map to the Unicode
Hangul Compatibility Jamo
Hangul Compatibility Jamo is a Unicode block containing Hangul characters for compatibility with the South Korean national standard KS X 1001
KS X 1001, "''Code for Information Interchange (Hangul and Hanja)''", formerly called KS C 5601, ...
block, and do not have a one-to-one mapping with the position-specific characters in the
Hangul Jamo
This is the list of Hangul ''jamo'' (Korean alphabet letters which represent consonants and vowels in Korean) including obsolete ones. This list contains Unicode code points.
In the lists below,
* code points in were added in Unicode 5.2. block. Compare with
row 4 of the North Korean KPS 9566. Character 04-52 is a Hangul Filler (see
above), used in combining sequences.
Character set 0x25 / 0xA5 (row number 5, Roman numerals and Greek)
This set contains
Roman numerals
Roman numerals are a numeral system that originated in ancient Rome and remained the usual way of writing numbers throughout Europe well into the Late Middle Ages. Numbers are written with combinations of letters from the Latin alphabet, eac ...
and basic support for the
Greek alphabet
The Greek alphabet has been used to write the Greek language since the late 9th or early 8th century BCE. It is derived from the earlier Phoenician alphabet, and was the earliest known alphabetic script to have distinct letters for vowels as we ...
, without diacritics or the
final sigma. Apple includes some additional punctuation in this row, as well as some black circled list markers continuing from those in row 6.
Contrast
row 6 of KPS 9566, which includes the same characters but in a different layout.
Character set 0x26 / 0xA6 (row number 6, box drawing)
This row contains characters for drawing boxes in a
semigraphic
Text-based semigraphics or pseudographics is a primitive method used in early text mode video hardware to emulate raster graphics without having to implement the logic for such a display mode.
There are two different ways to accomplish the emu ...
context. Apple also includes some black circled list markers.
Character set 0x27 / 0xA7 (row number 7, unit symbols)
This row contains unit symbols as single characters, including those which consist of multiple letters. Apple also includes some circled list markers continuing from those in row 8.
Compare and contrast with the repertoire of unit symbols included in
row 8 of KPS 9566.
Character set 0x28 / 0xA8 (row number 8, extended Latin, encircled, fractions)
Character set 0x29 / 0xA9 (row number 9, extended Latin, encircled, superscript and subscript)
Character set 0x2A / 0xAA (row number 10, Hiragana)
This set contains
Hiragana
is a Japanese syllabary, part of the Japanese writing system, along with ''katakana'' as well as ''kanji''.
It is a phonetic lettering system. The word ''hiragana'' literally means "flowing" or "simple" kana ("simple" originally as contrast ...
for writing the
Japanese language
is spoken natively by about 128 million people, primarily by Japanese people and primarily in Japan, the only country where it is the national language. Japanese belongs to the Japonic or Japanese- Ryukyuan language family. There have been ma ...
. Apple also includes some bracketed list markers continuing from those in row 9.
Compare
row 10 of KPS 9566, which uses the same layout. Compare and contrast
row 4 of JIS X 0208, which also uses the same layout, but in a different row.
Character set 0x2B / 0xAB (row number 11, Katakana)
This set contains
Katakana
is a Japanese syllabary, one component of the Japanese writing system along with hiragana, kanji and in some cases the Latin script (known as rōmaji). The word ''katakana'' means "fragmentary kana", as the katakana characters are derived fr ...
for writing the
Japanese language
is spoken natively by about 128 million people, primarily by Japanese people and primarily in Japan, the only country where it is the national language. Japanese belongs to the Japonic or Japanese- Ryukyuan language family. There have been ma ...
. However, the
Japanese long vowel mark, which is used in katakana text and included in row 1 of
JIS X 0208
JIS X 0208 is a 2-byte character set specified as a Japanese Industrial Standards, Japanese Industrial Standard, containing 6879 graphic characters suitable for writing text, place names, personal names, and so forth in the Japanese language. Th ...
, is not included.
[
] Apple also includes some bracketed list markers continuing from those in rows 9 and 10.
Compare
row 11 of KPS 9566, which uses the same layout. Compare and contrast
row 5 of JIS X 0208, which also uses the same layout, but in a different row.
Character set 0x2C / 0xAC (row number 12, Cyrillic)
This set contains the modern
Russian alphabet
The Russian alphabet (russian: ру́сский алфави́т, russkiy alfavit, , label=none, or russian: ру́сская а́збука, russkaya azbuka, label=none, more traditionally) is the script used to write the Russian language. I ...
, and is not necessarily sufficient to represent other forms of the
Cyrillic script
The Cyrillic script ( ), Slavonic script or the Slavic script, is a writing system used for various languages across Eurasia. It is the designated national script in various Slavic languages, Slavic, Turkic languages, Turkic, Mongolic languages, ...
. Apple also includes some black boxed list markers.
Compare
row 5 of KPS 9566 and
row 7 of JIS X 0208, which use the same layout (but in a different row).
Extended character set 0x2D / 0xAD (row number 13, Apple additional punctuation)
Precomposed Hangul sets (rows number 16 through 40)
Code points for precomposed Hangul are included in a continuous sorted block between code points 16-01 and 40-94 inclusive. Not all possible syllable clusters are included in this range. Compare
the different ordering and availability in KPS 9566.
Note that initial+vowel+final syllables 뢨, 썅, 쏀, 쓩, and 쭁 are included but their initial+vowel counterparts 뢔, 쌰, 쎼, 쓔, and 쬬 are not. This used to cause problems when inputting, because input methods have to go through an initial+vowel syllable first in order to input an initial+vowel+final syllable (e.g. ㅎ → 하 → 한).
Those which are not listed here may be represented using eight-byte composition sequences. All other modern-jamo clusters are assigned codes elsewhere by UHC. All possible modern-jamo clusters are assigned codes by Johab.
* Row 16: 가 각 간 갇 갈 갉 갊 감 갑 값 갓 갔 강 갖 갗 같 갚 갛 개 객 갠 갤 갬 갭 갯 갰 갱 갸 갹 갼 걀 걋 걍 걔 걘 걜 거 걱 건 걷 걸 걺 검 겁 것 겄 겅 겆 겉 겊 겋 게 겐 겔 겜 겝 겟 겠 겡 겨 격 겪 견 겯 결 겸 겹 겻 겼 경 곁 계 곈 곌 곕 곗 고 곡 곤 곧 골 곪 곬 곯 곰 곱 곳 공 곶 과 곽 관 괄 괆
* Row 17: 괌 괍 괏 광 괘 괜 괠 괩 괬 괭 괴 괵 괸 괼 굄 굅 굇 굉 교 굔 굘 굡 굣 구 국 군 굳 굴 굵 굶 굻 굼 굽 굿 궁 궂 궈 궉 권 궐 궜 궝 궤 궷 귀 귁 귄 귈 귐 귑 귓 규 균 귤 그 극 근 귿 글 긁 금 급 긋 긍 긔 기 긱 긴 긷 길 긺 김 깁 깃 깅 깆 깊 까 깍 깎 깐 깔 깖 깜 깝 깟 깠 깡 깥 깨 깩 깬 깰 깸
* Row 18: 깹 깻 깼 깽 꺄 꺅 꺌 꺼 꺽 꺾 껀 껄 껌 껍 껏 껐 껑 께 껙 껜 껨 껫 껭 껴 껸 껼 꼇 꼈 꼍 꼐 꼬 꼭 꼰 꼲 꼴 꼼 꼽 꼿 꽁 꽂 꽃 꽈 꽉 꽐 꽜 꽝 꽤 꽥 꽹 꾀 꾄 꾈 꾐 꾑 꾕 꾜 꾸 꾹 꾼 꿀 꿇 꿈 꿉 꿋 꿍 꿎 꿔 꿜 꿨 꿩 꿰 꿱 꿴 꿸 뀀 뀁 뀄 뀌 뀐 뀔 뀜 뀝 뀨 끄 끅 끈 끊 끌 끎 끓 끔 끕 끗 끙
* Row 19: 끝 끼 끽 낀 낄 낌 낍 낏 낑 나 낙 낚 난 낟 날 낡 낢 남 납 낫 났 낭 낮 낯 낱 낳 내 낵 낸 낼 냄 냅 냇 냈 냉 냐 냑 냔 냘 냠 냥 너 넉 넋 넌 널 넒 넓 넘 넙 넛 넜 넝 넣 네 넥 넨 넬 넴 넵 넷 넸 넹 녀 녁 년 녈 념 녑 녔 녕 녘 녜 녠 노 녹 논 놀 놂 놈 놉 놋 농 높 놓 놔 놘 놜 놨 뇌 뇐 뇔 뇜 뇝
* Row 20: 뇟 뇨 뇩 뇬 뇰 뇹 뇻 뇽 누 눅 눈 눋 눌 눔 눕 눗 눙 눠 눴 눼 뉘 뉜 뉠 뉨 뉩 뉴 뉵 뉼 늄 늅 늉 느 늑 는 늘 늙 늚 늠 늡 늣 능 늦 늪 늬 늰 늴 니 닉 닌 닐 닒 님 닙 닛 닝 닢 다 닥 닦 단 닫 달 닭 닮 닯 닳 담 답 닷 닸 당 닺 닻 닿 대 댁 댄 댈 댐 댑 댓 댔 댕 댜 더 덕 덖 던 덛 덜 덞 덟 덤 덥
* Row 21: 덧 덩 덫 덮 데 덱 덴 델 뎀 뎁 뎃 뎄 뎅 뎌 뎐 뎔 뎠 뎡 뎨 뎬 도 독 돈 돋 돌 돎 돐 돔 돕 돗 동 돛 돝 돠 돤 돨 돼 됐 되 된 될 됨 됩 됫 됴 두 둑 둔 둘 둠 둡 둣 둥 둬 뒀 뒈 뒝 뒤 뒨 뒬 뒵 뒷 뒹 듀 듄 듈 듐 듕 드 득 든 듣 들 듦 듬 듭 듯 등 듸 디 딕 딘 딛 딜 딤 딥 딧 딨 딩 딪 따 딱 딴 딸
* Row 22: 땀 땁 땃 땄 땅 땋 때 땍 땐 땔 땜 땝 땟 땠 땡 떠 떡 떤 떨 떪 떫 떰 떱 떳 떴 떵 떻 떼 떽 뗀 뗄 뗌 뗍 뗏 뗐 뗑 뗘 뗬 또 똑 똔 똘 똥 똬 똴 뙈 뙤 뙨 뚜 뚝 뚠 뚤 뚫 뚬 뚱 뛔 뛰 뛴 뛸 뜀 뜁 뜅 뜨 뜩 뜬 뜯 뜰 뜸 뜹 뜻 띄 띈 띌 띔 띕 띠 띤 띨 띰 띱 띳 띵 라 락 란 랄 람 랍 랏 랐 랑 랒 랖 랗
* Row 23: 래 랙 랜 랠 램 랩 랫 랬 랭 랴 략 랸 럇 량 러 럭 런 럴 럼 럽 럿 렀 렁 렇 레 렉 렌 렐 렘 렙 렛 렝 려 력 련 렬 렴 렵 렷 렸 령 례 롄 롑 롓 로 록 론 롤 롬 롭 롯 롱 롸 롼 뢍 뢨 뢰 뢴 뢸 룀 룁 룃 룅 료 룐 룔 룝 룟 룡 루 룩 룬 룰 룸 룹 룻 룽 뤄 뤘 뤠 뤼 뤽 륀 륄 륌 륏 륑 류 륙 륜 률 륨 륩
* Row 24: 륫 륭 르 륵 른 를 름 릅 릇 릉 릊 릍 릎 리 릭 린 릴 림 립 릿 링 마 막 만 많 맏 말 맑 맒 맘 맙 맛 망 맞 맡 맣 매 맥 맨 맬 맴 맵 맷 맸 맹 맺 먀 먁 먈 먕 머 먹 먼 멀 멂 멈 멉 멋 멍 멎 멓 메 멕 멘 멜 멤 멥 멧 멨 멩 며 멱 면 멸 몃 몄 명 몇 몌 모 목 몫 몬 몰 몲 몸 몹 못 몽 뫄 뫈 뫘 뫙 뫼
* Row 25: 묀 묄 묍 묏 묑 묘 묜 묠 묩 묫 무 묵 묶 문 묻 물 묽 묾 뭄 뭅 뭇 뭉 뭍 뭏 뭐 뭔 뭘 뭡 뭣 뭬 뮈 뮌 뮐 뮤 뮨 뮬 뮴 뮷 므 믄 믈 믐 믓 미 믹 민 믿 밀 밂 밈 밉 밋 밌 밍 및 밑 바 박 밖 밗 반 받 발 밝 밞 밟 밤 밥 밧 방 밭 배 백 밴 밸 뱀 뱁 뱃 뱄 뱅 뱉 뱌 뱍 뱐 뱝 버 벅 번 벋 벌 벎 범 법 벗
* Row 26: 벙 벚 베 벡 벤 벧 벨 벰 벱 벳 벴 벵 벼 벽 변 별 볍 볏 볐 병 볕 볘 볜 보 복 볶 본 볼 봄 봅 봇 봉 봐 봔 봤 봬 뵀 뵈 뵉 뵌 뵐 뵘 뵙 뵤 뵨 부 북 분 붇 불 붉 붊 붐 붑 붓 붕 붙 붚 붜 붤 붰 붸 뷔 뷕 뷘 뷜 뷩 뷰 뷴 뷸 븀 븃 븅 브 븍 븐 블 븜 븝 븟 비 빅 빈 빌 빎 빔 빕 빗 빙 빚 빛 빠 빡 빤
* Row 27: 빨 빪 빰 빱 빳 빴 빵 빻 빼 빽 뺀 뺄 뺌 뺍 뺏 뺐 뺑 뺘 뺙 뺨 뻐 뻑 뻔 뻗 뻘 뻠 뻣 뻤 뻥 뻬 뼁 뼈 뼉 뼘 뼙 뼛 뼜 뼝 뽀 뽁 뽄 뽈 뽐 뽑 뽕 뾔 뾰 뿅 뿌 뿍 뿐 뿔 뿜 뿟 뿡 쀼 쁑 쁘 쁜 쁠 쁨 쁩 삐 삑 삔 삘 삠 삡 삣 삥 사 삭 삯 산 삳 살 삵 삶 삼 삽 삿 샀 상 샅 새 색 샌 샐 샘 샙 샛 샜 생 샤
* Row 28: 샥 샨 샬 샴 샵 샷 샹 섀 섄 섈 섐 섕 서 석 섞 섟 선 섣 설 섦 섧 섬 섭 섯 섰 성 섶 세 섹 센 셀 셈 셉 셋 셌 셍 셔 셕 션 셜 셤 셥 셧 셨 셩 셰 셴 셸 솅 소 속 솎 손 솔 솖 솜 솝 솟 송 솥 솨 솩 솬 솰 솽 쇄 쇈 쇌 쇔 쇗 쇘 쇠 쇤 쇨 쇰 쇱 쇳 쇼 쇽 숀 숄 숌 숍 숏 숑 수 숙 순 숟 술 숨 숩 숫 숭
* Row 29: 숯 숱 숲 숴 쉈 쉐 쉑 쉔 쉘 쉠 쉥 쉬 쉭 쉰 쉴 쉼 쉽 쉿 슁 슈 슉 슐 슘 슛 슝 스 슥 슨 슬 슭 슴 습 슷 승 시 식 신 싣 실 싫 심 십 싯 싱 싶 싸 싹 싻 싼 쌀 쌈 쌉 쌌 쌍 쌓 쌔 쌕 쌘 쌜 쌤 쌥 쌨 쌩 썅 써 썩 썬 썰 썲 썸 썹 썼 썽 쎄 쎈 쎌 쏀 쏘 쏙 쏜 쏟 쏠 쏢 쏨 쏩 쏭 쏴 쏵 쏸 쐈 쐐 쐤 쐬 쐰
* Row 30: 쐴 쐼 쐽 쑈 쑤 쑥 쑨 쑬 쑴 쑵 쑹 쒀 쒔 쒜 쒸 쒼 쓩 쓰 쓱 쓴 쓸 쓺 쓿 씀 씁 씌 씐 씔 씜 씨 씩 씬 씰 씸 씹 씻 씽 아 악 안 앉 않 알 앍 앎 앓 암 압 앗 았 앙 앝 앞 애 액 앤 앨 앰 앱 앳 앴 앵 야 약 얀 얄 얇 얌 얍 얏 양 얕 얗 얘 얜 얠 얩 어 억 언 얹 얻 얼 얽 얾 엄 업 없 엇 었 엉 엊 엌 엎
* Row 31: 에 엑 엔 엘 엠 엡 엣 엥 여 역 엮 연 열 엶 엷 염 엽 엾 엿 였 영 옅 옆 옇 예 옌 옐 옘 옙 옛 옜 오 옥 온 올 옭 옮 옰 옳 옴 옵 옷 옹 옻 와 왁 완 왈 왐 왑 왓 왔 왕 왜 왝 왠 왬 왯 왱 외 왹 왼 욀 욈 욉 욋 욍 요 욕 욘 욜 욤 욥 욧 용 우 욱 운 울 욹 욺 움 웁 웃 웅 워 웍 원 월 웜 웝 웠 웡 웨
* Row 32: 웩 웬 웰 웸 웹 웽 위 윅 윈 윌 윔 윕 윗 윙 유 육 윤 율 윰 윱 윳 융 윷 으 윽 은 을 읊 음 읍 읏 응 읒 읓 읔 읕 읖 읗 의 읜 읠 읨 읫 이 익 인 일 읽 읾 잃 임 입 잇 있 잉 잊 잎 자 작 잔 잖 잗 잘 잚 잠 잡 잣 잤 장 잦 재 잭 잰 잴 잼 잽 잿 쟀 쟁 쟈 쟉 쟌 쟎 쟐 쟘 쟝 쟤 쟨 쟬 저 적 전 절 젊
* Row 33: 점 접 젓 정 젖 제 젝 젠 젤 젬 젭 젯 젱 져 젼 졀 졈 졉 졌 졍 졔 조 족 존 졸 졺 좀 좁 좃 종 좆 좇 좋 좌 좍 좔 좝 좟 좡 좨 좼 좽 죄 죈 죌 죔 죕 죗 죙 죠 죡 죤 죵 주 죽 준 줄 줅 줆 줌 줍 줏 중 줘 줬 줴 쥐 쥑 쥔 쥘 쥠 쥡 쥣 쥬 쥰 쥴 쥼 즈 즉 즌 즐 즘 즙 즛 증 지 직 진 짇 질 짊 짐 집 짓
* Row 34: 징 짖 짙 짚 짜 짝 짠 짢 짤 짧 짬 짭 짯 짰 짱 째 짹 짼 쨀 쨈 쨉 쨋 쨌 쨍 쨔 쨘 쨩 쩌 쩍 쩐 쩔 쩜 쩝 쩟 쩠 쩡 쩨 쩽 쪄 쪘 쪼 쪽 쫀 쫄 쫌 쫍 쫏 쫑 쫓 쫘 쫙 쫠 쫬 쫴 쬈 쬐 쬔 쬘 쬠 쬡 쭁 쭈 쭉 쭌 쭐 쭘 쭙 쭝 쭤 쭸 쭹 쮜 쮸 쯔 쯤 쯧 쯩 찌 찍 찐 찔 찜 찝 찡 찢 찧 차 착 찬 찮 찰 참 찹 찻
* Row 35: 찼 창 찾 채 책 챈 챌 챔 챕 챗 챘 챙 챠 챤 챦 챨 챰 챵 처 척 천 철 첨 첩 첫 첬 청 체 첵 첸 첼 쳄 쳅 쳇 쳉 쳐 쳔 쳤 쳬 쳰 촁 초 촉 촌 촐 촘 촙 촛 총 촤 촨 촬 촹 최 쵠 쵤 쵬 쵭 쵯 쵱 쵸 춈 추 축 춘 출 춤 춥 춧 충 춰 췄 췌 췐 취 췬 췰 췸 췹 췻 췽 츄 츈 츌 츔 츙 츠 측 츤 츨 츰 츱 츳 층
* Row 36: 치 칙 친 칟 칠 칡 침 칩 칫 칭 카 칵 칸 칼 캄 캅 캇 캉 캐 캑 캔 캘 캠 캡 캣 캤 캥 캬 캭 컁 커 컥 컨 컫 컬 컴 컵 컷 컸 컹 케 켁 켄 켈 켐 켑 켓 켕 켜 켠 켤 켬 켭 켯 켰 켱 켸 코 콕 콘 콜 콤 콥 콧 콩 콰 콱 콴 콸 쾀 쾅 쾌 쾡 쾨 쾰 쿄 쿠 쿡 쿤 쿨 쿰 쿱 쿳 쿵 쿼 퀀 퀄 퀑 퀘 퀭 퀴 퀵 퀸 퀼
* Row 37: 큄 큅 큇 큉 큐 큔 큘 큠 크 큭 큰 클 큼 큽 킁 키 킥 킨 킬 킴 킵 킷 킹 타 탁 탄 탈 탉 탐 탑 탓 탔 탕 태 택 탠 탤 탬 탭 탯 탰 탱 탸 턍 터 턱 턴 털 턺 텀 텁 텃 텄 텅 테 텍 텐 텔 템 텝 텟 텡 텨 텬 텼 톄 톈 토 톡 톤 톨 톰 톱 톳 통 톺 톼 퇀 퇘 퇴 퇸 툇 툉 툐 투 툭 툰 툴 툼 툽 툿 퉁 퉈 퉜
* Row 38: 퉤 튀 튁 튄 튈 튐 튑 튕 튜 튠 튤 튬 튱 트 특 튼 튿 틀 틂 틈 틉 틋 틔 틘 틜 틤 틥 티 틱 틴 틸 팀 팁 팃 팅 파 팍 팎 판 팔 팖 팜 팝 팟 팠 팡 팥 패 팩 팬 팰 팸 팹 팻 팼 팽 퍄 퍅 퍼 퍽 펀 펄 펌 펍 펏 펐 펑 페 펙 펜 펠 펨 펩 펫 펭 펴 편 펼 폄 폅 폈 평 폐 폘 폡 폣 포 폭 폰 폴 폼 폽 폿 퐁
* Row 39: 퐈 퐝 푀 푄 표 푠 푤 푭 푯 푸 푹 푼 푿 풀 풂 품 풉 풋 풍 풔 풩 퓌 퓐 퓔 퓜 퓟 퓨 퓬 퓰 퓸 퓻 퓽 프 픈 플 픔 픕 픗 피 픽 핀 필 핌 핍 핏 핑 하 학 한 할 핥 함 합 핫 항 해 핵 핸 핼 햄 햅 햇 했 행 햐 향 허 헉 헌 헐 헒 험 헙 헛 헝 헤 헥 헨 헬 헴 헵 헷 헹 혀 혁 현 혈 혐 협 혓 혔 형 혜 혠
* Row 40: 혤 혭 호 혹 혼 홀 홅 홈 홉 홋 홍 홑 화 확 환 활 홧 황 홰 홱 홴 횃 횅 회 획 횐 횔 횝 횟 횡 효 횬 횰 횹 횻 후 훅 훈 훌 훑 훔 훗 훙 훠 훤 훨 훰 훵 훼 훽 휀 휄 휑 휘 휙 휜 휠 휨 휩 휫 휭 휴 휵 휸 휼 흄 흇 흉 흐 흑 흔 흖 흗 흘 흙 흠 흡 흣 흥 흩 희 흰 흴 흼 흽 힁 히 힉 힌 힐 힘 힙 힛 힝
Hanja sets
Johab encoding
KS X 1001, since 1992, also defines an alternative encoding known as Johab. This represents a hangul syllable as the sequence of three five-bit values, split across two
8-bit bytes, most significant bit first. The most significant bit of the lead byte is always set (allowing combination with single-byte
ASCII
ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of ...
or
KS X 1003). This encoding is also used for the modern jamo from
row 4 of KS X 1001, by using the filler values for the other components. The Johab encoding for hangul is shown in the table below.
Johab encodes the remainder of KS X 1001 using lead bytes which do not correspond to an initial jamo (0xE0–0xF9 for
hanja
Hanja (Hangul: ; Hanja: , ), alternatively known as Hancha, are Chinese characters () used in the writing of Korean. Hanja was used as early as the Gojoseon period, the first ever Korean kingdom.
(, ) refers to Sino-Korean vocabulary, wh ...
and 0xD9–0xDE
for non-hanja, excluding hangul syllables and modern jamo), with trail bytes in the ranges 0x31–0x7E and 0x91–0xFE.
These codes are algorithmically mapped from the characters' KS X 1001 code points,
with two KS X 1001 rows per lead byte (compare and contrast
Shift JIS
Shift JIS (Shift Japanese Industrial Standards, also SJIS, MIME name Shift_JIS, known as PCK in Solaris contexts) is a character encoding for the Japanese language, originally developed by a Japanese company called ASCII Corporation in conjunctio ...
).
The ASCII-based Johab encoding is numbered Code page 1361 by Microsoft.
Other, vendor-defined, Johab variants also exist; for example,
IBM defines one for use as a
Shift Out
Shift Out (SO) and Shift In (SI) are ASCII control characters 14 and 15, respectively (0x0E and 0x0F). These are sometimes also called "Control-N" and "Control-O".
The original meaning of those characters provided a way to shift a coloured ribbon ...
set with
EBCDIC
Extended Binary Coded Decimal Interchange Code (EBCDIC; ) is an eight-bit character encoding used mainly on IBM mainframe and IBM midrange computer operating systems. It descended from the code used with punched cards and the corresponding six- ...
. That variant uses
shift in
Shift Out (SO) and Shift In (SI) are ASCII control characters 14 and 15, respectively (0x0E and 0x0F). These are sometimes also called "Control-N" and "Control-O".
The original meaning of those characters provided a way to shift a coloured ribbon ...
and
shift out
Shift Out (SO) and Shift In (SI) are ASCII control characters 14 and 15, respectively (0x0E and 0x0F). These are sometimes also called "Control-N" and "Control-O".
The original meaning of those characters provided a way to shift a coloured ribbon ...
to switch between a single-byte EBCDIC page and Johab, uses a different encoding for the non-hangul characters (using lead bytes 0x40–6C with a different layout), and uses lead bytes 0xD4–DD as a
user-defined region, but uses the same Johab layout as the 1992 standard for the hangul characters when in shift-out state.
IBM number the EBCDIC-based, stateful Johab encoding Code page 1364,
and also define a subset of that encoding, including fewer hangul characters but in the same layout, as Code page 933.
Some other vendors such as
Samsung
The Samsung Group (or simply Samsung) ( ko, 삼성 ) is a South Korean multinational manufacturing conglomerate headquartered in Samsung Town, Seoul, South Korea. It comprises numerous affiliated businesses, most of them united under the ...
or
GoldStar
GoldStar was a South Korean electronics company established in 1958. The corporate name was changed to LG Electronics and LG Cable on February 28, 1995, after merging with Lucky Chemical. LG Cable was spun off from LG Electronics and changed i ...
(now
LG) used other "Johab" encodings where the mappings of five-bit codes to jamo differ from the below, consequently not being compatible with the 1992 standard Johab. The table below corresponds to the 1992 standard and also to IBM usage.
Footnotes
References
External links
What are KS X 1001(KS C 5601) and other Hangul codes?Implementing Cross-Locale CJKV Code Conversionby Ken Lunde
* Unicode mapping tables for Wansung and Johab encodings:
*
IBM code page 970 (Wansung, EUC-KR format)*
Windows code page 949 (Unified Hangul Code / Extended Wansung)
*
Windows code page 1361 (Johab, ASCII-based version)
*
IBM code page 1364 (Johab, EBCDIC-based version)
{{Hangul Jamo
Encodings of Asian languages
Korean-language computing
Hangul
Computer-related introductions in 1987