HOME

TheInfoList



OR:

JIS X 0208 is a 2-byte
character set Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers. The numerical values tha ...
specified as a
Japanese Industrial Standard are the standards used for industrial activities in Japan, coordinated by the Japanese Industrial Standards Committee (JISC) and published by the Japanese Standards Association (JSA). The JISC is composed of many nationwide committees and play ...
, containing 6879 graphic characters suitable for writing text, place names, personal names, and so forth in the
Japanese language is spoken natively by about 128 million people, primarily by Japanese people and primarily in Japan, the only country where it is the national language. Japanese belongs to the Japonic or Japanese- Ryukyuan language family. There have been ...
. The official title of the current standard is . It was originally established as JIS C 6226 in 1978, and has been revised in 1983, 1990, and 1997. It is also called Code page 952 by IBM. The 1978 version is also called Code page 955 by IBM.


Scope of use and compatibility

The character set JIS X 0208 establishes is primarily for the purpose of between data processing systems and the devices connected to them, or mutually between data communication systems. This character set can be used for data processing and text processing. Partial implementations of the character set are not considered compatible. Because there are places where such things have happened as the original drafting committee of the first standard taking care to separate characters between level 1 and level 2 and the second standard then shuffling some variant characters (異体字, '' itaiji'') between the levels, at least in the first and second standards, it is conjectured that non-
kanji are the logographic Chinese characters taken from the Chinese script and used in the writing of Japanese. They were made a major part of the Japanese writing system during the time of Old Japanese and are still used, along with the subsequ ...
and level 1-only implementation Japanese computer systems were at one time considered for development. However, such implementations have never been specified as compatible, though examples such as the early
NEC PC-9801 The , commonly shortened to PC-98 or , is a lineup of Japanese 16-bit and 32-bit personal computers manufactured by NEC from 1982 to 2000. The platform established NEC's dominance in the Japanese personal computer market, and, by 1999, more th ...
did exist. Even though there are provisions in the JIS X 0208:1997 standard concerning compatibility, at the present time, it is generally considered that this standard neither certifies compatibility nor is it an official manufacturing standard that amounts to a declaration of self-compatibility. Consequently, ''de facto'', JIS X 0208-"compatible" products are not considered to exist. Terminology such as and is included in JIS X 0208, but the semantics of these terms vary from person to person.


Code charts


Lead byte

The first encoding byte corresponds to the row or cell number plus 0x20, or 32 in decimal (see below). Hence, the code set starting with 0x21 has a row number of 1, and its cell 1 has a continuation byte of 0x21 (or 33), and so forth. For lead bytes used for characters other than
kanji are the logographic Chinese characters taken from the Chinese script and used in the writing of Japanese. They were made a major part of the Japanese writing system during the time of Old Japanese and are still used, along with the subsequ ...
, links are provided to charts on this page listing the characters encoded under that lead byte. For lead bytes used for kanji, links are provided to the appropriate section of
Wiktionary Wiktionary ( , , rhyming with "dictionary") is a multilingual, web-based project to create a free content dictionary of terms (including words, phrases, proverbs, linguistic reconstructions, etc.) in all natural languages and in a num ...
's kanji index.


Non-Kanji rows


Character set 0x21 (row number 1, special characters)

Some vendors use slightly different Unicode mapping for this set than the one below. For example,
Microsoft Microsoft Corporation is an American multinational technology corporation producing computer software, consumer electronics, personal computers, and related services headquartered at the Microsoft Redmond campus located in Redmond, Washi ...
maps kuten 1-29 (JIS 0x213D) to U+2015 (Horizontal Bar), (codes in Shift_JIS format; SJIS 0x815C = 1-29 = JIS 0x213D; SJIS 0x817C = 1-61 = JIS 0x215D) whereas
Apple An apple is an edible fruit produced by an apple tree (''Malus domestica''). Apple trees are cultivated worldwide and are the most widely grown species in the genus '' Malus''. The tree originated in Central Asia, where its wild ancest ...
maps it to U+2014 (Em Dash). (codes in Shift_JIS format; SJIS 0x815C = 1-29 = JIS 0x213D; SJIS 0x817C = 1-61 = JIS 0x215D) Similarly, Microsoft maps kuten 1-61 (JIS 0x215D) to U+FF0D (the fullwidth form of U+002D Hyphen-Minus), and Apple maps it to U+2212 (Minus Sign). Unicode mapping of the wave dash also differs between vendors. See the cells with footnotes below. ASCII and JISCII punctuation (shown here with a yellow background) may use alternative mappings to the
Halfwidth and Fullwidth Forms In CJK (Chinese, Japanese and Korean) computing, graphic characters are traditionally classed into fullwidth (in Taiwan and Hong Kong: 全形; in CJK: 全角) and halfwidth (in Taiwan and Hong Kong: 半形; in CJK: 半角) characters. Unlike ...
block if used in an encoding which combines JIS X 0208 with
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
or with JIS X 0201, such as
Shift JIS Shift JIS (Shift Japanese Industrial Standards, also SJIS, MIME name Shift_JIS, known as PCK in Solaris contexts) is a character encoding for the Japanese language, originally developed by a Japanese company called ASCII Corporation in conjuncti ...
,
EUC-JP Extended Unix Code (EUC) is a multibyte character encoding system used primarily for Japanese, Korean, and simplified Chinese. The most commonly used EUC codes are variable-length encodings with a character belonging to an compliant coded char ...
or ISO 2022-JP.


Character set 0x22 (row number 2, special characters)

Most of the characters in this set were added in 1983, except for characters 0x2221–0x222E (kuten 2-1 through 2-14, or the first line of the chart below), which were included in the original 1978 version of the standard.


Character set 0x23 (row number 3, digits and Roman)

This set includes a subset of the
ISO 646 ISO/IEC 646 is a set of ISO/IEC standards, described as ''Information technology — ISO 7-bit coded character set for information interchange'' and developed in cooperation with ASCII at least since 1964. Since its first edition in ...
invariant set (and therefore also a subset of both
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
and the JIS X 0201 Roman set), minus punctuation and symbols, comprising
western Arabic numerals Arabic numerals are the ten numerical digits: , , , , , , , , and . They are the most commonly used symbols to write decimal numbers. They are also used for writing numbers in other systems such as octal, and for writing identifiers such as ...
and both cases of the Basic Latin alphabet. Characters in this set may use alternative Unicode mappings to the
Halfwidth and Fullwidth Forms In CJK (Chinese, Japanese and Korean) computing, graphic characters are traditionally classed into fullwidth (in Taiwan and Hong Kong: 全形; in CJK: 全角) and halfwidth (in Taiwan and Hong Kong: 半形; in CJK: 半角) characters. Unlike ...
block if used in an encoding which combines JIS X 0208 with ASCII or with JIS X 0201, such as
EUC-JP Extended Unix Code (EUC) is a multibyte character encoding system used primarily for Japanese, Korean, and simplified Chinese. The most commonly used EUC codes are variable-length encodings with a character belonging to an compliant coded char ...
,
Shift JIS Shift JIS (Shift Japanese Industrial Standards, also SJIS, MIME name Shift_JIS, known as PCK in Solaris contexts) is a character encoding for the Japanese language, originally developed by a Japanese company called ASCII Corporation in conjuncti ...
or ISO 2022-JP. Compare row 3 of KPS 9566, which this row exactly matches. Compare and contrast row 3 of KS X 1001 and of GB 2312, which include their entire national variants of
ISO 646 ISO/IEC 646 is a set of ISO/IEC standards, described as ''Information technology — ISO 7-bit coded character set for information interchange'' and developed in cooperation with ASCII at least since 1964. Since its first edition in ...
in this row, rather than only the alphanumeric subset.


Character set 0x24 (row number 4, Hiragana)

This row contains Japanese
Hiragana is a Japanese syllabary, part of the Japanese writing system, along with ''katakana'' as well as ''kanji''. It is a phonetic lettering system. The word ''hiragana'' literally means "flowing" or "simple" kana ("simple" originally as contras ...
. Compare row 4 of GB 2312, which matches this row. Compare and contrast row 10 of KPS 9566 and of KS X 1001, which use the same layout, but in a different row.


Character set 0x25 (row number 5, Katakana)

This row contains Japanese
Katakana is a Japanese syllabary, one component of the Japanese writing system along with hiragana, kanji and in some cases the Latin script (known as rōmaji). The word ''katakana'' means "fragmentary kana", as the katakana characters are derived f ...
. Compare row 5 of GB 2312, which matches this row. Compare and contrast row 11 of KPS 9566 and of KS X 1001, which use the same layout, but in a different row. Contrast the considerably different Katakana layout used by JIS X 0201.


Character set 0x26 (row number 6, Greek)

This row contains basic support for the modern
Greek alphabet The Greek alphabet has been used to write the Greek language since the late 9th or early 8th century BCE. It is derived from the earlier Phoenician alphabet, and was the earliest known alphabetic script to have distinct letters for vowels as ...
, without diacritics or the
final sigma Sigma (; uppercase Σ, lowercase σ, lowercase in word-final position ς; grc-gre, σίγμα) is the eighteenth letter of the Greek alphabet. In the system of Greek numerals, it has a value of 200. In general mathematics, uppercase Σ is used a ...
. Compare row 6 of GB 2312 and GB 12345 and row 6 of KPS 9566, which include the same Greek letters in the same layout, although GB 12345 adds vertical presentation forms and KPS 9566 adds Roman numerals. Compare and contrast row 5 of KS X 1001, which offsets the Greek letters to include the Roman numerals first.


Character set 0x27 (row number 7, Cyrillic)

This row contains the modern
Russian alphabet The Russian alphabet (russian: ру́сский алфави́т, russkiy alfavit, , label=none, or russian: ру́сская а́збука, russkaya azbuka, label=none, more traditionally) is the script used to write the Russian language. I ...
and is not necessarily sufficient for representing other forms of the
Cyrillic script The Cyrillic script ( ), Slavonic script or the Slavic script, is a writing system used for various languages across Eurasia. It is the designated national script in various Slavic, Turkic, Mongolic, Uralic, Caucasian and Iranic-speaking c ...
. Compare row 7 of GB 2312, which matches this row. Compare and contrast row 12 of KS X 1001 and row 5 of KPS 9566, which use the same layout (but in a different row).


Character set 0x28 (row number 8, box drawing)

All characters in this set were added in 1983, and were not present in the original 1978 revision of the standard.


Extension character set 0x2D (row number 13, NEC special characters)

Rows 9 through 15 of the JIS X 0208 standard are left empty. However, the following layout for row 13, first introduced by
NEC is a Japanese multinational information technology and electronics corporation, headquartered in Minato, Tokyo. The company was known as the Nippon Electric Company, Limited, before rebranding in 1983 as NEC. It provides IT and network soluti ...
, is a common extension. It is used (with minor variations, noted in footnotes) by Windows-932 (which is matched by the
WHATWG The Web Hypertext Application Technology Working Group (WHATWG) is a community of people interested in evolving HTML and related technologies. The WHATWG was founded by individuals from Apple Inc., the Mozilla Foundation and Opera Software, l ...
Encoding Standard used by
HTML5 HTML5 is a markup language used for structuring and presenting content on the World Wide Web. It is the fifth and final major HTML version that is a World Wide Web Consortium (W3C) recommendation. The current specification is known as the HTML ...
), by the PostScript variant (but, since
KanjiTalk KanjiTalk was the name given by Apple to its Japanese language localization of the classic Mac OS. It consisted of translated applications, a set of Japanese fonts, and a Japanese input method called Kotoeri. The software was sold and supported ...
version 7, not the regular variant) of
MacJapanese Shift JIS (Shift Japanese Industrial Standards, also SJIS, MIME name Shift_JIS, known as PCK in Oracle Solaris, Solaris contexts) is a character encoding for the Japanese language, originally developed by a Japanese company called ASCII Corporati ...
, and by JIS X 0213 (the successor to JIS X 0208). Unlike the other extensions made by Windows-932/WHATWG and JIS X 0213, the two match rather than colliding, so decoding of most of this row is better supported than the other extensions made by JIS X 0213.


Kanji rows


Code structure

In order to represent
code point In character encoding terminology, a code point, codepoint or code position is a numerical value that maps to a specific character. Code points usually represent a single grapheme—usually a letter, digit, punctuation mark, or whitespace—but ...
s, column/line numbers are used for one-byte codes and ''kuten'' numbers are used for two-byte codes. For a way to identify a character without depending on a code, character names are used.


Single byte codes

Almost all JIS X 0208 graphic character codes are represented with two bytes of at least seven bits each. However, every
control character In computing and telecommunication, a control character or non-printing character (NPC) is a code point (a number) in a character set, that does not represent a written symbol. They are used as in-band signaling to cause effects other than the ...
, as well as the plain
space Space is the boundless three-dimensional extent in which objects and events have relative position and direction. In classical physics, physical space is often conceived in three linear dimensions, although modern physicists usually consi ...
– although not the
ideographic space In computer programming, whitespace is any character or series of characters that represent horizontal or vertical space in typography. When rendered, a whitespace character does not correspond to a visible mark, but typically does occupy an area ...
– is represented with a one-byte code. In order to represent the of a one-byte code, two decimal numbers – a column number and a line number – are used. Three high-order bits out of seven or four high-order bits out of eight, counting from zero to seven or from zero to fifteen respectively, form the column number. Four low-order bits counting from zero to fifteen form the line number. Each decimal number corresponds to one
hexadecimal In mathematics and computing, the hexadecimal (also base-16 or simply hex) numeral system is a positional numeral system that represents numbers using a radix (base) of 16. Unlike the decimal system representing numbers using 10 symbols, he ...
digit. For example, the bit combination corresponding to the graphic character "space" is 010 0000 as a 7-bit number, and 0010 0000 as an 8-bit number. In column/line notation, this is represented as 2/0. Other representations of the same single-byte code include 0x20 as hexadecimal, or 32 as a single decimal number.


Code points and code numbers

The double-byte codes are laid out in 94 numbered groups, each called a . Every row contains 94 numbered codes, each called a . This makes a total of 8836 (94 × 94) possible code points (although not all are assigned, see below); these are laid out in the standard in a 94-line, 94-column code table. A row number and a cell number (each numbered from 1 to 94, for a standard JIS X 0208 code) form a point, which is used to represent double-byte code points. A is expressed in the form "row-cell", the row and cell numbers being separated by a
hyphen The hyphen is a punctuation mark used to join words and to separate syllables of a single word. The use of hyphens is called hyphenation. ''Son-in-law'' is an example of a hyphenated word. The hyphen is sometimes confused with dashes ( figure ...
. For example, the character "" has a code point at row 16, cell 1, so its code number is represented as "16-01". In 7-bit JIS X 0208 (as might be switched to in JIS X 0202 /
ISO-2022-JP ISO/IEC 2022 ''Information technology—Character code structure and extension techniques'', is an ISO/IEC standard (equivalent to the ECMA standard ECMA-35, the ANSI standard ANSI X3.41 and the Japanese Industrial Standard JIS X 0202) in the ...
), both bytes must be from the 94-byte range of 0x21 (used for row or cell number 1) through 0x7E (used for row or cell number 94) – exactly corresponding to the range used for 7-bit ASCII printing characters, not counting the space. Accordingly, the encoded bytes are obtained by adding 0x20 (32) to each number. For instance, the above example of 16-01 ("亜") would be represented by the bytes 0x30 0x21. The 8-bit
EUC-JP Extended Unix Code (EUC) is a multibyte character encoding system used primarily for Japanese, Korean, and simplified Chinese. The most commonly used EUC codes are variable-length encodings with a character belonging to an compliant coded char ...
instead uses the range 0xA1 through 0xFE (setting the high bit to 1), whereas other encodings such as
Shift JIS Shift JIS (Shift Japanese Industrial Standards, also SJIS, MIME name Shift_JIS, known as PCK in Solaris contexts) is a character encoding for the Japanese language, originally developed by a Japanese company called ASCII Corporation in conjuncti ...
use more complicated transforms. Shift JIS includes more encoding space than is needed for JIS X 0208 itself; some Shift JIS specific extensions to JIS X 0208 make use of row numbers above 94. This structure is also used in the Mainland Chinese GB 2312, where it is natively known as zh, s= 区位, hp=qūwèi, labels=no, and the South Korean KS C 5601 (currently
KS X 1001 KS X 1001, "''Code for Information Interchange (Hangul and Hanja)''", formerly called KS C 5601, is a South Korean coded character set standard to represent hangul and hanja characters on a computer. KS X 1001 is encoded by the most common l ...
), where the ''ku'' and ''ten'' are respectively known as ''hang'' () and ''yol'' (). The later JIS X 0213 extends this structure by having more than one of rows, which is also the structure used by
CNS 11643 The CNS 11643 character set (Chinese National Standard 11643), also officially known as the Chinese Standard Interchange Code or CSIC ( zh, tr=, t=中文標準交換碼), is officially the standard character set of Taiwan (Republic of China). In p ...
, and related to the structure used by CCCII.


Unassigned code points

Among the 2-byte codes, rows 9 to 15 and 85 to 94 are ; that is, they are code points with no characters assigned to them. Also, some cells in other rows are also essentially unassigned code points. These empty areas contain code points that should basically not be used. Except when there is prior agreement among the relevant parties, characters ( gaiji) for information interchange should not be assigned to the unassigned code points. Even when assigning characters to unassigned code points, graphic characters defined in the standard should not be assigned to them, and the same character should not be assigned to multiple unassigned code points; characters should not be duplicated in the set. Furthermore, when assigning characters to unassigned code points, it is necessary to be cautious of unification in regards to kanji glyphs. For example, row 25 cell 66 corresponds to the kanji meaning "high" or "expensive"; both the form with a component resembling the "mouth" character () in the middle () and the less common form with a ladder-like construction in the same location () are subsumed into the same code point. Consequently, limiting point 25-66 to the "mouth" form and assigning the latter "ladder" form to an unassigned code point would technically be in violation of the standard. In practice, however, several vendor-specific
Shift JIS Shift JIS (Shift Japanese Industrial Standards, also SJIS, MIME name Shift_JIS, known as PCK in Solaris contexts) is a character encoding for the Japanese language, originally developed by a Japanese company called ASCII Corporation in conjuncti ...
variants, including Windows-932 and
MacJapanese Shift JIS (Shift Japanese Industrial Standards, also SJIS, MIME name Shift_JIS, known as PCK in Oracle Solaris, Solaris contexts) is a character encoding for the Japanese language, originally developed by a Japanese company called ASCII Corporati ...
, encode vendor extensions in unallocated rows of the encoding space for JIS X 0208. Also, most of the codes unassigned in JIS X 0208 are assigned by the newer JIS X 0213 standard.


Character names

Each JIS X 0208 character is given a name. By using a character's name, it is possible to identify characters without relying on their codes. The names of characters are coordinated with other character set standards, notably the
Universal Coded Character Set The Universal Coded Character Set (UCS, Unicode) is a standard set of character (computing), characters defined by the international standard International Organization for Standardization, ISO/International Electrotechnical Commission, IEC  ...
(UCS/
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, ...
), so this is one possible source of character mappings to character sets such as Unicode. For example, both the character at
ISO/IEC 646 ISO/IEC 646 is a set of ISO/IEC standards, described as ''Information technology — ISO 7-bit coded character set for information interchange'' and developed in cooperation with ASCII at least since 1964. Since its first edition in ...
International Reference Version (
US-ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of ...
) column 4 line 1 and the one at JIS X 0208 row 3 cell 33 have the name "LATIN CAPITAL LETTER A". Therefore, the character at 4/1 in ASCII and the character at 3-33 in JIS X 0208 can be regarded as the same character (although, in practice, alternative mapping is used for the JIS X 0208 character due to encodings providing ASCII separately). Conversely, ASCII characters 2/2 (quotation mark), 2/7 (apostrophe), 2/13 (hyphen-minus), and 7/14 (tilde) can be determined to be characters that do not exist in this standard. Character names of non-kanji characters use uppercase Roman letters, spaces, and hyphens. Non-kanji characters are given a , but some provisions for these names do not exist. The names of kanji, on the other hand, are mechanically set according to the corresponding hexadecimal representation of their code in UCS/Unicode. The name of a kanji can be arrived at by prepending the Unicode codepoint with "CJK UNIFIED IDEOGRAPH-". For example, row 16 cell 1 () corresponds to U+4E9C in UCS, so the name of it would be "CJK UNIFIED IDEOGRAPH-4E9C". Kanji are not given Japanese common names.


Kanji set


Overview

JIS X 0208 prescribes a set of 6879 graphical characters that correspond to two-byte codes with either seven or eight bits to the byte; in JIS X 0208, this is called the , which includes 6355 kanji as well as 524 , including characters such as
Latin letters The Latin script, also known as Roman script, is an alphabetic writing system based on the letters of the classical Latin alphabet, derived from a form of the Greek alphabet which was in use in the ancient Greek city of Cumae, in southern ...
,
kana The term may refer to a number of syllabaries used to write Japanese phonological units, morae. Such syllabaries include (1) the original kana, or , which were Chinese characters ( kanji) used phonetically to transcribe Japanese, the most ...
, and so forth. ;Special characters :Occupies rows 1 and 2. There are 18 such as the "ideographic space" (  ), and the Japanese
comma The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline ...
and period; eight
diacritical marks A diacritic (also diacritical mark, diacritical point, diacritical sign, or accent) is a glyph added to a letter or to a basic glyph. The term derives from the Ancient Greek (, "distinguishing"), from (, "to distinguish"). The word ''diacriti ...
such as
dakuten and handakuten The , colloquially , is a diacritic most often used in the Japanese kana syllabaries to indicate that the consonant of a syllable should be pronounced voiced, for instance, on sounds that have undergone rendaku (sequential voicing). The ...
; 10 characters for such as the
Iteration mark Iteration marks are characters or punctuation marks that represent a duplicated character or word. Chinese In Chinese, (usually appearing as ) or is used in casual writing to represent a doubled character. However, it is not used in formal wri ...
; 22 ; 45 ; and 32 unit symbols, which includes the
currency sign A currency symbol or currency sign is a graphic symbol used to denote a currency unit. Usually it is defined by the monetary authority, like the national central bank for the currency concerned. In formatting, the symbol can use various format ...
and the postal mark, for a total of 147 characters. ;
Numerals A numeral is a figure, symbol, or group of figures or symbols denoting a number. It may refer to: * Numeral system used in mathematics * Numeral (linguistics), a part of speech denoting numbers (e.g. ''one'' and ''first'' in English) * Numerical d ...
:Occupies part of row 3. The ten digits from "0" to "9". ;
Latin letters The Latin script, also known as Roman script, is an alphabetic writing system based on the letters of the classical Latin alphabet, derived from a form of the Greek alphabet which was in use in the ancient Greek city of Cumae, in southern ...
:Occupies part of row 3. The 26 letters of the English alphabet in uppercase and lowercase form for a total of 52. ;
Hiragana is a Japanese syllabary, part of the Japanese writing system, along with ''katakana'' as well as ''kanji''. It is a phonetic lettering system. The word ''hiragana'' literally means "flowing" or "simple" kana ("simple" originally as contras ...
:Occupies row 4. Contains 48 unvoiced kana (including the obsolete '' wi'' and '' we''), 20 voiced kana (
dakuten The , colloquially , is a diacritic most often used in the Japanese kana syllabaries to indicate that the consonant of a syllable should be pronounced voiced, for instance, on sounds that have undergone rendaku (sequential voicing). The , ...
), 5 semi-voiced kana ( handakuten), 10 small kana for palatalized and assimilated sounds, for a total of 83 characters. ;
Katakana is a Japanese syllabary, one component of the Japanese writing system along with hiragana, kanji and in some cases the Latin script (known as rōmaji). The word ''katakana'' means "fragmentary kana", as the katakana characters are derived f ...
:Occupies row 5. There are 86 characters; in addition to the katakana equivalents of the hiragana characters, the small ''ka''/''ke'' kana (/) and the '' vu'' kana (). ;
Greek letters The Greek alphabet has been used to write the Greek language since the late 9th or early 8th century BCE. It is derived from the earlier Phoenician alphabet, and was the earliest known alphabetic script to have distinct letters for vowels as we ...
:Occupies row 6. The 24 letters of the Greek alphabet in uppercase and lowercase form (minus the final
sigma Sigma (; uppercase Σ, lowercase σ, lowercase in word-final position ς; grc-gre, σίγμα) is the eighteenth letter of the Greek alphabet. In the system of Greek numerals, it has a value of 200. In general mathematics, uppercase Σ is used a ...
) for a total of 48. ;
Cyrillic letters The Cyrillic script ( ), Slavonic script or the Slavic script, is a writing system used for various languages across Eurasia. It is the designated national script in various Slavic, Turkic, Mongolic, Uralic, Caucasian and Iranic-speaking c ...
:Occupies row 7. The 33 letters of the
Russian alphabet The Russian alphabet (russian: ру́сский алфави́т, russkiy alfavit, , label=none, or russian: ру́сская а́збука, russkaya azbuka, label=none, more traditionally) is the script used to write the Russian language. I ...
in uppercase and lowercase form for a total of 66. ;
Box-drawing character Box-drawing characters, also known as line-drawing characters, are a form of semigraphics widely used in text user interfaces to draw various geometric frames and boxes. Box-drawing characters typically only work well with monospaced fonts. ...
s :Occupies row 8. Thin segments, thick segments, and mixed thin and thick segments, 32 total. ;
Kanji are the logographic Chinese characters taken from the Chinese script and used in the writing of Japanese. They were made a major part of the Japanese writing system during the time of Old Japanese and are still used, along with the subsequ ...
:The 2965 characters of from row 16 to row 47, and the 3390 characters of from row 48 to row 84 for a total of 6355.


Special characters, numerals, and Latin characters

As for the special characters in the kanji set, some characters from the graphic character set of the International Reference Version (IRV) of
ISO/IEC 646 ISO/IEC 646 is a set of ISO/IEC standards, described as ''Information technology — ISO 7-bit coded character set for information interchange'' and developed in cooperation with ASCII at least since 1964. Since its first edition in ...
:1991 (equivalent to
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
) are absent from JIS X 0208. There are the aforementioned four characters "QUOTATION MARK", "APOSTROPHE", "HYPHEN-MINUS", and "TILDE". The former three are split into different code points in the kanji set (Nishimura, 1978; JIS X 0221-1:2001 standard, Section 3.8.7). The "TILDE" of IRV has no corresponding character in the kanji set. In the following table, the ISO/IEC 646:1991 IRV characters in question are compared with their multiple equivalents in JIS X 0208, except for the IRV character "TILDE", which is compared with the "WAVE DASH" of JIS X 0208. The entries under the "Symbol" columns utilize UCS/Unicode code points, so the specifics of display may differ. The ASCII/IRV characters without exact JIS X 0208 equivalents were later assigned code points by JIS X 0213, these are also listed below, as are Microsoft's mapping of the four characters. This means that the kanji set is the most widespread non-upward-compatible character set in the world; it is counted as one of the weak points of this standard. Even with the 90 special characters, numerals, and Latin letters the kanji set and the IRV set have in common, this standard does not follow the arrangement of ISO/IEC 646. These 90 characters are split between rows 1 (punctuation) and 3 (letters and numbers), although row 3 does follow ISO 646 arrangement for the 62 letters and numbers alone (e.g. 4/1 ("A") in ISO 646 becomes 2/3 4/1 (i.e. 3-33) in JIS X 0208). As to the cause of how these numerals, Latin letters, and so forth in the kanji set are the and how the original implementation came forth with a differing interpretation compared to the IRV, it is thought that it is due to these incompatibilities. Ever since the first standard, it has been possible to represent such as encircled numbers, ligatures for measurement unit names, and
Roman numerals Roman numerals are a numeral system that originated in ancient Rome and remained the usual way of writing numbers throughout Europe well into the Late Middle Ages. Numbers are written with combinations of letters from the Latin alphabet, ...
; they were not given independent ''kuten'' code points. Although individual companies that manufacture information systems can make an effort to represent these characters as customers may require by the composition of the characters, none has requested to have them added to the standard, instead choosing to proprietarily offer them as gaiji. In the fourth standard (1997), all these characters were explicitly defined as characters that accompany an advancement of the current position; that is to say, they are spacing characters. Furthermore, it was ruled that they should not be made by the composition of characters. For this reason, it became disallowed to represent Latin characters with
diacritic A diacritic (also diacritical mark, diacritical point, diacritical sign, or accent) is a glyph added to a letter or to a basic glyph. The term derives from the Ancient Greek (, "distinguishing"), from (, "to distinguish"). The word ''diacrit ...
s at all, with possibly the sole exception of the
ångström The angstromEntry "angstrom" in the Oxford online dictionary. Retrieved on 2019-03-02 from https://en.oxforddictionaries.com/definition/angstrom.Entry "angstrom" in the Merriam-Webster online dictionary. Retrieved on 2019-03-02 from https://www.m ...
symbol ( Å) at row 2 cell 82.


Hiragana and katakana

The
hiragana is a Japanese syllabary, part of the Japanese writing system, along with ''katakana'' as well as ''kanji''. It is a phonetic lettering system. The word ''hiragana'' literally means "flowing" or "simple" kana ("simple" originally as contras ...
and
katakana is a Japanese syllabary, one component of the Japanese writing system along with hiragana, kanji and in some cases the Latin script (known as rōmaji). The word ''katakana'' means "fragmentary kana", as the katakana characters are derived f ...
in JIS X 0208, unlike JIS X 0201, includes
dakuten The , colloquially , is a diacritic most often used in the Japanese kana syllabaries to indicate that the consonant of a syllable should be pronounced voiced, for instance, on sounds that have undergone rendaku (sequential voicing). The , ...
and handakuten markings as part of a character. The katakana and (both obsolete in modern Japanese) as well as the small , not in JIS X 0201, are also included. The arrangement of kana in JIS X 0208 is different from the arrangement of katakana in JIS X 0201. In JIS X 0201, the syllabary starts with , followed by the small kana sorted by ''
gojūon In the Japanese language, the is a traditional system ordering kana characters by their component phonemes, roughly analogous to alphabetical order. The "fifty" (''gojū'') in its name refers to the 5×10 grid in which the characters are disp ...
'' order, followed by the full-size kana, also in ''gojūon'' order (). On the other hand, in JIS X 0208, the kana are sorted first by ''gojūon'' order, then in the order of "small kana, full-size kana, kana with dakuten, and kana with handakuten" such that the same fundamental kana is grouped with its derivatives (). This ordering was chosen in order to more simply facilitate the sorting of kana-based dictionary look-ups (Yasuoka, 2006). As mentioned above, in this standard, the previously defined katakana order in JIS X 0201 was not followed in JIS X 0208. It is thought that the JIS X 0201 katakana being " half-width kana" arose due to the incompatibility with the katakana of this standard. This point is also one of the weaknesses of this standard.


Kanji

How the kanji in this standard were chosen from what sources, why they are split into level 1 and level 2, and how they are arranged are all explained in detail in the fourth standard (1997). Per that explanation, the kanji included in the following four kanji listings were reflected in the 6349 characters of the first standard (1978). * :The Information Processing Society of Japan kanji code committee compiled this list in 1971. In the below "Correspondence Analysis Results", this appears to be 6086 characters. * :Selected by the Administrative Management Agency of Japan in 1975, it consists of 2817 characters. For data for the purpose of selection, the Agency made a report which, starting with the "Kanji Listing for Standard Code (Tentative)", contrasted several kanji listings, the , or for short. * :One of the kanji listings that compose the "Correspondence Analysis Results", consisting of 3044 characters. It no longer exists. The original list was nonexistent for the original drafting committee; this kanji list was reflected in the standard to follow the "Correspondence Analysis Results". * :One of the kanji listings that compose the "Correspondence Analysis Results", consisting of 3251 characters. They are the kanji used in the list of all administrative place names compiled by the Japan Geographic Data Center, the . The original drafting committee did not investigate the listing itself; the kanji used from this list followed the "Correspondence Analysis Results". In the second and third standards, they added four and two characters to level 2, respectively, bringing the total kanji to 6355. Also, in the second standard, character forms were changed as well as transposition among the levels; in the third standard as well, character forms were changed. These are described further below.


Level partitioning

The 2,965 Level 1 kanji occupy rows 16 to 47. The 3,390 Level 2 kanji occupy rows 48 to 84. For level 1, characters common to multiple kanji glyph listings were chosen, using the tōyō kanji, the tōyō kanji correction draft, and the
jinmeiyō kanji are a set of 863 Chinese characters known as "name kanji" in English. They are a supplementary list of characters that can legally be used in registered personal names in Japan, despite not being in the official list of "commonly used character ...
as a basis. Also, JIS C 6260 ("To-Do-Fu-Ken (Prefecture) Identification Code"; currently JIS X 0401) and JIS C 6261 ("Identification code for cities, towns and villages"; currently JIS X 0402) were consulted; kanji for nearly all Japanese
prefectures A prefecture (from the Latin ''Praefectura'') is an administrative jurisdiction traditionally governed by an appointed prefect. This can be a regional or local government subdivision in various countries, or a subdivision in certain international ...
, cities, districts, wards, towns, villages, and so forth were intentionally placed in level 1. Furthermore, amendments by experts were added. Level 2 was dedicated to kanji that made an appearance in the aforementioned four major listings but were not selected for level 1. As noted below, the kanji of level 1 were ordered by their pronunciation, so among the kanji whose pronunciation were difficult to determine, there were those that were transferred from level 1 to level 2 on that basis (Nishimura, 1978). Due to these decisions, for the most part, level 1 contains more frequently used kanji, and level 2 contains more infrequently used kanji, but of course, those were judged by the standards of the day; over the passage of time, some level 2 kanji have become more frequently used, such as one meaning "to soar" () and one meaning "to glitter" (); and inversely, some level 1 kanji have become infrequent, notably the ones meaning "centimeter" () and "millimeter" (). Of the current
jōyō kanji The is the guide to kanji characters and their readings, announced officially by the Japanese Ministry of Education. Current ''jōyō kanji'' are those on a list of 2,136 characters issued in 2010. It is a slightly modified version of the '' t ...
, 30 fall into level 2, while three are missing altogether (塡󠄀, 剝󠄀 and 頰󠄀). Of the current
jinmeiyō kanji are a set of 863 Chinese characters known as "name kanji" in English. They are a supplementary list of characters that can legally be used in registered personal names in Japan, despite not being in the official list of "commonly used character ...
, 192 are in level 2, while 105 are not part of the standard.


Arrangement

The kanji in level 1 are sorted in order of each one's "representative reading" (i.e. a canonical reading chosen for the purposes of this standard only); the reading of a kanji for this may be an ''on'' or a ''kun'' reading; readings are sorted in
gojūon In the Japanese language, the is a traditional system ordering kana characters by their component phonemes, roughly analogous to alphabetical order. The "fifty" (''gojū'') in its name refers to the 5×10 grid in which the characters are disp ...
order. As a general rule, the ''on'' (Chinese-sound) reading is considered the representative reading; where a kanji has multiple ''on'' readings, the reading judged to be predominant in use frequency is used for the representative reading (JIS C 6226-1978 standard, Section 3.4). For the small percentage of kanji that either do not have an ''on'' reading or have an ''on'' reading which is little known and not in common use, the ''kun'' reading was employed as the representative reading. Where a verb ''kun'' reading must be used as the representative reading, the '' ren'yōkei'' (rather than the '' shūshikei'') form is used. For example, cells 1 to 41 on row 16 are 41 characters sorted as starting with a reading of '' a''. Within these, 22 characters, including 16-10 (: ''on'' reading "''ki''"; ''kun'' reading "''aoi''") and 16-32 (: ''on'' readings "''zoku''" and "''shoku''"; ''kun'' reading "''awa''") are there on the basis of their ''kun'' readings. 16-09 (: ''on'' reading "''hō''", ''kun'' reading "''a(i)''") and 16-23 (: ''on'' readings "''sō''" and "''kyū''", ''kun'' reading "''atsuka(i)''") are just two examples of ''ren'yōkei''-form verbs used for the representative reading. Where the representative reading is the same between different kanji, a kanji that uses an ''on'' reading is placed ahead of one that uses a ''kun'' reading. Where the ''on'' or ''kun'' readings are the same between more than one kanji, they are then ordered by their primary radical and
stroke A stroke is a disease, medical condition in which poor cerebral circulation, blood flow to the brain causes cell death. There are two main types of stroke: brain ischemia, ischemic, due to lack of blood flow, and intracranial hemorrhage, hemorr ...
count. Whether on level 1 or level 2, itaiji are arranged to directly follow their exemplar form. For example, in level 2, right after row 49 cell 88 (), the immediately following characters deviate from the general rule (stroke count in this case) to include three variants of 49-88 (, , and ). The kanji in level 2 are arranged in order of primary radical and stroke count. Where these two properties are the same for different kanji, they are then sorted by reading.


Kanji from unknown sources

It has been pointed out that there are kanji in the kanji set that are not found in comprehensive, unabridged kanji dictionaries, and that the sources thereof are unknown. For example, only one year after the first standard was established, Tajima (1979) reported that he had confirmed 63 kanji that were not to be found in ''Shinjigen'' (a large kanji dictionary published by
Kadokawa Shoten , formerly , is a Japanese publisher and division of Kadokawa Future Publishing based in Tokyo, Japan. It became an internal division of Kadokawa Corporation on October 1, 2013. Kadokawa publishes manga, light novels, manga anthology magazines ...
), nor in '' Dai Kan-Wa jiten'', and they did not make sense as
ryakuji In Japanese language, ''Ryakuji'' ( ja, 略字 "abbreviated characters", or ''hissha ryakuji'', meaning "handwritten abbreviated characters") are colloquial simplifications of kanji. Status Ryakuji are not covered in the Kanji Kentei, no ...
of any sort; he noted that it would be preferable for kanji not available in kanji dictionaries to be selected from definite sources. These kanji came to be known as or , among other names. The drafting committee for the fourth version of the standard also saw the existence of kanji with sources unknown as a problem, and so made an inquiry into just what kind of sources the drafting committee of the first version referenced. As a result, it was discovered that the original drafting committee had heavily relied on the "Correspondence Analysis Results" to collect kanji. When the drafting committee investigated the "Correspondence Analysis Results", it became clear that many of the kanji included in the kanji set but not found in exhaustive kanji dictionaries supposedly came from the "Japanese Personality Registration Name Kanji" and "Kanji for National Administrative District Listing" lists mentioned in the "Correspondence Analysis Results". It was confirmed that no original text for the "Japanese Personality Registration Name Kanji" referenced in the "Correspondence Analysis Results" exists. For the "National Administrative District Listing", Sasahara Hiroyuki of the fourth version's drafting committee examined the kanji that appeared on the in-progress development pages for the first standard. The committee also consulted many ancient writings, as well as many examples of personal names in a database of NTT phone books. Due to this thorough investigation, the committee was able to pare down the number of kanji for which the source cannot be confidently explained to twelve, shown on the adjacent table. Of these, it is conjectured that several glyphs came about due to copying errors. In particular, 妛 was probably created when printers tried to create 𡚴 by cutting and pasting 山 and 女 together. A shadow from that process was misinterpreted as a line, resulting in 妛 (a picture of this can be found in the ''Jōyō kanji jiten'').


Unification of kanji variants

According to the specifications in the fourth standard (1997), is the action of giving the same code point to a character without regard to its different character forms. In the fourth standard, the
glyph A glyph () is any kind of purposeful mark. In typography, a glyph is "the specific shape, design, or representation of a character". It is a particular graphical representation, in a particular typeface, of an element of written language. A g ...
s allowed are limited; the extent to which particular
allograph Allography, from the Greek for "other writing", has several meanings which all relate to how words and sounds are written down. Authorship An allograph may be the opposite of an autograph – i.e. a person's words or name ( signature) written b ...
ic glyphs are unified into a graphemic code point is clearly defined. Furthermore, according to the specifications in the standard, a is an abstract notion as to the graphical representation of a graphic character; a is the representation as a graphical shape that a glyph takes in actuality (e.g. due to a glyph being handwritten, printed, displayed on a screen, etc.). For a single glyph, there exist an endless range of possible concretely and/or visibly different character forms. A variation between a character form of one glyph is termed a . The extent to which a glyph is unified to one code point is determined according to that code point's and the that can be applied to that example glyph; that is, the example glyph for a code point applies to that code point, and any glyphs for which the parts that compose the example glyph are replaced in accordance with the unification criteria ''also'' apply to that code point. For example, the example glyph at 33-46 () is composed of
radical 9 Radical 9 or radical man () meaning "person" is a Kangxi radicals. Of the 214 radicals, Radical 9 is one of 23 which are composed of 2 strokes. When appearing at the left side of a Chinese character, it usually transforms into . In the ''Kangxi ...
() and the kanji that eventually spawned the '' so'' kana (). Also, in unification criterion 101, there are three kanji displayed: the first takes the form most often seen in Japanese (); the second contains a more traditional form () in which the first two strokes form
radical 12 Radical 12 or radical eight (), meaning '' eight'' or ''all'', is one of 23 of the 214 Kangxi radicals that are composed of two strokes. "八" is two bent lines that signal '' divide''. Eight is the single-digit number that can be divided by two t ...
(the kanji numeral for the number 8: ); and the third is like the second, except that radical 12 is inverted (). Consequently, all three permutations (, , ) all apply to the code point at line 33 cell 46. In the fourth standard, including one of the
errata An erratum or corrigendum (plurals: errata, corrigenda) (comes from la, errata corrige) is a correction of a published text. As a general rule, publishers issue an erratum for a production error (i.e., an error introduced during the publishing pro ...
for the first printing, there are 186 unification criteria. When a code point's example glyph is composed of more than one part glyph, unification criteria can be applied to each part. After a unification criterion is applied to one part glyph, that part cannot have any more unification criteria applied to it. Also, a unification criterion is not allowed to apply if the resulting glyph would coincide with that of another code point entirely. An example glyph is no more than an example for that code point; it is not a glyph "endorsed" by the standard. Also, the unification criteria need only be used for generally used kanji and for the purpose of assigning things to the code points of this standard. The standard requests that generally unused kanji not be created based on the example glyphs and unification criteria. The kanji of the kanji set are not chosen completely consistently according to the unification criteria. For example, although 41-7 corresponds to the form where the third and fourth strokes cross () as well as the form where they don't () according to unification criterion 72, 20-73 only corresponds to the form where they do not cross (), and 80-90 only corresponds to the form where they do (). The terms "unification", "unification criteria", and "example glyph" were adopted in the fourth standard. From the first to the third version, kanji and relations between kanji were grouped into three types: , , and ; it was explained that the characters recognized as equivalent "consolidate to just one point". "Equivalence" included, other than kanji with exactly the same shape, kanji with differences due to style, and kanji where the difference in character form is small. In the first standard, it was stipulated that "this standard ... does not establish the particulars of character forms" (Section 3.1); it also states that "the aim of this standard is to establish the general idea of characters and their codes; the design of their character forms and such lie outside its scope." In the second and third standards as well, notes to the effect that specific designs of character forms lie outside its scope (the note on item 1). The fourth standard also stipulates that "This standard regulates graphic characters as well as their bit patterns, and the use, specific designs of individual characters, and so forth are not within the scope of this standard" (JIS X 0208:1997, item 1).


Unification criteria for compatibility

In the fourth standard, is defined. Their application is limited to 29 code points whose glyphs vary greatly between the standards JIS C 6226-1983 on and after and JIS C 6226-1978. For those 29 code points, the glyphs from JIS C 6226-1983 on and after are displayed as "A", and the glyphs from JIS C 6226-1978 as "B". On each of them, both "A" and "B" glyphs may be applied. However, in order to claim compatibility with the standard, whether the "A" or "B" form has been used for each code point must be explicitly noted.


Character encodings


Encoding schemes stipulated by JIS X 0208

In JIS X 0208:1997, article 7 combined with appendices 1 and 2 define a total of eight encoding schemes. In the descriptions below, the "CL" (control left), "GL" (graphic left), "CR" (control right), and "GR" (graphic right) regions are respectively, in column/line notation, from 0/0 to 1/15, from 2/1 to 7/14, from 8/0 to 9/15, and from 10/1 to 15/14. For each code, 2/0 is assigned the graphic character "SPACE" and 7/15 the control character "DELETE". The C0 control characters (defined in
JIS X 0211 JIS X 0211, originally designated JIS C 6323 is a Japanese Industrial Standard defining C0 and C1 control codes and control sequences. It was first established in 1986, with subsequent editions in 1991 and 1994. It defines C0 and C1 control charac ...
and matching
ISO/IEC 6429 ISO/IEC JTC 1, entitled "Information technology", is a joint technical committee (JTC) of the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC). Its purpose is to develop, maintain and ...
) are assigned to the CL region. ;7-bit encoding for kanji :Stipulated in the standard itself. The JIS X 0208 double-byte set is assigned to the GL region. ;8-bit encoding for kanji :Stipulated in the standard itself. Same as the 7-bit encoding, but defined in terms of 8-bit bytes. The CR region may be unused, or encode the C1 control characters from JIS X 0211. The GR region is unused. ;International Reference Version + 7-bit encoding for kanji :Stipulated in the standard itself. The
shift in Shift Out (SO) and Shift In (SI) are ASCII control characters 14 and 15, respectively (0x0E and 0x0F). These are sometimes also called "Control-N" and "Control-O". The original meaning of those characters provided a way to shift a coloured ribbon ...
control character designates the
ISO/IEC 646 ISO/IEC 646 is a set of ISO/IEC standards, described as ''Information technology — ISO 7-bit coded character set for information interchange'' and developed in cooperation with ASCII at least since 1964. Since its first edition in ...
:1991 IRV (International Reference Version, equivalent to
US-ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of ...
) to the GL region.
Shift out Shift Out (SO) and Shift In (SI) are ASCII control characters 14 and 15, respectively (0x0E and 0x0F). These are sometimes also called "Control-N" and "Control-O". The original meaning of those characters provided a way to shift a coloured ribbon ...
designates the JIS X 0208 double-byte set to the same region. ;Latin characters + 7-bit encoding for kanji :Stipulated in the standard itself. As with IRV+7-bit, but with ISO/IEC 646:IRV replaced with ISO/IEC 646:JP (the Roman set of JIS X 0201). ;International Reference Version + 8-bit encoding for kanji :Stipulated in the standard itself. ISO/IEC 646:IRV is assigned to the GL region, JIS X 0208 to the GR region. This is effectively a subset of
EUC-JP Extended Unix Code (EUC) is a multibyte character encoding system used primarily for Japanese, Korean, and simplified Chinese. The most commonly used EUC codes are variable-length encodings with a character belonging to an compliant coded char ...
, excluding the half-width katakana from JIS X 0201 and the supplemental kanji from JIS X 0212. ;Latin characters + 8-bit encoding for kanji :Stipulated in the standard itself. As with IRV+8-bit, but with ISO/IEC 646:IRV replaced with ISO/IEC 646:JP. ;Shift-coded character set :Stipulated in Appendix 1: . The authoritative definition of
Shift JIS Shift JIS (Shift Japanese Industrial Standards, also SJIS, MIME name Shift_JIS, known as PCK in Solaris contexts) is a character encoding for the Japanese language, originally developed by a Japanese company called ASCII Corporation in conjuncti ...
. ;RFC 1468-coded character set :Stipulated in Appendix 2: . Resembles
ISO-2022-JP ISO/IEC 2022 ''Information technology—Character code structure and extension techniques'', is an ISO/IEC standard (equivalent to the ECMA standard ECMA-35, the ANSI standard ANSI X3.41 and the Japanese Industrial Standard JIS X 0202) in the ...
(which is authoritatively defined in RFC 1468) but is defined in terms of eight-bit bytes, whereas ISO-2022-JP is defined in terms of seven-bit bytes. Among the encodings stipulated in the fourth standard, only the "Shift" coded character set is registered by the
IANA The Internet Assigned Numbers Authority (IANA) is a standards organization that oversees global IP address allocation, autonomous system number allocation, root zone management in the Domain Name System (DNS), media types, and other Interne ...
. However, certain others are closely related to IANA-registered encodings defined elsewhere (EUC-JP and ISO-2022-JP).


Escape sequences for JIS X 0202 / ISO 2022

JIS X 0208 may be used within
ISO 2022 ISO/IEC 2022 ''Information technology—Character code structure and extension techniques'', is an ISO/IEC standard (equivalent to the ECMA standard ECMA-35, the ANSI standard ANSI X3.41 and the Japanese Industrial Standard JIS X 0202) in th ...
/JIS X 0202 (of which ISO-2022-JP is a subset). The
escape sequence In computer science, an escape sequence is a combination of characters that has a meaning other than the literal characters contained therein; it is marked by one or more preceding (and possibly terminating) characters. Examples * In C and ma ...
s to designate JIS X 0208 to each of the four ISO 2022 code sets are listed below. Here, "ESC" refers to the control character "
Escape Escape or Escaping may refer to: Computing * Escape character, in computing and telecommunication, a character which signifies that what follows takes an alternative interpretation ** Escape sequence, a series of characters used to trigger some s ...
" (0x1B, or 1/11). The escape sequence starting ESC 2/4 selects a multi-byte character set. The escape sequence starting ESC 2/6 specifies a revision of the upcoming character set selection. JIS C 6226:1978 is identified by the multibyte-94-set identifier byte 4/0 (corresponding to ASCII @). JIS C 6226:1983 / JIS X 0208:1983 is identified by the multibyte-94-set identifier byte 4/2 (B). JIS X 0208:1990 is also identified by the 94-set identifier byte 4/2, but can be distinguished with the revision identifier 4/0 (@).


Duplicate encodings of ASCII and JIS X 0201

When using the kanji set of this standard with either the ISO/IEC 646:1991 IRV graphic character set (
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
) or JIS X 0201's graphic character set for Latin characters ( JIS-Roman), the treatment of the characters common to both sets becomes problematic. Unless one takes special measures, the characters included in both sets do not all map to each other one-to-one, and a single character may be given more than one code point; that is, it may cause a duplicate encoding. JIS X 0208:1997, in regards to when a character is common to both sets, basically forbids the use of the code point in the kanji set (which is one of two code points), eliminating duplicate encodings. It is judged that characters that have the same name are the same character. For example, both the name of the character corresponding to the bit pattern 4/1 in ASCII and the name of the character corresponding to row 3 cell 33 of the kanji set are "LATIN CAPITAL LETTER A". In International Reference Version + 8-bit code for kanji, whether by the bit pattern 4/1 or by the bit pattern corresponding to the kanji set's row 3 cell 33 (10/3 12/1), the letter " A" (i.e. "LATIN CAPITAL LETTER A") is represented. The standard forbids the use of the "10/3 12/1" bit pattern, in an attempt to eliminate the duplicate encoding. In consideration to implementations that treat the characters of the code points in the kanji set as " full-width characters" and those of ASCII or JIS-Roman as different characters, the use of the kanji set code points is permitted only for the sake of backwards compatibility. For example, for the purpose of backwards compatibility, it is permitted to consider 10/3 12/1 in International Reference Version + 8-bit code for kanji to correspond to a full-width "A". If the kanji set is used along with ASCII or JIS-Roman, then even if the standard is abided by strictly, the unique encoding of a character is not guaranteed. For example, in the International Reference Version + 8-bit code for kanji, it is valid to represent a
hyphen The hyphen is a punctuation mark used to join words and to separate syllables of a single word. The use of hyphens is called hyphenation. ''Son-in-law'' is an example of a hyphenated word. The hyphen is sometimes confused with dashes ( figure ...
with the bit pattern 2/13 for the character "HYPHEN-MINUS", as well as with the kanji set's row 1 cell 30 (bit pattern 10/1 11/14) for the character "HYPHEN". In addition, the standard does not define which of the two to use for what, and so the hyphen is not given one unique encoding. The same problem affects the
minus sign The plus and minus signs, and , are mathematical symbols used to represent the notions of positive and negative, respectively. In addition, represents the operation of addition, which results in a sum, while represents subtraction, resul ...
, the
quotation mark Quotation marks (also known as quotes, quote marks, speech marks, inverted commas, or talking marks) are punctuation marks used in pairs in various writing systems to set off direct speech, a quotation, or a phrase. The pair consists of an ...
s, and so forth. Moreover, even if the kanji set is used as a separate code, there is no guarantee that the unique encoding of characters is implemented. In many cases, however, the full-width "
IDEOGRAPHIC SPACE In computer programming, whitespace is any character or series of characters that represent horizontal or vertical space in typography. When rendered, a whitespace character does not correspond to a visible mark, but typically does occupy an area ...
" at row 1 cell 1 and the half-width space (2/0) coexist. How the two should be different is not self-explanatory, and is not specified in the standard.


Comparison of encoding schemes used in practice


History

Until five years have passed after a Japanese Industrial Standard has been established, reaffirmed, or revised, the prior standard undergoes a process of reaffirmation, revision, or withdrawal. Since establishment, the standard has been subject to revision three times, and at present, the fourth standard is valid.


First standard

The first standard is JIS C 6226-1978 , established by the Japanese Minister of International Trade and Industry on 1 January 1978. It is also called 78JIS for short. Entrusted by the Agency of Industrial Science and Technology, a JIPDEC kanji code standardization research and study committee produced the draft. The committee chairman was Moriguchi Shigeichi. The code included 453 non-Kanji (including Hiragana, Katakana, the Roman, Greek and Cyrillic alphabets and punctuation) and 6349 Kanji (2965 level 1 Kanji and 3384 level 2 Kanji) for a total of 6802 characters. It did not yet include
box-drawing characters Box-drawing characters, also known as line-drawing characters, are a form of semigraphics widely used in text user interfaces to draw various geometric frames and boxes. Box-drawing characters typically only work well with monospaced fonts. ...
. The standard itself was set in Shaken Co., Ltd's Ishii Mincho typeface.


Second standard

The second standard JIS C 6226-1983 revised the first standard on 1 September 1983. It is also called 83JIS. Entrusted by the AIST, a JIPDEC kanji code-related JIS committee produced the draft. The committee chairman was Motooka Tōru. The draft of the second standard was based on the consideration of factors such as the promulgation of the
jōyō kanji The is the guide to kanji characters and their readings, announced officially by the Japanese Ministry of Education. Current ''jōyō kanji'' are those on a list of 2,136 characters issued in 2010. It is a slightly modified version of the '' t ...
, the enforcement of the
jinmeiyō kanji are a set of 863 Chinese characters known as "name kanji" in English. They are a supplementary list of characters that can legally be used in registered personal names in Japan, despite not being in the official list of "commonly used character ...
, and the standardization of Japanese-language Teletex by the Ministry of Posts and Telecommunications; also, the next modification was performed to keep pace with JIS C 6234-1983 (24-pixel matrix printer character forms; presently JIS X 9052). ;Addition of special characters :39 characters were added to the special characters. Among these 39, per JICST recommendations, and from such standards as JIS Z 8201-1981 (mathematical symbols) and JIS Z 8202-1982 (quantity, unit, and chemical symbols), things that could not be represented by composition were chosen. ;Newly added box-drawing characters :32
box-drawing characters Box-drawing characters, also known as line-drawing characters, are a form of semigraphics widely used in text user interfaces to draw various geometric frames and boxes. Box-drawing characters typically only work well with monospaced fonts. ...
were added. ;Swapping of ''itaiji'' code points :Code points for 22 variant pairs of Kanji were swapped, such that the variant in level 2 was moved to level 1 and vice versa. For example, (level 1's) row 36 cell 59 in the first standard () was moved to (level 2's) row 52 cell 68; the point originally at row 52 cell 68 () was in turn moved to row 36 cell 59. ;Additions to the level 2 kanji :Three characters from level 1 and one character from level 2 were given new code points at previously unassigned code points in row 84 as level 2 kanji. ''Itaiji'' for each of those code points were newly assigned to their original locations. For example, row 84 cell 1 in the second standard () was moved there to accommodate a different form not included in the first standard at row 22 cell 38 as a level 1 kanji (). ;Modification of character forms :The character forms of approximately 300 kanji were amended. Among the changes in those 300 or so kanji character forms, many level 1 glyphs that were in the style of the
Kangxi Dictionary The ''Kangxi Dictionary'' ( (Compendium of standard characters from the Kangxi period), published in 1716, was the most authoritative dictionary of Chinese characters from the 18th century through the early 20th. The Kangxi Emperor of the Qing ...
were changed into variants, and especially more simplified forms (e.g.
ryakuji In Japanese language, ''Ryakuji'' ( ja, 略字 "abbreviated characters", or ''hissha ryakuji'', meaning "handwritten abbreviated characters") are colloquial simplifications of kanji. Status Ryakuji are not covered in the Kanji Kentei, no ...
and extended shinjitai). For example, a couple of code points that are often the subject of criticism due to being greatly changed are row 18 cell 10 (78JIS: , 83JIS: ) and row 38 cell 34 (78JIS: , 83JIS: ). There were many smaller changes away from the Kangxi-style variants; for example, row 25 cell 84 () lost part of a stroke. Also, where some glyphs for level 1 kanji were not Kangxi-style forms, there were some changed into their Kangxi-style forms; for example, row 80 cell 49 () gained part of a stroke (i.e., the same part of the stroke that 25-84 lost). In order to elucidate the original intent of the first standard, these ended up falling into parameters for unification criteria in the fourth standard. The difference in form for the examples noted above ("" and "") falls under the parameters for unification criterion 42 (concerning the component ""). The bulk of the changes to character forms are differences between level 1 and level 2 kanji. Specifically, simplification was done more often for level 1 kanji than for level 2 kanji; simplifications applied to level 1 kanji (e.g. "" to "" and "" to "") were not generally applied to kanji in level 2 ("" stayed as-is). The aforementioned 25-84 () and 80-49 () were given different treatment likewise, as the former is in level 1 and the latter is in level 2. Even so, there were some changes regardless of the level; for instance characters containing the "door" () and "winter" () components were changed with no different treatment between level 1 and level 2 kanji. However, for 29 code points (such as the problematic 18-10 and 38-34 mentioned above), the forms inherited by the fourth standard contradicts the original intent of the first. For these, there are special unification criteria to maintain compatibility with the previous standards at these code points. When the new "X" category for Japanese Industrial Standards (for information-related fields) was introduced, the second standard was re-termed JIS X 0208-1983 on 1 March 1987.


Third standard

The third standard JIS X 0208-1990 revised the second standard on 1 September 1990. It is also called 90JIS for short. Entrusted by the AIST, a committee at the
Japanese Standards Association is the Japanese industrial standard development organization. JSA promotes standardization and management system in Japan through the following activities: * Development of national standards ( JIS) * Support of international standardization act ...
for the revision of JIS X 0208 created the draft. The committee chairman was Tajima Kazuo. 225 kanji glyphs were changed, and two characters were added to level 2 (84-05 "" and 84-06 ""). This was a disunification of ''itaiji'' for two characters already included (49-59 "" and 63-70 ""). Some of the changes and the two additions corresponded to the 118
jinmeiyō kanji are a set of 863 Chinese characters known as "name kanji" in English. They are a supplementary list of characters that can legally be used in registered personal names in Japan, despite not being in the official list of "commonly used character ...
added in March 1990. The standard itself was set in Heisei Mincho.


Fourth standard

The fourth standard JIS X 0208:1997 revised the third standard on 20 January 1997. It is also called 97JIS for short. Entrusted by the AIST, a JSA committee for research and study of coded character sets produced the draft. The committee chairman was Shibano Kōji. The basic policies of this revision were to perform no changes the character set, to clarify ambiguous provisions, and to make the standard relatively easier to use. Addition, removal, and code point rearrangement were not done, and without exception, the example glyphs were also left unchanged. However, the stipulations of the standard were completely re-written and/or supplemented. Whereas the third standard was 65 pages long without the explanations, the fourth standard was 374 pages without the explanations. The main points of the revision are: ;Definition of encoding methods :Until the third standard, only the encoding method based on JIS X 0202 code extension was defined. This is something unusual as far as coded character sets go. In the fourth standard, encoding methods that do not use escape sequences for the purpose of code extension were defined. ;Definition of the general prohibition of the use of unassigned code points and methods of usage for unassigned code points :The third standard, in an explanation that was not part of the standard, described things as if there were places where for some unassigned code points, it was acceptable to assign gaiji. In the fourth standard, it was clarified that use of unassigned code points is generally prohibited. Also, the conditions for the usage of unassigned code points were specified. ;General elimination of duplicate encodings :Each character was given a "character name" that maps to those of other standards. Also, encoding methods to use them together with the ISO/IEC 646's International Reference Version or JIS X 0201 were specified. When JIS X 0208 is used together with either, among two assigned code points for characters with the same name, only one is permitted; thus, duplicate encodings were generally eliminated. ;Investigation into sources of kanji :Characters included in the standard so far that are found in neither the
Kangxi Dictionary The ''Kangxi Dictionary'' ( (Compendium of standard characters from the Kangxi period), published in 1716, was the most authoritative dictionary of Chinese characters from the 18th century through the early 20th. The Kangxi Emperor of the Qing ...
nor the Dai Kanwa Jiten were identified. Accordingly, exactly with what purpose for inclusion and from which sources these kanji came during compilation of the first standard was investigated. ;Definition of kanji unification criteria :Based on things such as the materials for the drafting of the first standard, an attempt was made to restore the intent of the first standard for the scope of the glyphs each code point represents. Moreover, the criteria for unifying kanji glyphs were clearly defined. ;Inclusion of de facto standards :By the time of the fourth standard, the encoding methods
Shift JIS Shift JIS (Shift Japanese Industrial Standards, also SJIS, MIME name Shift_JIS, known as PCK in Solaris contexts) is a character encoding for the Japanese language, originally developed by a Japanese company called ASCII Corporation in conjuncti ...
and
ISO-2022-JP ISO/IEC 2022 ''Information technology—Character code structure and extension techniques'', is an ISO/IEC standard (equivalent to the ECMA standard ECMA-35, the ANSI standard ANSI X3.41 and the Japanese Industrial Standard JIS X 0202) in the ...
had become
de facto standard A ''de facto'' standard is a custom or convention that has achieved a dominant position by public acceptance or market forces (for example, by early entrance to the market). is a Latin phrase (literally " in fact"), here meaning "in practice b ...
s for personal computing and e-mail, respectively. These encoding methods were included as "Shift-Coded Representation" and "RFC 1468-Coded Representation" (described above).


Successors

JIS X 0213 ( extended kanji) was designed "with the goal being to offer a sufficient character set for the purposes of encoding the modern Japanese language that JIS X 0208 intended to be from the start";Original Japanese: it defines a character set that expands upon the kanji set of JIS X 0208. The drafters of JIS X 0213 recommend migration from JIS X 0208 to JIS X 0213, among the advantages being JIS X 0213's compatibility with the Hyōgai Kanji Glyph List and with newer
jinmeiyō kanji are a set of 863 Chinese characters known as "name kanji" in English. They are a supplementary list of characters that can legally be used in registered personal names in Japan, despite not being in the official list of "commonly used character ...
. Contrary to the expectations of the drafters, adoption of JIS X 0213 has been anything but fast since its enactment in the year 2000. The drafting committee of JIS X 0213:2004 wrote (in the year 2004), "The status where 'what the majority of information systems can use in common is JIS X 0208 only' still continues." (JIS X 0213:2000, Appendix 1:2004, section 2.9.7) For
Microsoft Windows Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for ...
, the predominant
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common daemon (computing), services for computer programs. Time-sharing operating systems scheduler (computing), schedule tasks for ef ...
(and hence supplying the predominant
desktop environment In computing, a desktop environment (DE) is an implementation of the desktop metaphor made of a bundle of programs running on top of a computer operating system that share a common graphical user interface (GUI), sometimes described as a grap ...
) in the personal computing sector, the JIS X 0213
repertoire A repertoire () is a list or set of dramas, operas, musical compositions or roles which a company or person is prepared to perform. Musicians often have a musical repertoire. The first known use of the word ''repertoire'' was in 1847. It is a ...
has been included since
Windows Vista Windows Vista is a major release of the Windows NT operating system developed by Microsoft. It was the direct successor to Windows XP, which was released five years before, at the time being the longest time span between successive releases of ...
, released in November 2006.
Mac OS X macOS (; previously OS X and originally Mac OS X) is a Unix operating system developed and marketed by Apple Inc. since 2001. It is the primary operating system for Apple's Mac computers. Within the market of desktop and lap ...
has been compatible with JIS X 0213 since version 10.1 (released in 2001). Many
Unix-like A Unix-like (sometimes referred to as UN*X or *nix) operating system is one that behaves in a manner similar to a Unix system, although not necessarily conforming to or being certified to any version of the Single UNIX Specification. A Unix-li ...
s such as
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, whi ...
can (optionally) support JIS X 0213 if desired. Therefore, it is thought that with time, JIS X 0213 support on personal computers will not be an impediment to its eventual adoption. Among the drafters of JIS X 0213, there are those who expect to see a mix of JIS X 0208 and JIS X 0213 before any adoption of JIS X 0213 (Satō, 2004). However, JIS X 0208 continues to be used for the present, and many predict it to endure as a standard. There are barriers that need to be overcome if JIS X 0213 is to supplant JIS X 0208 in common usage: * The character repertoires utilized in Japanese
mobile phone A mobile phone, cellular phone, cell phone, cellphone, handphone, hand phone or pocket phone, sometimes shortened to simply mobile, cell, or just phone, is a portable telephone that can make and receive calls over a radio frequency link whi ...
s at the present time are based on JIS X 0208. There are no officially announced plans whatsoever to migrate these to JIS X 0213 compatibility. As mobile phones are now a pervasive aspect of Japanese textual communication (see Japanese mobile phone culture), being a widespread, commonly accessed medium for sending
e-mail Electronic mail (email or e-mail) is a method of exchanging messages ("mail") between people using electronic devices. Email was thus conceived as the electronic (digital) version of, or counterpart to, mail, at a time when "mail" meant ...
and accessing the
World Wide Web The World Wide Web (WWW), commonly known as the Web, is an information system enabling documents and other web resources to be accessed over the Internet. Documents and downloadable media are made available to the network through web ...
, a lack of adoption for mobile phones deters usage elsewhere. * JIS X 0213 is not strictly upward-compatible with JIS X 0208 in terms of unification criteria (see
below Below may refer to: *Earth * Ground (disambiguation) *Soil *Floor * Bottom (disambiguation) *Less than *Temperatures below freezing *Hell or underworld People with the surname *Ernst von Below (1863–1955), German World War I general *Fred Below ...
). For large-scale archives (e.g.
bibliographic database A bibliographic database is a database of bibliographic records, an organized digital collection of references to published literature, including journal and newspaper articles, conference proceedings, reports, government and legal publications, ...
s and
Aozora Bunko Aozora Bunko (, literally the "Blue Sky Library", also known as the "Open Air Library") is a Japanese digital library. This online collection encompasses several thousands of works of Japanese-language fiction and non-fiction. These include out-o ...
) that use JIS X 0208 and follow its unification criteria strictly, it is thought that it would be extremely difficult work to both convert all the data to JIS X 0213 and preserve the same standard of textual integrity. * In practice, many systems define and use unassigned code points in JIS X 0208. For example, Windows assigns IBM and NEC extended characters and user-defined character areas (see Windows-932), and mobile phones assign
emoji An emoji ( ; plural emoji or emojis) is a pictogram, logogram, ideogram or smiley embedded in text and used in electronic messages and web pages. The primary function of emoji is to fill in emotional cues otherwise missing from typed conv ...
in some such places. The code points of these ''gaiji'' conflict with the code points that JIS X 0213 codes use, so there would be some difficulty in migrating these systems from JIS X 0208 to JIS X 0213. There are also plans to migrate to UCS/
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, ...
and use the JIS X 0213 repertoire from there, but until a system administrator is able to judge that the implementations of UCS/Unicode surrogate pairs and character compositions are sufficiently stable, he or she is likely to hesitate to use the repertoire of JIS X 0213 that requires those implementations. * The improvements provided by JIS X 0213 are mostly in the realm of characters that are not used as often as the ones already present in JIS X 0208. Because there are nearly twice as many glyphs that need to be implemented for less usage of those extra glyphs, it can be a low return on investment in many cases, especially where resources are constrained.


Implementations

Because JIS X 0208 / JIS C 6226 is primarily a
character set Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers. The numerical values tha ...
and not a strictly defined
character encoding Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers. The numerical values tha ...
, several companies have implemented their own encodings of the character set. * Apple Computer Inc.:
MacJapanese Shift JIS (Shift Japanese Industrial Standards, also SJIS, MIME name Shift_JIS, known as PCK in Oracle Solaris, Solaris contexts) is a character encoding for the Japanese language, originally developed by a Japanese company called ASCII Corporati ...
(Shift_JIS based) *
Fujitsu is a Japanese multinational information and communications technology equipment and services corporation, established in 1935 and headquartered in Tokyo. Fujitsu is the world's sixth-largest IT services provider by annual revenue, and the la ...
:
JEF kanji code Several mutually incompatible versions of the Extended Binary Coded Decimal Interchange Code (EBCDIC) have been used to represent the Japanese language on computers, including variants defined by Hitachi, Fujitsu, IBM and others. Some are variabl ...
(EBCDIC based) * Hitachi Ltd.:
KEIS Qays ʿAylān ( ar, قيس عيلان), often referred to simply as Qays (''Kais'' or ''Ḳays'') were an Arab tribal confederation that branched from the Mudar group. The tribe does not appear to have functioned as a unit in the pre-Islamic er ...
(EBCDIC based) * IBM: various, including IBM-932 and IBM-942 (both Shift_JIS based) *
Microsoft Microsoft Corporation is an American multinational technology corporation producing computer software, consumer electronics, personal computers, and related services headquartered at the Microsoft Redmond campus located in Redmond, Washi ...
: Windows-932 (Shift_JIS based) *
NEC is a Japanese multinational information technology and electronics corporation, headquartered in Minato, Tokyo. The company was known as the Nippon Electric Company, Limited, before rebranding in 1983 as NEC. It provides IT and network soluti ...
: JIPS Several of these incorporate vendor-specific character assignments in place of unallocated regions of the standard. These include Windows-932 and MacJapanese, as well as
NEC is a Japanese multinational information technology and electronics corporation, headquartered in Minato, Tokyo. The company was known as the Nippon Electric Company, Limited, before rebranding in 1983 as NEC. It provides IT and network soluti ...
's PC98 character encoding. While IBM-932 and IBM-942 also include vendor assignments, they include them outside of the region used for JIS X 0208.


Relation to other standards


ISO/IEC 646 IRV and ASCII

As noted above, the kanji set is not upwardly compatible with the ISO/IEC 646:1991 IRV (ASCII) graphic character set. The kanji set and the IRV graphic character set can be used together as specified in JIS X 0208 (IRV + 7-bit code for kanji and IRV + 8-bit code for kanji). They can be used together in
EUC-JP Extended Unix Code (EUC) is a multibyte character encoding system used primarily for Japanese, Korean, and simplified Chinese. The most commonly used EUC codes are variable-length encodings with a character belonging to an compliant coded char ...
as well.


JIS X 0201

The kanji set lacks three characters included in JIS X 0201's graphic character set for Latin characters: 2/2 (QUOTATION MARK), 2/7 (APOSTROPHE), and 2/13 (HYPHEN-MINUS). The kanji set contains all character included in JIS X 0201's graphic character set for katakana. The kanji set and the graphic character set for Latin characters can be used together as specified in JIS X 0208 (Latin characters + 7-bit code for kanji and the Latin characters + 8-bit code for kanji). The kanji set, graphic character set for Latin characters, and JIS X 0201's graphic character set for katakana can be used together as specified in JIS X 0208 (the shift-coded character set; i.e.
Shift JIS Shift JIS (Shift Japanese Industrial Standards, also SJIS, MIME name Shift_JIS, known as PCK in Solaris contexts) is a character encoding for the Japanese language, originally developed by a Japanese company called ASCII Corporation in conjuncti ...
). The kanji set and graphic character set for katakana can be used together in
EUC-JP Extended Unix Code (EUC) is a multibyte character encoding system used primarily for Japanese, Korean, and simplified Chinese. The most commonly used EUC codes are variable-length encodings with a character belonging to an compliant coded char ...
.


JIS X 0212

JIS X 0212 (supplementary kanji) defines additional characters with code points for the purposes of information processing that requires characters not found in JIS X 0208. Rather than allocating characters within the main JIS X 0208 kanji set, it defines a second 94-by-94 kanji set containing supplementary characters. JIS X 0212 can be used with JIS X 0208 in
EUC-JP Extended Unix Code (EUC) is a multibyte character encoding system used primarily for Japanese, Korean, and simplified Chinese. The most commonly used EUC codes are variable-length encodings with a character belonging to an compliant coded char ...
. Also, JIS X 0208 and JIS X 0212 are both source standards for UCS/Unicode's
Han unification Han unification is an effort by the authors of Unicode and the Universal Character Set to map multiple character sets of the Han characters of the so-called CJK languages into a single set of unified characters. Han characters are a featur ...
, meaning that kanji from both sets can be included in one Unicode-format document. Among the code points that the second version of JIS X 0208 changed, 28 code points in JIS X 0212 reflect the character forms from before the changes. Also, JIS X 0212 reassigns the " closure mark" that JIS X 0208 had assigned as a non-kanji (, at row 1 cell 26) as a kanji (, at row 16 cell 17). JIS X 0212 has no characters in common with JIS X 0208 other than these. Hence, it is not suited for general use on its own. However, in the fourth version of JIS X 0208, the connection to JIS X 0212 was not defined at all. It is believed that this is because the drafting committee of the fourth JIS X 0208 standard had a critical opinion of the selection and identification methods of JIS X 0212. The character meanings and selection rationales were not properly documented, making it difficult to identify whether desired kanji corresponded to those in its repertoire. The text of the fourth standard, as well as pointing out the problematic points of the character selection of JIS X 0212, states that "it is thought that not only is character selection impossible, it is also impossible to use together; the connection to JIS X 0212 is not defined at all." (section 3.3.1)


JIS X 0213

JIS X 0213 (extension kanji) defines a kanji set that expands upon the kanji set of JIS X 0208. According to this standard, it is "designed with the goal being to offer a sufficient character set for the purposes of encoding the modern Japanese language that JIS X 0208 intended to be from the start." The kanji set of JIS X 0213 incorporates all characters that can be represented in the kanji set of JIS X 0208, with many additions. In total, JIS X 0213 defines 1183 non-kanji and 10,050 kanji (for a total of 11,233 characters), within two 94-by-94 . The first plane (non-kanji and level 1–3 kanji) is based on JIS X 0208, whereas the second plane (level 4 kanji) is designed to fit within the unallocated rows of JIS X 0212, allowing use in
EUC-JP Extended Unix Code (EUC) is a multibyte character encoding system used primarily for Japanese, Korean, and simplified Chinese. The most commonly used EUC codes are variable-length encodings with a character belonging to an compliant coded char ...
. JIS X 0213 also defines Shift_JISx0213, a variant of Shift_JIS capable of encoding the entirety of JIS X 0213. For most intents and purposes, JIS X 0213 plane 1 is a superset of JIS X 0208. However, different unification criteria are applied to some code points in JIS X 0213 compared to JIS X 0208. Consequently, some pairs of kanji glyphs that were represented by one JIS X 0208 code point, due to being unified, are given separate code points in JIS X 0213. For example, the glyph at row 33 cell 46 of JIS X 0208 ("", described above) unifies a few variants due to its right-hand component. In JIS X 0213, two forms (the ones containing the component "") are unified on plane 1 row 33 cell 46, and the other (containing the component "") is located at plane 1 row 14 cell 41. Therefore, whether JIS X 0208 row 33 cell 46 should be mapped to JIS X 0213 plane 1 row 33 cell 46 or plane 1 row 14 cell 41 cannot be determined automatically. This limits the extent to which JIS X 0213 can be considered upwardly compatible with JIS X 0208, as admitted by the JIS X 0213 drafting committee.JIS X 0213:2000 section 5.3.2, JIS X 0213:2000 Appendix 1:2004 section 3.2.2 However, for the most part, row ''m'' cell ''n'' in JIS X 0208 corresponds to plane 1 row ''m'' cell ''n'' in JIS X 0213; therefore, not much confusion arises in practice. This is because most typefaces have come to use the glyphs exemplified in JIS X 0208, and most users are not consciously aware of the unification criteria.


ISO/IEC 10646 and Unicode

The kanji set of JIS X 0208 is among the original source standards for the
Han unification Han unification is an effort by the authors of Unicode and the Universal Character Set to map multiple character sets of the Han characters of the so-called CJK languages into a single set of unified characters. Han characters are a featur ...
in ISO/IEC 10646 (UCS) and
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, ...
. Every kanji in JIS X 0208 corresponds to its own code point in UCS/Unicode's
Basic Multilingual Plane In the Unicode standard, a plane is a continuous group of 65,536 (216) code points. There are 17 planes, identified by the numbers 0 to 16, which corresponds with the possible values 00–1016 of the first two positions in six position hexadeci ...
(BMP). The non-kanji in JIS X 0208 also correspond to their own code points in the BMP. However, for some special characters, some systems implement a different correspondences from those of UCS/Unicode's (which are based on the character names given JIS X 0208:1997).


Footnotes


Explanatory


Reference footnotes


See also

* JIS coded character sets ** JIS X 0201 "7-bit and 8-bit coded character sets for information interchange" ** JIS X 0202 "Information technology – Character code structure and extension techniques" (
ISO/IEC 2022 ISO/IEC 2022 ''Information technology—Character code structure and extension techniques'', is an ISO/IEC standard (equivalent to the ECMA standard ECMA-35, the ANSI standard ANSI X3.41 and the Japanese Industrial Standard JIS X 0202) in the ...
) ** JIS X 0208 "7-bit and 8-bit double byte coded KANJI sets for information interchange" ** JIS X 0211 "Control functions for coded character sets" (
ISO/IEC 6429 ISO/IEC JTC 1, entitled "Information technology", is a joint technical committee (JTC) of the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC). Its purpose is to develop, maintain and ...
) ** JIS X 0212 "Code of the supplementary Japanese graphic character set for information interchange" ** JIS X 0213 "7-bit and 8-bit double byte coded extended KANJI sets for information interchange" ** JIS X 0221 "Universal Multiple-Octet Coded Character Set (UCS)" ( ISO/IEC 10646) * Extended shinjitai * Help:Japanese


References

''For the purposes of citation, these
Japanese name in modern times consist of a family name (surname) followed by a given name, in that order. Nevertheless, when a Japanese name is written in the Roman alphabet, ever since the Meiji era, the official policy has been to cater to Western expec ...
s are presented as if they were in Western order where Romanized, and retain Eastern order where not.'' * Nishimura, Hirohiko [], 1978. The Kanji JIS []. ''Standardization Journal'' [], 171: 3–8. * Nomura, Masaaki [], 1984. Revision of JIS C 6226: Kanji codes for information interchange []. ''Standardization Journal'' [], 14 (3): 4–9. * Ogata, Katsuhiro [], 2006a
Things that were not unified in 97JIS among the example glyphs changed in JIS C 6226-1983 (83JIS) []
(accessed 29 January 2007). * Ogata, Katsuhiro [], 2006b
Things that fell within the scope of unification among the example glyphs changed in JIS C 6226-1983 (83JIS) []
(accessed 29 January 2007). * Satō, Takayuki [], 2004. Concerning the revision of JIS X 0213 (7-bit and 8-bit double byte coded extended Kanji sets for information interchange) []. ''Standardization Journal'' [], 34 (4): 8–12. * Shibano, Kōji [], 1997a. Concerning the revision of JIS X 0208 (7-bit and 8-bit double byte coded Kanji sets for information interchange ) []. ''Standardization Journal'' [], 27 (3): 8–12. * Shibano, Kōji [], 1997b. Plan for the extension of the JIS kanji []. ''Standardization Journal'' [], 27 (7): 5–11. * Shibano, Kōji [], 2000. Establishment of JIS X 0213 (7-bit and 8-bit double byte coded extended Kanji sets for information interchange) []. ''Standardization Journal'' [], 30 (3): 3–7. * Shibano, Kōji [], 2001. Concerning JIS kanji []. ''Standardization and Quality Control'' [], 54 (8): 44–50. * Shibano, Kōji [] (editor), 2002. ''JIS Kanji Dictionary, enlarged and revised edition'' []. Tokyo: Japanese Standards Association (). * Shibano, Kōji [], 2002
The development of kanji and Japanese language processing technologies: the standardization of kanji codes []
''IPSJ Magazine'' [], 43 (12): 1362–1367 * Tajima, Kazuo [], 1979. Problems concerning the use of the JIS kanji listing: design and handling of kanji in kanji processing systems []. ''Journal of Information Processing Society of Japan'' [], 21 (10): 753–761. * Uchida, Tomio [], 1990. Establishment of JIS X 0212 (Kanji Codes for Information Interchange – Supplemental Kanji) []. ''Standardization Journal'' [], 20 (11): 6–11. * Yasuoka, Kōichi [], 2001a. Situation of the Newest Character Codes in Japan (former part) []. ''Systems, Control and Information'' [], 45 (9): 528–535. * Yasuoka, Kōichi [], 2001b. Situation of the Newest Character Codes in Japan (latter part) []. ''Systems, Control and Information'' [], 45 (12): 687–694. * Yasuoka, Kōichi [], 200
"Differences between the JIS kanji plan (1976) and JIS C 6226-1978" []
at the 17th "Computer Usage for Oriental Studies" [] research seminar. 3–51. * Yasuoka, Kōichi [] & Motoko Yasuoka [], 2006. ''The History of Character Codes: Europe, America, and Japan'' []. Tokyo: Kyōritsu Shuppan ().


External links


The International Register
that the IPSJ/ITSCJ supervises.
Japanese Character Set JIS C 6226-1978

Japanese Character Set JIS C 6226-1983

Update Registration 87 Japanese Graphic Character Set for Information Interchange
*

(the latest standard may be read here). *
Japanese Standards Association database search
(a copy of the latest standard may be purchased here). *
Unification-related provisions in the JIS X 0208 and 0213 standards
*

{{DEFAULTSORT:Jis X 0208 Character sets Encodings of Japanese JIS standards Computer-related introductions in 1978