HOME

TheInfoList



OR:

JIS X 0201, a
Japanese Industrial Standard are the standards used for industrial activities in Japan, coordinated by the Japanese Industrial Standards Committee (JISC) and published by the Japanese Standards Association (JSA). The JISC is composed of many nationwide committees and play ...
developed in 1969, was the first Japanese electronic
character set Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using computers. The numerical values that make up a c ...
to become widely used. The character set was initially known as JIS C 6220 before the JIS category reform. Its two forms were a 7-bit encoding or an 8-bit encoding, although the 8-bit form was dominant until
Unicode Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
(specifically
UTF-8 UTF-8 is a character encoding standard used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode Transformation Format 8-bit''. Almost every webpage is transmitted as UTF-8. UTF-8 supports all 1,112,0 ...
) replaced it. The full name of this standard is ''7-bit and 8-bit coded character sets for information interchange'' (). The first 96 codes comprise an
ISO 646 ISO/IEC 646 ''Information technology — ISO 7-bit coded character set for information interchange'', is an International Organization for Standardization, ISO/International Electrotechnical Commission, IEC standard in the ...
variant, mostly following
ASCII ASCII ( ), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable character, printable and 33 control character, control c ...
with some differences, while the second 96 character codes represent the phonetic Japanese
katakana is a Japanese syllabary, one component of the Japanese writing system along with hiragana, kanji and in some cases the Latin script (known as rōmaji). The word ''katakana'' means "fragmentary kana", as the katakana characters are derived fr ...
signs. Since the encoding does not provide any way to express
hiragana is a Japanese language, Japanese syllabary, part of the Japanese writing system, along with ''katakana'' as well as ''kanji''. It is a phonetic lettering system. The word ''hiragana'' means "common" or "plain" kana (originally also "easy", ...
or
kanji are logographic Chinese characters, adapted from Chinese family of scripts, Chinese script, used in the writing of Japanese language, Japanese. They were made a major part of the Japanese writing system during the time of Old Japanese and are ...
, it is only capable of expressing simplified written Japanese. Nevertheless, this simplification can represent the full range of sounds in the language. In the 1970s, this was acceptable for media such as text mode
computer terminal A computer terminal is an electronic or electromechanical hardware device that can be used for entering data into, and transcribing data from, a computer or a computing system. Most early computers only had a front panel to input or display ...
s,
telegram Telegraphy is the long-distance transmission of messages where the sender uses symbolic codes, known to the recipient, rather than a physical exchange of an object bearing the message. Thus flag semaphore is a method of telegraphy, whereas pi ...
s, receipts, or other electronically handled data. JIS X 0201 was supplanted by subsequent encodings such as
Shift JIS Shift JIS (also SJIS, MIME name Shift_JIS, known as PCK in Solaris contexts) is a character encoding for the Japanese language, originally developed by the Japanese company ASCII Corporation in conjunction with Microsoft and standardized as JIS ...
, which combines this standard and
JIS X 0208 JIS X 0208 is a 2-byte character set specified as a Japanese Industrial Standards, Japanese Industrial Standard, containing 6879 graphic characters suitable for writing text, place names, personal names, and so forth in the Japanese language. Th ...
, and later by
Unicode Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
.


History

The Comite Consultatif International Telephonique et Telegraphique (CCITT) introduced the
International Telegraph Alphabet The Baudot code () is an early character encoding for telegraphy invented by Émile Baudot in the 1870s. It was the predecessor to the International Telegraph Alphabet No. 2 (ITA2), the most common teleprinter code in use before ASCII. Each Chara ...
No.2 (ITA2) code as an international standard, which was the 5-bit Latin encoding. Most countries have their own national standards based on this. In Japan, the Agency of Industrial Science and Technology (AIST) standardized it as the 6-bit character codes of JIS C 0803-1961 (''Keyboard layout and codes for teleprinters''), which combined with katakana characters. However, it didn't match the industry requirements because the character map was small, and the code layout was impractical. The AIST considered a practical character encoding to replace various codes used in Japan. In 1963,
ISO The International Organization for Standardization (ISO ; ; ) is an independent, non-governmental, international standard development organization composed of representatives from the national standards organizations of member countries. Me ...
introduced a draft of ISO R 646 (''6 and 7-bit coded character sets for information processing interchange''). AIST committed the conjunction of ISO R 646 and katakana mapping to the
Information Processing Society of Japan The is a Japanese learned society for computing. Founded in 1960, it is headquartered in Tokyo, Japan. IPSJ publishes a magazine and several professional journals mainly in Japanese, and sponsors conferences and workshops, also mainly conducted ...
(IPSJ). IPSJ formed the code standardization committee. The committee didn't adopt the 6-bit form of ISO's draft because the katakana set couldn't fit into its character map. The early JIS draft mapped small katakana characters next to each of their normal katakana characters. It was considered to be convenient for sorting by
Gojūon In the Japanese language, the is a traditional system ordering kana characters by their component phonemes, roughly analogous to alphabetical order. The "fifty" (''gojū'') in its name refers to the 5×10 grid in which the characters are dis ...
order (JIS X 0208:1978 chose this ordering). Some committee members criticized it would complicate the mechanic of keyboards which only handled normal katakana characters. The later draft mapped small katakana characters to positions . The 1964 ISO draft reserved the positions and for first and second currency symbols to be assigned by each country, but it was considered too dangerous in international communications to use currency symbols that could be localized. The ISO committee had two options that to use a generic currency symbol (¤) or to give the
dollar Dollar is the name of more than 25 currencies. The United States dollar, named after the international currency known as the Spanish dollar, was established in 1792 and is the first so named that still survives. Others include the Australian d ...
($) and pound (£) signs permanent assignments. It was agreed that the dollar sign was assigned to position and the pound sign was to position . The latter was not required in countries that did not need the pound sign. The JIS committee decided to put the
yen sign The yen and yuan sign (¥) is a currency sign used for the Japanese yen and the Chinese yuan currencies when writing in Latin scripts. This character resembles a capital letter Y with a single or double horizontal stroke. The symbol is usually ...
(¥) in (one of national use positions). JIS C 6220 (''Codes for information interchange'', 情報交換用符号) was published in 1969. Its number was changed to JIS X 0201 due to the JIS category reform in 1987, and the name was changed to ''7-bit and 8-bit coded character sets for information interchange'' (7ビット及び8ビットの情報交換用符号化文字集合) in the 1990 edition. The character set of JIS X 0201 had been widely used in Japan. The Nationwide Banking Data Communication System (全国銀行データ通信システム), the largest funds transfer system in Japan, was established in 1973. Transaction messages between banks used a subset of JIS X 0201. The system was used until 2018, and it was replaced by the ZEDI (The Nationwide Banking Electronic Data Interchange System, 全銀EDIシステム) which could handle hiragana and kanji characters. In 1978, the JIS C 6226 (
JIS X 0208 JIS X 0208 is a 2-byte character set specified as a Japanese Industrial Standards, Japanese Industrial Standard, containing 6879 graphic characters suitable for writing text, place names, personal names, and so forth in the Japanese language. Th ...
) 2-byte character set was developed to express hiragana and kanji characters. It includes katakana characters, but their codes and layout are different from JIS X 0201. Computer manufacturers developed their own extensions of JIS X 0208 to retain compatibility with JIS X 0201. In 1982, the
Microsoft Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...
Kanji encoding scheme ( Codepage 932 of
MS-DOS MS-DOS ( ; acronym for Microsoft Disk Operating System, also known as Microsoft DOS) is an operating system for x86-based personal computers mostly developed by Microsoft. Collectively, MS-DOS, its rebranding as IBM PC DOS, and a few op ...
) and
Digital Research Digital Research, Inc. (DR or DRI) was a privately held American software company created by Gary Kildall to market and develop his CP/M operating system and related 8-bit, 16-bit and 32-bit systems like MP/M, Concurrent DOS, FlexOS, Multiuser ...
's SJC26 (for Japanese
CP/M-86 CP/M-86 is a discontinued version of the CP/M operating system that Digital Research (DR) made for the Intel 8086 and Intel 8088. The system commands are the same as in CP/M-80. Executable files used the relocatable .CMD file format. Digital Re ...
) were developed to combine JIS X 0201 single-byte encoding and JIS X 0208 double byte encoding without
shift out Shift Out (SO) and Shift In (SI) are ASCII control characters 14 and 15, respectively (0x0E and 0x0F). These are sometimes also called "Control-N" and "Control-O". The original purpose of these characters was to provide a way to shift a coloured ...
and
shift in Shift Out (SO) and Shift In (SI) are ASCII control characters 14 and 15, respectively (0x0E and 0x0F). These are sometimes also called "Control-N" and "Control-O". The original purpose of these characters was to provide a way to shift a coloured ...
characters. They were called
Shift JIS Shift JIS (also SJIS, MIME name Shift_JIS, known as PCK in Solaris contexts) is a character encoding for the Japanese language, originally developed by the Japanese company ASCII Corporation in conjunction with Microsoft and standardized as JIS ...
, which became the industrial standard for personal computers.


Implementation details

The first half (Roman set) of JIS X 0201 constitutes a Japanese variant of
ISO 646 ISO/IEC 646 ''Information technology — ISO 7-bit coded character set for information interchange'', is an International Organization for Standardization, ISO/International Electrotechnical Commission, IEC standard in the ...
, amounting to
ASCII ASCII ( ), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable character, printable and 33 control character, control c ...
with
backslash The backslash is a mark used mainly in computing and mathematics. It is the mirror image of the common slash (punctuation), slash . It is a relatively recent mark, first documented in the 1930s. It is sometimes called a hack, whack, Escape c ...
(\) and
tilde The tilde (, also ) is a grapheme or with a number of uses. The name of the character came into English from Spanish , which in turn came from the Latin , meaning 'title' or 'superscription'. Its primary use is as a diacritic (accent) in ...
(~) replaced by
yen The is the official currency of Japan. It is the third-most traded currency in the foreign exchange market, after the United States dollar and the euro. It is also widely used as a third reserve currency after the US dollar and the euro. T ...
(¥) and
overline An overline, overscore, or overbar, is a typographical feature of a horizontal and vertical, horizontal line drawn immediately above the text. In old mathematical notation, an overline was called a ''vinculum (symbol), vinculum'', a notation fo ...
(‾), while the second half (Kana set) consists mainly of
katakana is a Japanese syllabary, one component of the Japanese writing system along with hiragana, kanji and in some cases the Latin script (known as rōmaji). The word ''katakana'' means "fragmentary kana", as the katakana characters are derived fr ...
.
Control character In computing and telecommunications, a control character or non-printing character (NPC) is a code point in a character encoding, character set that does not represent a written Character (computing), character or symbol. They are used as in-ba ...
s are specified in
JIS X 0211 JIS X 0211, originally designated JIS C 6323 is a Japanese Industrial Standard defining C0 and C1 control codes and control sequences. It was first established in 1986, with subsequent editions in 1991 and 1994. It defines C0 and C1 control char ...
. In the 7-bit format, the
shift out Shift Out (SO) and Shift In (SI) are ASCII control characters 14 and 15, respectively (0x0E and 0x0F). These are sometimes also called "Control-N" and "Control-O". The original purpose of these characters was to provide a way to shift a coloured ...
control character switches to the Kana set and
shift in Shift Out (SO) and Shift In (SI) are ASCII control characters 14 and 15, respectively (0x0E and 0x0F). These are sometimes also called "Control-N" and "Control-O". The original purpose of these characters was to provide a way to shift a coloured ...
(0x0F) switches to the Roman set. In the 8-bit format, given in the chart below, bytes with the most significant bit set (i.e. ) are used for the Kana set and bytes with it unset (i.e. ) are used otherwise. Names used specifically for the 7-bit Roman set include "JISCII", "JIS Roman", "ISO646-JP", "JIS C6220-1969-ro", "Japanese-Roman", "Japan 7-Bit Latin", and "ISO-IR-14", whereas names used specifically for the 7-bit Kana set include "ISO-IR-13", "JIS C6220-1969-jp" and "x0201-7". The substitution of the yen symbol for backslash can make paths on
DOS DOS (, ) is a family of disk-based operating systems for IBM PC compatible computers. The DOS family primarily consists of IBM PC DOS and a rebranded version, Microsoft's MS-DOS, both of which were introduced in 1981. Later compatible syste ...
and
Windows Windows is a Product lining, product line of Proprietary software, proprietary graphical user interface, graphical operating systems developed and marketed by Microsoft. It is grouped into families and subfamilies that cater to particular sec ...
-based computers with Japanese support display strangely, like "C:¥Program Files¥", for example. Another similar problem is
C programming language C (''pronounced'' '' – like the letter c'') is a general-purpose programming language. It was created in the 1970s by Dennis Ritchie and remains very widely used and influential. By design, C's features cleanly reflect the capabilities of ...
's control characters of
string literal string literal or anonymous string is a literal for a string value in the source code of a computer program. Modern programming languages commonly use a quoted sequence of characters, formally "bracketed delimiters", as in x = "foo", where , "foo ...
s, like printf("Hello, world.¥n");.


Codepage layout

The following table is the original 8-bit coded character set of JIS X 0201 (with the kana set indicated by bytes with the high bit set).


As part of Shift JIS

Following is the mapping used for JIS X 0201 as part of
Shift JIS Shift JIS (also SJIS, MIME name Shift_JIS, known as PCK in Solaris contexts) is a character encoding for the Japanese language, originally developed by the Japanese company ASCII Corporation in conjunction with Microsoft and standardized as JIS ...
, i.e. showing the 8-bit form of JIS X 0201, and mapping the Katakana characters to the
Halfwidth and Fullwidth Forms In CJK characters, CJK (Chinese, Japanese, and Korean) computing, graphic characters are traditionally classed into fullwidth and halfwidth characters. Unlike monospaced fonts, a halfwidth character occupies half the width of a fullwidth characte ...
block (which in turn derives its half width kana layout from JIS X 0201).


Alternative mapping of katakana

The basic
ISO-2022-JP ISO/IEC 2022 ''Information technology—Character code structure and extension techniques'', is an International Organization for Standardization, ISO/International Electrotechnical Commission, IEC standard in the field of character encoding. It ...
profile does not permit the Kana set of JIS X 0201, only the Roman set and
JIS X 0208 JIS X 0208 is a 2-byte character set specified as a Japanese Industrial Standards, Japanese Industrial Standard, containing 6879 graphic characters suitable for writing text, place names, personal names, and so forth in the Japanese language. Th ...
(although ISO 2022 / JIS X 0202 itself permits it). Accordingly, when converting JIS X 0201 katakana (or Unicode
half-width kana are katakana characters displayed compressed at half their normal width (a 1:2 aspect ratio), instead of the usual square (1:1) aspect ratio. For example, the usual (full-width) form of the katakana ''ka'' is カ while the half-width form is カ. ...
, which use the same layout) to ISO-2022-JP, the following mapping or transformation is often used.} This allows the kana to be converted to JIS X 0208. In theory, this mapping is equally correct, as JIS X 0201 itself does not specify display width, although in practice (and especially in duospaced environments) JIS X 0201 is used for half-width katakana. For ease of comparison with the chart above, the mapping is shown below over the JIS X 0201 katakana encoding and with the high bit set.


Variants and extensions


Shift JIS


IBM's implementations

Code page 897 is
IBM International Business Machines Corporation (using the trademark IBM), nicknamed Big Blue, is an American Multinational corporation, multinational technology company headquartered in Armonk, New York, and present in over 175 countries. It is ...
's implementation of the 8-bit form of JIS X 0201. It includes several additional graphical characters in the C0 control characters area, and the code points in question may be used as control characters or graphical characters depending on the context, similarly in concept to OEM-US, but with different graphical characters. The C0 rows are shown below. IBM also designate pure 8-bit JIS X 0201 without these control code replacements as Code page 1139. Another variant, including a smaller subset of these C0 replacement graphics (including only the box drawing characters in and the line/arrow characters in ), but using a different style of up-arrow () at , is designated Code page 1086. IBM also implements the 7-bit Roman set of JIS X 0201 as Code page 895 and the 7-bit Kana set as Code page 896 for use as
ISO 2022 ISO/IEC 2022 ''Information technology—Character code structure and extension techniques'', is an ISO/IEC standard in the field of character encoding. It is equivalent to the ECMA standard ECMA-35, the ANSI standard ANSI X3.41 and the Japanes ...
or
EUC-JP Extended Unix Code (EUC) is a multibyte character encoding system used primarily for Japanese, Korean, and simplified Chinese (characters). The most commonly used EUC codes are variable-length encodings with a character belonging to an compl ...
code-sets. Code page 896, in addition to standard JIS X 0201 assignments, defines five additional assignments, shown below. Although use of these extended characters is not permitted by the associated
CCSID A CCSID (coded character set identifier) is a 16-bit number that represents a particular encoding of a specific code page. For example, Unicode is a code page that has several character encoding schemes (referred to as "transformation formats")—i ...
896, they are permitted by the alternative CCSID 4992. IBM's Code page 1041 is an extended version of Code page 897, encoding these five IBM extended characters in alternative locations which are compatible with
Shift JIS Shift JIS (also SJIS, MIME name Shift_JIS, known as PCK in Solaris contexts) is a character encoding for the Japanese language, originally developed by the Japanese company ASCII Corporation in conjunction with Microsoft and standardized as JIS ...
(respectively ). Code page 911, another extended 8-bit JIS X 0201 implementation (which uses the same C0 replacement graphics as Code page 1086) encodes the pound (sterling) sign ( £) at , similarly to Code page 896 with the eight bit set, but differs by encoding the cent sign ( ¢) at and the not-sign ( ¬) at . IBM's Code page 903 is encoded for use as the single byte component of certain
simplified Chinese Simplification, Simplify, or Simplified may refer to: Mathematics Simplification is the process of replacing a mathematical expression by an equivalent one that is simpler (usually shorter), according to a well-founded ordering. Examples include: ...
character encodings, accompanying the
ASCII ASCII ( ), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable character, printable and 33 control character, control c ...
-based
Code page 904 In communications and information processing, code is a system of rules to convert information—such as a letter, word, sound, image, or gesture—into another form, sometimes shortened or secret, for communication through a communication ch ...
used with
traditional Chinese A tradition is a system of beliefs or behaviors (folk custom) passed down within a group of people or society with symbolic meaning or special significance with origins in the past. A component of cultural expressions and folklore, common examp ...
encodings. Despite this, Code page 903 follows ISO 646-JP / the Roman half of JIS X 0201, in that it replaces the ASCII backslash (rather than the ASCII dollar sign as in GB 1988 / ISO 646-CN) with the yen/yuan sign. It also uses the same C0 replacement graphics as code page 897. Code page 1042 extends code page 903 with the pound (sterling) sign at , and the not-sign, backslash and tilde at their Code page 1041 locations.


Others

File:PC-8001 character set.png, NEC PC-8001 (1979) character set as rendered in the 8×8 pixel font File:NEC-C-6220-paths.svg,
NEC is a Japanese multinational information technology and electronics corporation, headquartered at the NEC Supertower in Minato, Tokyo, Japan. It provides IT and network solutions, including cloud computing, artificial intelligence (AI), Inte ...
variant used on the PC98 series. File:Charset.gif,
Hitachi () is a Japanese Multinational corporation, multinational Conglomerate (company), conglomerate founded in 1910 and headquartered in Chiyoda, Tokyo. The company is active in various industries, including digital systems, power and renewable ener ...
variant used on the
HD44780 HD 44780 is a binary star system in the northern constellation of Gemini (constellation), Gemini, located about 3° north of Mu Geminorum. The pair have a combined apparent visual magnitude of 6.35, which is near the lower limit of visib ...
.


Footnotes


References


External links


Diagram of JIS X 0201 (as 7-bit code sets)
{{character encoding Encodings of Japanese JIS standards Computer-related introductions in 1969