HOME

TheInfoList



OR:

are
katakana is a Japanese syllabary, one component of the Japanese writing system along with hiragana, kanji and in some cases the Latin script (known as rōmaji). The word ''katakana'' means "fragmentary kana", as the katakana characters are derived f ...
characters displayed compressed at half their normal width (a 1:2 aspect ratio), instead of the usual square (1:1) aspect ratio. For example, the usual (full-width) form of the katakana ''ka'' is カ while the half-width form is カ. Half-width
hiragana is a Japanese syllabary, part of the Japanese writing system, along with ''katakana'' as well as ''kanji''. It is a phonetic lettering system. The word ''hiragana'' literally means "flowing" or "simple" kana ("simple" originally as contras ...
is not included in Unicode, although it's usable on Web or in
e-book An ebook (short for electronic book), also known as an e-book or eBook, is a book publication made available in digital form, consisting of text, images, or both, readable on the flat-panel display of computers or other electronic devices. A ...
s via
CSS Cascading Style Sheets (CSS) is a style sheet language used for describing the presentation of a document written in a markup language such as HTML or XML (including XML dialects such as SVG, MathML or XHTML). CSS is a cornerstone technolo ...
's font-feature-settings: "hwid" 1 with Adobe-Japan1-6 based OpenType fonts. Half-width
kanji are the logographic Chinese characters taken from the Chinese script and used in the writing of Japanese. They were made a major part of the Japanese writing system during the time of Old Japanese and are still used, along with the subsequ ...
is not usable on modern computers, but is used in some receipt printers, electric bulletin board and old computers. Half-width kana were used in the early days of Japanese computing, to allow Japanese characters to be displayed on the same grid as
monospaced font A monospaced font, also called a fixed-pitch, fixed-width, or non-proportional font, is a font whose letters and characters each occupy the same amount of horizontal space. This contrasts with variable-width fonts, where the letters and spaci ...
s of Latin characters. Half-width kanji were not used. Half-width kana characters are not generally used today, but find some use in specific settings, such as
cash register A cash register, sometimes called a till or automated money handling system, is a mechanical or electronic device for registering and calculating transactions at a point of sale. It is usually attached to a drawer for storing cash and other ...
displays, on shop receipts, Japanese digital television and DVD subtitles, and mailing address labels. Their usage is sometimes also a stylistic choice, particularly frequent in certain
Internet slang Internet slang (also called Internet shorthand, cyber-slang, netspeak, digispeak or chatspeak) is a non-standard or unofficial form of language used by people on the Internet to communicate to one another. An example of Internet slang is "LOL" m ...
. The term "half-width kana", which strictly refers only to how kana are ''displayed'', not how they are ''stored'' – is also used loosely to refer to the A0–DF (hexadecimal) block where katakana are stored in some
character encoding Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers. The numerical values tha ...
s, such as JIS X 0201 (1969) – see encodings, below. This is formally incorrect, however – this JIS standard simply specifies that katakana can be stored in these locations, without specifying how they should be displayed; the confusion is because in early computing, the characters stored here were in fact displayed as half-width kana – see
confusion In medicine, confusion is the quality or state of being bewildered or unclear. The term "acute mental confusion"
, below.


History

Half-width kana and 2/3-width kana were used from pre-computer era. In the early computer era,
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
is defined as a 7-bit
character set Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers. The numerical values tha ...
and has room for 128 characters. However, since this standard was designed for the
United States The United States of America (U.S.A. or USA), commonly known as the United States (U.S. or US) or America, is a country Continental United States, primarily located in North America. It consists of 50 U.S. state, states, a Washington, D.C., ...
, it does not contain characters and symbols, such as the yen (¥) symbol needed to represent Japanese currency, nor did it include space for characters from other alphabets, such as kana or kanji – thus Japanese characters could not be ''encoded''. Further, Japanese characters, both kana and kanji, are drawn on a square grid, while Latin characters are generally written more narrowly – thus Japanese characters could not be ''displayed'' either. JIS X 0201 was developed in 1969, a time when computers were generally incapable, both by software design and hardware resources, of representing the thousands of Chinese-based
kanji are the logographic Chinese characters taken from the Chinese script and used in the writing of Japanese. They were made a major part of the Japanese writing system during the time of Old Japanese and are still used, along with the subsequ ...
characters used in Japanese. As a compromise, this standard encoded katakana (only – not hiragana or kanji) as a small set of characters, assigned in the upper byte value range of 0x80–0xFF. This allowed 8-bit processors to encode and process Japanese text phonetically (as katakana), though without being able to process hiragana or kanji. These katakana characters were in turn ''displayed'' as "half-width kana" – a new, unorthodox, narrower form factor to fit the same width as the monospaced Latin alphabets machines were capable of printing and displaying. Encoding-wise, JIS X 0201 is a variant extension of ASCII – it includes additional characters, and does not exactly agree with ASCII on the overlapping part (the Latin character section).Half-width kana were developed as "... the first Japanese characters encoded on computers because they are used for Japanese telegrams." , the largest funds transfer system in Japan, was established in 1973. Transaction messages between banks could only use Latin, numbers, and half-width katakana within 20 characters. The system is superseded by ZEDI (The Nationwide Banking Electronic Data Interchange System) in 2018, which can handle hiragana and kanji with variable length characters. To make katakana fit into the narrower cell area allowed, some compromises were made. For example, the diacritical marks ''
dakuten The , colloquially , is a diacritic most often used in the Japanese kana syllabaries to indicate that the consonant of a syllable should be pronounced voiced, for instance, on sounds that have undergone rendaku (sequential voicing). The , ...
'' and '' handakuten'' are treated as separate characters instead of being part of the preceding character. This compromise led many to consider "half-width kana" visually unattractive, and causes problems for many computer programs today. Another use of half-width kana is to save space. The Japanese version of
Windows 95 Windows 95 is a consumer-oriented operating system developed by Microsoft as part of its Windows 9x family of operating systems. The first operating system in the 9x family, it is the successor to Windows 3.1x, and was released to manufacturi ...
used half-width katakana of MS P Gothic in its user interface. It was replaced by full-width kana of MS UI Gothic, little narrower than MS P Gothic.


Encoding

In the JIS X 0201 specification (1969), katakana are encoded in A0–DF (hexadecimal) block – how they are displayed is not specified, and there is no separate encoding of full-width and half-width kana. In JIS X 0208, katakana, hiragana, and kanji are all encoded (and displayed as full-width characters; there are no half-width characters), though the ordering of the kana is different – see JIS X 0208#Hiragana and katakana. In
Shift JIS Shift JIS (Shift Japanese Industrial Standards, also SJIS, MIME name Shift_JIS, known as PCK in Solaris contexts) is a character encoding for the Japanese language, originally developed by a Japanese company called ASCII Corporation in conjuncti ...
, which combines JIS X 0201 and JIS X 0208, these encodings (both of which can encode Latin characters and katakana) are stored separately, with JIS X 0201 all being displayed as half-width (thus the JIS X 0201 katakana are displayed as half-width kana), while JIS X 0208 are all displayed as full-width (thus the JIS X 0208 Latin characters are all displayed as full-width Latin characters). Thus in Shift JIS, Latin characters and katakana have two encodings with two separate display forms, both half-width and full-width. In
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, ...
, katakana and hiragana are primarily used as normal, full-width characters (the Katakana and Hiragana blocks are displayed as full-width characters); a separate block, the
Halfwidth and Fullwidth Forms In CJK (Chinese, Japanese and Korean) computing, graphic characters are traditionally classed into fullwidth (in Taiwan and Hong Kong: 全形; in CJK: 全角) and halfwidth (in Taiwan and Hong Kong: 半形; in CJK: 半角) characters. Unlike ...
block is used to store variant characters, including half-width kana and full-width Latin characters. Thus, the katakana in JIS X 0201 and the corresponding part of derived encodings (the JIS X 0201 part of Shift JIS) are displayed as half-width, while in Unicode half-width forms are specified separately.


Half-width table

"J" indicates the first four bits in JIS X 0201 (though see below, these do not ''necessarily'' indicate half-width) and in other sets such as
Shift JIS Shift JIS (Shift Japanese Industrial Standards, also SJIS, MIME name Shift_JIS, known as PCK in Solaris contexts) is a character encoding for the Japanese language, originally developed by a Japanese company called ASCII Corporation in conjuncti ...
, "U" indicates the row in
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, ...
in the Halfwidth and Fullwidth Forms block. Please note that the blank first cell represents a non-existent character in JIS, A0; but a fullwidth double parenthesis ⦆ in Unicode, U+FF60.


Half-width kana on the Internet


E-mail

Since the
SMTP The Simple Mail Transfer Protocol (SMTP) is an Internet standard communication protocol for electronic mail transmission. Mail servers and other message transfer agents use SMTP to send and receive mail messages. User-level email clients ty ...
and
NNTP The Network News Transfer Protocol (NNTP) is an application protocol used for transporting Usenet news articles (''netnews'') between news servers, and for reading/posting articles by the end user client applications. Brian Kantor of the Univers ...
protocols (used to deliver e-mail and
Usenet Usenet () is a worldwide distributed discussion system available on computers. It was developed from the general-purpose Unix-to-Unix Copy (UUCP) dial-up network architecture. Tom Truscott and Jim Ellis conceived the idea in 1979, and it wa ...
, respectively) were formerly only able to transmit 7-bit bytes, it was then the convention to use
ISO-2022-JP ISO/IEC 2022 ''Information technology—Character code structure and extension techniques'', is an ISO/IEC standard (equivalent to the ECMA standard ECMA-35, the ANSI standard ANSI X3.41 and the Japanese Industrial Standard JIS X 0202) in the ...
for sending e-mail in Japanese. Half-width kana is not contained in ISO-2022-JP: it includes the Roman set of JIS X 0201, and all of JIS X 0208, but not the katakana set of JIS X 0201 (which is used for half-width kana in Shift JIS, for instance). Both sets of JIS X 0201 have ISO 2022 codes, but the ISO-2022-JP profile only includes the Roman set: this means that the format for including half-width katakana in ISO-2022-JP is both well-defined and a violation of the ISO-2022-JP format. For this reason, if half-width kana were accidentally included in a message, it could become garbled during transmission (see
mojibake Mojibake ( ja, 文字化け; , "character transformation") is the garbled text that is the result of text being decoded using an unintended character encoding. The result is a systematic replacement of symbols with completely unrelated ones, oft ...
). The
WHATWG The Web Hypertext Application Technology Working Group (WHATWG) is a community of people interested in evolving HTML and related technologies. The WHATWG was founded by individuals from Apple Inc., the Mozilla Foundation and Opera Software, l ...
encoding standard used by
HTML5 HTML5 is a markup language used for structuring and presenting content on the World Wide Web. It is the fifth and final major HTML version that is a World Wide Web Consortium (W3C) recommendation. The current specification is known as the HTML ...
permits decoding, but not encoding, of JIS X 0201 katakana in ISO-2022-JP as an extension to the format, and converts half-width katakana to their JIS X 0208 equivalents upon encoding. This is no longer such a problem since most e-mail servers today support
8BITMIME The Simple Mail Transfer Protocol (SMTP) is an Internet standard communication protocol for electronic mail transmission. Mail servers and other message transfer agents use SMTP to send and receive mail messages. User-level email clients typi ...
extension and hence understand 8-bit characters. Alternatively, an encoding system such as Base64 can be used and specified in the message using
MIME Multipurpose Internet Mail Extensions (MIME) is an Internet standard that extends the format of email messages to support text in character sets other than ASCII, as well as attachments of audio, video, images, and application programs. Message ...
.


Web pages

The problem that exists in e-mail does not exist with Web pages since
HTTP The Hypertext Transfer Protocol (HTTP) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide We ...
accepts 8-bit characters. However, one problem that does exist is that computer programs have difficulties determining whether to treat a character as
Shift JIS Shift JIS (Shift Japanese Industrial Standards, also SJIS, MIME name Shift_JIS, known as PCK in Solaris contexts) is a character encoding for the Japanese language, originally developed by a Japanese company called ASCII Corporation in conjuncti ...
,
EUC-JP Extended Unix Code (EUC) is a multibyte character encoding system used primarily for Japanese, Korean, and simplified Chinese. The most commonly used EUC codes are variable-length encodings with a character belonging to an compliant coded char ...
, or
UTF-8 UTF-8 is a variable-length character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode'' (or ''Universal Coded Character Set'') ''Transformation Format 8-bit''. UTF-8 is capable of e ...
– hence character code information should be specified with a HTTP response header or a Meta tag.


Confusion

Strictly speaking, JIS X 0201 encoding as "half-width katakana" is incorrect, as the standard does not define character widths – it defines only the code representation of katakana characters. In the JIS X 0201 standard, katakana characters are printed in normal (full) width, not half-width. Half-width characters were only used for display during the period when characters were displayed at half-width (and single-byte encodings were used), before full-width character displays (and associated double-byte encodings such as JIS X 0208) became widespread. However, in the Shift JIS standard, which combines the JIS X 0201 standard (whose characters – Latin and katakana – were displayed as half-width) and the JIS X 0208 standard (whose characters – katakana, hiragana, kanji, and Latin – were displayed as full-width), katakana and Latin characters are encoded twice, both in JIS X 0201 and JIS 0208, but displayed as half-width or full-width according to which section they are in (0201 or 0208) – thus the 0201 katakana block can be thought of as corresponding to "half-width kana", and the misunderstanding that the 0201 standard defines "half-width" characters is widespread. Further, though JIS X 0201 is a single-byte encoding (and displayed at half-width) and JIS X 0208 is a double-byte encoding (and displayed at full-width), there is no connection between number of bytes and width (other than those corresponding in Shift JIS, as above) – for example, Unicode can be encoded with four bytes ( UTF-32) to display both full-width and single-width characters.


See also

*
Halfwidth and fullwidth forms In CJK (Chinese, Japanese and Korean) computing, graphic characters are traditionally classed into fullwidth (in Taiwan and Hong Kong: 全形; in CJK: 全角) and halfwidth (in Taiwan and Hong Kong: 半形; in CJK: 半角) characters. Unlike ...


References

* Lunde, Ken. ''CJKV Information Processing''. O'Reilly, 2nd ed., 2009
p. 224–226
(also 1st ed., 1999. p. 144–145) {{DEFAULTSORT:Half-Width Kana Japanese writing system terms Kana