HOME

TheInfoList



OR:

In CJK (Chinese, Japanese, and Korean) computing,
graphic character In ISO/IEC 646 (commonly known as ASCII) and related standards including ISO 8859 and Unicode, a graphic character, also known as printing character (or printable character), is any character intended to be written, printed, or otherwise display ...
s are traditionally classed into fullwidth and halfwidth characters. Unlike
monospaced font A monospaced font, also called a fixed-pitch, fixed-width, or non-proportional font, is a font whose letters and characters each occupy the same amount of horizontal space. This contrasts with Typeface#Proportion, variable-width fonts, where t ...
s, a halfwidth character occupies half the width of a fullwidth character, hence the name. ''
Halfwidth and Fullwidth Forms In CJK characters, CJK (Chinese, Japanese, and Korean) computing, graphic characters are traditionally classed into fullwidth and halfwidth characters. Unlike monospaced fonts, a halfwidth character occupies half the width of a fullwidth characte ...
'' is also the name of a
Unicode block A Unicode block is one of several contiguous ranges of numeric character codes (code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation purposes. Typically, proposals such as the ...
U+FF00–FFEF, provided so that older encodings containing both halfwidth and fullwidth characters can have lossless translation to and from Unicode.


Rationale

In the days of
text mode Text mode is a computer display mode in which content is internally represented on a computer screen in terms of characters rather than individual pixels. Typically, the screen consists of a uniform rectangular grid of ''character cells'', ea ...
computing, Western characters were normally laid out in a grid on the screen, often 80 columns by 24 or 25 lines. Each character was displayed as a small
dot matrix A dot matrix is a 2-dimensional patterned Array data structure, array, used to represent characters, symbols and images. Most types of modern technology use dot matrices for display of information, including mobile phones, televisions, and pri ...
, often about 8
pixel In digital imaging, a pixel (abbreviated px), pel, or picture element is the smallest addressable element in a Raster graphics, raster image, or the smallest addressable element in a dot matrix display device. In most digital display devices, p ...
s wide, and an SBCS (single-byte character set) was generally used to encode characters of Western languages. For aesthetic reasons and readability, it is preferable for
Chinese characters Chinese characters are logographs used Written Chinese, to write the Chinese languages and others from regions historically influenced by Chinese culture. Of the four independently invented writing systems accepted by scholars, they represe ...
to be approximately square-shaped, therefore twice as wide as these fixed-width SBCS characters. As these were typically encoded in a DBCS (double-byte character set), this also meant that their width on screen in a
duospaced font A duospaced font (also called a duospace font) is a fixed-width font whose letters and characters occupy either of two integer multiples of a specified, fixed horizontal space. Traditionally, this means either a single or double character width, a ...
was proportional to their byte length. Some terminals and editing programs could not deal with double-byte characters starting at odd columns, only even ones (some could not even put double-byte and single-byte characters in the same line). So the DBCS sets generally included Roman characters and digits also, for use alongside the CJK characters in the same line. On the other hand, early Japanese computing used a single-byte code page called
JIS X 0201 JIS X 0201, a Japanese Industrial Standards, Japanese Industrial Standard developed in 1969, was the first Japanese electronic character set to become widely used. The character set was initially known as JIS C 6220 before the JIS category reform. ...
for
katakana is a Japanese syllabary, one component of the Japanese writing system along with hiragana, kanji and in some cases the Latin script (known as rōmaji). The word ''katakana'' means "fragmentary kana", as the katakana characters are derived fr ...
. These would be rendered at the same width as the other single-byte characters, making them
half-width kana are katakana characters displayed compressed at half their normal width (a 1:2 aspect ratio), instead of the usual square (1:1) aspect ratio. For example, the usual (full-width) form of the katakana ''ka'' is カ while the half-width form is カ. ...
characters rather than normally proportioned kana. Although the JIS X 0201 standard itself did not specify half-width display for katakana, this became the visually distinguishing feature in
Shift JIS Shift JIS (also SJIS, MIME name Shift_JIS, known as PCK in Solaris contexts) is a character encoding for the Japanese language, originally developed by the Japanese company ASCII Corporation in conjunction with Microsoft and standardized as JIS ...
between the single-byte JIS X 0201 and double-byte
JIS X 0208 JIS X 0208 is a 2-byte character set specified as a Japanese Industrial Standards, Japanese Industrial Standard, containing 6879 graphic characters suitable for writing text, place names, personal names, and so forth in the Japanese language. Th ...
katakana. Some IBM code pages used a similar treatment for Korean jamo, based on the N-byte Hangul code and its
EBCDIC Extended Binary Coded Decimal Interchange Code (EBCDIC; ) is an eight- bit character encoding used mainly on IBM mainframe and IBM midrange computer operating systems. It descended from the code used with punched cards and the corresponding si ...
translation.


In Unicode

For compatibility with existing character sets that contained both half- and fullwidth versions of the same character,
Unicode Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
allocated a single block at U+FF00–FFEF containing the necessary "alternative width" characters. This includes a fullwidth version of all the
ASCII ASCII ( ), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable character, printable and 33 control character, control c ...
characters and some non-ASCII punctuation such as the Yen sign, halfwidth versions of katakana and
hangul The Korean alphabet is the modern writing system for the Korean language. In North Korea, the alphabet is known as (), and in South Korea, it is known as (). The letters for the five basic consonants reflect the shape of the speech organs ...
, and halfwidth versions of some other symbols such as circles. Only characters needed for lossless round trip to existing character sets were allocated, rather than (for instance) making a fullwidth version of every Latin accented character. Unicode assigns ''every'' code point an "East Asian width"
property Property is a system of rights that gives people legal control of valuable things, and also refers to the valuable things themselves. Depending on the nature of the property, an owner of property may have the right to consume, alter, share, re ...
. This may be:
Terminal emulator A terminal emulator, or terminal application, is a computer program that emulates a video terminal within some other display architecture. Though typically synonymous with a shell or text terminal, the term ''terminal'' covers all remote term ...
s can use this property to decide whether a character should consume one or two "columns" when figuring out tabs and cursor position.


In OpenType

OpenType OpenType is a format for scalable computer fonts. Derived from TrueType, it retains TrueType's basic structure but adds many intricate data structures for describing typographic behavior. OpenType is a registered trademark of Microsoft Corpora ...
has the fwid, halt, hwid, and vhal feature tags to be used to reproduce fullwidth or halfwidth form of a character. CSS provides control over these features using font-variant-east-asian and font-feature-settings properties.


See also

* East Asian punctuation * Em size – full width forms *
Enclosed Alphanumerics Enclosed Alphanumerics is a Unicode block of Typography, typographical symbols of an alphanumeric within a circle, a bracket or other not-closed enclosure, or ending in a full stop. It is currently fully allocated. Within the Basic Multi ...
– bullet point sequences; some appear as fullwidth (e.g. ⒈, ⓵, ⑴, ⒜, ⓐ) *
Han unification Han unification is an effort by the authors of Unicode and the Universal Character Set to map multiple character sets of the Han characters of the so-called CJK languages into a single set of unified characters. Han characters are a featur ...
*
Hangul Jamo (Unicode block) Hangul Jamo (, ) is a Unicode block containing positional (''choseong'', ''jungseong'', and ''jongseong'') forms of the Hangul consonant and vowel clusters. While the Hangul Syllables Hangul Syllables is a Unicode block containing precompos ...
*
Katakana (Unicode block) Katakana is a Unicode block containing katakana characters for the Japanese and Ainu languages. Block History The following Unicode-related documents record the purpose and process of defining specific characters in the Katakana block: See ...
*
Latin script in Unicode Over a thousand characters from the Latin script are encoded in the Unicode Standard, grouped in several basic and extended Latin blocks. The extended ranges contain mainly precomposed letters plus diacritics that are equivalently encoded with c ...


Notes


References


External links


East Asian Width
Unicode Standard Annex #11 {{Unicode navigation East Asian typography Kana *Halfwidth