In
CJK (Chinese, Japanese and Korean) computing,
graphic character
In ISO/IEC 646 (commonly known as ASCII) and related standards including ISO 8859 and Unicode, a graphic character is any character intended to be written, printed, or otherwise displayed in a form that can be read by humans. In other words, it i ...
s are traditionally classed into fullwidth (in
Taiwan
Taiwan, officially the Republic of China (ROC), is a country in East Asia, at the junction of the East and South China Seas in the northwestern Pacific Ocean, with the People's Republic of China (PRC) to the northwest, Japan to the nort ...
and
Hong Kong
Hong Kong ( (US) or (UK); , ), officially the Hong Kong Special Administrative Region of the People's Republic of China ( abbr. Hong Kong SAR or HKSAR), is a city and special administrative region of China on the eastern Pearl River Delt ...
:
全形; in CJK:
全角) and halfwidth (in
Taiwan
Taiwan, officially the Republic of China (ROC), is a country in East Asia, at the junction of the East and South China Seas in the northwestern Pacific Ocean, with the People's Republic of China (PRC) to the northwest, Japan to the nort ...
and
Hong Kong
Hong Kong ( (US) or (UK); , ), officially the Hong Kong Special Administrative Region of the People's Republic of China ( abbr. Hong Kong SAR or HKSAR), is a city and special administrative region of China on the eastern Pearl River Delt ...
:
半形; in CJK:
半角) characters. Unlike
monospaced fonts
This is a list of typefaces, which are separated into groups by distinct artistic differences. The list includes typefaces that have articles or that are referenced. Superfamilies that fall under more than one category have an asterisk (*) after t ...
, a halfwidth character occupies half the width of a fullwidth character, hence the name.
''
Halfwidth and Fullwidth Forms
In CJK (Chinese, Japanese and Korean) computing, graphic characters are traditionally classed into fullwidth (in Taiwan and Hong Kong: 全形; in CJK: 全角) and halfwidth (in Taiwan and Hong Kong: 半形; in CJK: 半角) characters. Unlik ...
'' is also the name of a
Unicode block
A Unicode block is one of several contiguous ranges of numeric character codes (code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation purposes. Typically, proposals such as the ad ...
U+FF00–FFEF, provided so that older encodings containing both halfwidth and fullwidth characters can have lossless translation to/from Unicode.
Rationale
In the days of
text mode
Text mode is a computer display mode in which content is internally represented on a computer screen in terms of characters rather than individual pixels. Typically, the screen consists of a uniform rectangular grid of ''character cells'', each ...
computing, Western characters were normally laid out in a grid on the screen, often 80 columns by 24 or 25 lines. Each character was displayed as a small
dot matrix
A dot matrix is a 2-dimensional patterned array, used to represent characters, symbols and images. Most types of modern technology use dot matrices for display of information, including mobile phones, televisions, and printers. The system is al ...
, often about 8
pixel
In digital imaging, a pixel (abbreviated px), pel, or picture element is the smallest addressable element in a raster image, or the smallest point in an all points addressable display device.
In most digital display devices, pixels are the smal ...
s wide, and a
SBCS
SBCS, or Single Byte Character Set, is used to refer to character encodings that use exactly one byte for each graphic character. An SBCS can accommodate a maximum of 256 symbols, and is useful for scripts that do not have many symbols or accented ...
(single-byte character set) was generally used to encode characters of Western languages.
For aesthetic reasons and readability, it is preferable for
Han character
Chinese characters () are logograms developed for the writing of Chinese. In addition, they have been adapted to write other East Asian languages, and remain a key component of the Japanese writing system where they are known as ''kanji' ...
s to be approximately square-shaped, thusly twice as wide as these fixed-width SBCS characters. As these were typically encoded in a
DBCS
A double-byte character set (DBCS) is a character encoding in which either all characters (including control characters) are encoded in two bytes, or merely every graphic character not representable by an accompanying single-byte character set ( ...
(double-byte character set) this also meant that their width on screen in a
duospaced font
A duospaced font (also called a duospace font) is a fixed-width font whose letters and characters occupy either of two integer multiples of a specified, fixed horizontal space. Traditionally, this means either a single or double character width, al ...
was proportional to their byte length. Some terminals and editing programs could not deal with double-byte characters starting at odd columns, only even ones (some could not even put double-byte and single-byte characters in the same line). So the DBCS sets generally included Roman characters and digits also, for use alongside the CJK characters in the same line.
On the other hand, early Japanese computing used a single-byte code page called
JIS X 0201
JIS X 0201, a Japanese Industrial Standard developed in 1969 (then called JIS C 6220 until the JIS category reform), was the first Japanese electronic character set to become widely used. It is either a 7-bit encoding or an 8-bit encoding, althou ...
for
katakana
is a Japanese syllabary, one component of the Japanese writing system along with hiragana, kanji and in some cases the Latin script (known as rōmaji). The word ''katakana'' means "fragmentary kana", as the katakana characters are derived fr ...
. These would be rendered at the same width as the other single-byte characters, making them
half-width kana are katakana characters displayed compressed at half their normal width (a 1:2 aspect ratio), instead of the usual square (1:1) aspect ratio. For example, the usual (full-width) form of the katakana ''ka'' is カ while the half-width form is カ. ...
characters rather than normally proportioned kana. Although the JIS X 0201 standard itself did not specify half-width display for katakana, this became the visually distinguishing feature in
Shift JIS
Shift JIS (Shift Japanese Industrial Standards, also SJIS, MIME name Shift_JIS, known as PCK in Solaris contexts) is a character encoding for the Japanese language, originally developed by a Japanese company called ASCII Corporation in conjunctio ...
between the single-byte JIS X 0201 and double-byte
JIS X 0208
JIS X 0208 is a 2-byte character set specified as a Japanese Industrial Standard, containing 6879 graphic characters suitable for writing text, place names, personal names, and so forth in the Japanese language. The official title of the current ...
katakana. Some IBM code pages used a similar treatment for
Korean jamo,
based on the
N-byte Hangul code and its
EBCDIC
Extended Binary Coded Decimal Interchange Code (EBCDIC; ) is an eight-bit character encoding used mainly on IBM mainframe and IBM midrange computer operating systems. It descended from the code used with punched cards and the corresponding six- ...
translation.
In Unicode
For compatibility with existing character sets that contained both half- and fullwidth versions of the same character,
Unicode
Unicode, formally The Unicode Standard,The formal version reference is is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expre ...
allocated a single block at U+FF00–FFEF containing the necessary "alternative width" characters. This includes a fullwidth version of all the
ASCII
ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of ...
characters and some non-ASCII punctuation such as the Yen sign, halfwidth versions of katakana and
hangul
The Korean alphabet, known as Hangul, . Hangul may also be written as following South Korea's standard Romanization. ( ) in South Korea and Chosŏn'gŭl in North Korea, is the modern official writing system for the Korean language. The let ...
, and halfwidth versions of some other symbols such as circles. Only characters needed for lossless round trip to existing character sets were allocated, rather than (for instance) making a fullwidth version of every Latin accented character.
Unicode assigns ''every'' code point an "East Asian width"
property
Property is a system of rights that gives people legal control of valuable things, and also refers to the valuable things themselves. Depending on the nature of the property, an owner of property may have the right to consume, alter, share, r ...
. This may be:
Terminal emulator
A terminal emulator, or terminal application, is a computer program that emulates a video terminal within some other display architecture. Though typically synonymous with a shell or text terminal, the term ''terminal'' covers all remote termin ...
s can use this property to decide whether a character should consume one or two "columns" when figuring out tabs and cursor position.
In OpenType
OpenType
OpenType is a format for scalable computer fonts. It was built on its predecessor TrueType, retaining TrueType's basic structure and adding many intricate data structures for prescribing typographic behavior. OpenType is a registered trademark o ...
has the "fwid", "halt", "hwid" and "vhal" feature tags to be used for providing fullwidth or halfwidth form of a character.
See also
*
Han unification
Han unification is an effort by the authors of Unicode and the Universal Character Set to map multiple character sets of the Han characters of the so-called CJK languages into a single set of unified characters. Han characters are a feature s ...
*
East Asian punctuation
*
Em size
An em (from English '' em quadrat'') is a unit in the field of typography, equal to the currently specified point size. For example, one em in a 16-point typeface is 16 points. Therefore, this unit is the same for all typefaces at a given point ...
– full width forms
*
Hangul Jamo (Unicode block)
Hangul Jamo ( ko, 한글 자모, ) is a Unicode block containing positional (''choseong'', ''jungseong'', and ''jongseong'') forms of the Hangul consonant and vowel clusters. They can be used to dynamically compose syllables that are not availa ...
*
Katakana (Unicode block)
Katakana is a Unicode block containing katakana characters for the Japanese and Ainu languages.
Block
History
The following Unicode-related documents record the purpose and process of defining specific characters in the Katakana block:
See ...
*
Latin script in Unicode
Over a thousand characters from the Latin script are encoded in the Unicode Standard, grouped in several basic and extended Latin blocks. The extended ranges contain mainly precomposed letters plus diacritics that are equivalently encoded with co ...
*
Enclosed Alphanumerics
Enclosed Alphanumerics is a Unicode block of Typography, typographical symbols of an alphanumeric within a circle, a bracket or other not-closed enclosure, or ending in a full stop.
It is currently fully allocated. Within the Basic Multili ...
– bullet point sequences, some appear as full width (e.g. ⒈,⓵,⑴,⒜,ⓐ)
References
External links
East Asian WidthUnicode Standard Annex #11
{{Unicode navigation
Kana
*Halfwidth