HOME
*





Z-variant
In Unicode, two glyphs are said to be Z-variants (often spelled zVariants) if they share the same etymology but have slightly different appearances and different Unicode code points. For example, the Unicode characters U+8AAA 說 and U+8AAC 説 are Z-variants. The notion of Z-variance is only applicable to the "CJKV scripts"—Chinese, Japanese, Korean and Vietnamese—and is a subtopic of Han unification. Differences on the Z-axis The Unicode philosophy of code point allocation for CJK languages is organized along three "axes." The X-axis represents differences in semantics; for example, the Latin capital A (U+0041 A) and the Greek capital alpha (U+0391 Α) are represented by two distinct code points in Unicode, and might be termed "X-variants" (though this term is not common). The Y-axis represents significant differences in appearance though not in semantics; for example, the traditional Chinese character ''māo'' "cat" (U+8C93 貓) and the simplified Chinese character (U+732B ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Unihan
Han unification is an effort by the authors of Unicode and the Universal Character Set to map multiple character sets of the Han characters of the so-called CJK languages into a single set of unified characters. Han characters are a feature shared in common by written Chinese (hanzi), Japanese (kanji), Korean (hanja) and Vietnamese (chữ Hán). Modern Chinese, Japanese and Korean typefaces typically use regional or historical variants of a given Han character. In the formulation of Unicode, an attempt was made to unify these variants by considering them different glyphs representing the same "grapheme", or orthographic unit, hence, "Han unification", with the resulting character repertoire sometimes contracted to Unihan. Nevertheless, many characters have regional variants assigned to different code points, such as Traditional (U+500B) versus Simplified (U+4E2A). Unihan can also refer to the Unihan Database maintained by the Unicode Consortium, which provides informatio ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Han Unification
Han unification is an effort by the authors of Unicode and the Universal Character Set to map multiple character sets of the Han characters of the so-called CJK languages into a single set of unified characters. Han characters are a feature shared in common by written Chinese ( hanzi), Japanese (kanji), Korean (hanja) and Vietnamese (chữ Hán). Modern Chinese, Japanese and Korean typefaces typically use regional or historical variants of a given Han character. In the formulation of Unicode, an attempt was made to unify these variants by considering them different glyphs representing the same "grapheme", or orthographic unit, hence, "Han unification", with the resulting character repertoire sometimes contracted to Unihan. Nevertheless, many characters have regional variants assigned to different code points, such as Traditional (U+500B) versus Simplified (U+4E2A). Unihan can also refer to the Unihan Database maintained by the Unicode Consortium, which provides informati ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Unicode
Unicode, formally The Unicode Standard,The formal version reference is is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expressed in most of the world's writing systems. The standard, which is maintained by the Unicode Consortium, defines as of the current version (15.0) 149,186 characters covering 161 modern and historic script (Unicode), scripts, as well as symbols, emoji (including in colors), and non-visual control and formatting codes. Unicode's success at unifying character sets has led to its widespread and predominant use in the internationalization and localization of computer software. The standard has been implemented in many recent technologies, including modern operating systems, XML, and most modern programming languages. The Unicode character repertoire is synchronized with Universal Coded Character Set, ISO/IEC 10646, each being code-for-code id ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Glyph
A glyph () is any kind of purposeful mark. In typography, a glyph is "the specific shape, design, or representation of a character". It is a particular graphical representation, in a particular typeface, of an element of written language. A grapheme, or part of a grapheme (such as a diacritic), or sometimes several graphemes in combination (a composed glyph) can be represented by a glyph. Glyphs, graphemes and characters In most languages written in any variety of the Latin alphabet except English, the use of diacritics to signify a sound mutation is common. For example, the grapheme requires two glyphs: the basic and the grave accent . In general, a diacritic is regarded as a glyph, even if it is contiguous with the rest of the character like a cedilla in French, Catalan or Portuguese, the ogonek in several languages, or the stroke on a Polish " Ł". Although these marks originally had no independent meaning, they have since acquired meaning in the field of mathematic ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Code Point
In character encoding terminology, a code point, codepoint or code position is a numerical value that maps to a specific character. Code points usually represent a single grapheme—usually a letter, digit, punctuation mark, or whitespace—but sometimes represent symbols, control characters, or formatting. The set of all possible code points within a given encoding/character set make up that encoding's ''codespace''. For example, the character encoding scheme ASCII comprises 128 code points in the range 0 hex to 7Fhex, Extended ASCII comprises 256 code points in the range 0hex to FFhex, and Unicode comprises code points in the range 0hex to 10FFFFhex. The Unicode code space is divided into seventeen planes (the basic multilingual plane, and 16 supplementary planes), each with (= 216) code points. Thus the total size of the Unicode code space is 17 ×  = . Definition The notion of a code point is used for abstraction, to distinguish both: * the num ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


CJKV
In internationalization, CJK characters is a collective term for the Chinese, Japanese, and Korean languages, all of which include Chinese characters and derivatives in their writing systems, sometimes paired with other scripts. Collectively, the CJK characters often include ''Hànzì'' in Chinese, ''Kanji'' and ''Kana'' in Japanese, ''Hanja'' and ''Hangul'' in Korean. Vietnamese can be included, making the abbreviation CJKV, as Vietnamese historically used Chinese characters in which they were known as ''Chữ Hán'' and ''Chữ Nôm'' in Vietnamese ('' Hán-Nôm'' altogether). Character repertoire Standard Mandarin Chinese and Standard Cantonese are written almost exclusively in Chinese characters. Over 3,000 characters are required for general literacy, with up to 40,000 characters for reasonably complete coverage. Japanese uses fewer characters—general literacy in Japanese can be expected with 2,136 characters. The use of Chinese characters in Korea is increasingly rare, a ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Cartesian Coordinate System
A Cartesian coordinate system (, ) in a plane is a coordinate system that specifies each point uniquely by a pair of numerical coordinates, which are the signed distances to the point from two fixed perpendicular oriented lines, measured in the same unit of length. Each reference coordinate line is called a ''coordinate axis'' or just ''axis'' (plural ''axes'') of the system, and the point where they meet is its ''origin'', at ordered pair . The coordinates can also be defined as the positions of the perpendicular projections of the point onto the two axes, expressed as signed distances from the origin. One can use the same principle to specify the position of any point in three-dimensional space by three Cartesian coordinates, its signed distances to three mutually perpendicular planes (or, equivalently, by its perpendicular projection onto three mutually perpendicular lines). In general, ''n'' Cartesian coordinates (an element of real ''n''-space) specify the point in an ' ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Big5
Big-5 or Big5 is a Chinese character encoding method used in Taiwan, Hong Kong, and Macau for traditional Chinese characters. The People's Republic of China (PRC), which uses simplified Chinese characters, uses the GB 18030 character set instead. Big5 gets its name from the consortium of five companies in Taiwan that developed it. Organization The original Big5 character set is sorted first by usage frequency, second by stroke count, lastly by Kangxi radical. The original Big5 character set lacked many commonly used characters. To solve this problem, each vendor developed its own extension. The ETen extension became part of the current Big5 standard through popularity. The structure of Big5 does not conform to the ISO 2022 standard, but rather bears a certain similarity to the encoding. It is a double-byte character set (DBCS) with the following structure: (the prefix 0x signifying hexadecimal numbers). Standard assignments (excluding vendor or user-defined extensions) ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Chinese Character Code For Information Interchange
The Chinese Character Code for Information Interchange () or CCCII is a character set developed by the Chinese Character Analysis Group in Taiwan. It was first published in 1980, and significantly expanded in 1982 and 1987. It is used mostly by library systems. It is one of the earliest established and most sophisticated encodings for traditional Chinese (predating the establishment of Big5 in 1984 and CNS 11643 in 1986). It is distinguished by its unique system for encoding simplified versions and other variants of its main set of hanzi characters. A variant of an earlier version of CCCII is used by the Library of Congress as part of MARC-8, under the name East Asian Character Code (EACC, ANSI/NISO Z39.64), where it comprises part of MARC 21's JACKPHY support. However, EACC contains fewer characters than the most recent versions of CCCII. Design Byte ranges CCCII is designed as an 94n set, as defined by ISO/IEC 2022. Each Chinese character is represented by a 3-byte code in ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Lossless
Lossless compression is a class of data compression that allows the original data to be perfectly reconstructed from the compressed data with no loss of information. Lossless compression is possible because most real-world data exhibits statistical redundancy. By contrast, lossy compression permits reconstruction only of an approximation of the original data, though usually with greatly improved compression rates (and therefore reduced media sizes). By operation of the pigeonhole principle, no lossless compression algorithm can efficiently compress all possible data. For this reason, many different algorithms exist that are designed either with a specific type of input data in mind or with specific assumptions about what kinds of redundancy the uncompressed data are likely to contain. Therefore, compression ratios tend to be stronger on human- and machine-readable documents and code in comparison to entropic binary data (random bytes). Lossless data compression is used in many ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Internet Draft
An Internet Draft (I-D) is a document published by the Internet Engineering Task Force (IETF) containing preliminary technical specifications, results of networking-related research, or other technical information. Often, Internet Drafts are intended to be work-in-progress documents for work that is eventually to be published as a Request for Comments (RFC) and potentially leading to an Internet Standard. It is considered inappropriate to rely on Internet Drafts for reference purposes. I-D citations should indicate the I-D is a ''work in progress''. An Internet Draft is expected to adhere to the basic requirements imposed on any RFC. An Internet Draft is only valid for six months unless it is replaced by an updated version. An otherwise expired draft remains valid while it is under official review by the Internet Engineering Steering Group The Internet Engineering Task Force (IETF) is a standards organization for the Internet and is responsible for the technical stand ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Mandarin Chinese
Mandarin (; ) is a group of Chinese (Sinitic) dialects that are natively spoken across most of northern and southwestern China. The group includes the Beijing dialect, the basis of the phonology of Standard Chinese, the official language of China. Because Mandarin originated in North China and most Mandarin dialects are found in the north, the group is sometimes referred to as Northern Chinese (). Many varieties of Mandarin, such as those of the Southwest (including Sichuanese) and the Lower Yangtze, are not mutually intelligible with the standard language (or are only partially intelligible). Nevertheless, Mandarin as a group is often placed first in lists of languages by number of native speakers (with nearly one billion). Mandarin is by far the largest of the seven or ten Chinese dialect groups; it is spoken by 70 percent of all Chinese speakers over a large geographical area that stretches from Yunnan in the southwest to Xinjiang in the northwest and Heilongjiang in ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]