HOME



picture info

Unihan
Han unification is an effort by the authors of Unicode and the Universal Character Set to map multiple character sets of the Han characters of the so-called CJK languages into a single set of unified characters. Han characters are a feature shared in common by written Chinese (hanzi), Japanese (kanji), Korean (hanja) and Vietnamese (chữ Hán). Modern Chinese, Japanese and Korean typefaces typically use regional or historical variants of a given Han character. In the formulation of Unicode, an attempt was made to unify these variants by considering them as allographsdifferent glyphs representing the same "grapheme" or orthographic unit hence, "Han unification", with the resulting character repertoire sometimes contracted to Unihan. Nevertheless, many characters have regional variants assigned to different code points, such as Traditional (U+500B) versus Simplified (U+4E2A). Rationale and controversy The Unicode Standard details the principles of Han unification. ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

CJK Unified Ideographs
The Chinese, Japanese and Korean (CJK) scripts share a common background, collectively known as CJK characters. During the process called Han unification, the common (shared) characters were identified and named CJK Unified Ideographs. As of Unicode , Unicode defines a total of 97,680 characters. The term ''ideographs'' is a misnomer, as the Chinese script is not ideographic but rather logographic. Until the early 20th century, Vietnam also used Chinese characters (Chữ Nôm), so sometimes the abbreviation CJKV is used. Sources The Ideographic Research Group (IRG) is responsible for developing extensions to the encoded repertoires of CJK unified ideographs. IRG processes proposals for new CJK unified ideographs submitted by its member bodies, and after undergoing several rounds of expert review, IRG submits a consolidated set of characters to ISO/IEC JTC 1/SC 2 Working Group 2 (WG2) and the Unicode Technical Committee (UTC) for consideration for inclusion in the ISO/IEC 10 ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Simplified Chinese Characters
Simplified Chinese characters are one of two standardized Chinese characters, character sets widely used to write the Chinese language, with the other being traditional characters. Their mass standardization during the 20th century was part of an initiative by the People's Republic of China (PRC) to promote literacy, and their use in ordinary circumstances on the mainland has been encouraged by the Chinese government since the 1950s. They are the official forms used in mainland China, Malaysia, and Singapore, while traditional characters are officially used in Hong Kong, Macau, and Taiwan. Simplification of a component—either a character or a sub-component called a Radical (Chinese characters), radical—usually involves either a reduction in its total number of Chinese character strokes, strokes, or an apparent streamlining of which strokes are chosen in what places—for example, the radical used in the traditional character is simplified to to form the simplified charac ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Unicode
Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Character (computing), characters and 168 script (Unicode), scripts used in various ordinary, literary, academic, and technical contexts. Unicode has largely supplanted the previous environment of a myriad of incompatible character sets used within different locales and on different computer architectures. The entire repertoire of these sets, plus many additional characters, were merged into the single Unicode set. Unicode is used to encode the vast majority of text on the Internet, including most web pages, and relevant Unicode support has become a common consideration in contemporary software development. Unicode is ultimately capable of encoding more than 1.1 million characters. The Unicode character repertoire is synchronized with Univers ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




CEDICT
The CEDICT project was started by Paul Denisowski in 1997 and is maintained by a team on mdbg.net under the name CC-CEDICT, with the aim to provide a complete Chinese to English dictionary with pronunciation in pinyin for the Chinese characters. Content CEDICT is a text file; other programs (or simply Notepad or egrep or equivalent) are needed to search and display it. This project is used by several other Chinese-English projects. The Unihan Database uses CEDICT data for most of its information about character compounds, but this is auxiliary and is explicitly not a part of the main Unicode database. Features: * Traditional Chinese and Simplified Chinese * Pinyin (several pronunciations) * American English (several) * , it had 122,444 entries in UTF-8. The basic format of a CEDICT entry is: Traditional Simplified in1 yin1/American English equivalent 1/equivalent 2/ 漢字 汉字 an4 zi4/Chinese character/CL:個, 个/ Example of a simple egrep search: $ egrep -i 有� ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Hanzi
Chinese characters are logographs used to write the Chinese languages and others from regions historically influenced by Chinese culture. Of the four independently invented writing systems accepted by scholars, they represent the only one that has remained in continuous use. Over a documented history spanning more than three millennia, the function, style, and means of writing characters have changed greatly. Unlike letters in alphabets that reflect the sounds of speech, Chinese characters generally represent morphemes, the units of meaning in a language. Writing all of the frequently used vocabulary in a language requires roughly 2000–3000 characters; , nearly have been identified and included in ''The Unicode Standard''. Characters are created according to several principles, where aspects of shape and pronunciation may be used to indicate the character's meaning. The first attested characters are oracle bone inscriptions made during the 13th century BCE in what ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Ideograms
An ideogram or ideograph (from Greek 'idea' + 'to write') is a symbol that is used within a given writing system to represent an idea or concept in a given language. (Ideograms are contrasted with phonograms, which indicate sounds of speech and thus are independent of any particular language.) Some ideograms are more arbitrary than others: some are only meaningful assuming preexisting familiarity with some convention; others more directly resemble their signifieds. Ideograms that represent physical objects by visually illustrating them are called ''pictograms''. * Numerals and mathematical symbols are ideograms, for example ⟨1⟩ 'one', ⟨2⟩ 'two', ⟨+⟩ 'plus', and ⟨=⟩ 'equals'. * The ampersand ⟨&⟩ is used in many languages to represent the word ''and'', originally a stylized ligature of the Latin word . * Other typographical examples include ⟨§⟩ 'section', ⟨€⟩ 'euro', ⟨£⟩ 'pound sterling', and ⟨©⟩ 'copyright'. Terminology Logograms ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Ideographic Research Group
The Ideographic Research Group (IRG), formerly called the Ideographic Rapporteur Group, is a subgroup of Working Group 2 (WG2) of ISO/IEC JTC1 Subcommittee 2 (SC2), which is the committee responsible for developing the Universal Coded Character Set (ISO/IEC 10646). IRG is tasked with preparing and reviewing sets of CJK unified ideographs for eventual inclusion in both ISO/IEC 10646 and ''The Unicode Standard''. The IRG is composed of representatives from national standards bodies from China, Japan, South Korea, Vietnam, and other regions that have historically used Chinese characters, as well as experts from liaison organizations such as the SAT Daizōkyō Text Database Committee (SAT), Taipei Computer Association (TCA), and the Unicode Technical Committee (UTC). The group holds two meetings every year lasting 4-5 days each, subsequently reporting its activities to its parent ISO/IEC JTC 1/SC 2 (SC2/WG2) committee. History The precursor to the IRG was the CJK Joint Research Grou ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Glyphs
A glyph ( ) is any kind of purposeful mark. In typography, a glyph is "the specific shape, design, or representation of a character". It is a particular graphical representation, in a particular typeface, of an element of written language. A grapheme, or part of a grapheme (such as a diacritic), or sometimes several graphemes in combination (a composed glyph) can be represented by a glyph. Glyphs, graphemes and characters In modern English, symbols like letters and numerical digits are each both single graphemes and single glyphs. In most languages written in any variety of the Latin alphabet except English, the use of diacritics to signify a sound mutation is common. For example, the grapheme requires two glyphs: the basic and the grave accent . In general, a diacritic is regarded as a glyph, even if it is contiguous with the rest of the character like a cedilla in French, Catalan or Portuguese, the ogonek in several languages, or the stroke on a Polish . Although th ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Japan Electronic Industries Development Association
The (Formerly ) was an industry research, development, and standards body for electronics in Japan. It was merged with EIAJ to form JEITA on November 1, 2000. JEIDA was similar to SEMATECH of the US, ECMA of Europe. JEIDA developed a number of standards, including the JEIDA memory card, and the Exif Exchangeable image file format (officially Exif, according to JEIDA/JEITA/CIPA specifications) is a standard that specifies formats for images, sound, and ancillary tags used by digital cameras (including smartphones), scanners and other system ... graphical file format. History The association was established as Ryoko Communications Association Co., Ltd. in 1967. In 1989, Ryoko Communications Association Co., Ltd. was re-branded into Japan Electronic Industries Development Association. In 2000, JEIDA became a Pending merger with EIAJ and was Reorganized into JEITA. External links JEITA Press Releases: JEITA inaugurated today, on November 1, 2000 Electronics industry ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Traditional Chinese Characters
Traditional Chinese characters are a standard set of Chinese character forms used to written Chinese, write Chinese languages. In Taiwan, the set of traditional characters is regulated by the Ministry of Education (Taiwan), Ministry of Education and standardized in the ''Standard Form of National Characters''. These forms were predominant in written Chinese until the middle of the 20th century, when various Chinese family of scripts, countries that use Chinese characters began standardizing simplified sets of characters, often with characters that existed before as well-known variant Chinese characters, variants of the predominant forms. Simplified characters as codified by the People's Republic of China are predominantly used in mainland China, Malaysia, and Singapore. "Traditional" as such is a retronym applied to non-simplified character sets in the wake of widespread use of simplified characters. Traditional characters are commonly used in Taiwan, Hong Kong, and Macau, as ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Code Point
A code point, codepoint or code position is a particular position in a Table (database), table, where the position has been assigned a meaning. The table may be one dimensional (a column), two dimensional (like cells in a spreadsheet), three dimensional (sheets in a workbook), etc... in any number of dimensions. Technically, a code point is a unique position in a quantized n-dimensional space, where the position has been assigned a semantic meaning. The table has discrete (whole) and positive positions (1, 2, 3, 4, but not fractions). Code points are used in a multitude of formal information processing and telecommunication standards.ETSI TS 101 773 (section 4), https://www.etsi.org/deliver/etsi_ts/101700_101799/101773/01.02.01_60/ts_101773v010201p.pdf For example ITU-T Recommendation T.35 contains a set of country codes for telecommunications equipment (originally fax machines) which allow equipment to indicate its country of manufacture or operation. In T.35, Argentina is repre ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]