Several systems have been proposed for describing the internal structure of
Chinese characters
Chinese characters are logographs used Written Chinese, to write the Chinese languages and others from regions historically influenced by Chinese culture. Of the four independently invented writing systems accepted by scholars, they represe ...
, including their strokes, components, and the
stroke order
Stroke order is the order in which the strokes of a Chinese character are written. A stroke is a movement of a writing instrument on a writing surface.
Basic principles
Chinese characters are logograms constructed with strokes. Over the ...
, and the location of each in the character's ideal square. This information is useful for identifying variants of characters that are unified into one code point by
Unicode
Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
and
ISO/IEC 10646
ISO/IEC JTC 1, entitled "Information technology", is a joint technical committee (JTC) of the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC). Its purpose is to develop, maintain and ...
, as well as to provide an alternative form of representation for rare characters that do not yet have a standardized encoding in Unicode. Many aim to work for
regular script
The regular script is the newest of the major Chinese script styles, emerging during the Three Kingdoms period , and stylistically mature by the 7th century. It is the most common style used in modern text. In its traditional form it is the t ...
, as well as to provide the character's internal structure which can be used for easier look-up of a character by indexing the character's internal make-up and cross-referencing among similar characters.
CDL
Character Description Language (CDL) is an XML-based
declarative language
In computer science, declarative programming is a programming paradigm—a style of building the structure and elements of computer programs—that expresses the logic of a computation without describing its control flow.
Many languages that app ...
co-created by Tom Bishop and Richard Cook for the
Wenlin Institute. It defines characters by the arrangement of components, which are not required to reflect the semantic or etymological history of the character. In order for a component to fit into the allotted portion of the whole character's square, A set of fewer than 50 strokes allow one to construct approximately 1,000 components, which may in turn describe tens of thousands of characters.
Ideographic Description Sequences
Chapter 18 of ''The Unicode Standard'' (version 15.0) defines the "Ideographic Description Sequences" (IDS) syntax used to describe characters in featural terms, by arrangements of components with code points. Sixteen special characters in the range U+2FF0..U+2FFF act as prefix operators to combine other characters or sequences to form larger characters.
Two additional ideographic description characters are scattered in other Unicode blocks. is not officially an ideographic description character, but is sometimes used in ideographic description sequences.
These sequences are useful in describing to the reader a character that is not directly printable, either because it is absent in a given font, or is absent from the Unicode standard altogether. For example, the
sawndip
(Sawndip: ; ) are Chinese characters used to write the Zhuang languages in the Chinese provinces of Guangxi and Yunnan. is a Standard Zhuang, Zhuang word that means "immature characters". The Zhuang word for Chinese characters used in the Chi ...
character encoded in
CJK Unified Ideographs Extension F
__FORCETOC__
CJK Unified Ideographs Extension F is a Unicode block containing rare and historic CJK ideographs for Chinese, Japanese, Korean, and Vietnamese, as well as more than a thousand Sawndip characters for writing the Zhuang language, which ...
as U+2DA21 can be described as . Another use is for dictionary lookup purposes, as a rough
input method
An input method (or input method editor, commonly abbreviated IME) is an operating system component or program that enables users to generate characters not natively available on their input devices by using sequences of characters (or mouse oper ...
for queries.
These sequences can be rendered either by keeping the individual characters separately or by parsing the Ideographic Description Sequence and drawing the ideograph so described. They do not, by themselves, provide unambiguous rendering for all characters. For instance, the sequence represents both with the middle bar being narrower, and with the middle bar being wider.
Unicode's specification for these sequences is based on the characters and syntax of the earlier
GBK encoding. Additional symbols are later encoded to fill in the missing combinations.
The IDSgrep free software package by Matthew Skala
extends Unicode's IDS syntax to include additional features for dictionary lookup; it is capable of converting KanjiVG's database to its own extended IDS format, or of searching EIDS files generated by the related Tsukurimashou font family.
See also
*
List of Shuowen Jiezi radicals
*
List of Kangxi radicals
A list is a set of discrete items of information collected and set forth in some format for utility, entertainment, or other purposes. A list may be memorialized in any number of ways, including existing only in the mind of the list-maker, but ...
*
List of Unicode radicals
The List of Unicode radicals comprises those Unicode characters that represent radical components of CJK characters, Tangut characters or Yi syllables. These are used primarily for indexing characters in dictionaries.
There are two CJK radical ...
*
Cangjie input method
The Cangjie input method (Tsang-chieh input method, sometimes called Changjie, Cang Jie, Changjei or Chongkit) is a system for entering Chinese characters into a computer using a standard computer keyboard. In filenames and elsewhere, the name C ...
*
Radical
Radical (from Latin: ', root) may refer to:
Politics and ideology Politics
*Classical radicalism, the Radical Movement that began in late 18th century Britain and spread to continental Europe and Latin America in the 19th century
*Radical politics ...
*
Stroke
Stroke is a medical condition in which poor cerebral circulation, blood flow to a part of the brain causes cell death. There are two main types of stroke: brain ischemia, ischemic, due to lack of blood flow, and intracranial hemorrhage, hemor ...
References
Citations
Works cited
*
*
*
*
**
*
*
{{Refend
Description languages
XML
Description languages