HOME

TheInfoList



OR:

The Chinese character description languages are several proposed languages to most accurately and completely describe Chinese (or CJK) characters and information such as their list of components, list of strokes (basic and complex), their order, and the location of each of them on a background empty square. They are designed to overcome the inherent lack of information within a bitmap description. This enriched information can be used to identify variants of characters that are unified into one code point by
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, wh ...
and
ISO/IEC 10646 ISO/IEC JTC 1, entitled "Information technology", is a joint technical committee (JTC) of the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC). Its purpose is to develop, maintain and pr ...
, as well as to provide an alternative form of representation for rare characters that do not yet have a standardized encoding in Unicode or ISO/IEC 10646. Many aim to work for Kaishu style and Song style, as well as to provide the character's internal structure which can be used for easier look-up of a character by indexing the character's internal make-up and cross-referencing among similar characters.


CDL

Character Description Language is a font technology, based on
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable ...
, co-created by Tom Bishop and Richard Cook for Wenlin Institute, Inc, designed for describing any
CJK character In internationalization, CJK characters is a collective term for the Chinese, Japanese, and Korean languages, all of which include Chinese characters and derivatives in their writing systems, sometimes paired with other scripts. Collectively, th ...
, but suitable for describing any glyph. This XML-based
declarative language In computer science, declarative programming is a programming paradigm—a style of building the structure and elements of computer programs—that expresses the logic of a computation without describing its control flow. Many languages that a ...
defines the stroke order of each component (a subunit of the glyph similar to a radical, but not necessarily bearing the semantic significance of a true radical), as well as assembly of previously defined components to build up ever more complex characters. Many of these components are characters in their own right, in addition to serving as building-block components. The background looks like a square of 128
pixel In digital imaging, a pixel (abbreviated px), pel, or picture element is the smallest addressable element in a raster image, or the smallest point in an all points addressable display device. In most digital display devices, pixels are the ...
s on each side. In this background: # Each of about 50 strokes can be drawn in SVG. # A basic component is composed by calling several strokes. In this component, each stroke is described by its bottom-left and top-right corner. Transformations are possible (reduction, enlargement, etc.). There are more than 1,000 basic components. # A character is composed by calling several components. In this character, each component is described by its bottom-left and top-right corner. In order for a component to fit into its proper portion of the Chinese character's rectangular block, a component may be transformed (e.g., horizontal or
vertical Vertical is a geometric term of location which may refer to: * Vertical direction, the direction aligned with the direction of the force of gravity, up or down * Vertical (angles), a pair of angles opposite each other, formed by two intersecting s ...
reduction or enlargement) upon its use as a building-block embedded within a containing more-complex character. Accordingly, a set of less than 50 strokesBishop & Cook 2013-12-31:p2 allow one to construct a set of about 1,000 componentsBishop & Cook 2013-12-31:p9 which may in turn be embedded within tens of thousands of characters' descriptions. A change in the shape of one of the 50 basic strokes is implicitly applied within each character that embeds that stroke. Likewise, a change to a component is implicitly applied within each and all characters whose assemblage uses that component. T. Bishop and R. Cook explain this as follows: nearly 100,000 Chinese characters have been described via CDL.Wenlin Institute webpage for CDL
/ref>


HanGlyph

A character description language intended for supplying missing rare characters in documents (addressing the Chinese equivalent of the
gaiji are the logographic Chinese characters taken from the Chinese script and used in the writing of Japanese. They were made a major part of the Japanese writing system during the time of Old Japanese and are still used, along with the subseque ...
problem). Documents can contain markup for missing characters, which will automatically trigger the generation of small fonts to provide the characters. The language itself is a simple postfix notation describing strokes and ways to combine them. The prototype software uses
Metapost MetaPost refers to both a programming language and the interpreter of the MetaPost programming language. Both are derived from Donald Knuth's Metafont language and interpreter. MetaPost produces vector graphic diagrams from a geometric/algebra ...
to render the characters and embed them in
LaTeX Latex is an emulsion (stable dispersion) of polymer microparticles in water. Latexes are found in nature, but synthetic latexes are common as well. In nature, latex is found as a milky fluid found in 10% of all flowering plants (angiosperms ...
documents. The language was presented by Wai Wong in 1997, and papers about its implementation in Metapost and LaTeX appeared at TeX user group conferences in 2003.


Ideographic Description Sequences

Chapter 12 of the Unicode specificationhttps://www.unicode.org/versions/Unicode6.0.0/ch12.pdf defines a syntax for "Ideographic Description Sequences" (IDSes) intended for use in describing characters not included in the standard in terms of combinations of components that do have code points. Twelve special characters in the range U+2FF0 to U+2FFB act as prefix operators to combine other characters or sequences to form larger characters. These sequences are useful in describing to the reader a character that is not directly printable, either because it is absent in a given font, or is absent from the Unicode standard altogether. For example, the
Sawndip Zhuang characters or ''Sawndip'' (Sawndip: ; ) are logograms derived from Chinese characters and used by the Zhuang people of Guangxi and Yunnan provinces in China to write the Zhuang languages for more than one thousand years. The script is used ...
character "" (encoded in
CJK Unified Ideographs Extension F CJK Unified Ideographs Extension F is a Unicode block containing rare and historic CJK ideographs for Chinese, Japanese, Korean, and Vietnamese, as well as more than a thousand Sawndip characters for writing the Zhuang language The Zhuang la ...
as U+2DA21 𭨡) can be described as "⿰書史". Another use is for dictionary lookup purposes, as a sort of rough
input method An input method (or input method editor, commonly abbreviated IME) is an operating system component or program that enables users to generate characters not natively available on their input devices by using sequences of characters (or mouse o ...
for queries. These sequences can be rendered either by keeping the individual characters separately or by parsing the Ideographic Description Sequence and drawing the ideograph so described. They do not, by themselves, provide unambiguous rendering for all characters. For instance, the sequence ⿱十一 represents both ("soil", the middle bar being narrower) and ("bachelor", the middle bar being wider). Unicode's specification for these sequences is based on the characters and syntax of the earlier GBK standard. The IDSgrep free software package by Matthew Skala extends Unicode's IDS syntax to include additional features for dictionary lookup; it is capable of converting KanjiVG's database to its own extended IDS format, or of searching EIDS files generated by the related Tsukurimashou font family.


KanjiVG

KanjiVG (Kanji Vector Graphics) is a free, Creative Commons-licensed Japanese character description language (intended to eventually expand to Chinese as well) based on the SVG
vector graphics Vector graphics is a form of computer graphics in which visual images are created directly from geometric shapes defined on a Cartesian plane, such as points, lines, curves and polygons. The associated mechanisms may include vector display ...
format.


SCML

In 2007, Structural Character Modeling Language was proposed as a different kind of XML-based Chinese-character description language whose positioning is not based on a numerical grid, as CDL and HanGlyph are. The known database of characters whose strokes and components are encoded in SCML is for demonstration-of-principle only; no known effort exists to attempt to encode, say, all of Unicode's CJK characters in SCML.


See also

*
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, wh ...
*
List of Shuowen Jiezi radicals The list of ''Shuowen Jiezi'' radicals consists of the 540 radical (Chinese characters), radicals used to index ''Shuowen Jiezi'', created by lexicographer Xu Shen. The 540 radicals are shown below.Donald Sturgeon, 《說文解字》 electronic ed ...
, a system of 540 components used by
Xu Shen Xu Shen ( CE) was a Chinese calligrapher, philologist, politician, and writer of the Eastern Han Dynasty (25-189). He was born in the Zhaoling district of Run'an prefecture (today known as Luohe in Henan Province). During his own lifetime, ...
(d. ≈147 AD) in his
Shuowen Jiezi ''Shuowen Jiezi'' () is an ancient Chinese dictionary from the Han dynasty. Although not the first comprehensive Chinese character dictionary (the ''Erya'' predates it), it was the first to analyze the structure of the characters and to give t ...
*
List of Kangxi radicals A ''list'' is any set of items in a row. List or lists may also refer to: People * List (surname) Organizations * List College, an undergraduate division of the Jewish Theological Seminary of America * SC Germania List, German rugby unio ...
, a system of 214 components used by the
Kangxi dictionary The ''Kangxi Dictionary'' ( (Compendium of standard characters from the Kangxi period), published in 1716, was the most authoritative dictionary of Chinese characters from the 18th century through the early 20th. The Kangxi Emperor of the Qing ...
(1716), made under the leadership of the
Kangxi Emperor The Kangxi Emperor (4 May 1654– 20 December 1722), also known by his temple name Emperor Shengzu of Qing, born Xuanye, was the third emperor of the Qing dynasty, and the second Qing emperor to rule over China proper, reigning from 1661 to ...
*
List of unicode radicals The List of Unicode radicals comprises those Unicode characters that represent radical components of CJK characters, Tangut characters or Yi syllables. These are used primarily for indexing characters in dictionaries. There are two CJK radical ...
, a modern and computer-based ongoing attempt to create a complete and accurate set of CJK component list, led by
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, wh ...
. * Cangjie input method * Radical * Stroke * Stroke order


Notes


External links

;CDL language from Wenlin Institute * * * * **2003/12/31 correction: * *
Digital Humanities Start-up Grant from the U.S. National Endowment for the Humanities
;SCML * ;HanGlyph * * {{citation , title=HanGlyph – a Chinese Character Description Language - Reference Manual , url=http://www.hanglyph.com/en/hanglyph/reference.pdf , date=13 September 2003 , pages=31 , access-date=11 December 2007 , archive-url=https://web.archive.org/web/20160304185736/http://www.hanglyph.com/en/hanglyph/reference.pdf , archive-date=4 March 2016 , url-status=dead Chinese characters XML