HOME





Chinese Computational Linguistics
Chinese computational linguistics is a subset of computational linguistics; it is the scientific study and information processing of the Chinese language by means of computers. The purpose is to obtain a better understanding of how the language works and to bring more convenience to language applications. The term ''Chinese computational linguistics'' is often employed interchangeably with Chinese information processing, though the former may sound more theoretical while the latter more technical. Rather than introducing computational linguistics in a general sense, this article will focus on the unique issues involved with implementing the Chinese language compared to other languages. The contents include Chinese character information processing, word segmentation, proper noun recognition, natural language understanding and generation, corpus linguistics, and machine translation. Chinese character information processing ''Chinese character Information Technology (IT)'' is the t ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Ethnologue
''Ethnologue: Languages of the World'' is an annual reference publication in print and online that provides statistics and other information on the living languages of the world. It is the world's most comprehensive catalogue of languages. It was first issued in 1951 and is now published by SIL International, an American evangelical Parachurch organization, Christian non-profit organization. Overview and content ''Ethnologue'' has been published by SIL Global (formerly known as the Summer Institute of Linguistics), a Christian linguistics, linguistic service organization with an international office in Dallas, Texas. The organization studies numerous minority languages to facilitate language development, and to work with speakers of such language communities in translating portions of the Bible into their languages. Despite the Christian orientation of its publisher, ''Ethnologue'' is not ideologically or theologically biased. ''Ethnologue'' includes alternative names and Exo ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Cangjie Input Method
The Cangjie input method (Tsang-chieh input method, sometimes called Changjie, Cang Jie, Changjei or Chongkit) is a system for entering Chinese characters into a computer using a standard computer keyboard. In filenames and elsewhere, the name Cangjie is sometimes abbreviated as cj. The input method was invented in 1976 by Chu Bong-Foo, and named after Cangjie (Tsang-chieh), the mythological inventor of the Chinese writing system, at the suggestion of Chiang Wei-kuo, the former Defense Minister of Taiwan. Chu Bong-Foo released the patent for Cangjie in 1982, as he thought that the method should belong to Chinese culture, Chinese cultural heritage. Therefore, Cangjie has become open-source software and is on every computer system that supports traditional Chinese characters, and it has been extended so that Cangjie is compatible with the Simplified Chinese characters, simplified Chinese character set. Cangjie is the first Chinese input method to use the QWERTY keyboard. Chu saw ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Natural Language Processing
Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related to information retrieval, knowledge representation and computational linguistics, a subfield of linguistics. Major tasks in natural language processing are speech recognition, text classification, natural-language understanding, natural language understanding, and natural language generation. History Natural language processing has its roots in the 1950s. Already in 1950, Alan Turing published an article titled "Computing Machinery and Intelligence" which proposed what is now called the Turing test as a criterion of intelligence, though at the time that was not articulated as a problem separate from artificial intelligence. The proposed test includes a task that involves the automated interpretation and generation of natural language ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Computational Linguistics
Computational linguistics is an interdisciplinary field concerned with the computational modelling of natural language, as well as the study of appropriate computational approaches to linguistic questions. In general, computational linguistics draws upon linguistics, computer science, artificial intelligence, mathematics, logic, philosophy, cognitive science, cognitive psychology, psycholinguistics, anthropology and neuroscience, among others. Computational linguistics is closely related to mathematical linguistics. Origins The field overlapped with artificial intelligence since the efforts in the United States in the 1950s to use computers to automatically translate texts from foreign languages, particularly Russian scientific journals, into English. Since rule-based approaches were able to make arithmetic (systematic) calculations much faster and more accurately than humans, it was expected that lexicon, morphology, syntax and semantics can be learned using explicit rules, a ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Journal Of Chinese Information Processing
''Journal of Chinese Information Processing'' () is the journal of Chinese Information Processing Society of China. It was founded in 1986 and has been focused on publishing academic papers on the basic theory and applied technology of Chinese information processing, as well as related overviews, research results, technical reports, book reviews, special discussions, domestic and foreign academic trends, etc. It aims to reflect the development and academic trends in the field of Chinese information processing in a timely manner. ''Journal of Chinese Information Processing'' has long been included in many important domestic and foreign databases such as the Chinese Science Citation Database (CSCD), Chinese Core Journals, and Chinese Science and Technology Core Journals. Its contents represent the advanced level of Chinese information processing in China. History * In 1986, ''Journal of Chinese Information Processing'' was founded. * In 1987, the publication period was changed f ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Point (typography)
In typography, the point is the smallest unit of measure. It is used for measuring font size, leading, and other items on a printed page. The size of the point has varied throughout printing's history. Since the 18th century, the size of a point has been between 0.18 and 0.4  millimeters. Following the advent of desktop publishing in the 1980s and 1990s, digital printing has largely supplanted the letterpress printing and has established the desktop publishing (DTP) point as the ''de facto'' standard. The DTP point is defined as of an inch (or exactly 0.352  mm) and, as with earlier American point sizes, is considered to be of a pica. In metal type, the point size of a font describes the height of the metal body on which that font's characters were cast. In digital type, letters of a computer font are designed around an imaginary space called an '' em square''. When a point size of a font is specified, the font is scaled so that its em square has a side length ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Fonts
In movable type, metal typesetting, a font is a particular #Characteristics, size, weight and style of a ''typeface'', defined as the set of fonts that share an overall design. For instance, the typeface Bauer Bodoni (shown in the figure) includes fonts "Roman (typeface), Roman" (or "regular"), "" and ""; each of these exists in a variety of Font size, sizes. In the digital description of fonts (computer fonts), the terms "font" and "typeface" are often used interchangeably. For example, when used in computers, each style is stored in a separate digital font file. In both traditional typesetting and computing, the word "font" refers to the delivery mechanism of an instance of the typeface. In traditional typesetting, the font would be made from metal or wood type: to compose a page may require multiple fonts from the typeface or even multiple typefaces. Spelling and etymology The word ''font'' (US) or ''fount'' (traditional UK, CAN; in any case pronounced ) derives from Mid ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Kangxi Radicals
The ''Kangxi'' radicals (), also known as ''Zihui'' radicals, are a set of 214 radicals that were collated in the 18th-century '' Kangxi Dictionary'' to aid categorization of Chinese characters. They are primarily sorted by stroke count. They are the most popular system of radicals for dictionaries that order characters by radical and stroke count. They are encoded in Unicode alongside other CJK characters, under the block "Kangxi radicals", while graphical variants are included in the block "CJK Radicals Supplement". Originally introduced in the ''Zihui'' dictionary of 1615, they are more commonly referred to in relation to the 1716 ''Kangxi Dictionary''—''Kangxi'' being the commissioning emperor's Chinese era name, era name. The 1915 encyclopedic word dictionary ''Ciyuan'' also uses this system. In modern times, many dictionaries that list Traditional Chinese head characters continue to use this system, for example the ''Wang Li (linguist), Wang Li Character Dictionary of ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Big5
Big-5 or Big5 ( zh, t=大五碼) is a Chinese character encoding method used in Taiwan, Hong Kong, and Macau for traditional Chinese characters. The People's Republic of China (PRC), which uses simplified Chinese characters, uses the GB 18030 character set instead (though it can also substitute Big-5 or UTF-8). Big5 gets its name from the consortium of five companies in Taiwan that developed it. Encoding The original Big5 character set is sorted first by usage frequency, second by stroke count, lastly by Kangxi radical. The original Big5 character set lacked many commonly used characters. To solve this problem, each vendor developed its own extension. The ETen extension became part of the current Big5 standard through popularity. The structure of Big5 does not conform to the ISO 2022 standard, but rather bears a certain similarity to the encoding. It is a double-byte character set (DBCS) with the following structure: (the prefix 0x signifying hexadecimal numbers). Sta ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


GB18030
GB 18030 is a Chinese government standard, described as ''Information Technology — Chinese coded character set'' and defines the required language and character support necessary for software in China. GB18030 is the registered Internet name for the official character set of the People's Republic of China (PRC) superseding GB2312. As a Unicode Transformation Format (i.e. an encoding of all Unicode code points), GB18030 supports both simplified and traditional Chinese characters. It is also compatible with legacy encodings including GB/T 2312, CP936, and GBK 1.0. The Unicode Consortium has warned implementers that the latest version of this Chinese standard, GB 18030-2022, introduces what they describe as "disruptive changes" from the previous version GB 18030-2005 "involving 33 different characters and 55 code positions". GB 18030-2022 was enforced from 1 August 2023. It has been implemented in ICU 73.2; and in Java 21, and backported to older Jav ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Traditional Characters
Traditional Chinese characters are a standard set of Chinese character forms used to write Chinese languages. In Taiwan, the set of traditional characters is regulated by the Ministry of Education and standardized in the ''Standard Form of National Characters''. These forms were predominant in written Chinese until the middle of the 20th century, when various countries that use Chinese characters began standardizing simplified sets of characters, often with characters that existed before as well-known variants of the predominant forms. Simplified characters as codified by the People's Republic of China are predominantly used in mainland China, Malaysia, and Singapore. "Traditional" as such is a retronym applied to non-simplified character sets in the wake of widespread use of simplified characters. Traditional characters are commonly used in Taiwan, Hong Kong, and Macau, as well as in most overseas Chinese communities outside of Southeast Asia. As for non-Chinese languages ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Simplified Chinese Characters
Simplified Chinese characters are one of two standardized Chinese characters, character sets widely used to write the Chinese language, with the other being traditional characters. Their mass standardization during the 20th century was part of an initiative by the People's Republic of China (PRC) to promote literacy, and their use in ordinary circumstances on the mainland has been encouraged by the Chinese government since the 1950s. They are the official forms used in mainland China, Malaysia, and Singapore, while traditional characters are officially used in Hong Kong, Macau, and Taiwan. Simplification of a component—either a character or a sub-component called a Radical (Chinese characters), radical—usually involves either a reduction in its total number of Chinese character strokes, strokes, or an apparent streamlining of which strokes are chosen in what places—for example, the radical used in the traditional character is simplified to to form the simplified charac ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]