Chinese Language Technology
   HOME
*





Chinese Language Technology
Chinese computational linguistics is a subset of computational linguistics; it is the scientific study and information processing of the Chinese language by means of computers. The purpose is to obtain a better understanding of how the language works and to bring more convenience to language applications. The term ''Chinese computational linguistics'' is often employed interchangeably with Chinese information processing, though the former may sound more theoretical while the latter more technical. Rather than introducing computational linguistics in a general sense, this article will focus on the unique issues involved with implementing the Chinese language compared to other languages. The contents include Chinese character information processing, word segmentation, proper noun recognition, natural language understanding and generation, corpus linguistics, and machine translation. Chinese character information processing ''Chinese character Information Technology (IT)'' is ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Ethnologue
''Ethnologue: Languages of the World'' (stylized as ''Ethnoloɠue'') is an annual reference publication in print and online that provides statistics and other information on the living languages of the world. It is the world's most comprehensive catalogue of languages. It was first issued in 1951, and is now published by SIL International, an American Christian non-profit organization. Overview and content ''Ethnologue'' has been published by SIL International (formerly known as the Summer Institute of Linguistics), a Christian linguistic service organization with an international office in Dallas, Texas. The organization studies numerous minority languages to facilitate language development, and to work with speakers of such language communities in translating portions of the Bible into their languages. Despite the Christian orientation of its publisher, ''Ethnologue'' isn't ideologically or theologically biased. ''Ethnologue'' includes alternative names and autonyms, the ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Cantonese
Cantonese ( zh, t=廣東話, s=广东话, first=t, cy=Gwóngdūng wá) is a language within the Chinese (Sinitic) branch of the Sino-Tibetan languages originating from the city of Guangzhou (historically known as Canton) and its surrounding area in Southeastern China. It is the traditional prestige variety of the Yue Chinese dialect group, which has over 80 million native speakers. While the term ''Cantonese'' specifically refers to the prestige variety, it is often used to refer to the entire Yue subgroup of Chinese, including related but largely mutually unintelligible languages and dialects such as Taishanese. Cantonese is viewed as a vital and inseparable part of the cultural identity for its native speakers across large swaths of Southeastern China, Hong Kong and Macau, as well as in overseas communities. In mainland China, it is the ''lingua franca'' of the province of Guangdong (being the majority language of the Pearl River Delta) and neighbouring areas such as Guang ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Natural Language Processing
Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to process and analyze large amounts of natural language data. The goal is a computer capable of "understanding" the contents of documents, including the contextual nuances of the language within them. The technology can then accurately extract information and insights contained in the documents as well as categorize and organize the documents themselves. Challenges in natural language processing frequently involve speech recognition, natural-language understanding, and natural-language generation. History Natural language processing has its roots in the 1950s. Already in 1950, Alan Turing published an article titled "Computing Machinery and Intelligence" which proposed what is now called the Turing test as a criterion of intelligence, t ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Computational Linguistics
Computational linguistics is an Interdisciplinarity, interdisciplinary field concerned with the computational modelling of natural language, as well as the study of appropriate computational approaches to linguistic questions. In general, computational linguistics draws upon linguistics, computer science, artificial intelligence, mathematics, logic, philosophy, cognitive science, cognitive psychology, psycholinguistics, anthropology and neuroscience, among others. Sub-fields and related areas Traditionally, computational linguistics emerged as an area of artificial intelligence performed by computer scientists who had specialized in the application of computers to the processing of a natural language. With the formation of the Association for Computational Linguistics (ACL) and the establishment of independent conference series, the field consolidated during the 1970s and 1980s. The Association for Computational Linguistics defines computational linguistics as: The term "comp ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Point (typography)
In typography, the point is the smallest unit of measure. It is used for measuring font size, leading, and other items on a printed page. The size of the point has varied throughout printing's history. Since the 18th century, the size of a point has been between 0.18 and 0.4 millimeters. Following the advent of desktop publishing in the 1980s and 1990s, digital printing has largely supplanted the letterpress printing and has established the DTP point (DeskTop Publishing point) as the ''de facto'' standard. The DTP point is defined as of an international inch () and, as with earlier American point sizes, is considered to be of a pica. In metal type, the point size of the font describes the height of the metal body on which the typeface's characters were cast. In digital type, letters of a font are designed around an imaginary space called an '' em square''. When a point size of a font is specified, the font is scaled so that its em square has a side length of that parti ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Fonts
In metal typesetting, a font is a particular size, weight and style of a typeface. Each font is a matched set of type, with a piece (a "sort") for each glyph. A typeface consists of a range of such fonts that shared an overall design. In modern usage, with the advent of computer fonts, the term "font" has come to be used as a synonym for "typeface", although a typical typeface (or "font family") consists of a number of fonts. For instance, the typeface "Bauer Bodoni" (sample shown here) includes fonts "Roman" (or "Regular"), "Bold" and ''" Italic"''; each of these exists in a variety of sizes. The term "font" is correctly applied to any one of these alone but may be seen used loosely to refer to the whole typeface. When used in computers, each style is in a separate digital "font file". In both traditional typesetting and modern usage, the word "font" refers to the delivery mechanism of the typeface. In traditional typesetting, the font would be made from metal or wood type: t ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Kangxi Radicals
The 214 Kangxi radicals (), also known as the Zihui radicals, form a system of radicals () of Chinese characters. The radicals are numbered in stroke count order. They are the most popular system of radicals for dictionaries that order Traditional Chinese characters (''hanzi'', ''hanja'', ''kanji'', ''chữ hán'') by radical and stroke count. They are officially part of the Unicode encoding system for CJKV characters, in their standard order, under the coding block "Kangxi radicals", while their graphic variants are contained in the "CJK Radicals Supplement". Thus, a reference to "radical 61", for example, without additional context, refers to the 61st radical of the ''Kangxi Dictionary'', 心; ''xīn'' "heart". Originally introduced in the 1615 ''Zihui'' (字彙), they are more commonly named in relation to the ''Kangxi Dictionary'' of 1716 ('' Kāngxī'' being the era name for 1662–1723). The 1915 encyclopedic word dictionary ''Ciyuan'' (辭源) also uses this syste ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Big5
Big-5 or Big5 is a Chinese character encoding method used in Taiwan, Hong Kong, and Macau for traditional Chinese characters. The People's Republic of China (PRC), which uses simplified Chinese characters, uses the GB 18030 character set instead. Big5 gets its name from the consortium of five companies in Taiwan that developed it. Organization The original Big5 character set is sorted first by usage frequency, second by stroke count, lastly by Kangxi radical. The original Big5 character set lacked many commonly used characters. To solve this problem, each vendor developed its own extension. The ETen extension became part of the current Big5 standard through popularity. The structure of Big5 does not conform to the ISO 2022 standard, but rather bears a certain similarity to the encoding. It is a double-byte character set (DBCS) with the following structure: (the prefix 0x signifying hexadecimal numbers). Standard assignments (excluding vendor or user-defined extensions) ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Traditional Characters
Traditional Chinese characters are one type of standard Chinese characters, Chinese character sets of the contemporary written Chinese. The traditional characters had taken shapes since the libian, clerical change and mostly remained in the same structure they took at the introduction of the regular script in the 2nd century. Over the following centuries, traditional characters were regarded as the standard form of printed Chinese characters or Classical Chinese, literary Chinese Adoption of Chinese literary culture, throughout the Sinosphere until the middle of the 20th century, before different script reforms initiated by Chinese family of scripts, countries using Chinese characters as a writing system. Traditional Chinese characters remain in common use in Taiwan, Hong Kong and Macau, as well as in most overseas Chinese communities outside Southeast Asia; in addition, Hanja in Korean language#Writing system, Korean language remains virtually identical to traditional charac ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Simplified Chinese Characters
Simplified Chinese characters are standardized Chinese characters used in mainland China, Malaysia and Singapore, as prescribed by the ''Table of General Standard Chinese Characters''. Along with traditional Chinese characters, they are one of the two standard character sets of the contemporary Chinese written language. The Government of China, government of the People's Republic of China in mainland China has promoted them for use in printing since the 1950s and 1960s to encourage literacy. They are officially used in the China, People's Republic of China, Malaysia and Singapore, while traditional Chinese characters still remain in common use in Hong Kong, Macau, Taiwan, ROC/Taiwan and Japan to a certain extent. Simplified Chinese characters may be referred to by their official name above or colloquially . In its broadest sense, the latter term refers to all characters that have undergone simplifications of character "structure" or "body", some of which have existed for mille ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Radical (Chinese Characters)
A Chinese radical () or indexing component is a graphical component of a Chinese character under which the character is traditionally listed in a Chinese dictionary. This component is often a semantic indicator similar to a morpheme, though sometimes it may be a phonetic component or even an artificially extracted portion of the character. In some cases the original semantic or phonological connection has become obscure, owing to changes in character meaning or pronunciation over time. The English term "radical" is based on an analogy between the structure of characters and inflection of words in European languages. Radicals are also sometimes called "classifiers", but this name is more commonly applied to grammatical classifiers (measure words). History In the earliest Chinese dictionaries, such as the '' Erya'' (3rd century BC), characters were grouped together in broad semantic categories. Because the vast majority of characters are phono-semantic compounds (), comb ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]