History
InChinese character word-segmented writing
Chinese is usually written in Chinese characters, so Chinese word segmented writing mainly refers to the segmentation of Chinese character text. The following are some methods or skills.Textual context
The most important purpose of word-segmented writing is to express the intended meaning of the writer accurately and clearly. For example, the traditional non-word-segmented text "乒乓球拍卖完了。" has two possible meanings, which can be expressed in word-segmented writing as "乒乓 球拍 卖完了。" (Ping pong bats are sold out) and "乒乓球 拍卖 完了。" (The ping pong balls have been auctioned). The author is to make a selection to correctly express the intended meaning without ambiguity.Dictionaries
If not sure whether a character string is a legal word, the writer can check its existence in a reliable word dictionary, such as Xiandai Hanyu Cidian and CEDICT. Or check whether it is a linguistically qualified word according to lexical, morphological and syntactical knowledge.Prosody
In spoken language, there is usually a pause between two words (and pause is not allowed within a word), so it is natural to put a pause (represented by a space) between the words in written language. Methods to identify word boundaries can also be found in Word#Word boundaries.Whitespace
The space between two words should be set at half the width of a Chinese character, shorter than the distance between two lines. Because the average length of a Chinese word is about 2 characters, if a space is of full width of a Chinese character, longer than the inter-line distance, the lines of words will appear scattered, not compact.Proper noun marker
To further help the reader, the proper nouns should be marked as well, such as by underlines. In fact this is already done in the Holy Bible (Union Version with modern punctuation).Pinyin segmentation
Pinyin orthography
The general rules are # Use words as the basic writing units for Pinyin expressions. For example: rén (人, person), pǎo (跑, run), māmɑ (妈妈, mother), yuèdú (阅读, read), túshūɡuǎn (图书馆, library). # A two-syllable and three-syllable expression of a concept is written consecutively (without spaces). For example: huánbǎo (环保, environmental protection), ɡōnɡɡuān (公关, public relations), chánɡyònɡcí (常用词, commonly-used words), duìbuqǐ (对不起, sorry). # Names with four or more syllables that represent a concept are written-segmentedly by words or syllables (segments divided by speech pauses inside the phrase). Those that cannot be divided into words or syllables are written consecutively. For example: wúfènɡ ɡānɡɡuǎn (无缝钢管, seamless steel pipe), huánjìnɡ bǎohù guīhuà (环境保护规划, environmental protection planning), Zhōnɡɡuó Shèhuì Kēxuéyuàn (中国社会科学院, Chinese Academy of Social Sciences), yánjiūshēnɡyuàn (研究生院, graduate school), hónɡshízìhuì (红十字会, Red Cross Society) # Single-syllable repeating words are to be written consecutively; double-syllable repeating words are written separately. For example: rénrén (人人, everyone), kànkɑn (看看, look), hónɡhónɡ de (红红的, very red), yánjiū yánjiū (研究研究, research research), xuěbái xuěbái (雪白雪白, snow white snow white). Repeating words in AABB structure are written consecutively. For example: láiláiwǎnɡwǎnɡ (来来往往, coming and going), qīnɡqīnɡchǔchǔ (清清楚楚, crystal clear), fānɡfānɡmiànmiàn (方方面面, all aspects). # Monosyllabic prefixes (副 vice, 总 general/chief, 非 non, 反 anti, 超 super, 老 old, 阿 A, 可 able, 无 non, 半 semi, etc.) or monosyllable suffixes (子 zi, 儿 er, 头 man, 性 -ity, 者 person, 员 member, 家 expert, 手 specialist, 化 -ize, 们 plural, etc.) are written consecutively with the main word. For example: fùbùzhǎnɡ (副部长, vice minister), zǒnɡɡōnɡchénɡshī (总工程师, chief engineer), fùzǒnɡɡōnɡchénɡshī (副总工程师, vice chief engineer), fēijīnshǔ (非金属, non-metallic), kēxuéxìnɡ (科学性, scientific / scientificity), chénɡwùyuán (乘务员, flight attendant), xiàndàihuà (现代化, modernization), háizimen (孩子们, children). # For the convenience of reading and understanding, a hyphen can be used between some parallel words or morphemes, or in some abbreviations. For example: bā-jiǔ tiān (八九天, eight or nine days), rén-jī duìhuà (人机对话, human-computer dialogue), Jīnɡ-Zànɡ Gāosù Gōnɡlù (京藏高速公路, Beijing-Tibet Expressway). In addition to the general rules, there are specific rules for nouns, verbs, adjectives, pronouns, numerals, quantifiers, adverbs, prepositions, conjunctions, auxiliary words, interjections, onomatopoeias, idioms, sayings, as well as names of people and places.Example
Below is an example with a longer text from the Chinese version of the United Nations Universal Declaration of Human Rights: Article 1 of the ''Universal Declaration of Human Rights'' inComputer-based word segmentation
Before word-segmented writing is popularized, computer-based word segmentation is often used for language information processing. The quality is getting better and better. But it still needs post-editing by human beings. And it will never be as reliable as word segmentation by the author personally.References
{{reflist Word-segmented writing Word-segmented writing