Text Simplification

	Text Simplification Text simplification is an operation used in natural language processing to change, enhance, classify, or otherwise process an existing body of human-readable text so its grammar and structure is greatly simplified while the underlying meaning and information remain the same. Text simplification is an important area of research because of communication needs in an increasingly complex and interconnected world more dominated by science, technology, and new media. But natural human languages pose huge problems because they ordinarily contain large vocabularies and complex constructions that machines, no matter how fast and well-programmed, cannot easily process. However, researchers have discovered that, to reduce linguistic diversity, they can use methods of semantic compression to limit and simplify a set of words used in given texts. Example Text simplification is illustrated with an example used by Siddharthan (2006). The first sentence contains two relative clauses and one con ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Natural Language Processing Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to process and analyze large amounts of natural language data. The goal is a computer capable of "understanding" the contents of documents, including the contextual nuances of the language within them. The technology can then accurately extract information and insights contained in the documents as well as categorize and organize the documents themselves. Challenges in natural language processing frequently involve speech recognition, natural-language understanding, and natural-language generation. History Natural language processing has its roots in the 1950s. Already in 1950, Alan Turing published an article titled "Computing Machinery and Intelligence" which proposed what is now called the Turing test as a criterion of intelligence, t ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Lexical Simplification Lexical simplification is a sub-task of text simplification. It can be defined as any lexical substitution task that reduces text complexity. See also Lexical substitution Text simplification Text simplification is an operation used in natural language processing to change, enhance, classify, or otherwise process an existing body of human-readable text so its grammar and structure is greatly simplified while the underlying meaning and ... References * Advaith Siddharthan.Syntactic Simplification and Text Cohesion. In Research on Language and Computation, Volume 4, Issue 1, Jun 2006, Pages 77–109, Springer Science, the Netherlands. * Siddhartha Jonnalagadda, Luis Tari, Joerg Hakenberg, Chitta Baral and Graciela Gonzalez. Towards Effective Sentence Simplification for Automatic Processing of Biomedical Text. In Proc. of the NAACL-HLT 2009, Boulder, USA, June External links Task Computational linguistics Speech recognition Natural language processing {{Comp-ling- ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Speech Recognition Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers with the main benefit of searchability. It is also known as automatic speech recognition (ASR), computer speech recognition or speech to text (STT). It incorporates knowledge and research in the computer science, linguistics and computer engineering fields. The reverse process is speech synthesis. Some speech recognition systems require "training" (also called "enrollment") where an individual speaker reads text or isolated vocabulary into the system. The system analyzes the person's specific voice and uses it to fine-tune the recognition of that person's speech, resulting in increased accuracy. Systems that do not use training are called "speaker-independent" systems. Systems that use training are called "speaker dependent". Speech recognition ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Computational Linguistics Computational linguistics is an Interdisciplinarity, interdisciplinary field concerned with the computational modelling of natural language, as well as the study of appropriate computational approaches to linguistic questions. In general, computational linguistics draws upon linguistics, computer science, artificial intelligence, mathematics, logic, philosophy, cognitive science, cognitive psychology, psycholinguistics, anthropology and neuroscience, among others. Sub-fields and related areas Traditionally, computational linguistics emerged as an area of artificial intelligence performed by computer scientists who had specialized in the application of computers to the processing of a natural language. With the formation of the Association for Computational Linguistics (ACL) and the establishment of independent conference series, the field consolidated during the 1970s and 1980s. The Association for Computational Linguistics defines computational linguistics as: The term "comp ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Basic English Basic English (British American Scientific International and Commercial English) is an English-based controlled language created by the linguist and philosopher Charles Kay Ogden as an international auxiliary language, and as an aid for teaching English as a second language. Basic English is, in essence, a simplified subset of regular English. It was presented in Ogden's book ''Basic English: A General Introduction with Rules and Grammar''. The first work on Basic English was written by two Englishmen, Ivor Richards of Harvard University and Charles Kay Ogden of the University of Cambridge in England. The design of Basic English drew heavily on the semiotic theory put forward by Ogden and Richards in their book ''The Meaning of Meaning''. Ogden's Basic, and the concept of a simplified English, gained its greatest publicity just after the Allied victory in World War II as a means for world peace. He was convinced that the world needed to gradually eradicate minority languages ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Simplified Technical English ASD-STE100 Simplified Technical English (STE) is an international specification for the preparation of technical documentation in a controlled language. STE as a controlled language was developed in the early 1980s (as AECMA Simplified English) to help second-language speakers of English to unambiguously understand technical manuals written in English. It was initially applicable to civil aircraft maintenance documentation. It then became a requirement for defense projects, including land and sea vehicles. Today, many maintenance and technical manuals are written in STE, in a wide range of other industries. History The first attempts towards a form of controlled English were made as early as the 1930s and 1970s with Basic English and Caterpillar Fundamental English. In 1979 aerospace documentation was written in American English (Boeing, Douglas, Lockheed, etc.), in British English (Hawker Siddeley, British Aircraft Corporation, etc.) and by companies whose native language was no ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Text Normalization Text normalization is the process of transforming text into a single canonical form that it might not have had before. Normalizing text before storing or processing it allows for separation of concerns, since input is guaranteed to be consistent before operations are performed on it. Text normalization requires being aware of what type of text is to be normalized and how it is to be processed afterwards; there is no all-purpose normalization procedure. Applications Text normalization is frequently used when converting text to speech. Numbers, dates, acronyms, and abbreviations are non-standard "words" that need to be pronounced differently depending on context.Sproat, R.; Black, A.; Chen, S.; Kumar, S.; Ostendorf, M.; Richards, C. (2001). "Normalization of non-standard words." ''Computer Speech and Language'' 15; 287–333. doibr>10.1006/csla.2001.0169 For example: * "$200" would be pronounced as "two hundred dollars" in English, but as "lua selau tālā" in Samoan. * "vi" c ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Semantic Compression In natural language processing, semantic compression is a process of compacting a lexicon used to build a textual document (or a set of documents) by reducing language heterogeneity, while maintaining text semantics. As a result, the same ideas can be represented using a smaller set of words. In most applications, semantic compression is a lossy compression, that is, increased prolixity does not compensate for the lexical compression, and an original document cannot be reconstructed in a reverse process. By generalization Semantic compression is basically achieved in two steps, using frequency dictionaries and semantic network: # determining cumulated term frequencies to identify target lexicon, # replacing less frequent terms with their hypernyms (generalization) from target lexicon. Step 1 requires assembling word frequencies and information on semantic relationships, specifically hyponymy. Moving upwards in word hierarchy, a cumulative concept frequency is calculating by a ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Lexical Substitution Lexical substitution is the task of identifying a substitute for a word in the context of a clause. For instance, given the following text: "After the ''match'', replace any remaining fluid deficit to prevent chronic dehydration throughout the tournament", a substitute of ''game'' might be given. Lexical substitution is strictly related to word sense disambiguation (WSD), in that both aim to determine the meaning of a word. However, while WSD consists of automatically assigning the appropriate sense from a fixed sense inventory, lexical substitution does not impose any constraint on which substitute to choose as the best representative for the word in context. By not prescribing the inventory, lexical substitution overcomes the issue of the granularity of sense distinctions and provides a level playing field for automatic systems that automatically acquire word senses (a task referred to as Word Sense Induction). Evaluation In order to evaluate automatic systems on lexical subs ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Language Reform Language reform is a kind of language planning by widespread change to a language. The typical methods of language reform are simplification and linguistic purism. Simplification regularises vocabulary, grammar, or spelling. Purism aligns the language with a form which is deemed 'purer'. Language reforms are intentional changes to language; this article does not cover natural language change, such as the Great Vowel Shift. Simplification By far the most common language reform is simplification. The most common simplification is spelling reform, but inflection, syntax, vocabulary and word formation can also be targets for simplification. For example, in English, there are many prefixes which mean "the opposite of", e.g. ''un-'', ''in-'', ''a(n)-'', ''dis-'', and ''de-''. A language reform might propose to replace the redundant prefixes with one, such as ''un-''. Purification Linguistic purism or linguistic protectionism is the prescriptive practice of recognising one form of a lan ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Meaning (linguistic) Semantics (from grc, σημαντικός ''sēmantikós'', "significant") is the study of reference, meaning, or truth. The term can be used to refer to subfields of several distinct disciplines, including philosophy, linguistics and computer science. History In English, the study of meaning in language has been known by many names that involve the Ancient Greek word (''sema'', "sign, mark, token"). In 1690, a Greek rendering of the term ''semiotics'', the interpretation of signs and symbols, finds an early allusion in John Locke's ''An Essay Concerning Human Understanding'': The third Branch may be called [''simeiotikí'', "semiotics"], or the Doctrine of Signs, the most usual whereof being words, it is aptly enough termed also , Logick. In 1831, the term is suggested for the third branch of division of knowledge akin to Locke; the "signs of our knowledge". In 1857, the term ''semasiology'' (borrowed from German ''Semasiologie'') is attested in Josiah W. Gibbs' '' ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Controlled Natural Language Controlled natural languages (CNLs) are subsets of natural languages that are obtained by restricting the grammar and vocabulary in order to reduce or eliminate ambiguity and complexity. Traditionally, controlled languages fall into two major types: those that improve readability for human readers (e.g. non-native speakers), and those that enable reliable automatic semantic analysis of the language. The first type of languages (often called "simplified" or "technical" languages), for example ASD Simplified Technical English, Caterpillar Technical English, IBM's Easy English, are used in the industry to increase the quality of technical documentation, and possibly simplify the semi-automatic translation of the documentation. These languages restrict the writer by general rules such as "Keep sentences short", "Avoid the use of pronouns", "Only use dictionary-approved words", and "Use only the active voice". The second type of languages have a formal syntax and semantics, and can ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]