XCES

XCES
XCES is an XML based standard to encode text corpus, text corpora, which are used by linguists and natural language researchers. XCES is highly based on the previous Expert Advisory Group on Language Engineering Standards, EAGLES Corpus Encoding Standard (CES) but uses XML as the markup language. It supports simple corpora as well as annotated corpora, parallel corpora and other. See also * Text Encoding Initiative External links * Corpus Encoding Standard
Markup languages {{comp-ling-stub ...
[...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]

Text Corpus
In linguistics, a corpus (plural ''corpora'') or text corpus is a language resource consisting of a large and structured set of texts (nowadays usually electronically stored and processed). In corpus linguistics, they are used to do statistical analysis and statistical hypothesis testing, hypothesis testing, checking occurrences or validating linguistic rules within a specific language territory. In Search engine (computing), search technology, a corpus is the collection of documents which is being searched. Overview A corpus may contain texts in a single language (''monolingual corpus'') or text data in multiple languages (''multilingual corpus''). In order to make the corpora more useful for doing linguistic research, they are often subjected to a process known as annotation. An example of annotating a corpus is part-of-speech tagging, or ''POS-tagging'', in which information about each word's part of speech (verb, noun, adjective, etc.) is added to the corpus in the form o ...
[...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]