machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...

, semantic analysis of a

corpus Corpus is Latin for "body". It may refer to: Linguistics * Text corpus, in linguistics, a large and structured set of texts * Speech corpus, in linguistics, a large set of speech audio files * Corpus linguistics, a branch of linguistics Music * ...

is the task of building structures that approximate concepts from a large set of documents. It generally does not involve prior semantic understanding of the documents. A

metalanguage In logic and linguistics, a metalanguage is a language used to describe another language, often called the ''object language''. Expressions in a metalanguage are often distinguished from those in the object language by the use of italics, quota ...

based on predicate logic can analyze the speech of humans. Another strategy to understand the semantics of a text is

symbol grounding In cognitive science and semantics, the symbol grounding problem concerns how it is that words ( symbols in general) get their meanings, and hence is closely related to the problem of what meaning itself really is. The problem of meaning is i ...

. If language is grounded, it is equal to recognizing a machine readable meaning. For the restricted domain of spatial analysis, a computer based language understanding system was demonstrated. Latent semantic analysis (sometimes latent semantic indexing), is a class of techniques where documents are represented as vectors in term space. A prominent example is PLSI. Latent Dirichlet allocation involves attributing document terms to topics.

n-gram In the fields of computational linguistics and probability, an ''n''-gram (sometimes also called Q-gram) is a contiguous sequence of ''n'' items from a given sample of text or speech. The items can be phonemes, syllables, letters, words or ...

s and

hidden Markov models A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process — call it X — with unobservable ("''hidden''") states. As part of the definition, HMM requires that there be an ...

work by representing the term stream as a

Markov chain A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happen ...

where each term is derived from the few terms before it.

References

Machine learning {{Compsci-stub

See also

References