Co-occurrence
   HOME

TheInfoList



OR:

In linguistics, co-occurrence or cooccurrence is an above-chance frequency of occurrence of two
term Term may refer to: * Terminology, or term, a noun or compound word used in a specific context, in particular: **Technical term, part of the specialized vocabulary of a particular field, specifically: ***Scientific terminology, terms used by scient ...
s (also known as coincidence or
concurrence In Western jurisprudence, concurrence (also contemporaneity or simultaneity) is the apparent need to prove the simultaneous occurrence of both ("guilty action") and ("guilty mind"), to constitute a crime; except in crimes of strict liability ...
) from a
text corpus In linguistics, a corpus (plural ''corpora'') or text corpus is a language resource consisting of a large and structured set of texts (nowadays usually electronically stored and processed). In corpus linguistics, they are used to do statistical a ...
alongside each other in a certain order. Co-occurrence in this
linguistic Linguistics is the scientific study of human language. It is called a scientific study because it entails a comprehensive, systematic, objective, and precise analysis of all aspects of language, particularly its nature and structure. Linguis ...
sense can be interpreted as an indicator of
semantic proximity Semantic similarity is a metric defined over a set of documents or terms, where the idea of distance between items is based on the likeness of their meaning or semantic content as opposed to lexicographical similarity. These are mathematical tool ...
or an
idiom An idiom is a phrase or expression that typically presents a figurative, non-literal meaning attached to the phrase; but some phrases become figurative idioms while retaining the literal meaning of the phrase. Categorized as formulaic language, ...
atic expression. Corpus linguistics and its statistic analyses reveal patterns of co-occurrences within a language and enable to work out typical collocations for its lexical items. A ''co-occurrence restriction'' is identified when linguistic elements never occur together. Analysis of these restrictions can lead to discoveries about the
structure A structure is an arrangement and organization of interrelated elements in a material object or system, or the object or system so organized. Material structures include man-made objects such as buildings and machines and natural objects such as ...
and development of a language. Co-occurrence can be seen an extension of
word count The word count is the number of words in a document or passage of text. Word counting may be needed when a text is required to stay within certain numbers of words. This may particularly be the case in academia, legal proceedings, journalism and ad ...
ing in higher dimensions. Co-occurrence can be quantitatively described using measures like
correlation In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics ...
or
mutual information In probability theory and information theory, the mutual information (MI) of two random variables is a measure of the mutual dependence between the two variables. More specifically, it quantifies the " amount of information" (in units such ...
.


See also

*
Distributional hypothesis Distributional semantics is a research area that develops and studies theories and methods for quantifying and categorizing semantic similarities between linguistic items based on their distributional properties in large samples of language data. T ...
*
Statistical semantics In linguistics, statistical semantics applies the methods of statistics to the problem of determining the meaning of words or phrases, ideally through unsupervised learning, to a degree of precision at least sufficient for the purpose of informat ...
*
Co-occurrence matrix A co-occurrence matrix or co-occurrence distribution (also referred to as : ''gray-level co-occurrence matrices'' GLCMs) is a matrix that is defined over an image to be the distribution of co-occurring pixel values (grayscale values, or colors) at ...
*
Co-occurrence networks Co-occurrence network, sometimes referred to as a semantic network, is a method to analyze text that includes a graphic visualization of potential relationships between people, organizations, concepts, biological organisms like bacteria or other ...
*
Similarity measure In statistics and related fields, a similarity measure or similarity function or similarity metric is a real-valued function that quantifies the similarity between two objects. Although no single definition of a similarity exists, usually such meas ...
**
Dice coefficient Dice (singular die or dice) are small, throwable objects with marked sides that can rest in multiple positions. They are used for generating random values, commonly as part of tabletop games, including dice games, board games, role-playing g ...


References

Corpus linguistics {{Ling-stub