Lexical similarity
   HOME

TheInfoList



OR:

In
linguistics Linguistics is the scientific study of human language. It is called a scientific study because it entails a comprehensive, systematic, objective, and precise analysis of all aspects of language, particularly its nature and structure. Ling ...
, lexical similarity is a measure of the degree to which the word sets of two given
language Language is a structured system of communication. The structure of a language is its grammar and the free components are its vocabulary. Languages are the primary means by which humans communicate, and may be conveyed through a variety of ...
s are similar. A lexical similarity of 1 (or 100%) would mean a total overlap between vocabularies, whereas 0 means there are no common words. There are different ways to define the lexical similarity and the results vary accordingly. For example, ''
Ethnologue ''Ethnologue: Languages of the World'' (stylized as ''Ethnoloɠue'') is an annual reference publication in print and online that provides statistics and other information on the living languages of the world. It is the world's most comprehensi ...
s method of calculation consists in comparing a regionally standardized wordlist (comparable to the Swadesh list) and counting those forms that show similarity in both form and meaning. Using such a method,
English English usually refers to: * English language * English people English may also refer to: Peoples, culture, and language * ''English'', an adjective for something of, from, or related to England ** English national ...
was evaluated to have a lexical similarity of 60% with
German German(s) may refer to: * Germany (of or related to) **Germania (historical use) * Germans, citizens of Germany, people of German ancestry, or native speakers of the German language ** For citizens of Germany, see also German nationality law **Ge ...
and 27% with
French French (french: français(e), link=no) may refer to: * Something of, from, or related to France ** French language, which originated in France, and its various dialects and accents ** French people, a nation and ethnic group identified with Franc ...
. Lexical similarity can be used to evaluate the degree of genetic relationship between two languages. Percentages higher than 85% usually indicate that the two languages being compared are likely to be related
dialect The term dialect (from Latin , , from the Ancient Greek word , 'discourse', from , 'through' and , 'I speak') can refer to either of two distinctly different types of linguistic phenomena: One usage refers to a variety of a language that is ...
s. The lexical similarity is only one indication of the
mutual intelligibility In linguistics, mutual intelligibility is a relationship between languages or dialects in which speakers of different but related varieties can readily understand each other without prior familiarity or special effort. It is sometimes used as ...
of the two languages, since the latter also depends on the degree of phonetical, morphological, and syntactical similarity. The variations due to differing wordlists weigh on this. For example, lexical similarity between French and English is considerable in lexical fields relating to culture, whereas their similarity is smaller as far as basic (function) words are concerned. Unlike mutual intelligibility, lexical similarity can only be symmetrical.


Indo-European languages

The table below shows some lexical similarity values for pairs of selected Romance, Germanic, and Slavic languages, as collected and published by ''
Ethnologue ''Ethnologue: Languages of the World'' (stylized as ''Ethnoloɠue'') is an annual reference publication in print and online that provides statistics and other information on the living languages of the world. It is the world's most comprehensi ...
''. ''Notes:'' *Language codes are from standard
ISO 639-3 ISO 639-3:2007, ''Codes for the representation of names of languages – Part 3: Alpha-3 code for comprehensive coverage of languages'', is an international standard for language codes in the ISO 639 series. It defines three-letter codes for ...
. *Roberto Bolognesi and Wilbert Heeringa found the average divergence between Sardinian and Italian to be around 48.7%, ranging from a minimum dialectal degree of divergence being 46.6% to the highest one of 51.1%. That would make the various dialects of Sardinian slightly more divergent from Italian than Spanish (with an average degree of divergence from Italian being around 46.0%) is. *"-" denotes that comparison data are not available. *In the case of English-French lexical similarity, at least two other studies estimate the number of English words directly inherited from French at 28.3% and 41% respectively, with respectively 28.24% and 15% of other English words derived from Latin, putting English-French lexical similarity at around 0.56, with reciprocally lower English-German lexical similarities. Another study estimates the number of English words with an Italic origin at 51%, consistent with the two previous analyses.{{cite book , last=Nation , first=I.S.P. , authorlink1= Paul Nation , date=2001, title=Learning Vocabulary in Another Language, publisher=Cambridge University Press, url=https://books.google.com/books?id=sKqx8k8gYTkC, page=477, isbn=0-521-80498-1


See also

*
Lexis (linguistics) In linguistics, the term lexis (from grc, λέξις / word) designates the complete set of all possible words in a language, or a particular subset of words that are grouped by some specific linguistic criteria. For example, the general term ...
*
Language family A language family is a group of languages related through descent from a common ''ancestral language'' or ''parental language'', called the proto-language of that family. The term "family" reflects the tree model of language origination in h ...
*
Dialect The term dialect (from Latin , , from the Ancient Greek word , 'discourse', from , 'through' and , 'I speak') can refer to either of two distinctly different types of linguistic phenomena: One usage refers to a variety of a language that is ...
*
Linguistic distance Linguistic distance is how different one language or dialect is from another. Although they lack a uniform approach to quantifying linguistic distance between languages, practitioners of linguistics use the concept in a variety of linguistic situat ...


References


''Ethnologue.com''
(lexical similarity values available at some of the individual language entries)
Definition of lexical similarity at ''Ethnologue.com''
*Rensch, Calvin R. 1992. "Calculating lexical similarity." In Eugene H. Casad (ed.), ''Windows on bilingualism '', 13-15. (Summer Institute of Linguistics and the University of Texas at Arlington Publications in Linguistics, 110). Dallas: Summer Institute of Linguistics and the University of Texas at Arlington.


Notes


External links


Most similar languages

A Similarity Database of Modern Lexicons
Lexical similarity of 331 languages Language comparison