Below are two estimates of the most common words in
Modern Spanish
Spanish ( or , Castilian) is a Romance language of the Indo-European language family that evolved from colloquial Latin spoken on the Iberian peninsula. Today, it is a global language with more than 500 million native speakers, mainly in the Am ...
. Each estimate comes from an analysis of a different
text corpus
In linguistics, a corpus (plural ''corpora'') or text corpus is a language resource consisting of a large and structured set of texts (nowadays usually electronically stored and processed). In corpus linguistics, they are used to do statistical a ...
. A ''text corpus'' is a large collection of samples of written and/or spoken language, that has been carefully prepared for linguistic analysis. To determine which words are the most common, researchers create a
database
In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases sp ...
of all the words found in the corpus, and categorise them based on the context in which they are used.
The first table lists the 100 most common
word form
In linguistics, morphology () is the study of words, how they are formed, and their relationship to other words in the same language. It analyzes the structure of words and parts of words such as stems, root words, prefixes, and suffixes. Mor ...
s from the Corpus de Referencia del Español Actual (CREA), a text corpus compiled by the
Real Academia Española
The Royal Spanish Academy ( es, Real Academia Española, generally abbreviated as RAE) is Spain's official royal institution with a mission to ensure the stability of the Spanish language. It is based in Madrid, Spain, and is affiliated with ...
(RAE). The RAE is Spain's official institution for documenting,
planning
Planning is the process of thinking regarding the activities required to achieve a desired goal. Planning is based on foresight, the fundamental capacity for mental time travel. The evolution of forethought, the capacity to think ahead, is consi ...
, and
standardising the Spanish language. A ''word form'' is any of the grammatical variations of a word.
The second table is a list of 100 most common
lemmas found in a text corpus compiled by
Mark Davies and other language researchers at
Brigham Young University
Brigham Young University (BYU, sometimes referred to colloquially as The Y) is a private research university in Provo, Utah. It was founded in 1875 by religious leader Brigham Young and is sponsored by the Church of Jesus Christ of Latter-day ...
in the United States. A ''lemma'' is the primary form of a word—the one that would appear in a dictionary. The Spanish
infinitive
Infinitive (abbreviated ) is a linguistics term for certain verb forms existing in many languages, most often used as non-finite verbs. As with many linguistic concepts, there is not a single definition applicable to all languages. The word is deri ...
''
tener'' ("to have") is a lemma, while ''
tiene'' ("has")—which is a
conjugation
Conjugation or conjugate may refer to:
Linguistics
* Grammatical conjugation, the modification of a verb from its basic form
* Emotive conjugation or Russell's conjugation, the use of loaded language
Mathematics
* Complex conjugation, the chang ...
of ''tener''—is a word form.
Real Academia Española
The list below comes from "1000 formas más frecuentes" ()", a list published by the Real Academia Española (RAE) from analysis of more than 160 million
word form
In linguistics, morphology () is the study of words, how they are formed, and their relationship to other words in the same language. It analyzes the structure of words and parts of words such as stems, root words, prefixes, and suffixes. Mor ...
s found in the Corpus de Referencia del Español Actual (), or CREA. CREA is a computerised
corpus
Corpus is Latin for "body". It may refer to:
Linguistics
* Text corpus, in linguistics, a large and structured set of texts
* Speech corpus, in linguistics, a large set of speech audio files
* Corpus linguistics, a branch of linguistics
Music
* ...
of texts written in Spanish, and of transcripts of spoken Spanish. It includes books, magazines, and newspapers with a wide variety of content, as well as transcripts of spoken language from radio and television broadcasts and other sources. All the works in the collection are from 1975 to 2004. CREA includes samples from all Spanish-speaking countries.
The list of "2000 most frequent word forms" comes from an analysis of CREA
version
Version may refer to:
Computing
* Software version, a set of numbers that identify a unique evolution of a computer program
* VERSION (CONFIG.SYS directive), a configuration directive in FreeDOS
Music
* Cover version
* Dub version
* Remix
* ''Ve ...
3.2.
Plural
The plural (sometimes abbreviated pl., pl, or ), in many languages, is one of the values of the grammatical category of number. The plural of a noun typically denotes a quantity greater than the default quantity represented by that noun. This de ...
s,
verb conjugation
In linguistics, conjugation () is the creation of derived forms of a verb from its principal parts by inflection (alteration of form according to rules of grammar). For instance, the verb ''break'' can be conjugated to form the words ''break'', ...
s, and other
inflection
In linguistic morphology, inflection (or inflexion) is a process of word formation in which a word is modified to express different grammatical categories such as tense, case, voice, aspect, person, number, gender, mood, animacy, and defin ...
s are ranked separately.
Homonym
In linguistics, homonyms are words which are homographs (words that share the same spelling, regardless of pronunciation), or homophones (equivocal words, that share the same pronunciation, regardless of spelling), or both. Using this definition, ...
s, however, are not distinguished from one another. CREA 3.2 was published in June 2008.
Mark Davies
In 2006,
Mark Davies, an associate professor of
linguistics
Linguistics is the scientific study of human language. It is called a scientific study because it entails a comprehensive, systematic, objective, and precise analysis of all aspects of language, particularly its nature and structure. Linguis ...
at
Brigham Young University
Brigham Young University (BYU, sometimes referred to colloquially as The Y) is a private research university in Provo, Utah. It was founded in 1875 by religious leader Brigham Young and is sponsored by the Church of Jesus Christ of Latter-day ...
, published his estimate of the 5000 most common words in Modern Spanish. To make this list, he compiled samples only from 20th-century sources—especially from the years 1970 to 2000. Most of the sources are from the 1990s. Of the 20 million words in the corpus, about one-third (~6,750,000 words) come from transcripts of spoken Spanish: conversations, interviews, lectures, sermons, press conferences, sports broadcasts, and so on. Among the written sources are novels, plays, short stories, letters, essays, newspapers, and the encyclopedia ''
Encarta
''Microsoft Encarta'' is a discontinued digital multimedia encyclopedia published by Microsoft from 1993 to 2009. Originally sold on CD-ROM or DVD, it was also available on the World Wide Web via an annual subscription, although later articles ...
''. The samples, written and spoken, come from Spain and at least 10 Latin American countries. Most of the samples were previously compiled for the Corpus del Español (2001), a 100 million-word corpus that includes works from the 13th century through the 20th.
The 5000 words in Davies' list are
lemmas. A ''lemma'' is the form of the word as it would appear in a dictionary. Singular nouns and plurals, for example, are treated as the same word, as are
infinitive
Infinitive (abbreviated ) is a linguistics term for certain verb forms existing in many languages, most often used as non-finite verbs. As with many linguistic concepts, there is not a single definition applicable to all languages. The word is deri ...
s and verb conjugations. The table below includes the top 100 words from Davies' list of 5000. This list distinguishes between the
definite article
An article is any member of a class of dedicated words that are used with noun phrases to mark the identifiability of the referents of the noun phrases. The category of articles constitutes a part of speech.
In English, both "the" and "a(n)" ar ...
s ''lo'' and ''la'' and the pronouns ''lo'' and ''la''; all are ranked individually. The adjectives ''ese'' and ''esa'' are ranked together (as are ''este'' and ''esta'') ), but the pronoun ''eso'' is separate. All conjugations of a verb are ranked together.
A highlighted row indicates that the word was found to occur especially frequently in samples of spoken Spanish.
[Davies (2006), p. 9]
See also
Notes
References
*
External links
* {{cite web , url=https://crscardellino.github.io/SBWCE/ , title=Spanish Billion Words Corpus and Embeddings , last=Cardellino , first=Cristian , website=crscardellino.github.io , publisher=Cristian Cardellino , date=March 2016
Spanish language
Spanish words and phrases
Spanish might refer to:
* Items from or related to Spain:
**Spaniards are a nation and ethnic group indigenous to Spain
**Spanish language, spoken in Spain and many Latin American countries
**Spanish cuisine
Other places
* Spanish, Ontario, Cana ...