Word Frequencies
   HOME
*



picture info

Word Frequencies
A word list (or ''lexicon'') is a list of a language's lexicon (generally sorted by frequency of occurrence either by levels or as a ranked list) within some given text corpus, serving the purpose of vocabulary acquisition. A lexicon sorted by frequency "provides a rational basis for making sure that learners get the best return for their vocabulary learning effort" (), but is mainly intended for course writers, not directly for learners. Frequency lists are also made for lexicographical purposes, serving as a sort of checklist to ensure that common words are not left out. Some major pitfalls are the corpus content, the corpus register (sociolinguistics), register, and the definition of "word". While word counting is a thousand years old, with still gigantic analysis done by hand in the mid-20th century, natural language processing, natural language electronic processing of large corpora such as movie subtitles (SUBTLEX megastudy) has accelerated the research field. In computatio ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Lexicon
A lexicon is the vocabulary of a language or branch of knowledge (such as nautical or medical). In linguistics, a lexicon is a language's inventory of lexemes. The word ''lexicon'' derives from Koine Greek language, Greek word (), neuter of () meaning 'of or for words'. Linguistic theories generally regard human languages as consisting of two parts: a lexicon, essentially a catalogue of a language's words (its wordstock); and a grammar, a system of rules which allow for the combination of those words into meaningful sentences. The lexicon is also thought to include bound morphemes, which cannot stand alone as words (such as most affixes). In some analyses, compound words and certain classes of idiomatic expressions, collocations and other phrases are also considered to be part of the lexicon. Dictionary, Dictionaries are lists of the lexicon, in alphabetical order, of a given language; usually, however, bound morphemes are not included. Size and organization Items in the le ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Semantic Compression
In natural language processing, semantic compression is a process of compacting a lexicon used to build a textual document (or a set of documents) by reducing language heterogeneity, while maintaining text semantics. As a result, the same ideas can be represented using a smaller set of words. In most applications, semantic compression is a lossy compression, that is, increased prolixity does not compensate for the lexical compression, and an original document cannot be reconstructed in a reverse process. By generalization Semantic compression is basically achieved in two steps, using frequency dictionaries and semantic network: # determining cumulated term frequencies to identify target lexicon, # replacing less frequent terms with their hypernyms (generalization) from target lexicon. Step 1 requires assembling word frequencies and information on semantic relationships, specifically hyponymy. Moving upwards in word hierarchy, a cumulative concept frequency is calculating by a ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Phonetic
Phonetics is a branch of linguistics that studies how humans produce and perceive sounds, or in the case of sign languages, the equivalent aspects of sign. Linguists who specialize in studying the physical properties of speech are phoneticians. The field of phonetics is traditionally divided into three sub-disciplines based on the research questions involved such as how humans plan and execute movements to produce speech (articulatory phonetics), how various movements affect the properties of the resulting sound (acoustic phonetics), or how humans convert sound waves to linguistic information (auditory phonetics). Traditionally, the minimal linguistic unit of phonetics is the phone—a speech sound in a language which differs from the phonological unit of phoneme; the phoneme is an abstract categorization of phones. Phonetics deals with two aspects of human speech: production—the ways humans make sounds—and perception—the way speech is understood. The communicative modali ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Orthography
An orthography is a set of conventions for writing a language, including norms of spelling, hyphenation, capitalization, word breaks, emphasis, and punctuation. Most transnational languages in the modern period have a writing system, and most of these systems have undergone substantial standardization, thus exhibiting less dialect variation than the spoken language. These processes can fossilize pronunciation patterns that are no longer routinely observed in speech (e.g., "would" and "should"); they can also reflect deliberate efforts to introduce variability for the sake of national identity, as seen in Noah Webster's efforts to introduce easily noticeable differences between American and British spelling (e.g., "honor" and "honour"). Some nations (e.g. France and Spain) have established language academies in an attempt to regulate orthography officially. For most languages (including English) however, there are no such authorities and a sense of 'correct' orthography evol ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Étienne Brunet
Étienne, a French analog of Stephen or Steven, is a masculine given name. An archaic variant of the name, prevalent up to the mid-17th century, is Estienne. Étienne, Etienne, Ettiene or Ettienne may refer to: People Scientists and inventors *Étienne Bézout (1730–1783), French mathematician *Étienne Louis Geoffroy (1725–1810), French entomologist and pharmacist *Étienne Laspeyres (1834–1913), German professor of economics and statistics *Étienne Lenoir (1822–1900), Belgian engineer who invented the first internal combustion engine to be produced in numbers *Étienne Lenoir (instrument maker) (1744–1832), French scientific instrument maker and inventor of the repeating circle surveying instrument *Étienne Mulsant (1797–1880), French entomologist and ornithologist *Étienne Pascal (1588–1651), French lawyer, scientist and mathematician best known as the father of Blaise Pascal *Étienne Geoffroy Saint-Hilaire (1772–1844), French naturalist *Étienne Pierre V ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


New General Service List
The New General Service List (NGSL) is a list of 2,818 words (lemmas) claimed to be the core vocabulary of the English language published by Dr. Charles Browne, Dr. Brent Culligan and Joseph Phillips in March 2013. The words in the NGSL represent the most important high frequency words of the English language for second language learners of English and is a major update of Michael West's 1953 GSL. Although there are more than 600,000 word families in the English language, the 2,800 words in the NGSL give more than 90% coverage for learners when trying to read most general texts of English. The main goals of the NGSL project were to (1) modernize and greatly increase the size of the corpus used by, and to (2) create a list of words that provided a higher degree of coverage with fewer words than, the original GSL. The 273-million-word subsection of the more than two-billion-word Cambridge English Corpus is about 100 times larger than the 2.5 million word corpus developed in the 1 ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




General Service List
The General Service List (GSL) is a list of roughly 2,000 words published by Michael West in 1953. The words were selected to represent the most frequent words of English and were taken from a corpus of written English. The target audience was English language learners and ESL teachers. To maximize the utility of the list, some frequent words that overlapped broadly in meaning with words already on the list were omitted. In the original publication the relative frequencies of various senses of the words were also included. Details The list is important because a person who knows all the words on the list and their related families would understand approximately 90–95 percent of colloquial speech and 80–85 percent of common written texts. The list consists only of headwords, which means that the word "be" is high on the list, but assumes that the person is fluent in all forms of the word, e.g. am, is, are, was, were, being, and been. Researchers have expressed doubts about th ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Internet Archive
The Internet Archive is an American digital library with the stated mission of "universal access to all knowledge". It provides free public access to collections of digitized materials, including websites, software applications/games, music, movies/videos, moving images, and millions of books. In addition to its archiving function, the Archive is an activist organization, advocating a free and open Internet. , the Internet Archive holds over 35 million books and texts, 8.5 million movies, videos and TV shows, 894 thousand software programs, 14 million audio files, 4.4 million images, 2.4 million TV clips, 241 thousand concerts, and over 734 billion web pages in the Wayback Machine. The Internet Archive allows the public to upload and download digital material to its data cluster, but the bulk of its data is collected automatically by its web crawlers, which work to preserve as much of the public web as possible. Its web archiving, web archive, the Wayback Machine, contains hu ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Hellenistic
In Classical antiquity, the Hellenistic period covers the time in Mediterranean history after Classical Greece, between the death of Alexander the Great in 323 BC and the emergence of the Roman Empire, as signified by the Battle of Actium in 31 BC and the conquest of Ptolemaic Egypt the following year. The Ancient Greek word ''Hellas'' (, ''Hellás'') was gradually recognized as the name for Greece, from which the word ''Hellenistic'' was derived. "Hellenistic" is distinguished from "Hellenic" in that the latter refers to Greece itself, while the former encompasses all ancient territories under Greek influence, in particular the East after the conquests of Alexander the Great. After the Macedonian invasion of the Achaemenid Empire in 330 BC and its disintegration shortly after, the Hellenistic kingdoms were established throughout south-west Asia ( Seleucid Empire, Kingdom of Pergamon), north-east Africa ( Ptolemaic Kingdom) and South Asia ( Greco-Bactrian Kingdom, Indo-Gree ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]