The International Computer Archive of Modern and Medieval English (ICAME) is an international group of linguists and data scientists working in

corpus linguistics Corpus linguistics is the study of language, study of a language as that language is expressed in its text corpus (plural ''corpora''), its body of "real world" text. Corpus linguistics proposes that a reliable analysis of a language is more feas ...

to digitise

English English usually refers to: * English language * English people English may also refer to: Peoples, culture, and language * ''English'', an adjective for something of, from, or related to England ** English national ide ...

texts. The organisation was founded in

Oslo Oslo ( , , or ; sma, Oslove) is the capital and most populous city of Norway. It constitutes both a county and a municipality. The municipality of Oslo had a population of in 2022, while the city's greater urban area had a population of ...

Norway Norway, officially the Kingdom of Norway, is a Nordic country in Northern Europe, the mainland territory of which comprises the western and northernmost portion of the Scandinavian Peninsula. The remote Arctic island of Jan Mayen and t ...

in 1977 as the International Computer Archive of Modern English, before being renamed to its current title. The portal to their materials is hosted at the

University of Bergen The University of Bergen ( no, Universitetet i Bergen, ) is a research-intensive state university located in Bergen, Norway. As of 2019, the university has over 4,000 employees and 18,000 students. It was established by an act of parliament in 194 ...

, where they have set out the aim of the organization to "collect and distribute information on English language material available for computer processing and on linguistic research to compile an archive of English text corpora in machine-readable form, and to make material available to research institutions." Creating computer corpora, i.e. collections of texts in machine-readable form, is the most accessible way to study both transcribed spoken language and various genres of written texts for modern scholars, including both "descriptive and more theoretically-minded linguists". The ICAME group hosts academic conferences that focus on corpus linguistic studies of historical changes and contemporary grammatical descriptions of English, and makes corpora of different varieties of English available to scholars, starting with editions of the 1960s

Brown Corpus The Brown University Standard Corpus of Present-Day American English (or just Brown Corpus) is an electronic collection of text samples of American English, the first major structured corpus of varied genres. This corpus first set the bar for the ...

. Their first academic conference was held in

Bergen, Norway Bergen (), historically Bjørgvin, is a city and municipalities of Norway, municipality in Vestland county on the Western Norway, west coast of Norway. , its population is roughly 285,900. Bergen is the list of towns and cities in Norway, secon ...

in 1979, and scholars who were interested in corpus linguistics continued to meet each spring in different European and English-speaking countries. At these meetings, the compilation and distribution of corpora they enabled played a key role in the creation of the field of corpus linguistics in the 20th century, a precursor to current

big data Though used sometimes loosely partly because of a lack of formal definition, the interpretation that seems to best describe Big data is the one associated with large body of information that we could not comprehend when used only in smaller am ...

analytics. In summarizing the field, Kennedy's ''Introduction to Corpus Linguistics'' notes that "for corpus linguists with an interest in the description of English, the International Computer Archive of Modern and Medieval English has been the major resource". The influence of ICAME on the field has also be laid out in Facchinetti's history, ''Corpus Linguistics Twenty-five Years On''. One influential resource that ICAME made available was a CD of 20 different corpora, including those covering different regional Englishes (such as the Australian Corpus of English, the Wellington Corpus of Spoken New Zealand English, the Kolhapur Corpus of Indian English, the

Bergen Corpus of London Teenage Language The Bergen Corpus of London Teenage Language (COLT) is a data set of samples of spoken English that was compiled in 1993 from tape recorded and transcribed conversations by teens between the ages of 13 and 17 in schools throughout London, England. ...

(COLT), the Helsinki Corpus of Older Scots, and the

International Corpus of English The International Corpus of English (ICE) is a set of corpora representing varieties of English from around the world. Over twenty countries or groups of countries where English is the first language or an official second language are included. His ...

—East-African component), as well as versions of the Brown Corpus and the Lancaster-Bergen-Oslo (LOB) corpus tagged for

part of speech In grammar, a part of speech or part-of-speech (abbreviated as POS or PoS, also known as word class or grammatical category) is a category of words (or, more generally, of lexical items) that have similar grammatical properties. Words that are assi ...

. ICAME also published an annual journal, the ''ICAME Journal'', formerly ''ICAME News'', that contains articles, conference reports, reviews and notices related to corpus linguistics. The current editors of the ''ICAME Journal'' are Merja Kytö and

Anna-Brita Stenström Anna-Brita Stenström (born 1932) is a linguist whose areas of research include corpus linguistics, sociolinguistics, pragmatics, and discourse analysis. She has initiated and co-directed three online corpora of adolescent language: The Bergen Corp ...

References

Further reading