Cambridge English Corpus
   HOME

TheInfoList



OR:

The Cambridge English Corpus (CEC) (formerly the Cambridge International Corpus, CIC), is a multi-billion word corpus of
English language English is a West Germanic language of the Indo-European language family, with its earliest forms spoken by the inhabitants of early medieval England. It is named after the Angles, one of the ancient Germanic peoples that migrated to the is ...
(containing both
text corpus In linguistics, a corpus (plural ''corpora'') or text corpus is a language resource consisting of a large and structured set of texts (nowadays usually electronically stored and processed). In corpus linguistics, they are used to do statistical a ...
and spoken corpus data). The Cambridge English Corpus contains data from a number of sources including
written Writing is a medium of human communication which involves the representation of a language through a system of physically inscribed, mechanically transferred, or digitally represented symbols. Writing systems do not themselves constitute h ...
and spoken,
British British may refer to: Peoples, culture, and language * British people, nationals or natives of the United Kingdom, British Overseas Territories, and Crown Dependencies. ** Britishness, the British identity and common culture * British English, ...
and
American American(s) may refer to: * American, something of, from, or related to the United States of America, commonly known as the "United States" or "America" ** Americans, citizens and nationals of the United States of America ** American ancestry, pe ...
English English usually refers to: * English language * English people English may also refer to: Peoples, culture, and language * ''English'', an adjective for something of, from, or related to England ** English national ide ...
. The CEC also contains the Cambridge Learner Corpus, a 40m word corpus made up from English exam responses written by English language learners. The Cambridge English Corpus is used to inform Cambridge University Press
English Language Teaching Teaching English as a second language (TESL) or Teaching English to speakers of other languages (TESOL) are terms that refer to teaching English to students whose first language is not English. The terms TESL, TEFL, and TESOL distinguish betwee ...
publications as well as for research in
corpus linguistics Corpus linguistics is the study of language, study of a language as that language is expressed in its text corpus (plural ''corpora''), its body of "real world" text. Corpus linguistics proposes that a reliable analysis of a language is more feas ...
. Access is currently restricted to authors and researchers working on projects and publications for
Cambridge University Press Cambridge University Press is the university press of the University of Cambridge. Granted letters patent by Henry VIII of England, King Henry VIII in 1534, it is the oldest university press A university press is an academic publishing hou ...
, and researchers at
Cambridge English Language Assessment Cambridge Assessment English or Cambridge English develops and produces Cambridge English Qualifications and the International English Language Testing System (International English Language Testing System, IELTS). The organisation contributed ...
. It contains instances of modern written English, taken from newspapers, magazines, novels, letters, emails, textbooks, websites, and many other sources. Its spoken data is taken from many sources, including everyday conversations, telephone calls, radio broadcasts, presentations, speeches, meetings, TV programmes and lectures.


Cambridge Learner Corpus

The Cambridge Learner Corpus (CLC) is a collection of exam scripts written by students learning English, built in collaboration with Cambridge English Language Assessment. The CLC contains scripts from over 180,000 students, from around 200 countries, speaking 138 different
first language A first language, native tongue, native language, mother tongue or L1 is the first language or dialect that a person has been exposed to from birth or within the critical period. In some countries, the term ''native language'' or ''mother tongu ...
s and is growing all the time. The exams currently included are: * KET Key English Test (and KET for schools) * PET
Preliminary English Test B1 Preliminary, previously known as Cambridge English: Preliminary and the Preliminary English Test (PET), is an English language examination provided by Cambridge Assessment English (previously known as Cambridge English Language Assessment an ...
(and PET for schools) * FCE
First Certificate in English B2 First, previously known as Cambridge English: First and the First Certificate in English (FCE), is an English language examination provided by Cambridge Assessment English (previously known as Cambridge English Language Assessment and Univer ...
* CAE
Certificate in Advanced English C1 Advanced, previously known as Cambridge English: Advanced and the Certificate in Advanced English (CAE), is an English language examination provided by Cambridge Assessment English (previously known as Cambridge English Language Assessment an ...
* CPE
Certificate of Proficiency in English C2 Proficiency, previously known as Cambridge English: Proficiency and the Certificate of Proficiency in English (CPE), is an English language examination provided by Cambridge Assessment English (previously known as Cambridge English Language ...
* BEC Business English Certificate (all levels) * IELTS
International English Language Testing System The International English Language Testing System (IELTS ), is an international standardized test of English language proficiency for non-native English language speakers. It is jointly managed by the British Council, IDP: IELTS Australia a ...
(academic and general training) * CELS Certificates in English Language Skills * ILEC International Legal English Certificate * ICFE International Certificate in Financial English * Skills for Life A unique feature of the Cambridge Learner Corpus is its error coding system. Language specialists identify and annotate errors in the exam scripts. This means that the Corpus can be used to find out about the frequency of different types of errors, the contexts that the errors are made in and the student groups that find particular language areas difficult. Authors of Cambridge
English Language Teaching Teaching English as a second language (TESL) or Teaching English to speakers of other languages (TESOL) are terms that refer to teaching English to students whose first language is not English. The terms TESL, TEFL, and TESOL distinguish betwee ...
resources can use this information to target common errors – for example, the Cambridge Advanced Learner’s Dictionary contains ‘Common mistake’ features which highlight frequent learner errors. Conversely, the error coding system also reveals what students can achieve at each level. This is central to the work of English Profile, a collaborative programme to enhance the learning, teaching and assessment of English worldwide. The founding partners are
Cambridge University Press Cambridge University Press is the university press of the University of Cambridge. Granted letters patent by Henry VIII of England, King Henry VIII in 1534, it is the oldest university press A university press is an academic publishing hou ...
,
Cambridge English Language Assessment Cambridge Assessment English or Cambridge English develops and produces Cambridge English Qualifications and the International English Language Testing System (International English Language Testing System, IELTS). The organisation contributed ...
, the
University of Cambridge , mottoeng = Literal: From here, light and sacred draughts. Non literal: From this place, we gain enlightenment and precious knowledge. , established = , other_name = The Chancellor, Masters and Schola ...
, the
University of Bedfordshire The University of Bedfordshire is a public research university with campuses in Bedfordshire and Buckinghamshire, England. The University has roots from 1882, however, it gained university status in 1993 as the University of Luton. The Universi ...
, the
British Council The British Council is a British organisation specialising in international cultural and educational opportunities. It works in over 100 countries: promoting a wider knowledge of the United Kingdom and the English language (and the Welsh lan ...
and English UK. The project’s aim is to describe what learners know and can do in English at each level of the
Common European Framework of Reference Common may refer to: Places * Common, a townland in County Tyrone, Northern Ireland * Boston Common, a central public park in Boston, Massachusetts * Cambridge Common, common land area in Cambridge, Massachusetts * Clapham Common, originally com ...
(CEFR).


Specialized corpora

The Cambridge English Corpus contains a number of specialized corpora:


Cambridge Business English Corpus

The Cambridge Business English Corpus is a large collection of British and American business language, including reports and documents, books relating to different aspects of business, and the business sections from many national newspapers. The Cambridge Business English Corpus also includes the Cambridge and Nottingham Spoken Business English Corpus (CANBEC), the result of a joint project between
Cambridge University Press Cambridge University Press is the university press of the University of Cambridge. Granted letters patent by Henry VIII of England, King Henry VIII in 1534, it is the oldest university press A university press is an academic publishing hou ...
and the
University of Nottingham The University of Nottingham is a public university, public research university in Nottingham, United Kingdom. It was founded as University College Nottingham in 1881, and was granted a royal charter in 1948. The University of Nottingham belongs t ...
. This is a collection of recordings of English from companies of all sizes, ranging from big multinational companies to small partnerships. It contains formal and informal meetings, presentations, telephone conversations, lunchtime conversations, and spoken language from other business situations.


Cambridge Legal English Corpus

The Cambridge Legal English Corpus contains books, journals and newspaper articles relating to the law and legal processes.


Cambridge Financial English Corpus

The Cambridge Financial English Corpus contains texts relating to economics and finance, including leading financial magazines and newspapers.


Cambridge Academic English Corpus

The Cambridge Academic English Corpus contains written and spoken academic language at undergraduate and post-graduate level from a range of US and UK institutions, including lectures, seminars, student presentations, journals, essays and text books.


CANCODE

The Cambridge and Nottingham Corpus of Discourse in English (CANCODE) is a collection of spoken English recorded at hundreds of locations across the British Isles in a wide variety of situations (e.g. casual conversation, socialising, finding out information, and discussions). The CANCODE corpus is the result of a joint project between
Cambridge University Press Cambridge University Press is the university press of the University of Cambridge. Granted letters patent by Henry VIII of England, King Henry VIII in 1534, it is the oldest university press A university press is an academic publishing hou ...
and the
University of Nottingham The University of Nottingham is a public university, public research university in Nottingham, United Kingdom. It was founded as University College Nottingham in 1881, and was granted a royal charter in 1948. The University of Nottingham belongs t ...
. There are about five million words in the CANCODE corpus, and it's a very rich resource for researchers of spoken English. However, the data does have some limitations. Most people knew they were being recorded, and are chatting in informal situations such as while relaxing at home, with others of fairly equal social status. This means the interactions are generally consensual and collaborative, so the corpus has minimal evidence of conflict or adversarial exchangesCarter (2004) Language and Creativity: The Art of Common Talk. London: Routledge.


Cambridge-Cornell Corpus of Spoken North American English

The Cambridge University Press/Cornell Corpus is a large collection of informal, highly interactive, multiparty conversations between family/friends in North America. The Cambridge-Cornell corpus is the result of a joint project between
Cambridge University Press Cambridge University Press is the university press of the University of Cambridge. Granted letters patent by Henry VIII of England, King Henry VIII in 1534, it is the oldest university press A university press is an academic publishing hou ...
and
Cornell University Cornell University is a private statutory land-grant research university based in Ithaca, New York. It is a member of the Ivy League. Founded in 1865 by Ezra Cornell and Andrew Dickson White, Cornell was founded with the intention to teach an ...
.


CAMSNAE

The Cambridge Corpus of Spoken
North American English North American English (NAmE, NAE) is the most generalized variety of the English language as spoken in the United States and Canada. Because of their related histories and cultures, plus the similarities between the pronunciations (accents), v ...
(CAMSNAE) is a large collection of spoken
American English American English, sometimes called United States English or U.S. English, is the set of variety (linguistics), varieties of the English language native to the United States. English is the Languages of the United States, most widely spoken lan ...
. It includes recordings of people going about their everyday life – at work, at home with their families, going shopping, having meals, etc.


See also

*
Cambridge English Language Assessment Cambridge Assessment English or Cambridge English develops and produces Cambridge English Qualifications and the International English Language Testing System (International English Language Testing System, IELTS). The organisation contributed ...
*
Corpus linguistics Corpus linguistics is the study of language, study of a language as that language is expressed in its text corpus (plural ''corpora''), its body of "real world" text. Corpus linguistics proposes that a reliable analysis of a language is more feas ...
* English Profile


References


External links


cambridge.org/corpus
{{Corpus linguistics English corpora
Linguistics Linguistics is the scientific study of human language. It is called a scientific study because it entails a comprehensive, systematic, objective, and precise analysis of all aspects of language, particularly its nature and structure. Linguis ...
Dialectology Applied linguistics Linguistic research Phonetics works Corpora