The Lancaster-Oslo/Bergen (LOB) Corpus is a million-word collection of British English texts which was compiled in the 1970s in collaboration between the
University of Lancaster, the
University of Oslo, and the
Norwegian Computing Centre for the Humanities,
Bergen, to provide a British counterpart to the
Brown Corpus compiled by
Henry Kučera and
W. Nelson Francis
W. Nelson Francis (October 23, 1910 – June 14, 2002) was an American author, linguist, and university professor. He served as a member of the faculties of Franklin & Marshall College and Brown University, where he specialized in Engl ...
for American English in the 1960s.
Its composition was designed to match the original Brown corpus in terms of its size and genres as closely as possible using documents published in the UK by British authors. Both corpora consist of 500 samples each comprising about 2000 words in the following genres:
The corpus has been also
tagged, i.e.
part-of-speech categories have been assigned to every word.
External links
LOB Corpus ManualLOB Corpus from the Oxford Text Archive
English corpora
Linguistic research
Applied linguistics
Corpora
{{english-lang-stub