LOB Corpus
   HOME

TheInfoList



OR:

The Lancaster-Oslo/Bergen (LOB) Corpus is a million-word collection of British English texts which was compiled in the 1970s in collaboration between the
University of Lancaster , mottoeng = Truth lies open to all , established = , endowment = £13.9 million , budget = £317.9 million , type = Public , city = Bailrigg, City of Lancaster , country = England , coor = , campus = Bailrigg , faculty = 1 ...
, the
University of Oslo The University of Oslo ( no, Universitetet i Oslo; la, Universitas Osloensis) is a public research university located in Oslo, Norway. It is the highest ranked and oldest university in Norway. It is consistently ranked among the top universit ...
, and the
Norwegian Computing Centre for the Humanities Norwegian, Norwayan, or Norsk may refer to: *Something of, from, or related to Norway, a country in northwestern Europe *Norwegians, both a nation and an ethnic group native to Norway *Demographics of Norway *The Norwegian language, including the ...
,
Bergen Bergen (), historically Bjørgvin, is a city and municipality in Vestland county on the west coast of Norway. , its population is roughly 285,900. Bergen is the second-largest city in Norway. The municipality covers and is on the peninsula of ...
, to provide a British counterpart to the
Brown Corpus The Brown University Standard Corpus of Present-Day American English (or just Brown Corpus) is an electronic collection of text samples of American English, the first major structured corpus of varied genres. This corpus first set the bar for the ...
compiled by
Henry Kučera Henry Kučera (15 February 1925 – 20 February 2010), born Jindřich Kučera () was a Czech-American linguist who pioneered corpus linguistics, linguistic software, a major contributor to the ''American Heritage Dictionary'', and a pioneer in ...
and
W. Nelson Francis W. Nelson Francis (October 23, 1910 – June 14, 2002) was an American author, linguist, and university professor. He served as a member of the faculties of Franklin & Marshall College and Brown University, where he specialized in Engl ...
for American English in the 1960s. Its composition was designed to match the original Brown corpus in terms of its size and genres as closely as possible using documents published in the UK by British authors. Both corpora consist of 500 samples each comprising about 2000 words in the following genres: The corpus has been also
tagged Tagged may refer to: * Tagged (website), a social discovery website * Tagged (web series), an American teen psychological thriller web series {{disambiguation ...
, i.e.
part-of-speech In grammar, a part of speech or part-of-speech (abbreviated as POS or PoS, also known as word class or grammatical category) is a category of words (or, more generally, of lexical items) that have similar grammatical properties. Words that are ass ...
categories have been assigned to every word.


External links


LOB Corpus Manual

LOB Corpus from the Oxford Text Archive
English corpora Linguistic research Applied linguistics Corpora {{english-lang-stub