Scottish Corpus of Texts and Speech
   HOME

TheInfoList



OR:

The Scottish Corpus of Texts & Speech (SCOTS) is an ongoing project to build a
corpus Corpus is Latin for "body". It may refer to: Linguistics * Text corpus, in linguistics, a large and structured set of texts * Speech corpus, in linguistics, a large set of speech audio files * Corpus linguistics, a branch of linguistics Music * ...
of modern-day (post-1940) written and spoken texts in
Scottish English Scottish English ( gd, Beurla Albannach) is the set of varieties of the English language spoken in Scotland. The transregional, standardised variety is called Scottish Standard English or Standard Scottish English (SSE). Scottish Standard ...
and varieties of Scots. SCOTS has been available online since November 2004, and can be freely searched and browsed. It reached 4.7 million words by 2015. The project is a venture by the Department of English Language and STELLA project at the
University of Glasgow , image = UofG Coat of Arms.png , image_size = 150px , caption = Coat of arms Flag , latin_name = Universitas Glasguensis , motto = la, Via, Veritas, Vita , ...
. SCOTS is grant-funded by the
Arts and Humanities Research Council The Arts and Humanities Research Council (AHRC), formerly Arts and Humanities Research Board (AHRB), is a British research council, established in 1998, supporting research and postgraduate study in the arts and humanities. History The Arts an ...
.


Language variety

SCOTS contains texts in
Scottish English Scottish English ( gd, Beurla Albannach) is the set of varieties of the English language spoken in Scotland. The transregional, standardised variety is called Scottish Standard English or Standard Scottish English (SSE). Scottish Standard ...
and varieties of broad Scots, including
Doric Doric may refer to: * Doric, of or relating to the Dorians of ancient Greece ** Doric Greek, the dialects of the Dorians * Doric order, a style of ancient Greek architecture * Doric mode, a synonym of Dorian mode * Doric dialect (Scotland) * Doric ...
,
Lallans Lallans (; a variant of the Modern Scots word ''lawlands'' meaning the lowlands of Scotland), is a term that was traditionally used to refer to the Scots language as a whole. However, more recent interpretations assume it refers to the dialects o ...
, urban varieties such as
Glaswegian The Glasgow dialect, popularly known as the Glasgow patter or Glaswegian, varies from Scottish English at one end of a bipolar linguistic continuum to the local dialect of West Central Scots at the other. Therefore, the speech of many Glaswegia ...
and
Insular Scots Insular Scots comprises varieties of Lowland Scots generally subdivided into: *Shetland dialect * Orcadian dialect Both dialects share much Norn Norn may refer to: *Norn language, an extinct North Germanic language that was spoken in North ...
. SCOTS contains a
geographical Geography (from Greek: , ''geographia''. Combination of Greek words ‘Geo’ (The Earth) and ‘Graphien’ (to describe), literally "earth description") is a field of science devoted to the study of the lands, features, inhabitants, and ...
spread of texts as well as a
demographic Demography () is the statistical study of populations, especially human beings. Demographic analysis examines and measures the dimensions and dynamics of populations; it can cover whole societies or groups defined by criteria such as edu ...
spread. Each text is accompanied by extensive
metadata Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive metadata – the descriptive ...
, including such information as author's decade of birth, gender, occupation, birthplace and place of residence, and details about the text such as publication information, audience, date and genre.


Genre and mode

SCOTS is a
multimedia Multimedia is a form of communication that uses a combination of different content forms such as text, audio, images, animations, or video into a single interactive presentation, in contrast to tradition ...
corpus Corpus is Latin for "body". It may refer to: Linguistics * Text corpus, in linguistics, a large and structured set of texts * Speech corpus, in linguistics, a large set of speech audio files * Corpus linguistics, a branch of linguistics Music * ...
, containing written texts and spoken texts, available as orthographic transcriptions, accompanied by source audio or video files. SCOTS includes a large number of
genre Genre () is any form or type of communication in any mode (written, spoken, digital, artistic, etc.) with socially-agreed-upon conventions developed over time. In popular usage, it normally describes a category of literature, music, or other for ...
s and text types, including prose fiction, poetry, business and personal correspondence, religious texts, parliamentary and administrative documents, emails, conversations and interviews.


Search and analysis

SCOTS can be investigated in various ways, depending on the user's interest. The corpus can be browsed, for example by the author's name or date of the text, and all texts can be downloaded in
plain text In computing, plain text is a loose term for data (e.g. file contents) that represent only characters of readable material but not its graphical representation nor other objects (floating-point numbers, images, etc.). It may also include a limit ...
format. Transcriptions are synchronised with audio / video files, which are streamed and may also be downloaded. An Advanced Search facility allows the user to build up more complex queries, choosing from all the fields available in the
metadata Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive metadata – the descriptive ...
. Geographical results are plotted on an interactive map, so regional variation may be investigated. Advanced Search results can also be viewed as a KWIC concordance, which can be reordered to highlight
collocation In corpus linguistics, a collocation is a series of words or terms that co-occur more often than would be expected by chance. In phraseology, a collocation is a type of compositional phraseme, meaning that it can be understood from the words th ...
al patterns.


References


External links

* Scots language Scottish English Corpora University of Glasgow Applied linguistics Linguistic research {{Germanic-lang-stub