A corpus language, or ''Trümmersprache'', is a language that has no living speakers but for which numerous records produced by its native speakers survive.
[Langslow, D.R. 2002 "Approaching bilingualism in corpus languages" in James Noel Adams, Mark Janse, Simon Swain (edd.) ''Bilingualism in Ancient Society: Language Contact and the Written Text'' Oxford: OUP.] Examples of corpus languages are
Ancient Greek
Ancient Greek (, ; ) includes the forms of the Greek language used in ancient Greece and the classical antiquity, ancient world from around 1500 BC to 300 BC. It is often roughly divided into the following periods: Mycenaean Greek (), Greek ...
,
Latin
Latin ( or ) is a classical language belonging to the Italic languages, Italic branch of the Indo-European languages. Latin was originally spoken by the Latins (Italic tribe), Latins in Latium (now known as Lazio), the lower Tiber area aroun ...
, the
Egyptian language
The Egyptian language, or Ancient Egyptian (; ), is an extinct branch of the Afro-Asiatic languages that was spoken in ancient Egypt. It is known today from a large corpus of surviving texts, which were made accessible to the modern world ...
,
Old English
Old English ( or , or ), or Anglo-Saxon, is the earliest recorded form of the English language, spoken in England and southern and eastern Scotland in the Early Middle Ages. It developed from the languages brought to Great Britain by Anglo-S ...
and
Elamite
Elamite, also known as Hatamtite and formerly as Scythic, Median, Amardian, Anshanian and Susian, is an extinct language that was spoken by the ancient Elamites. It was recorded in what is now southwestern Iran from 2600 BC to 330 BC. Elamite i ...
.
Some corpus languages, such as Ancient Greek and Latin, left very large corpora and therefore can be fully reconstructed, even though some details of pronunciation may be unclear. Such languages can be used even today, as is the case with
Sanskrit
Sanskrit (; stem form ; nominal singular , ,) is a classical language belonging to the Indo-Aryan languages, Indo-Aryan branch of the Indo-European languages. It arose in northwest South Asia after its predecessor languages had Trans-cultural ...
and Latin. Others have such limited corpora that some important words—e.g., some pronouns—are lacking in the corpora. Examples of these are
Ugaritic
Ugaritic () is an extinct Northwest Semitic languages, Northwest Semitic language known through the Ugaritic texts discovered by French archaeology, archaeologists in 1928 at Ugarit, including several major literary texts, notably the Baal cycl ...
and
Gothic. Languages attested only by a few words, often names, and a few phrases (called ''Trümmersprachen'' in German linguistics, literally "rubble languages") can be reconstructed only in a very limited way, and often their
genetic relationship to other languages remains unclear. Examples are the
Lombardic language
Lombardic or Langobardic () is an extinct West Germanic language that was spoken by the Lombards (), the Germanic people who settled in present-day Italy in the sixth century and established the Kingdom of the Lombards. It was already declining ...
and
Dadanitic
Dadanitic is the script and possibly the language of the oasis of Dadān (modern Al-'Ula) and the kingdom of Lihyan, Liḥyān in northwestern Arabia, spoken probably some time during the second half of the first millennium BCE.
Nomenclature
Dad ...
, a
Semitic language
The Semitic languages are a branch of the Afroasiatic language family. They include Arabic,
Amharic, Tigrinya, Aramaic, Hebrew, Maltese, Modern South Arabian languages and numerous other ancient and modern languages. They are spoken by mo ...
that may be close to
classical Arabic
Classical Arabic or Quranic Arabic () is the standardized literary form of Arabic used from the 7th century and throughout the Middle Ages, most notably in Umayyad Caliphate, Umayyad and Abbasid Caliphate, Abbasid literary texts such as poetry, e ...
.
Corpus languages are studied using the methods of
corpus linguistics
Corpus linguistics is an empirical method for the study of language by way of a text corpus (plural ''corpora''). Corpora are balanced, often stratified collections of authentic, "real world", text of speech or writing that aim to represent a giv ...
, but corpus linguistics can also be used (and is commonly used) for the study of the writings and other records of living languages.
Not all
extinct language
An extinct language or dead language is a language with no living native speakers. A dormant language is a dead language that still serves as a symbol of ethnic identity to an ethnic group; these languages are often undergoing a process of r ...
s are corpus languages, since there are many extinct languages in which few or no writings or other records survive.
References
See also
*
Endangered language
An endangered language or moribund language is a language that is at risk of disappearing as its speakers die out or shift to speaking other languages. Language loss occurs when the language has no more native speakers and becomes a " dead langua ...
*
Language death
In linguistics, language death occurs when a language loses its last native speaker. By extension, language extinction is when the language is no longer known, including by second-language speakers, when it becomes known as an extinct langua ...
Linguistics
Historical linguistics
Corpus linguistics
Extinct languages
{{historical-linguistics-stub