HOME

TheInfoList



OR:

In
punctuation Punctuation marks are marks indicating how a piece of writing, written text should be read (silently or aloud) and, consequently, understood. The oldest known examples of punctuation marks were found in the Mesha Stele from the 9th century BC, c ...
, a word divider is a form of
glyph A glyph ( ) is any kind of purposeful mark. In typography, a glyph is "the specific shape, design, or representation of a character". It is a particular graphical representation, in a particular typeface, of an element of written language. A ...
which separates written
words A word is a basic element of language that carries meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of what a word is, there is no consensus among linguists on its ...
. In languages which use the
Latin Latin ( or ) is a classical language belonging to the Italic languages, Italic branch of the Indo-European languages. Latin was originally spoken by the Latins (Italic tribe), Latins in Latium (now known as Lazio), the lower Tiber area aroun ...
,
Cyrillic The Cyrillic script ( ) is a writing system used for various languages across Eurasia. It is the designated national script in various Slavic, Turkic, Mongolic, Uralic, Caucasian and Iranic-speaking countries in Southeastern Europe, Ea ...
, and
Arabic alphabet The Arabic alphabet, or the Arabic abjad, is the Arabic script as specifically codified for writing the Arabic language. It is a unicase, unicameral script written from right-to-left in a cursive style, and includes 28 letters, of which most ...
s, as well as other scripts of Europe and West Asia, the word divider is a blank
space Space is a three-dimensional continuum containing positions and directions. In classical physics, physical space is often conceived in three linear dimensions. Modern physicists usually consider it, with time, to be part of a boundless ...
, or ''whitespace''. This convention is spreading, along with other aspects of European punctuation, to Asia and Africa, where words are usually written without word separation. In
character encoding Character encoding is the process of assigning numbers to graphical character (computing), characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using computers. The numerical v ...
,
word segmentation A word is a basic element of language that carries meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of what a word is, there is no consensus among linguists on its ...
depends on which characters are defined as word dividers.


History

In
Ancient Egyptian Ancient Egypt () was a cradle of civilization concentrated along the lower reaches of the Nile River in Northeast Africa. It emerged from prehistoric Egypt around 3150BC (according to conventional Egyptian chronology), when Upper and Lower E ...
,
determinative A determinative, also known as a taxogram or semagram, is an ideogram used to mark semantic categories of words in logographic scripts which helps to disambiguate interpretation. They have no direct counterpart in spoken language, though they ...
s may have been used as much to demarcate word boundaries as to disambiguate the semantics of words. Rarely in
Assyrian cuneiform Cuneiform is a Logogram, logo-Syllabary, syllabic writing system that was used to write several languages of the Ancient Near East. The script was in active use from the early Bronze Age until the beginning of the Common Era. Cuneiform script ...
, but commonly in the later cuneiform
Ugaritic alphabet The Ugaritic alphabet is an abjad (consonantal alphabet) with syllabic elements written using the same tools as cuneiform (i.e. pressing a wedge-shaped stylus into a clay tablet), which emerged or 1300 BCE to write Ugaritic, an extinct Nor ...
, a vertical stroke 𒑰 was used to separate words. In
Old Persian cuneiform Old Persian cuneiform is a semi-alphabetic cuneiform, cuneiform script that was the primary script for Old Persian. Texts written in this cuneiform have been found in Iran (Persepolis, Susa, Hamadan, Kharg Island), Armenia, Romania (Gherla), Turk ...
, a diagonally sloping wedge 𐏐 was used. As the alphabet spread throughout the ancient world, words were often run together without division, and this practice remains or remained until recently in much of South and Southeast Asia. However, not infrequently in inscriptions a vertical line, and in manuscripts a single (·), double (:), or triple (⁝)
interpunct An interpunct , also known as an interpoint, middle dot, middot, centered dot or centred dot, is a punctuation mark consisting of a vertically centered dot used for interword separation in Classical Latin. ( Word-separating spaces did not appe ...
(dot) was used to divide words. This practice was found in Phoenician,
Aramaic Aramaic (; ) is a Northwest Semitic language that originated in the ancient region of Syria and quickly spread to Mesopotamia, the southern Levant, Sinai, southeastern Anatolia, and Eastern Arabia, where it has been continually written a ...
,
Hebrew Hebrew (; ''ʿÎbrit'') is a Northwest Semitic languages, Northwest Semitic language within the Afroasiatic languages, Afroasiatic language family. A regional dialect of the Canaanite languages, it was natively spoken by the Israelites and ...
,
Greek Greek may refer to: Anything of, from, or related to Greece, a country in Southern Europe: *Greeks, an ethnic group *Greek language, a branch of the Indo-European language family **Proto-Greek language, the assumed last common ancestor of all kno ...
, and
Latin Latin ( or ) is a classical language belonging to the Italic languages, Italic branch of the Indo-European languages. Latin was originally spoken by the Latins (Italic tribe), Latins in Latium (now known as Lazio), the lower Tiber area aroun ...
, and continues today with Ethiopic, though there whitespace is gaining ground.


Scriptio continua

The early
alphabet An alphabet is a standard set of letter (alphabet), letters written to represent particular sounds in a spoken language. Specifically, letters largely correspond to phonemes as the smallest sound segments that can distinguish one word from a ...
ic writing systems, such as the
Phoenician alphabet The Phoenician alphabet is an abjad (consonantal alphabet) used across the Mediterranean civilization of Phoenicia for most of the 1st millennium BC. It was one of the first alphabets, attested in Canaanite and Aramaic inscriptions fo ...
, had only signs for
consonant In articulatory phonetics, a consonant is a speech sound that is articulated with complete or partial closure of the vocal tract, except for the h sound, which is pronounced without any stricture in the vocal tract. Examples are and pronou ...
s (although some signs for consonants could also stand for a
vowel A vowel is a speech sound pronounced without any stricture in the vocal tract, forming the nucleus of a syllable. Vowels are one of the two principal classes of speech sounds, the other being the consonant. Vowels vary in quality, in loudness a ...
, so-called ''
matres lectionis A ''mater lectionis'' ( , ; , ''matres lectionis'' ; original ) is any consonant letter that is used to indicate a vowel, primarily in the writing of Semitic languages such as Arabic, Hebrew and Syriac. The letters that do this in Hebrew are ...
''). Without some form of visible word dividers, parsing a text into its separate words would have been a puzzle. With the introduction of letters representing vowels in the
Greek alphabet The Greek alphabet has been used to write the Greek language since the late 9th or early 8th century BC. It was derived from the earlier Phoenician alphabet, and is the earliest known alphabetic script to systematically write vowels as wel ...
, the need for inter-word separation lessened. The earliest Greek inscriptions used interpuncts, as was common in the writing systems which preceded it, but soon the practice of ''
scriptio continua (Latin for 'continuous script'), also known as or , is a style of writing without spaces or other marks between the words or sentences. The form also lacks punctuation, diacritics, or distinguished letter case. In the West, the oldest Greek ...
'', continuous writing in which all words ran together without separation became common.


Types


None

Alphabetic writing without inter-word separation, known as ''
scriptio continua (Latin for 'continuous script'), also known as or , is a style of writing without spaces or other marks between the words or sentences. The form also lacks punctuation, diacritics, or distinguished letter case. In the West, the oldest Greek ...
'', was used in Ancient Egyptian. It appeared in Post-classical Latin after several centuries of the use of the interpunct. Traditionally, ''scriptio continua'' was used for the Indic alphabets of South and Southeast Asia and
hangul The Korean alphabet is the modern writing system for the Korean language. In North Korea, the alphabet is known as (), and in South Korea, it is known as (). The letters for the five basic consonants reflect the shape of the speech organs ...
of Korea, but spacing is now used with hangul and increasingly with the Indic alphabets. Today Chinese and Japanese are the most widely used scripts consistently written without punctuation to separate words, though other scripts such as Thai and Lao also follow this writing convention. In Classical Chinese, a word and a character were almost the same thing, so that word dividers would have been superfluous. Although Modern Mandarin has numerous polysyllabic words, and each syllable is written with a distinct character, the conceptual link between character and word or at least
morpheme A morpheme is any of the smallest meaningful constituents within a linguistic expression and particularly within a word. Many words are themselves standalone morphemes, while other words contain multiple morphemes; in linguistic terminology, this ...
remains strong, and no need is felt for word separation apart from what characters already provide. This link is also found in the
Vietnamese language Vietnamese () is an Austroasiatic languages, Austroasiatic language Speech, spoken primarily in Vietnam where it is the official language. It belongs to the Vietic languages, Vietic subgroup of the Austroasiatic language family. Vietnamese is s ...
; however, in the
Vietnamese alphabet The Vietnamese alphabet (, ) is the modern writing script for the Vietnamese language. It uses the Latin script based on Romance languages like French language, French, originally developed by Francisco de Pina (1585–1625), a missionary from P ...
, virtually all syllables are separated by spaces, whether or not they form word boundaries.


Space

Space is the most common word divider, especially in
Latin script The Latin script, also known as the Roman script, is a writing system based on the letters of the classical Latin alphabet, derived from a form of the Greek alphabet which was in use in the ancient Greek city of Cumae in Magna Graecia. The Gree ...
.


Vertical lines

Ancient inscribed and cuneiform scripts such as
Anatolian hieroglyphs Anatolian hieroglyphs are an indigenous logographic script native to central Anatolia, consisting of some 500 signs. They were once commonly known as Hittite hieroglyphs, but the language they encode proved to be Luwian language, Luwian, not Hitt ...
frequently used short vertical lines to separate words, as did
Linear B Linear B is a syllabary, syllabic script that was used for writing in Mycenaean Greek, the earliest Attested language, attested form of the Greek language. The script predates the Greek alphabet by several centuries, the earliest known examp ...
. In manuscripts, vertical lines were more commonly used for larger breaks, equivalent to the Latin comma and period. This continues with many Indic scripts today (the
danda In Indic scripts, the daṇḍa (Sanskrit: दण्ड ' "stick") is a punctuation mark. The glyph consists of a single vertical stroke. Use The daṇḍa marks the end of a sentence or line, comparable to a full stop (period) as commonly us ...
).


Interpunct, multiple dots, and hypodiastole

As noted above, the single and double interpunct were used in manuscripts (on paper) throughout the ancient world. For example, Ethiopic inscriptions used a vertical line, whereas manuscripts used double dots (፡) resembling a colon. The latter practice continues today, though the space is making inroads. Classical Latin used the interpunct in both paper manuscripts and stone inscriptions.(Wingo 1972:16) Ancient Greek orthography used between two and five dots as word separators, as well as the
hypodiastole The hypodiastole (Greek: , , ), also known as a diastole,''Oxford English Dictionary'', "diastole, ''n.''" Oxford University Press (Oxford), 1895. was an interpunct developed in late Ancient and Byzantine Greek texts before the separation o ...
.


Different letter forms

In the modern
Hebrew Hebrew (; ''ʿÎbrit'') is a Northwest Semitic languages, Northwest Semitic language within the Afroasiatic languages, Afroasiatic language family. A regional dialect of the Canaanite languages, it was natively spoken by the Israelites and ...
and
Arabic alphabet The Arabic alphabet, or the Arabic abjad, is the Arabic script as specifically codified for writing the Arabic language. It is a unicase, unicameral script written from right-to-left in a cursive style, and includes 28 letters, of which most ...
s, some letters have distinct forms at the ends and/or beginnings of words. This demarcation is used in addition to spacing.


Vertical arrangement

The
Nastaʿlīq ''Nastaliq'' (; ; ), also romanized as ''Nastaʿlīq'' or ''Nastaleeq'' (), is one of the main calligraphic hands used to write Arabic script and is used for some Indo-Iranian languages, predominantly Classical Persian, Kashmiri, Punjabi a ...
form of
Islamic calligraphy Islamic calligraphy is the artistic practice of penmanship and calligraphy, in the languages which use Arabic alphabet or the Arabic script#Additional letters used in other languages, alphabets derived from it. It is a highly stylized and struc ...
uses vertical arrangement to separate words. The beginning of each word is written higher than the end of the preceding word, so that a line of text takes on a sawtooth appearance. Nastaliq spread from Persia and today is used for
Persian Persian may refer to: * People and things from Iran, historically called ''Persia'' in the English language ** Persians, the majority ethnic group in Iran, not to be conflated with the Iranic peoples ** Persian language, an Iranian language of the ...
,
Uyghur Uyghur may refer to: * Uyghurs, a Turkic ethnic group living in Eastern and Central Asia (West China) ** Uyghur language, a Turkic language spoken primarily by the Uyghurs *** Old Uyghur language, a different Turkic language spoken in the Uyghur K ...
,
Pashto Pashto ( , ; , ) is an eastern Iranian language in the Indo-European language family, natively spoken in northwestern Pakistan and southern and eastern Afghanistan. It has official status in Afghanistan and the Pakistani province of Khyb ...
, and
Urdu Urdu (; , , ) is an Indo-Aryan languages, Indo-Aryan language spoken chiefly in South Asia. It is the Languages of Pakistan, national language and ''lingua franca'' of Pakistan. In India, it is an Eighth Schedule to the Constitution of Indi ...
.


Pause

In finger spelling and in
Morse code Morse code is a telecommunications method which Character encoding, encodes Written language, text characters as standardized sequences of two different signal durations, called ''dots'' and ''dashes'', or ''dits'' and ''dahs''. Morse code i ...
, words are separated by a pause.


Unicode

For use with computers, these marks have
codepoint A code point, codepoint or code position is a particular position in a table, where the position has been assigned a meaning. The table may be one dimensional (a column), two dimensional (like cells in a spreadsheet), three dimensional (sheets in ...
s in
Unicode Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
: * * * * * * ''Punctuation'' § 5. Papyrological Punctuation
* * In
Linear B Linear B is a syllabary, syllabic script that was used for writing in Mycenaean Greek, the earliest Attested language, attested form of the Greek language. The script predates the Greek alphabet by several centuries, the earliest known examp ...
script: * *


See also

*
Whitespace White space or whitespace may refer to: Technology * Whitespace characters, characters in computing that represent horizontal or vertical space * White spaces (radio), allocated but locally unused radio frequencies * TV White Space Database, a m ...
*
Sentence spacing Sentence spacing concerns how Space (punctuation), spaces are inserted between sentences in typeset Written language, text and is a matter of typographical convention (norm), convention. Since the introduction of movable type, movable-type printin ...
*
Speech segmentation Speech segmentation is the process of identifying the boundaries between words, syllables, or phonemes in spoken natural languages. The term applies both to the mental processes used by humans, and to artificial processes of natural language proces ...
*
Zero-width non-joiner The zero-width non-joiner (ZWNJ, ; rendered: ; HTML entity: or ) is a non-printing character used in the computerization of writing systems that make use of Typographic ligature, ligatures. For example, in writing systems that feature initial, ...
*
Zero-width space The zero-width space (rendered: ; HTML entity: or ), abbreviated ZWSP, is a control character, non-printing character used in computerized typesetting to indicate where the word boundaries are, without actually displaying a visible space in the re ...
* Substitute blank *
Underscore An underscore or underline is a line drawn under a segment of text. In proofreading, underscoring is a convention that says "set this text in italic type", traditionally used on manuscript or typescript as an instruction to the printer. Its ...


References


Further reading

* * * * * {{navbox punctuation Punctuation