In
punctuation
Punctuation marks are marks indicating how a piece of writing, written text should be read (silently or aloud) and, consequently, understood. The oldest known examples of punctuation marks were found in the Mesha Stele from the 9th century BC, c ...
, a word divider is a form of
glyph
A glyph ( ) is any kind of purposeful mark. In typography, a glyph is "the specific shape, design, or representation of a character". It is a particular graphical representation, in a particular typeface, of an element of written language. A ...
which separates written
words
A word is a basic element of language that carries meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of what a word is, there is no consensus among linguists on its ...
. In languages which use the
Latin
Latin ( or ) is a classical language belonging to the Italic languages, Italic branch of the Indo-European languages. Latin was originally spoken by the Latins (Italic tribe), Latins in Latium (now known as Lazio), the lower Tiber area aroun ...
,
Cyrillic
The Cyrillic script ( ) is a writing system used for various languages across Eurasia. It is the designated national script in various Slavic, Turkic, Mongolic, Uralic, Caucasian and Iranic-speaking countries in Southeastern Europe, Ea ...
, and
Arabic alphabet
The Arabic alphabet, or the Arabic abjad, is the Arabic script as specifically codified for writing the Arabic language. It is a unicase, unicameral script written from right-to-left in a cursive style, and includes 28 letters, of which most ...
s, as well as other scripts of Europe and West Asia, the word divider is a blank
space
Space is a three-dimensional continuum containing positions and directions. In classical physics, physical space is often conceived in three linear dimensions. Modern physicists usually consider it, with time, to be part of a boundless ...
, or ''whitespace''. This convention is spreading, along with other aspects of European punctuation, to Asia and Africa, where words are usually written without word separation.
In
character encoding
Character encoding is the process of assigning numbers to graphical character (computing), characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using computers. The numerical v ...
,
word segmentation
A word is a basic element of language that carries meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of what a word is, there is no consensus among linguists on its ...
depends on which characters are defined as word dividers.
History
In
Ancient Egyptian
Ancient Egypt () was a cradle of civilization concentrated along the lower reaches of the Nile River in Northeast Africa. It emerged from prehistoric Egypt around 3150BC (according to conventional Egyptian chronology), when Upper and Lower E ...
,
determinative
A determinative, also known as a taxogram or semagram, is an ideogram used to mark semantic categories of words in logographic scripts which helps to disambiguate interpretation. They have no direct counterpart in spoken language, though they ...
s may have been used as much to demarcate word boundaries as to disambiguate the semantics of words. Rarely in
Assyrian cuneiform
Cuneiform is a Logogram, logo-Syllabary, syllabic writing system that was used to write several languages of the Ancient Near East. The script was in active use from the early Bronze Age until the beginning of the Common Era. Cuneiform script ...
, but commonly in the later cuneiform
Ugaritic alphabet
The Ugaritic alphabet is an abjad (consonantal alphabet) with syllabic elements written using the same tools as cuneiform (i.e. pressing a wedge-shaped stylus into a clay tablet), which emerged or 1300 BCE to write Ugaritic, an extinct Nor ...
, a vertical stroke 𒑰 was used to separate words. In
Old Persian cuneiform
Old Persian cuneiform is a semi-alphabetic cuneiform, cuneiform script that was the primary script for Old Persian. Texts written in this cuneiform have been found in Iran (Persepolis, Susa, Hamadan, Kharg Island), Armenia, Romania (Gherla), Turk ...
, a diagonally sloping wedge 𐏐 was used.
As the alphabet spread throughout the ancient world, words were often run together without division, and this practice remains or remained until recently in much of South and Southeast Asia. However, not infrequently in inscriptions a vertical line, and in manuscripts a single (·), double (:), or triple (⁝)
interpunct
An interpunct , also known as an interpoint, middle dot, middot, centered dot or centred dot, is a punctuation mark consisting of a vertically centered dot used for interword separation in Classical Latin. ( Word-separating spaces did not appe ...
(dot) was used to divide words. This practice was found in
Phoenician,
Aramaic
Aramaic (; ) is a Northwest Semitic language that originated in the ancient region of Syria and quickly spread to Mesopotamia, the southern Levant, Sinai, southeastern Anatolia, and Eastern Arabia, where it has been continually written a ...
,
Hebrew
Hebrew (; ''ʿÎbrit'') is a Northwest Semitic languages, Northwest Semitic language within the Afroasiatic languages, Afroasiatic language family. A regional dialect of the Canaanite languages, it was natively spoken by the Israelites and ...
,
Greek
Greek may refer to:
Anything of, from, or related to Greece, a country in Southern Europe:
*Greeks, an ethnic group
*Greek language, a branch of the Indo-European language family
**Proto-Greek language, the assumed last common ancestor of all kno ...
, and
Latin
Latin ( or ) is a classical language belonging to the Italic languages, Italic branch of the Indo-European languages. Latin was originally spoken by the Latins (Italic tribe), Latins in Latium (now known as Lazio), the lower Tiber area aroun ...
, and continues today with
Ethiopic, though there whitespace is gaining ground.
Scriptio continua
The early
alphabet
An alphabet is a standard set of letter (alphabet), letters written to represent particular sounds in a spoken language. Specifically, letters largely correspond to phonemes as the smallest sound segments that can distinguish one word from a ...
ic writing systems, such as the
Phoenician alphabet
The Phoenician alphabet is an abjad (consonantal alphabet) used across the Mediterranean civilization of Phoenicia for most of the 1st millennium BC. It was one of the first alphabets, attested in Canaanite and Aramaic inscriptions fo ...
, had only signs for
consonant
In articulatory phonetics, a consonant is a speech sound that is articulated with complete or partial closure of the vocal tract, except for the h sound, which is pronounced without any stricture in the vocal tract. Examples are and pronou ...
s (although some signs for consonants could also stand for a
vowel
A vowel is a speech sound pronounced without any stricture in the vocal tract, forming the nucleus of a syllable. Vowels are one of the two principal classes of speech sounds, the other being the consonant. Vowels vary in quality, in loudness a ...
, so-called ''
matres lectionis
A ''mater lectionis'' ( , ; , ''matres lectionis'' ; original ) is any consonant letter that is used to indicate a vowel, primarily in the writing of Semitic languages such as Arabic, Hebrew and Syriac. The letters that do this in Hebrew are ...
''). Without some form of visible word dividers, parsing a text into its separate words would have been a puzzle. With the introduction of letters representing vowels in the
Greek alphabet
The Greek alphabet has been used to write the Greek language since the late 9th or early 8th century BC. It was derived from the earlier Phoenician alphabet, and is the earliest known alphabetic script to systematically write vowels as wel ...
, the need for inter-word separation lessened. The earliest Greek inscriptions used interpuncts, as was common in the writing systems which preceded it, but soon the practice of ''
scriptio continua
(Latin for 'continuous script'), also known as or , is a style of writing without spaces or other marks between the words or sentences. The form also lacks punctuation, diacritics, or distinguished letter case.
In the West, the oldest Greek ...
'', continuous writing in which all words ran together without separation became common.
Types
None
Alphabetic writing without inter-word separation, known as ''
scriptio continua
(Latin for 'continuous script'), also known as or , is a style of writing without spaces or other marks between the words or sentences. The form also lacks punctuation, diacritics, or distinguished letter case.
In the West, the oldest Greek ...
'', was used in Ancient Egyptian. It appeared in Post-classical Latin after several centuries of the use of the interpunct.
Traditionally, ''scriptio continua'' was used for the
Indic alphabets of South and Southeast Asia and
hangul
The Korean alphabet is the modern writing system for the Korean language. In North Korea, the alphabet is known as (), and in South Korea, it is known as (). The letters for the five basic consonants reflect the shape of the speech organs ...
of Korea, but spacing is now used with hangul and increasingly with the Indic alphabets.
Today
Chinese and
Japanese are the most widely used scripts consistently written without punctuation to separate words, though other scripts such as
Thai and
Lao also follow this writing convention. In Classical Chinese, a word and a
character were almost the same thing, so that word dividers would have been superfluous. Although
Modern Mandarin has numerous polysyllabic words, and each syllable is written with a distinct character, the conceptual link between character and word or at least
morpheme
A morpheme is any of the smallest meaningful constituents within a linguistic expression and particularly within a word. Many words are themselves standalone morphemes, while other words contain multiple morphemes; in linguistic terminology, this ...
remains strong, and no need is felt for word separation apart from what characters already provide. This link is also found in the
Vietnamese language
Vietnamese () is an Austroasiatic languages, Austroasiatic language Speech, spoken primarily in Vietnam where it is the official language. It belongs to the Vietic languages, Vietic subgroup of the Austroasiatic language family. Vietnamese is s ...
; however, in the
Vietnamese alphabet
The Vietnamese alphabet (, ) is the modern writing script for the Vietnamese language. It uses the Latin script based on Romance languages like French language, French, originally developed by Francisco de Pina (1585–1625), a missionary from P ...
, virtually all syllables are separated by spaces, whether or not they form word boundaries.
Space
Space is the most common word divider, especially in
Latin script
The Latin script, also known as the Roman script, is a writing system based on the letters of the classical Latin alphabet, derived from a form of the Greek alphabet which was in use in the ancient Greek city of Cumae in Magna Graecia. The Gree ...
.
Vertical lines
Ancient inscribed and cuneiform scripts such as
Anatolian hieroglyphs
Anatolian hieroglyphs are an indigenous logographic script native to central Anatolia, consisting of some 500 signs. They were once commonly known as Hittite hieroglyphs, but the language they encode proved to be Luwian language, Luwian, not Hitt ...
frequently used short vertical lines to separate words, as did
Linear B
Linear B is a syllabary, syllabic script that was used for writing in Mycenaean Greek, the earliest Attested language, attested form of the Greek language. The script predates the Greek alphabet by several centuries, the earliest known examp ...
. In manuscripts, vertical lines were more commonly used for larger breaks, equivalent to the Latin comma and period. This continues with many Indic scripts today (the
danda
In Indic scripts, the daṇḍa (Sanskrit: दण्ड ' "stick") is a punctuation mark. The glyph consists of a single vertical stroke.
Use
The daṇḍa marks the end of a sentence or line, comparable to a full stop (period) as commonly us ...
).
Interpunct, multiple dots, and hypodiastole
As noted above, the single and double interpunct were used in manuscripts (on paper) throughout the ancient world. For example, Ethiopic inscriptions used a vertical line, whereas manuscripts used double dots (፡) resembling a colon. The latter practice continues today, though the space is making inroads. Classical Latin used the interpunct in both paper manuscripts and stone inscriptions.
[(Wingo 1972:16)] Ancient Greek orthography used between two and five dots as word separators, as well as the
hypodiastole
The hypodiastole (Greek: , , ), also known as a diastole,''Oxford English Dictionary'', "diastole, ''n.''" Oxford University Press (Oxford), 1895. was an interpunct developed in late Ancient and Byzantine Greek texts before the separation o ...
.
Different letter forms
In the modern
Hebrew
Hebrew (; ''ʿÎbrit'') is a Northwest Semitic languages, Northwest Semitic language within the Afroasiatic languages, Afroasiatic language family. A regional dialect of the Canaanite languages, it was natively spoken by the Israelites and ...
and
Arabic alphabet
The Arabic alphabet, or the Arabic abjad, is the Arabic script as specifically codified for writing the Arabic language. It is a unicase, unicameral script written from right-to-left in a cursive style, and includes 28 letters, of which most ...
s, some letters have distinct forms at the ends and/or beginnings of words. This demarcation is used in addition to spacing.
Vertical arrangement

The
Nastaʿlīq
''Nastaliq'' (; ; ), also romanized as ''Nastaʿlīq'' or ''Nastaleeq'' (), is one of the main calligraphic hands used to write Arabic script and is used for some Indo-Iranian languages, predominantly Classical Persian, Kashmiri, Punjabi a ...
form of
Islamic calligraphy
Islamic calligraphy is the artistic practice of penmanship and calligraphy, in the languages which use Arabic alphabet or the Arabic script#Additional letters used in other languages, alphabets derived from it. It is a highly stylized and struc ...
uses vertical arrangement to separate words. The beginning of each word is written higher than the end of the preceding word, so that a line of text takes on a
sawtooth appearance. Nastaliq spread from Persia and today is used for
Persian
Persian may refer to:
* People and things from Iran, historically called ''Persia'' in the English language
** Persians, the majority ethnic group in Iran, not to be conflated with the Iranic peoples
** Persian language, an Iranian language of the ...
,
Uyghur
Uyghur may refer to:
* Uyghurs, a Turkic ethnic group living in Eastern and Central Asia (West China)
** Uyghur language, a Turkic language spoken primarily by the Uyghurs
*** Old Uyghur language, a different Turkic language spoken in the Uyghur K ...
,
Pashto
Pashto ( , ; , ) is an eastern Iranian language in the Indo-European language family, natively spoken in northwestern Pakistan and southern and eastern Afghanistan. It has official status in Afghanistan and the Pakistani province of Khyb ...
, and
Urdu
Urdu (; , , ) is an Indo-Aryan languages, Indo-Aryan language spoken chiefly in South Asia. It is the Languages of Pakistan, national language and ''lingua franca'' of Pakistan. In India, it is an Eighth Schedule to the Constitution of Indi ...
.
Pause
In
finger spelling and in
Morse code
Morse code is a telecommunications method which Character encoding, encodes Written language, text characters as standardized sequences of two different signal durations, called ''dots'' and ''dashes'', or ''dits'' and ''dahs''. Morse code i ...
, words are separated by a pause.
Unicode
For use with computers, these marks have
codepoint
A code point, codepoint or code position is a particular position in a table, where the position has been assigned a meaning. The table may be one dimensional (a column), two dimensional (like cells in a spreadsheet), three dimensional (sheets in ...
s in
Unicode
Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
:
*
*
*
*
*
*
[''Punctuation'' § 5. Papyrological Punctuation]
*
*
In
Linear B
Linear B is a syllabary, syllabic script that was used for writing in Mycenaean Greek, the earliest Attested language, attested form of the Greek language. The script predates the Greek alphabet by several centuries, the earliest known examp ...
script:
*
*
See also
*
Whitespace
White space or whitespace may refer to:
Technology
* Whitespace characters, characters in computing that represent horizontal or vertical space
* White spaces (radio), allocated but locally unused radio frequencies
* TV White Space Database, a m ...
*
Sentence spacing
Sentence spacing concerns how Space (punctuation), spaces are inserted between sentences in typeset Written language, text and is a matter of typographical convention (norm), convention. Since the introduction of movable type, movable-type printin ...
*
Speech segmentation Speech segmentation is the process of identifying the boundaries between words, syllables, or phonemes in spoken natural languages. The term applies both to the mental processes used by humans, and to artificial processes of natural language proces ...
*
Zero-width non-joiner
The zero-width non-joiner (ZWNJ, ; rendered: ; HTML entity: or ) is a non-printing character used in the computerization of writing systems that make use of Typographic ligature, ligatures. For example, in writing systems that feature initial, ...
*
Zero-width space
The zero-width space (rendered: ; HTML entity: or ), abbreviated ZWSP, is a control character, non-printing character used in computerized typesetting to indicate where the word boundaries are, without actually displaying a visible space in the re ...
*
Substitute blank
*
Underscore
An underscore or underline is a line drawn under a segment of text. In proofreading, underscoring is a convention that says "set this text in italic type", traditionally used on manuscript or typescript as an instruction to the printer. Its ...
References
Further reading
*
*
*
*
*
{{navbox punctuation
Punctuation