Syllabification () or syllabication (), also known as hyphenation, is the separation of a
word
A word is a basic element of language that carries semantics, meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of what a word is, there is no consensus among linguist ...
into
syllable
A syllable is a basic unit of organization within a sequence of speech sounds, such as within a word, typically defined by linguists as a ''nucleus'' (most often a vowel) with optional sounds before or after that nucleus (''margins'', which are ...
s, whether spoken, written or signed.
Overview
The written separation into syllables is usually marked by a
hyphen
The hyphen is a punctuation mark used to join words and to separate syllables of a single word. The use of hyphens is called hyphenation.
The hyphen is sometimes confused with dashes (en dash , em dash and others), which are wider, or with t ...
when using
English orthography
English orthography comprises the set of rules used when writing the English language, allowing readers and writers to associate written graphemes with the sounds of spoken English, as well as other features of the language. English's orthograp ...
(e.g., syl-la-ble) and with a period when transcribing the actually spoken syllables in the
International Phonetic Alphabet
The International Phonetic Alphabet (IPA) is an alphabetic system of phonetic notation based primarily on the Latin script. It was devised by the International Phonetic Association in the late 19th century as a standard written representation ...
(e.g., ). For presentation purposes,
typographer
Typography is the art and technique of Typesetting, arranging type to make written language legibility, legible, readability, readable and beauty, appealing when displayed. The arrangement of type involves selecting typefaces, Point (typogra ...
s may use an
interpunct
An interpunct , also known as an interpoint, middle dot, middot, centered dot or centred dot, is a punctuation mark consisting of a vertically centered dot used for interword separation in Classical Latin. ( Word-separating spaces did not appe ...
(
Unicode
Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
character U+00B7, e.g., syl·la·ble), a special-purpose "hyphenation point" (U+2027, e.g., syl‧la‧ble), or a
space
Space is a three-dimensional continuum containing positions and directions. In classical physics, physical space is often conceived in three linear dimensions. Modern physicists usually consider it, with time, to be part of a boundless ...
(e.g., syl la ble).
At the end of a line, a word is separated in writing into parts, conventionally called "syllables", if it does not fit the line and if moving it to the next line would make the first line much shorter than the others. This can be a particular problem with very long words, and with narrow columns in newspapers.
Word processing A word processor (WP) is a device or computer program that provides for input, editing, formatting, and output of text, often with some additional features.
Word processor (electronic device), Early word processors were stand-alone devices dedicate ...
has automated the process of
justification, making syllabification of shorter words often unnecessary.
In some languages, the spoken syllables are also the basis of syllabification in writing. However, possibly due to the weak correspondence between sounds and letters in the spelling of modern English, written syllabification in English is based mostly on
etymological
Etymology ( ) is the study of the origin and evolution of words—including their constituent units of sound and meaning—across time. In the 21st century a subfield within linguistics, etymology has become a more rigorously scientific study. ...
or
morphological, instead of
phonetic
Phonetics is a branch of linguistics that studies how humans produce and perceive sounds or, in the case of sign languages, the equivalent aspects of sign. Linguists who specialize in studying the physical properties of speech are phoneticians ...
, principles. For example, it is not possible to syllabify "learning" as ''lear-ning'' according to the correct syllabification of the living language. Seeing only ''lear-'' at the end of a line might mislead the reader into pronouncing the word incorrectly, as the
digraph ''ea'' can hold
many different values. The history of English orthography accounts for such phenomena.
English written syllabification therefore deals with a concept of "syllable" that does not correspond to the linguistic concept of a phonological (as opposed to morphological) unit.
As a result, even most native English speakers are unable to syllabify words according to established rules without consulting a dictionary or using a word processor. Schools usually do not provide much more advice on the topic than to consult a dictionary. In addition, there are differences between British and US syllabification and even between dictionaries of the same English variety.
In
Finnish,
Italian
Italian(s) may refer to:
* Anything of, from, or related to the people of Italy over the centuries
** Italians, a Romance ethnic group related to or simply a citizen of the Italian Republic or Italian Kingdom
** Italian language, a Romance languag ...
,
Portuguese,
Japanese (
Romaji
The romanization of Japanese is the use of Latin script to write the Japanese language. This method of writing is sometimes referred to in Japanese as .
Japanese is normally written in a combination of logogram, logographic characters borrowe ...
),
Korean (
Romanized
In linguistics, romanization is the conversion of text from a different writing system to the Roman (Latin) script, or a system for doing so. Methods of romanization include transliteration, for representing written text, and transcription, ...
) and other nearly phonemically spelled languages, writers can in principle correctly syllabify any existing or newly created word using only general rules. In Finland, children are first taught to hyphenate every word until they produce the correct syllabification reliably, after which the hyphens can be omitted.
Algorithm
A hyphenation algorithm is a set of rules, especially one codified for implementation in a computer program, that decides at which points a word can be broken over two lines with a hyphen. For example, a hyphenation algorithm might decide that ''impeachment'' can be broken as ''impeach-ment'' or ''im-peachment'' but not ''impe-achment''.
One of the reasons for the complexity of the rules of word-breaking is that different dialects of English tend to differ on hyphenation:
American English
American English, sometimes called United States English or U.S. English, is the set of variety (linguistics), varieties of the English language native to the United States. English is the Languages of the United States, most widely spoken lang ...
tends to work on sound, but
British English
British English is the set of Variety (linguistics), varieties of the English language native to the United Kingdom, especially Great Britain. More narrowly, it can refer specifically to the English language in England, or, more broadly, to ...
tends to look to the origins of the word and then to sound. There are also a large number of exceptions, which further complicates matters.
Some rules of thumb can be found in the Major Keary's "On Hyphenation – Anarchy of Pedantry." Among the
algorithm
In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...
ic approaches to hyphenation, the one implemented in the
TeX typesetting system is widely used. It is thoroughly documented in the first two volumes of ''
Computers and Typesetting'' by Donald Knuth and in Franklin Mark Liang's dissertation. The aim of Liang's work was to get the algorithm as accurate as possible and to keep exceptions to a minimum.
In TeX's original hyphenation patterns for American English, the exception list contains only 14 words.
In TeX
Ports of the TeX hyphenation algorithm are available as libraries for several programming languages, including
Haskell
Haskell () is a general-purpose, statically typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research, and industrial applications, Haskell pioneered several programming language ...
,
JavaScript
JavaScript (), often abbreviated as JS, is a programming language and core technology of the World Wide Web, alongside HTML and CSS. Ninety-nine percent of websites use JavaScript on the client side for webpage behavior.
Web browsers have ...
,
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Though Perl is not officially an acronym, there are various backronyms in use, including "Practical Extraction and Reporting Language".
Perl was developed ...
,
PostScript
PostScript (PS) is a page description language and dynamically typed, stack-based programming language. It is most commonly used in the electronic publishing and desktop publishing realm, but as a Turing complete programming language, it c ...
,
Python,
Ruby
Ruby is a pinkish-red-to-blood-red-colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called sapph ...
,
C#, and TeX can be made to show hyphens in the log by the command
\showhyphens
.
In
LaTeX
Latex is an emulsion (stable dispersion) of polymer microparticles in water. Latices are found in nature, but synthetic latices are common as well.
In nature, latex is found as a wikt:milky, milky fluid, which is present in 10% of all floweri ...
, hyphenation correction can be added by users by using:
\hyphenation
The
\hyphenation
command declares allowed hyphenation points in which words is a list of words, separated by spaces, in which each hyphenation point is indicated by a
-
character. For example,
\hyphenation
declares that in the current job "fortran" should not be hyphenated and that if "ergonomic" must be hyphenated, it will be at one of the indicated points.
However, there are several limits. For example, the stock
\hyphenation
command accepts only
ASCII
ASCII ( ), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable character, printable and 33 control character, control c ...
letters by default and so it cannot be used to correct hyphenation for words with non-ASCII characters (like ''ä'', ''é'', ''ç''), which are very common in many languages. Simple workarounds exist, however.
See also
*
Phonotactics
Phonotactics (from Ancient Greek 'voice, sound' and 'having to do with arranging') is a branch of phonology that deals with restrictions in a language on the permissible combinations of phonemes. Phonotactics defines permissible syllable struc ...
*
Tautosyllabic, heterosyllabic and
ambisyllabic phones
*
Syllable structure in English phonology
Notes
External links
Online Lyric Hyphenator Hyphenates English text into syllables
Hyphenation tool for the French Language Hyphenates French words with explanation
{{Authority control
Phonotactics