HOME

TheInfoList



OR:

Syllabification () or syllabication (), also known as hyphenation, is the separation of a
word A word is a basic element of language that carries an objective or practical meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of what a word is, there is no conse ...
into
syllable A syllable is a unit of organization for a sequence of speech sounds typically made up of a syllable nucleus (most often a vowel) with optional initial and final margins (typically, consonants). Syllables are often considered the phonological ...
s, whether spoken, written or signed.


Overview

The written separation into syllables is usually marked by a
hyphen The hyphen is a punctuation mark used to join words and to separate syllables of a single word. The use of hyphens is called hyphenation. ''Son-in-law'' is an example of a hyphenated word. The hyphen is sometimes confused with dashes ( figure ...
when using
English orthography English orthography is the writing system used to represent spoken English, allowing readers to connect the graphemes to sound and to meaning. It includes English's norms of spelling, hyphenation, capitalisation, word breaks, emphasis, ...
(e.g., syl-la-ble) and with a period when transcribing the actually spoken syllables in the
International Phonetic Alphabet The International Phonetic Alphabet (IPA) is an alphabetic system of phonetic notation based primarily on the Latin script. It was devised by the International Phonetic Association in the late 19th century as a standardized representation ...
(e.g., ). For presentation purposes,
typographer Typography is the art and technique of arranging type to make written language legible, readable and appealing when displayed. The arrangement of type involves selecting typefaces, point sizes, line lengths, line-spacing ( leading), a ...
s may use an
interpunct An interpunct , also known as an interpoint, middle dot, middot and centered dot or centred dot, is a punctuation mark consisting of a vertically centered dot used for interword separation in ancient Latin script. (Word-separating spaces did n ...
(
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, ...
character U+00B7, e.g., syl·la·ble), a special-purpose "hyphenation point" (U+2027, e.g., syl‧la‧ble), or a
space Space is the boundless three-dimensional extent in which objects and events have relative position and direction. In classical physics, physical space is often conceived in three linear dimensions, although modern physicists usually consi ...
(e.g., syl la ble). At the end of a line, a word is separated in writing into parts, conventionally called "syllables", if it does not fit the line and if moving it to the next line would make the first line much shorter than the others. This can be a particular problem with very long words, and with narrow columns in newspapers. Word processing has automated the process of justification, making syllabification of shorter words often unnecessary. In some languages, the spoken syllables are also the basis of syllabification in writing. However, possibly due to the weak correspondence between sounds and letters in the spelling of modern English, written syllabification in English is based mostly on
etymological Etymology () The New Oxford Dictionary of English (1998) – p. 633 "Etymology /ˌɛtɪˈmɒlədʒi/ the study of the class in words and the way their meanings have changed throughout time". is the study of the history of the form of words a ...
or morphological, instead of
phonetic Phonetics is a branch of linguistics that studies how humans produce and perceive sounds, or in the case of sign languages, the equivalent aspects of sign. Linguists who specialize in studying the physical properties of speech are phoneticians. ...
, principles. For example, it is not possible to syllabify "learning" as ''lear-ning'' according to the correct syllabification of the living language. Seeing only ''lear-'' at the end of a line might mislead the reader into pronouncing the word incorrectly, as the digraph ''ea'' can hold many different values. The history of English orthography accounts for such phenomena. English written syllabification therefore deals with a concept of "syllable" that does not correspond to the linguistic concept of a phonological (as opposed to morphological) unit. As a result, even most native English speakers are unable to syllabify words according to established rules without consulting a dictionary or using a word processor. Schools usually do not provide much more advice on the topic than to consult a dictionary. In addition, there are differences between British and US syllabification and even between dictionaries of the same English variety. In Finnish, Italian, Portuguese and other nearly phonemically spelled languages, writers can in principle correctly syllabify any existing or newly created word using only general rules. In Finland, children are first taught to hyphenate every word until they produce the correct syllabification reliably, after which the hyphens can be omitted.


Algorithm

A hyphenation algorithm is a set of rules, especially one codified for implementation in a computer program, that decides at which points a word can be broken over two lines with a hyphen. For example, a hyphenation algorithm might decide that ''impeachment'' can be broken as ''impeach-ment'' or ''im-peachment'' but not ''impe-achment''. One of the reasons for the complexity of the rules of word-breaking is that different dialects of English tend to differ on hyphenation:
American English American English, sometimes called United States English or U.S. English, is the set of varieties of the English language native to the United States. English is the most widely spoken language in the United States and in most circumstances ...
tends to work on sound, but
British English British English (BrE, en-GB, or BE) is, according to Oxford Dictionaries, "English as used in Great Britain, as distinct from that used elsewhere". More narrowly, it can refer specifically to the English language in England, or, more broadl ...
tends to look to the origins of the word and then to sound. There are also a large number of exceptions, which further complicates matters. Some rules of thumb can be found in the Major Keary's "On Hyphenation – Anarchy of Pedantry." Among the
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...
ic approaches to hyphenation, the one implemented in the TeX typesetting system is widely used. It is thoroughly documented in the first two volumes of '' Computers and Typesetting'' by Donald Knuth and in Franklin Mark Liang's dissertation. The aim of Liang's work was to get the algorithm as accurate as he practically could and to keep any exception dictionary small. In TeX's original hyphenation patterns for American English, the exception list contains only 14 words.


In TeX

Ports of the TeX hyphenation algorithm are available as libraries for several programming languages, including Haskell,
JavaScript JavaScript (), often abbreviated as JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS. As of 2022, 98% of websites use JavaScript on the client side for webpage behavior, of ...
,
Perl Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it also referred to its redesigned "sister language", Perl 6, before the latter's name was offic ...
,
PostScript PostScript (PS) is a page description language in the electronic publishing and desktop publishing realm. It is a dynamically typed, concatenative programming language. It was created at Adobe Systems by John Warnock, Charles Geschke, Do ...
, Python,
Ruby A ruby is a pinkish red to blood-red colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called ...
, C#, and TeX can be made to show hyphens in the log by the command \showhyphens. In
LaTeX Latex is an emulsion (stable dispersion) of polymer microparticles in water. Latexes are found in nature, but synthetic latexes are common as well. In nature, latex is found as a milky fluid found in 10% of all flowering plants (angiosperms ...
, hyphenation correction can be added by users by using:
\hyphenation
The \hyphenation command declares allowed hyphenation points in which words is a list of words, separated by spaces, in which each hyphenation point is indicated by a - character. For example,
\hyphenation
declares that in the current job "fortran" should not be hyphenated and that if "ergonomic" must be hyphenated, it will be at one of the indicated points. However, there are several limits. For example, the stock \hyphenation command accepts only
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
letters by default and so it cannot be used to correct hyphenation for words with non-ASCII characters (like ''ä'', ''é'', ''ç''), which are very common in almost all languages except English. Simple workarounds exist, however.


Worked

* Phonotactics * Tautosyllabic, heterosyllabic and ambisyllabic phones * Syllable structure in English phonology


Notes


External links


Online Lyric Hyphenator
Hyphenates English text into syllables
Online hyphenation tool
Hyphenation algorithms for several languages
Hyphenation tool for the French Language
Hyphenates French words with explanation {{Authority control Phonotactics