A diacritic (also diacritical mark, diacritical point, diacritical sign, or accent) is a
glyph
A glyph ( ) is any kind of purposeful mark. In typography, a glyph is "the specific shape, design, or representation of a character". It is a particular graphical representation, in a particular typeface, of an element of written language. A ...
added to a
letter or to a basic glyph. The term derives from the
Ancient Greek
Ancient Greek (, ; ) includes the forms of the Greek language used in ancient Greece and the classical antiquity, ancient world from around 1500 BC to 300 BC. It is often roughly divided into the following periods: Mycenaean Greek (), Greek ...
(, "distinguishing"), from (, "to distinguish"). The word ''diacritic'' is a
noun
In grammar, a noun is a word that represents a concrete or abstract thing, like living creatures, places, actions, qualities, states of existence, and ideas. A noun may serve as an Object (grammar), object or Subject (grammar), subject within a p ...
, though it is sometimes used in an
attributive sense, whereas ''diacritical'' is only an
adjective
An adjective (abbreviations, abbreviated ) is a word that describes or defines a noun or noun phrase. Its semantic role is to change information given by the noun.
Traditionally, adjectives are considered one of the main part of speech, parts of ...
. Some diacritics, such as the
acute ,
grave
A grave is a location where a cadaver, dead body (typically that of a human, although sometimes that of an animal) is burial, buried or interred after a funeral. Graves are usually located in special areas set aside for the purpose of buria ...
, and
circumflex
The circumflex () is a diacritic in the Latin and Greek scripts that is also used in the written forms of many languages and in various romanization and transcription schemes. It received its English name from "bent around"a translation of ...
(all shown above an 'o'), are often called ''accents''. Diacritics may appear above or below a letter or in some other position such as within the letter or between two letters.
The main use of diacritics in
Latin script
The Latin script, also known as the Roman script, is a writing system based on the letters of the classical Latin alphabet, derived from a form of the Greek alphabet which was in use in the ancient Greek city of Cumae in Magna Graecia. The Gree ...
is to change the sound-values of the letters to which they are added. Historically, English has used the
diaeresis diacritic to indicate the correct pronunciation of ambiguous words, such as "coöperate", without which the
letter sequence could be misinterpreted to be pronounced . Other examples are the acute and grave accents, which can indicate that a vowel is to be pronounced differently than is normal in that position, for example not reduced to /ə/ or silent as in the case of the two uses of the letter e in the noun '' résumé'' (as opposed to the verb ''resume'') and the help sometimes provided in the pronunciation of some words such as ''doggèd'', ''learnèd'', ''blessèd'', and especially words pronounced differently than normal in poetry (for example ''movèd'', ''breathèd'').
Most other words with diacritics in English are borrowings from languages such as French to better preserve the spelling, such as the diaeresis on and , the acute from , the circumflex
The circumflex () is a diacritic in the Latin and Greek scripts that is also used in the written forms of many languages and in various romanization and transcription schemes. It received its English name from "bent around"a translation of ...
in the word , and the cedille in . All these diacritics, however, are frequently omitted in writing, and English is the only major modern European language that does not have diacritics in common usage.
In Latin-script alphabet
A Latin-script alphabet (Latin alphabet or Roman alphabet) is an alphabet that uses Letter (alphabet), letters of the Latin script. The 21-letter archaic Latin alphabet and the 23-letter classical Latin alphabet belong to the oldest of this gr ...
s in other languages diacritics may distinguish between homonym
In linguistics, homonyms are words which are either; '' homographs''—words that mean different things, but have the same spelling (regardless of pronunciation), or '' homophones''—words that mean different things, but have the same pronunciat ...
s, such as the French ("there") versus ("the"), which are both pronounced . In Gaelic type, a dot over a consonant indicates lenition of the consonant in question. In other writing systems, diacritics may perform other functions. Vowel pointing systems, namely the Arabic
Arabic (, , or , ) is a Central Semitic languages, Central Semitic language of the Afroasiatic languages, Afroasiatic language family spoken primarily in the Arab world. The International Organization for Standardization (ISO) assigns lang ...
harakat and the Hebrew
Hebrew (; ''ʿÎbrit'') is a Northwest Semitic languages, Northwest Semitic language within the Afroasiatic languages, Afroasiatic language family. A regional dialect of the Canaanite languages, it was natively spoken by the Israelites and ...
niqqud systems, indicate vowels that are not conveyed by the basic alphabet. The Indic virama ( ् etc.) and the Arabic sukūn ( ) mark the absence of vowels. Cantillation marks indicate prosody. Other uses include the Early Cyrillic titlo
Titlo is an extended diacritic symbol initially used in early Cyrillic and Glagolitic manuscripts, e.g., in Old Church Slavonic and Old East Slavic language, Old East Slavic languages. The word is a borrowing from the , and is a cognate of t ...
stroke ( ◌҃ ) and the Hebrew gershayim ( ), which, respectively, mark abbreviation
An abbreviation () is a shortened form of a word or phrase, by any method including shortening (linguistics), shortening, contraction (grammar), contraction, initialism (which includes acronym), or crasis. An abbreviation may be a shortened for ...
s or acronym
An acronym is a type of abbreviation consisting of a phrase whose only pronounced elements are the initial letters or initial sounds of words inside that phrase. Acronyms are often spelled with the initial Letter (alphabet), letter of each wor ...
s, and Greek diacritical marks, which showed that letters of the alphabet were being used as numerals. In Vietnamese and the Hanyu Pinyin official romanization system for Mandarin in China, diacritics are used to mark the tones of the syllables in which the marked vowels occur.
In orthography
An orthography is a set of convention (norm), conventions for writing a language, including norms of spelling, punctuation, Word#Word boundaries, word boundaries, capitalization, hyphenation, and Emphasis (typography), emphasis.
Most national ...
and collation, a letter modified by a diacritic may be treated either as a new, distinct letter or as a letter–diacritic combination. This varies from language to language and may vary from case to case within a language.
In some cases, letters are used as "in-line diacritics", with the same function as ancillary glyphs, in that they modify the sound of the letter preceding them, as in the case of the "h" in the English pronunciation of "sh" and "th". Such letter combinations are sometimes even collated as a single distinct letter. For example, the spelling sch was traditionally often treated as a separate letter in German. Words with that spelling were listed after all other words spelled with s in card catalogs in the Vienna public libraries, for example (before digitization).
Types
Among the types of diacritic used in alphabets based on the Latin script
The Latin script, also known as the Roman script, is a writing system based on the letters of the classical Latin alphabet, derived from a form of the Greek alphabet which was in use in the ancient Greek city of Cumae in Magna Graecia. The Gree ...
are:
* accents (so called because the acute, grave, and circumflex were originally used to indicate different types of pitch accent
A pitch-accent language is a type of language that, when spoken, has certain syllables in words or morphemes that are prominent, as indicated by a distinct contrasting pitch (music), pitch (tone (linguistics), linguistic tone) rather than by vol ...
s in the polytonic transcription of Greek)
** – acute (); for example
** – grave
A grave is a location where a cadaver, dead body (typically that of a human, although sometimes that of an animal) is burial, buried or interred after a funeral. Graves are usually located in special areas set aside for the purpose of buria ...
; for example
** – circumflex
The circumflex () is a diacritic in the Latin and Greek scripts that is also used in the written forms of many languages and in various romanization and transcription schemes. It received its English name from "bent around"a translation of ...
; for example
** – caron
A caron or háček ( ), is a diacritic mark () placed over certain letters in the orthography of some languages, to indicate a change of the related letter's pronunciation.
Typographers tend to use the term ''caron'', while linguists prefer ...
, wedge; for example
** – double acute; for example
** – double grave; for example
**
* one dot
** – an overdot is used in many orthographies and transcriptions; for example
** – an underdot is also used in many orthographies and transcriptions; for example
** – an interpunct
An interpunct , also known as an interpoint, middle dot, middot, centered dot or centred dot, is a punctuation mark consisting of a vertically centered dot used for interword separation in Classical Latin. ( Word-separating spaces did not appe ...
is used in the Catalan (l·l)
** – a dot above right is used in Pe̍h-ōe-jī
( ; , , ; POJ), also known as Church Romanization, is an orthography used to write variants of Hokkien Southern Min, particularly Taiwanese Hokkien, Taiwanese and Amoy dialect, Amoy Hokkien, and it is widely employed as one of the writing syst ...
** tittle, the superscript dot of the modern lowercase
Letter case is the distinction between the letters that are in larger uppercase or capitals (more formally ''majuscule'') and smaller lowercase (more formally '' minuscule'') in the written representation of certain languages. The writing system ...
Latin and
* two dots:
** two overdots () are used for umlaut, diaeresis and others; (for example )
** two underdots () are used in the International Phonetic Alphabet
The International Phonetic Alphabet (IPA) is an alphabetic system of phonetic notation based primarily on the Latin script. It was devised by the International Phonetic Association in the late 19th century as a standard written representation ...
(IPA) and the ALA-LC romanization
ALA-LC (American Library AssociationLibrary of Congress) is a set of standards for romanization, the representation of text in other writing systems using the Latin script.
Applications
The system is used to represent bibliographic information by ...
system
** – triangular colon, used in the IPA to mark long vowels (the "dots" are triangular, not circular).
* curves
** – breve
A breve ( , less often , grammatical gender, neuter form of the Latin "short, brief") is the diacritic mark , shaped like the bottom half of a circle. As used in Ancient Greek, it is also called , . It resembles the caron (, the wedge or in ...
; for example
** – inverted breve
Inverse or invert may refer to:
Science and mathematics
* Inverse (logic), a type of conditional sentence which is an immediate inference made from another conditional sentence
* Additive inverse, the inverse of a number that, when added to the ...
; for example
** – sicilicus
A sicilicus was an old Latin diacritical mark, , like a reversed C (Ɔ) placed above a letter and evidently deriving its name from its shape like a little sickle (which is ''wiktionary:sicilis#Latin, sicilis'' in Latin). The ancient sources say t ...
, a palaeographic diacritic similar to a caron or breve
** – tilde; for example
** – titlo
Titlo is an extended diacritic symbol initially used in early Cyrillic and Glagolitic manuscripts, e.g., in Old Church Slavonic and Old East Slavic language, Old East Slavic languages. The word is a borrowing from the , and is a cognate of t ...
* vertical stroke
** – a subscript vertical stroke is used in IPA to mark syllabicity and in to mark a schwa
** – a superscript vertical stroke is used in Pe̍h-ōe-jī
( ; , , ; POJ), also known as Church Romanization, is an orthography used to write variants of Hokkien Southern Min, particularly Taiwanese Hokkien, Taiwanese and Amoy dialect, Amoy Hokkien, and it is widely employed as one of the writing syst ...
* macron or horizontal line
** – macron; for example
** – underbar
* overlays
** – vertical bar
The vertical bar, , is a glyph with various uses in mathematics, computing, and typography. It has many names, often related to particular meanings: Sheffer stroke (in logic), pipe, bar, or (literally, the word "or"), vbar, and others.
Usage
...
through the character
** – slash through the character; for example
** – crossbar through the character
* ring
** – overring: for example
* superscript curls
** – apostrophe
** – inverted apostrophe
** – reversed apostrophe
** – hook above
In typesetting, the hook above () is a diacritic mark placed on top of vowels in the Vietnamese alphabet. In shape it looks like a tiny question mark without the dot underneath, or a tiny glottal stop, glottal stop (ʔ). For example, a capita ...
()
** – horn (); for example
* subscript curls
** – undercomma; for example
** – cedilla
A cedilla ( ; from Spanish language, Spanish ', "small ''ceda''", i.e. small "z"), or cedille (from French , ), is a hook or tail () added under certain letters (as a diacritic, diacritical mark) to indicate that their pronunciation is modif ...
; for example
** – hook
A hook is a tool consisting of a length of material, typically metal, that contains a portion that is curved/bent back or has a deeply grooved indentation, which serves to grab, latch or in any way attach itself onto another object. The hook's d ...
, left or right, sometimes superscript
** – ogonek
The tail or ( ; Polish: , "little tail", diminutive of ) is a diacritic hook placed under the lower right corner of a vowel in the Latin alphabet used in several European languages, and directly under a vowel in several Native American langu ...
; for example
* double marks (over or under two base characters)
** – double breve
** – tie bar or top ligature
** – double circumflex
** – longum
** – double tilde
* double sub/superscript diacritics
** – double cedilla
** – double ogonek
** – double diaeresis
** – double ypogegrammeni
The tilde, dot, comma, titlo
Titlo is an extended diacritic symbol initially used in early Cyrillic and Glagolitic manuscripts, e.g., in Old Church Slavonic and Old East Slavic language, Old East Slavic languages. The word is a borrowing from the , and is a cognate of t ...
, apostrophe, bar, and colon are sometimes diacritical marks, but also have other uses.
Not all diacritics occur adjacent to the letter they modify. In the Wali language of Ghana, for example, an apostrophe indicates a change of vowel quality, but occurs at the beginning of the word, as in the dialects ''’Bulengee'' and ''’Dolimi''. Because of vowel harmony
In phonology, vowel harmony is a phonological rule in which the vowels of a given domain – typically a phonological word – must share certain distinctive features (thus "in harmony"). Vowel harmony is typically long distance, meaning tha ...
, all vowels in a word are affected, so the scope of the diacritic is the entire word. In abugida
An abugida (; from Geʽez: , )sometimes also called alphasyllabary, neosyllabary, or pseudo-alphabetis a segmental Writing systems#Segmental writing system, writing system in which consonant–vowel sequences are written as units; each unit ...
scripts, like those used to write Hindi
Modern Standard Hindi (, ), commonly referred to as Hindi, is the Standard language, standardised variety of the Hindustani language written in the Devanagari script. It is an official language of India, official language of the Government ...
and Thai, diacritics indicate vowels, and may occur above, below, before, after, or around the consonant letter they modify.
The tittle (dot) on the letter or the letter , of the Latin alphabet originated as a diacritic to clearly distinguish from the minims (downstrokes) of adjacent letters. It first appeared in the 11th century in the sequence ''ii'' (as in ), then spread to ''i'' adjacent to ''m, n, u'', and finally to all lowercase ''i''s. The , originally a variant of ''i'', inherited the tittle. The shape of the diacritic developed from initially resembling today's acute accent to a long flourish by the 15th century. With the advent of Roman type it was reduced to the round dot we have today.
Several languages of eastern Europe use diacritics on both consonants and vowels, whereas in western Europe digraphs are more often used to change consonant sounds. Most languages in Europe use diacritics on vowels, aside from English where there are typically none (with some exceptions).
Diacritics specific to non-Latin alphabets
Arabic
* (ئ ؤ إ أ and stand alone ء) : indicates a glottal stop
The glottal stop or glottal plosive is a type of consonantal sound used in many Speech communication, spoken languages, produced by obstructing airflow in the vocal tract or, more precisely, the glottis. The symbol in the International Phonetic ...
.
* (ــًــٍــٌـ) () symbols: Serve a grammatical role in Arabic
Arabic (, , or , ) is a Central Semitic languages, Central Semitic language of the Afroasiatic languages, Afroasiatic language family spoken primarily in the Arab world. The International Organization for Standardization (ISO) assigns lang ...
. The sign ـً is most commonly written in combination with alif, e.g. .
* (ــّـ) : Gemination (doubling) of consonants.
* (ٱ) : Comes most commonly at the beginning of a word. Indicates a type of that is pronounced only when the letter is read at the beginning of the talk.
* (آ) : A written replacement for a that is followed by an alif, i.e. (). Read as a glottal stop followed by a long , e.g. are written out respectively as . This writing rule does not apply when the alif that follows a is not a part of the stem of the word, e.g. is not written out as as the stem does not have an alif that follows its .
* (ــٰـ) ''superscript '' (also "short" or "dagger alif": A replacement for an original alif that is dropped in the writing out of some rare words, e.g. is not written out with the original alif found in the word pronunciation, instead it is written out as .
* (In Arabic: also called ):
** (ــَـ) (a)
** (ــِـ) (i)
** (ــُـ) (u)
** (ــْـ) (no vowel)
* The or vowel points serve two purposes:
** They serve as a phonetic guide. They indicate the presence of short vowels (, , or ) or their absence ().
** At the last letter of a word, the vowel point reflects the inflection
In linguistic Morphology (linguistics), morphology, inflection (less commonly, inflexion) is a process of word formation in which a word is modified to express different grammatical category, grammatical categories such as grammatical tense, ...
case or conjugation mood.
*** For nouns, The is for the nominative, for the accusative, and for the genitive.
*** For verbs, the is for the imperfective, for the perfective, and the is for verbs in the imperative or jussive moods.
* Vowel points or should not be confused with consonant points or () – one, two or three dots written above or below a consonant to distinguish between letters of the same or similar form.
Greek
These diacritics are used in addition to the acute, grave, and circumflex accents and the diaeresis:
* – iota subscript ()
* – rough breathing
In the polytonic orthography of Ancient Greek, the rough breathing ( or ; ) character is a diacritical mark used to indicate the presence of an sound before a vowel, diphthong, or after rho. It remained in the polytonic orthography even af ...
(, ): aspiration
* – smooth (or soft) breathing (, ): lack of aspiration
Hebrew
* Niqqud
** – Dagesh
** – Mappiq
** – Rafe
** – Shin dot (at top right corner)
** – Sin dot (at top left corner)
** – Shva
** – Kubutz
** – Holam
** – Kamatz
** – Patakh
** – Segol
** – Tzeire
** – Hiriq
( Cantillation marks do not generally render correctly; refer to Hebrew cantillation#Names and shapes of the ta'amim for a complete table together with instructions for how to maximize the possibility of viewing them in a web browser.)
* Other
** – Geresh
** – Gershayim
Korean
The diacritics 〮 and 〯 , known as Bangjeom (), were used to mark pitch accents in Hangul
The Korean alphabet is the modern writing system for the Korean language. In North Korea, the alphabet is known as (), and in South Korea, it is known as (). The letters for the five basic consonants reflect the shape of the speech organs ...
for Middle Korean. They were written to the left of a syllable in vertical writing and above a syllable in horizontal writing.
Sanskrit and Indic
Syriac
* A dot above and a dot below a letter represent , transliterated as ''a'' or ''ă'',
* Two diagonally-placed dots above a letter represent , transliterated as ''ā'' or ''â'' or ''å'',
* Two horizontally-placed dots below a letter represent , transliterated as ''e'' or ''ĕ''; often pronounced and transliterated as ''i'' in the East Syriac dialect,
* Two diagonally-placed dots below a letter represent , transliterated as ''ē'',
* A dot underneath the ''Beth'' represent a soft sound, transliterated as ''v''
* A tilde (~) placed under ''Gamel'' represent a sound, transliterated as ''j''
* The letter ''Waw'' with a dot below it represents , transliterated as ''ū'' or ''u'',
* The letter ''Waw'' with a dot above it represents , transliterated as ''ō'' or ''o'',
* The letter ''Yōḏ'' with a dot beneath it represents , transliterated as ''ī'' or ''i'',
* A tilde (~) under ''Kaph'' represent a sound, transliterated as ''ch'' or ''č'',
* A semicircle under ''Peh'' represents an sound, transliterated as ''f'' or ''ph''.
In addition to the above vowel marks, transliteration of Syriac sometimes includes ''ə'', ''e̊'' or superscript ''e'' (or often nothing at all) to represent an original Aramaic schwa that became lost later on at some point in the development of Syriac. Some transliteration schemes find its inclusion necessary for showing spirantization or for historical reasons.
Non-alphabetic scripts
Some non-alphabetic scripts also employ symbols that function essentially as diacritics.
* Non-pure abjad
An abjad ( or abgad) is a writing system in which only consonants are represented, leaving the vowel sounds to be inferred by the reader. This contrasts with alphabets, which provide graphemes for both consonants and vowels. The term was introd ...
s (such as Hebrew
Hebrew (; ''ʿÎbrit'') is a Northwest Semitic languages, Northwest Semitic language within the Afroasiatic languages, Afroasiatic language family. A regional dialect of the Canaanite languages, it was natively spoken by the Israelites and ...
and Arabic
Arabic (, , or , ) is a Central Semitic languages, Central Semitic language of the Afroasiatic languages, Afroasiatic language family spoken primarily in the Arab world. The International Organization for Standardization (ISO) assigns lang ...
script) and abugida
An abugida (; from Geʽez: , )sometimes also called alphasyllabary, neosyllabary, or pseudo-alphabetis a segmental Writing systems#Segmental writing system, writing system in which consonant–vowel sequences are written as units; each unit ...
s use diacritics for denoting vowel
A vowel is a speech sound pronounced without any stricture in the vocal tract, forming the nucleus of a syllable. Vowels are one of the two principal classes of speech sounds, the other being the consonant. Vowels vary in quality, in loudness a ...
s. Hebrew and Arabic also indicate consonant doubling and change with diacritics; Hebrew and Devanagari
Devanagari ( ; in script: , , ) is an Indic script used in the Indian subcontinent. It is a left-to-right abugida (a type of segmental Writing systems#Segmental systems: alphabets, writing system), based on the ancient ''Brāhmī script, Brā ...
use them for foreign sounds. Devanagari and related abugidas also use a diacritical mark called a '' virama'' to mark the absence of a vowel. In addition, Devanagari uses the moon-dot '' chandrabindu'' ( ँ ) for vowel nasalization.
* Unified Canadian Aboriginal Syllabics use several types of diacritics, including the diacritics with alphabetic properties known as Medials and Finals. Although long vowels originally were indicated with a negative line through the Syllabic glyphs, making the glyph appear broken, in the modern forms, a dot above is used to indicate vowel length. In some of the styles, a ring above indicates a long vowel with a off-glide. Another diacritic, the "inner ring" is placed at the glyph's head to modify to and to � Medials such as the "w-dot" placed next to the Syllabics glyph indicates a being placed between the syllable onset consonant and the nucleus vowel. Finals indicate the syllable coda consonant; some of the syllable coda consonants in word medial positions, such as with the "h-tick", indicate the fortification of the consonant in the syllable following it.
* The Japanese ''hiragana
is a Japanese language, Japanese syllabary, part of the Japanese writing system, along with ''katakana'' as well as ''kanji''.
It is a phonetic lettering system. The word ''hiragana'' means "common" or "plain" kana (originally also "easy", ...
'' and ''katakana
is a Japanese syllabary, one component of the Japanese writing system along with hiragana, kanji and in some cases the Latin script (known as rōmaji).
The word ''katakana'' means "fragmentary kana", as the katakana characters are derived fr ...
'' syllabaries
In the linguistic study of written languages, a syllabary is a set of written symbols that represent the syllables or (more frequently) morae which make up words.
A symbol in a syllabary, called a syllabogram, typically represents an (option ...
use the ''dakuten'' (◌゛) and ''handakuten'' (◌゜) (in Japanese: 濁点 and 半濁点) symbols, also known as ''nigori'' (濁 "muddying") or ''ten-ten'' (点々 "dot dot") and ''maru'' (丸 "circle"), to indicate voiced consonants or other phonetic changes.
* Emoticon
An emoticon (, , rarely , ), short for emotion icon, is a pictorial representation of a facial expression using Character (symbol), characters—usually punctuation marks, numbers and Alphabet, letters—to express a person's feelings, mood ...
s are commonly created with diacritic symbols, especially Japan
Japan is an island country in East Asia. Located in the Pacific Ocean off the northeast coast of the Asia, Asian mainland, it is bordered on the west by the Sea of Japan and extends from the Sea of Okhotsk in the north to the East China Sea ...
ese emoticons on popular imageboards.
Alphabetization or collation
Different languages use different rules to put diacritic characters in alphabet
An alphabet is a standard set of letter (alphabet), letters written to represent particular sounds in a spoken language. Specifically, letters largely correspond to phonemes as the smallest sound segments that can distinguish one word from a ...
ical order. For example, French and Portuguese treat letters with diacritical marks the same as the underlying letter for purposes of ordering and dictionaries. The Scandinavian languages
The North Germanic languages make up one of the three branches of the Germanic languages—a sub-family of the Indo-European languages—along with the West Germanic languages and the extinct East Germanic languages. The language group is al ...
and the Finnish language
Finnish (endonym: or ) is a Finnic languages, Finnic language of the Uralic languages, Uralic language family, spoken by the majority of the population in Finland and by ethnic Finns outside of Finland. Finnish is one of the two official langu ...
, by contrast, treat the characters with diacritics , , and as distinct letters of the alphabet, and sort them after . Usually (a-umlaut) and (o-umlaut) sed in Swedish and Finnishare sorted as equivalent to (ash) and (o-slash) sed in Danish and Norwegian Also, ''aa'', when used as an alternative spelling to , is sorted as such. Other letters modified by diacritics are treated as variants of the underlying letter, with the exception that is frequently sorted as .
Languages that treat accented letters as variants of the underlying letter usually alphabetize words with such symbols immediately after similar unmarked words. For instance, in German where two words differ only by an umlaut, the word without it is sorted first in German dictionaries (e.g. ''schon'' and then ''schön'', or ''fallen'' and then ''fällen''). However, when names are concerned (e.g. in phone books or in author catalogues in libraries), umlauts are often treated as combinations of the vowel with a suffixed ; Austrian phone books now treat characters with umlauts as separate letters (immediately following the underlying vowel).
In Spanish, the grapheme is considered a distinct letter, different from and collated between and , as it denotes a different sound from that of a plain . But the accented vowels , , , , are not separated from the unaccented vowels , , , , , as the acute accent in Spanish only modifies stress within the word or denotes a distinction between homonym
In linguistics, homonyms are words which are either; '' homographs''—words that mean different things, but have the same spelling (regardless of pronunciation), or '' homophones''—words that mean different things, but have the same pronunciat ...
s, and does not modify the sound of a letter.
For a comprehensive list of the collating orders in various languages, see Collating sequence.
Generation with computers
Modern computer technology was developed mostly in countries that speak Western European languages (particularly English), and many early binary encodings were developed with a bias favoring Englisha language written without diacritical marks. With computer memory
Computer memory stores information, such as data and programs, for immediate use in the computer. The term ''memory'' is often synonymous with the terms ''RAM,'' ''main memory,'' or ''primary storage.'' Archaic synonyms for main memory include ...
and computer storage
Computer data storage or digital data storage is a technology consisting of computer components and Data storage, recording media that are used to retain digital data. It is a core function and fundamental component of computers.
The cent ...
at premium, early character set
Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using computers. The numerical values that make up a c ...
s were limited to the Latin alphabet, the ten digits and a few punctuation marks and conventional symbols. The American Standard Code for Information Interchange (ASCII
ASCII ( ), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable character, printable and 33 control character, control c ...
), first published in 1963, encoded just 95 printable characters. It included just four free-standing diacriticsacute, grave, circumflex and tildewhich were to be used by backspacing and overprinting the base letter. The ISO/IEC 646
ISO/IEC 646 ''Information technology — ISO 7-bit coded character set for information interchange'', is an International Organization for Standardization, ISO/International Electrotechnical Commission, IEC standard in the ...
standard (1967) defined national variations that replace some American graphemes with precomposed character
A precomposed character (alternatively composite character or decomposable character) is a Unicode entity that can also be defined as a sequence of one or more other characters. A precomposed character may typically represent a letter with a diac ...
s (such as , and ), according to languagebut remained limited to 95 printable characters.
Unicode
Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
was conceived to solve this problem by assigning every known character its own code; if this code is known, most modern computer systems provide a method to input it. For historical reasons, almost all the letter-with-accent combinations used in European languages were given unique code point
A code point, codepoint or code position is a particular position in a Table (database), table, where the position has been assigned a meaning. The table may be one dimensional (a column), two dimensional (like cells in a spreadsheet), three dime ...
s and these are called precomposed character
A precomposed character (alternatively composite character or decomposable character) is a Unicode entity that can also be defined as a sequence of one or more other characters. A precomposed character may typically represent a letter with a diac ...
s. For other languages, it is usually necessary to use a combining character
In digital typography, combining characters are Character (computing), characters that are intended to modify other characters. The most common combining characters in the Latin script are the combining diacritic, diacritical marks (including c ...
diacritic together with the desired base letter. Unfortunately, even as of 2024, many applications and web browsers remain unable to operate the combining diacritic concept properly.
Depending on the keyboard layout
A keyboard layout is any specific physical, visual, or functional arrangement of the keys, legends, or key-meaning associations (respectively) of a computer keyboard, mobile phone, or other computer-controlled typographic keyboard. Standard keybo ...
and keyboard mapping
A keyboard layout is any specific physical, visual, or functional arrangement of the keys, legends, or key-meaning associations (respectively) of a computer keyboard, mobile phone, or other computer-controlled typographic keyboard. Standard keybo ...
, it is more or less easy to enter letters with diacritics on computers and typewriters. Keyboards used in countries where letters with diacritics are the norm, have keys engraved with the relevant symbols. In other cases, such as when the US international or UK extended mappings are used, the accented letter is created by first pressing the key with the diacritic mark, followed by the letter to place it on. This method is known as the dead key
A dead key is a special kind of modifier key on a mechanical typewriter, or computer keyboard, that is typically used to attach a specific diacritic to a base letter (alphabet), letter. The dead key does not generate a (complete) grapheme, charact ...
technique, as it produces no output of its own but modifies the output of the key pressed after it.
Languages with letters containing diacritics
The following languages have letters with diacritics that are orthographically distinct from those without diacritics.
Latin script
Baltic
:* Latvian has the following letters: , , , , , , , , , ,
:* Lithuanian. In general usage, where letters appear with the caron (, and ), they are considered as separate letters from , or and collated separately; letters with the ogonek
The tail or ( ; Polish: , "little tail", diminutive of ) is a diacritic hook placed under the lower right corner of a vowel in the Latin alphabet used in several European languages, and directly under a vowel in several Native American langu ...
(, , and ), the macron () and the overdot () are considered as separate letters as well, but not given a unique collation order.
Celtic
:* Welsh uses the circumflex, diaeresis, acute, and grave accents on its seven vowels , , , , , , (hence the composites , , , , , , , , , , , , , , , , , , , , , , , , , , , ). However all except the circumflex (which is used as a macron) are fairly rare.
:* Following spelling reforms since the 1970s, Scottish Gaelic
Scottish Gaelic (, ; Endonym and exonym, endonym: ), also known as Scots Gaelic or simply Gaelic, is a Celtic language native to the Gaels of Scotland. As a member of the Goidelic language, Goidelic branch of Celtic, Scottish Gaelic, alongs ...
uses graves only, which can be used on any vowel (, , , , ). Formerly acute accents could be used on , and , which were used to indicate a specific vowel quality. With the elimination of these accents, the new orthography relies on the reader having prior knowledge of pronunciation of a given word.
:* Manx uses the cedilla diacritic combined with h to give the digraph (pronounced ) to mark the distinction between it and the digraph (pronounced or ). Other diacritics used in Manx included the circumflex and diaeresis, as in , , , etc. to mark the distinction between two similarly spelled words but with slightly differing pronunciation.
:* Irish uses only acute accents to mark long vowels, following the 1948 spelling reform. Lenition is indicated using an overdot in Gaelic type (,,, , , , , ); in Roman type, a suffixed is used. Thus, is equivalent to .
:* Breton does not have a single orthography (spelling system), but uses diacritics for a number of purposes. The diaeresis is used to mark that two vowels are pronounced separately and not as a diphthong/digraph. The circumflex is used to mark long vowels, but usually only when the vowel length is not predictable by phonology. Nasalization of vowels may be marked with a tilde, or following the vowel with the letter . The plural suffix -où is used as a unified spelling to represent a suffix with a number of pronunciations in different dialects, and to distinguish this suffix from the digraph which is pronounced as . An apostrophe is used to distinguish , pronounced as the digraph is used in other Celtic languages, from the French-influenced digraph ch, pronounced .
Finno-Ugric
:* Estonian has a distinct letter , which contains a tilde. Estonian vowels with double-dot diacritics , , are similar to German, but these are also distinct letters, unlike German umlauted letters. All four have their own place in the alphabet, between and . Caron
A caron or háček ( ), is a diacritic mark () placed over certain letters in the orthography of some languages, to indicate a change of the related letter's pronunciation.
Typographers tend to use the term ''caron'', while linguists prefer ...
s in or appear only in foreign proper names and loanwords. Also these are distinct letters, placed in the alphabet between ''s'' and ''t''.
:* Finnish uses double-dotted vowels ( and ). As in Swedish and Estonian, these are regarded as individual letters, rather than 'vowel + diacritic' combinations (as happens in German). It also uses the characters , and in foreign names and loanwords. In the Finnish and Swedish alphabets, , and collate as separate letters after , the others as variants of their base letter.
:* Hungarian uses the double-dot, the acute and double acute diacritics (the last is unique to Hungarian): (, ), (, , , , ) and (, ). The acute accent indicates the long form of a vowel (in case of /, /, /) while the double acute performs the same function for and . The acute accent can also indicate a different sound (more open, as in case of /, /). Both long and short forms of the vowels are listed separately in the Hungarian alphabet, but members of the pairs /, /, /, /, /, / and / are collated in dictionaries as the same letter.
:* Livonian has the following letters: , , , , , , , , , , , , , , , , , .
Germanic
:* German uses the two-dots diacritic (): letters , , , used to indicate the fronting of back vowels (see umlaut (linguistics)
In linguistics, umlaut (from German language, German "sound alternation") is a sound change in which a vowel is pronounced more like a following vowel or semivowel.
The term ''umlaut'' was originally coined by Jacob Grimm in connection with th ...
).
:* Dutch uses acute, circumflex, grave and two-dots diacritics with most vowels and cedilla with c, as in French. This results in , , , , , , , , , , , , , , , and . This is mostly on words (and names) originating from French (like ''crème, café, gêne, façade''). The acute accent is also used to stress the vowel (like ''één''). The two-dots diacritic is used as a linguistic diaeresis (a vowel hiatus) that splits the two vowels, e.g., ''reële, reünie, coördinatie''), rather than to indicate a linguistic as used in German.
:* Afrikaans
Afrikaans is a West Germanic languages, West Germanic language spoken in South Africa, Namibia and to a lesser extent Botswana, Zambia, Zimbabwe and also Argentina where there is a group in Sarmiento, Chubut, Sarmiento that speaks the Pat ...
uses 16 additional vowel forms, both uppercase and lowercase: , , , , , , , , , , , , , , , .
:* Faroese uses acutes and some additional letters. All are considered separate letters and have their own place in the alphabet: , , , , and .
:* Icelandic uses acutes and other additional letters. All are considered separate letters, and have their own place in the alphabet: , , , , , and .
:* Danish and Norwegian use additional characters like the o-slash and the a-overring . These letters come after and in the order , . Historically, the has developed from a ligature by writing a small superscript over a lowercase ; if an character is unavailable, some Scandinavian languages allow the substitution of a doubled ''a'', thus . The Scandinavian languages collate these letters after , but have different national collation standards.
:* Swedish uses a-diaeresis () and o-diaeresis () in the place of () and slashed o () in addition to the a-overring (). Historically, the two-dots diacritic for the Swedish letters and developed from a small Gothic written above the letters. These letters are collated after , in the order , , .
Romance
:* In Asturian, Galician and Spanish, the character is a letter and collated between ''n'' and ''o''.
:* Asturian uses an underdot: (lower case
Letter case is the distinction between the letters that are in larger uppercase or capitals (more formally ''majuscule'') and smaller lowercase (more formally '' minuscule'') in the written representation of certain languages. The writing system ...
, ), and (lower case
Letter case is the distinction between the letters that are in larger uppercase or capitals (more formally ''majuscule'') and smaller lowercase (more formally '' minuscule'') in the written representation of certain languages. The writing system ...
)
:* Catalan uses the acute accent , , , , the grave accent , , , the diaeresis , , the cedilla , and the interpunct
An interpunct , also known as an interpoint, middle dot, middot, centered dot or centred dot, is a punctuation mark consisting of a vertically centered dot used for interword separation in Classical Latin. ( Word-separating spaces did not appe ...
.
::* In Valencian Valencian can refer to:
* Something related to the Valencian Community ( Valencian Country) in Spain
* Something related to the city of Valencia
* Something related to the province of Valencia in Spain
* Something related to the old Kingdom of ...
, the circumflex , , , , may also be used.
:* Corsican uses the following in its alphabet: /, /, /, /, /.
:* French uses four diacritics, appearing on vowels (circumflex, acute, grave, diaeresis) and the cedilla appearing in .
:* Italian uses two diacritics, appearing on vowels (acute, grave)
:* Leonese: could use or .
:* Portuguese uses a tilde with the vowels and and a cedilla with c.
:* Romanian uses a breve
A breve ( , less often , grammatical gender, neuter form of the Latin "short, brief") is the diacritic mark , shaped like the bottom half of a circle. As used in Ancient Greek, it is also called , . It resembles the caron (, the wedge or in ...
on the letter ''a'' () to indicate the sound schwa , as well as a circumflex over the letters ''a'' () and ''i'' () for the sound . Romanian also writes a comma below the letters ''s'' () and ''t'' () to represent the sounds and , respectively. These characters are collated after their non-diacritic equivalent.
:* Spanish uses acute accents (, , , , ) to indicate stress falling on a different syllable than the one it would fall on based on default rules, and to distinguish certain one-syllable homonyms (e.g. (masculine singular definite article) and e. The acute accent is also used to break up sequences of vowels that would normally be pronouced as a diphthong into two syllables, as in the word . Diaeresis is used on u only, to distinguish the combinations from , e.g. . The tilde on is not considered a diacritic as is considered a distinct letter from , not a mutated form of it.
Slavic
:* Gaj's Latin alphabet
Gaj's Latin alphabet ( sh-Latn-Cyrl, Gajeva latinica, separator=" / ", Гајева латиница}, ), also known as ( sr-Cyrl, абецеда, ) or ( sr-Cyrl, гајица, link=no, ), is the form of the Latin script used for writing all ...
, used in Croatian and latinized Serbian, has the symbols , , , and , which are considered separate letters and are listed as such in dictionaries and other contexts in which words are listed according to alphabetical order. It also has one digraph including a diacritic, '' dž'', which is also alphabetized independently, and follows and precedes in the alphabetical order.
:* The Czech alphabet uses the acute (lowercase á é í ó ú ý, uppercase Á É Í Ó Ú Ý), caron (lowercase č ď ě ň ř š ť ž, uppercase Č Ď Ě Ň Ř Š Ť Ž), and for one letter (lowercase ů, uppercase Ů) the ring. (In ď and ť the caron is modified to look rather like an apostrophe.) Letter with caron are considered separate letters, whereas vowels are considered only as longer variants of the unaccented letters. Acute does not affect alphabetical order, letters with caron are ordered after original counterparts.
:* Polish has the following letters: ą ć ę ł ń ó ś ź ż. These are considered to be separate letters: each of them is placed in the alphabet immediately after its Latin counterpart (e.g. between and ), and are placed after in that order.
:* The Serbian Cyrillic
The Serbian Cyrillic alphabet (, ), also known as the Serbian script, (, ), is a standardized variation of the Cyrillic script used to write the Serbian language. It originated in medieval Serbia and was significantly reformed in the 19th cen ...
alphabet has no diacritics, instead it has a grapheme (glyph
A glyph ( ) is any kind of purposeful mark. In typography, a glyph is "the specific shape, design, or representation of a character". It is a particular graphical representation, in a particular typeface, of an element of written language. A ...
) for every letter of its Latin counterpart (including Latin letters with diacritics and the digraphs dž, '' lj'' and '' nj'').
:* The Slovak alphabet uses the acute (lowercase á é í ó ú ý ĺ ŕ, uppercase Á É Í Ó Ú Ý Ĺ Ŕ), caron (lowercase č ď ľ ň š ť ž dž, uppercase Č Ď Ľ Ň Š Ť Ž DŽ), umlaut ( ä Ä) and circumflex accent ( ô Ô). All of those are considered separate letters and are placed directly after the original counterpart in the alphabet
An alphabet is a standard set of letter (alphabet), letters written to represent particular sounds in a spoken language. Specifically, letters largely correspond to phonemes as the smallest sound segments that can distinguish one word from a ...
.[http://www.juls.savba.sk/ediela/psp2000/psp.pdf page 12, section I.2]
:* The basic Slovenian alphabet has the symbols , , and , which are considered separate letters and are listed as such in dictionaries and other contexts in which words are listed according to alphabetical order. Letters with a caron
A caron or háček ( ), is a diacritic mark () placed over certain letters in the orthography of some languages, to indicate a change of the related letter's pronunciation.
Typographers tend to use the term ''caron'', while linguists prefer ...
are placed right after the letters as written without the diacritic. The letter ('d with bar') may be used in non-transliterated foreign words, particularly names, and is placed after and before .
Turkic
:* Azerbaijani includes the distinct Turkish alphabet letters Ç, Ğ, I, İ, Ö, Ş and Ü.
:* Crimean Tatar includes the distinct Turkish alphabet letters Ç, Ğ, I, İ, Ö, Ş and Ü. Unlike Turkish, Crimean Tatar also has the letter Ñ.
:* Gagauz includes the distinct Turkish alphabet letters Ç, Ğ, I, İ, Ö and Ü. Unlike Turkish, Gagauz also has the letters Ä, Ê Ș and Ț. Ș and Ț are derived from the Romanian alphabet
The Romanian alphabet is a variant of the Latin alphabet used for writing the Romanian language. It consists of 31 letters, five of which (Ă, Â, Î, Ș, and Ț) have been modified from their Latin originals for the phonetic requirements of t ...
for the same sounds. Sometime the Turkish Ş may be used instead of Ș.
:* Turkish uses a with a breve (), two letters with two dots ( and , representing two rounded front vowels), two letters with a cedilla ( and , representing the affricate and the fricative ), and also possesses a dotted capital (and a dotless lowercase representing a high unrounded back vowel). In Turkish each of these are separate letters, rather than versions of other letters, where dotted capital and lower case are the same letter, as are dotless capital and lowercase . Typographically, and are sometimes rendered with an underdot, as in . The new Azerbaijani, Crimean Tatar, and Gagauz alphabets are based on the Turkish alphabet and its same diacriticized letters, with some additions.
:* Turkmen includes the distinct Turkish alphabet letters Ç, Ö, Ş and Ü. In addition, Turkmen uses A with diaeresis ('' Ä'') to represent , N with caron () to represent the velar nasal
The voiced velar nasal, also known as eng, engma, or agma (from Greek 'fragment'), is a type of consonantal sound used in some spoken languages. It is the sound of ''ng'' in English ''sing'' as well as ''n'' before velar consonants as in ''E ...
, Y with acute () to represent the palatal approximant
The voiced palatal approximant is a type of consonant used in many spoken languages. The symbol in the International Phonetic Alphabet that represents this sound is ; the equivalent X-SAMPA symbol is j, and in the Americanist phonetic notation i ...
, and Z with caron () to represent .
Other
:* Albanian has two special letters Ç and Ë upper and lowercase. They are placed next to the most similar letters in the alphabet, c and e correspondingly.
:* Esperanto
Esperanto (, ) is the world's most widely spoken Constructed language, constructed international auxiliary language. Created by L. L. Zamenhof in 1887 to be 'the International Language' (), it is intended to be a universal second language for ...
has the symbols '' ŭ'', '' ĉ, ĝ, ĥ, ĵ'' and '' ŝ'', which are included in the alphabet, and considered separate letters.
:* Filipino also has the character '' ñ'' as a letter and is collated between n and o.
:* Modern Greenlandic does not use any diacritics, although ''ø'' and ''å'' are used to spell loanwords, especially from Danish and English. From 1851 until 1973, Greenlandic was written in an alphabet invented by Samuel Kleinschmidt, where long vowels and geminate consonants were indicated by diacritics on vowels (in the case of consonant gemination, the diacritics were placed on the vowel preceding the affected consonant). For example, the name '' Kalaallit Nunaat'' was spelled ''Kalâdlit Nunât''. This scheme uses the circumflex
The circumflex () is a diacritic in the Latin and Greek scripts that is also used in the written forms of many languages and in various romanization and transcription schemes. It received its English name from "bent around"a translation of ...
(◌̂) to indicate a long vowel (e.g. ; modern: ), an acute accent
The acute accent (), ,
is a diacritic used in many modern written languages with alphabets based on the Latin alphabet, Latin, Cyrillic script, Cyrillic, and Greek alphabet, Greek scripts. For the most commonly encountered uses of the accen ...
(◌́) to indicate gemination of the following consonant: (i.e. ; modern: ) and, finally, a tilde (◌̃) or a grave accent (◌̀), depending on the author, indicates vowel length and gemination of the following consonant (e.g. ; modern: ). , used only before , are now written in Greenlandic.
:* Hawaiian uses the kahakō ( macron) over vowels, although there is some disagreement over considering them as individual letters. The kahakō over a vowel can completely change the meaning of a word that is spelled the same but without the kahakō.
:* Kurdish uses the symbols Ç, Ê, Î, Ş and Û with other 26 standard Latin alphabet symbols.
:* Lakota alphabet uses the caron
A caron or háček ( ), is a diacritic mark () placed over certain letters in the orthography of some languages, to indicate a change of the related letter's pronunciation.
Typographers tend to use the term ''caron'', while linguists prefer ...
for the letters ''č'', ''ȟ'', ''ǧ'', ''š'', and ''ž''. It also uses the acute accent
The acute accent (), ,
is a diacritic used in many modern written languages with alphabets based on the Latin alphabet, Latin, Cyrillic script, Cyrillic, and Greek alphabet, Greek scripts. For the most commonly encountered uses of the accen ...
for stressed vowels á, é, í, ó, ú, áŋ, íŋ, úŋ.
:* Malay uses some diacritics such as ''á, ā, ç, í, ñ, ó, š, ú''. Uses of diacritics was continued until late 19th century except ''ā'' and ''ē''.
:* Maltese uses a C, G, and Z with a dot over them (Ċ, Ġ, Ż), and also has an H with an extra horizontal bar. For uppercase H, the extra bar is written slightly above the usual bar. For lowercase H, the extra bar is written crossing the vertical, like a ''t'', and not touching the lower part ( Ħ, ħ). The above characters are considered separate letters. The letter 'c' without a dot has fallen out of use due to redundancy. 'Ċ' is pronounced like the English 'ch' and 'k' is used as a hard c as in 'cat'. 'Ż' is pronounced just like the English 'Z' as in 'Zebra', while 'Z' is used to make the sound of 'ts' in English (like 'tsunami' or 'maths'). 'Ġ' is used as a soft 'G' like in 'geometry', while the 'G' sounds like a hard 'G' like in 'log'. The digraph 'għ' (called ''għajn'' after the Arabic
Arabic (, , or , ) is a Central Semitic languages, Central Semitic language of the Afroasiatic languages, Afroasiatic language family spoken primarily in the Arab world. The International Organization for Standardization (ISO) assigns lang ...
letter name ''ʻayn'' for غ) is considered separate, and sometimes ordered after 'g', whilst in other volumes it is placed between 'n' and 'o' (the Latin letter 'o' originally evolved from the shape of Phoenician ''ʻayin'', which was traditionally collated after Phoenician ''nūn'').
:* The romanization of Syriac uses the altered letters of. '' Ā, Č, Ḏ, Ē, Ë, Ġ, Ḥ, Ō, Š, Ṣ, Ṭ, Ū, Ž'' alongside the 26 standard Latin alphabet symbols.
:* Vietnamese uses the horn diacritic for the letters ''ơ'' and ''ư''; the circumflex
The circumflex () is a diacritic in the Latin and Greek scripts that is also used in the written forms of many languages and in various romanization and transcription schemes. It received its English name from "bent around"a translation of ...
for the letters ''â'', ''ê'', and ''ô''; the breve
A breve ( , less often , grammatical gender, neuter form of the Latin "short, brief") is the diacritic mark , shaped like the bottom half of a circle. As used in Ancient Greek, it is also called , . It resembles the caron (, the wedge or in ...
for the letter ''ă''; and a bar through the letter ''đ''. Separately, it also has á, à, ả, ã and ạ, the five tones used for vowels besides the flat tone 'a'.
Cyrillic letters
:* Belarusian and Uzbek Cyrillic have a letter .
:* Belarusian, Bulgarian, Russian and Ukrainian have the letter .
:* Belarusian and Russian have the letter . In Russian, this letter is usually replaced by , although it has a different pronunciation. The use of instead of does not affect the pronunciation. ''Ё'' is always used in children's books and in dictionaries. A minimal pair is все (''vs'e'', "everybody" pl.) and всё (''vs'o'', "everything" n. sg.). In Belarusian the replacement by is a mistake; in Russian, it is permissible to use either or for but the former is more common in everyday writing (as opposed to instructional or juvenile writing).
:* The Cyrillic Ukrainian alphabet has the letters , and . Ukrainian Latynka has many more.
:* Macedonian has the letters and .
:* In Bulgarian and Macedonian the possessive pronoun ѝ (''ì'', "her") is spelled with a grave accent in order to distinguish it from the conjunction и (''i'', "and").
:* The acute accent above any vowel in Cyrillic alphabets is used in dictionaries, books for children and foreign learners to indicate the word stress, it also can be used for disambiguation of similarly spelled words with different lexical stresses.
Diacritics that do not produce new letters
English
English is one of the few European languages that does not have many words that contain diacritical marks. Instead, digraphs are the main way the Modern English alphabet adapts the Latin to its phonemes. Exceptions are unassimilated foreign loanwords, including borrowings from French (and, increasingly, Spanish, like ''jalapeño'' and ''piñata''); however, the diacritic is also sometimes omitted from such words. Loanwords that frequently appear with the diacritic in English include ''café'', ''résumé'' or ''resumé'' (a usage that helps distinguish it from the verb ''resume''), ''soufflé'', and ''naïveté'' (see '' English terms with diacritical marks''). In older practice (and even among some orthographically conservative modern writers), one may see examples such as ''élite'', ''mêlée'' and ''rôle.''
English speakers and writers once used the diaeresis more often than now in words such as ''coöperation'' (from Fr. ''coopération''), ''zoölogy'' (from Grk. ''zoologia''), and ''seeër'' (now more commonly ''see-er ''or simply'' seer'') as a way of indicating that adjacent vowels belonged to separate syllables, but this practice has become far less common. ''The New Yorker
''The New Yorker'' is an American magazine featuring journalism, commentary, criticism, essays, fiction, satire, cartoons, and poetry. It was founded on February 21, 1925, by Harold Ross and his wife Jane Grant, a reporter for ''The New York T ...
'' magazine is a major publication that continues to use the diaeresis in place of a hyphen for clarity and economy of space.
A few English words, often when used out of context, especially in isolation, can only be distinguished from other words of the same spelling by using a diacritic or modified letter. These include ''exposé'', ''lamé'', ''maté'', ''öre'', ''øre'', ''résumé'' and ''rosé.'' In a few words, diacritics that did not exist in the original have been added for disambiguation, as in ''maté'' (''from Sp. and Port.'' mate)'', saké'' (''the standard Romanization of the Japanese has no accent mark'')'', and'' Malé (''from Dhivehi މާލެ'')'','' to clearly distinguish them from the English words ''mate, sake,'' and ''male.''
The acute and grave accents are occasionally used in poetry and lyrics: the acute to indicate stress overtly where it might be ambiguous (''rébel'' vs. ''rebél'') or nonstandard for metrical reasons (''caléndar''), the grave to indicate that an ordinarily silent or elided syllable is pronounced (''warnèd,'' ''parlìament'').
In certain personal names such as '' Renée'' and '' Zoë'', often two spellings exist, and the person's own preference will be known only to those close to them. Even when the name of a person is spelled with a diacritic, like '' Charlotte Brontë'', this may be dropped in English-language articles, and even in official documents such as passport
A passport is an official travel document issued by a government that certifies a person's identity and nationality for international travel. A passport allows its bearer to enter and temporarily reside in a foreign country, access local aid ...
s, due either to carelessness, the typist not knowing how to enter letters with diacritical marks, or technical reasons (California
California () is a U.S. state, state in the Western United States that lies on the West Coast of the United States, Pacific Coast. It borders Oregon to the north, Nevada and Arizona to the east, and shares Mexico–United States border, an ...
, for example, does not allow names with diacritics, as the computer system cannot process such characters). They also appear in some worldwide company names and/or trademarks, such as ''Nestlé
Nestlé S.A. ( ) is a Swiss multinational food and drink processing conglomerate corporation headquartered in Vevey, Switzerland. It has been the largest publicly held food company in the world, measured by revenue and other metrics, since 20 ...
'' and ''Citroën
Citroën ()The double-dot diacritic over the 'e' is a diaeresis () indicating the two vowels are sounded separately, and not as a diphthong. is a French automobile brand. The "Automobiles Citroën" manufacturing company was founded on 4 June 19 ...
''.
Other languages
The following languages have letter-diacritic combinations that are not considered independent letters.
* Afrikaans
Afrikaans is a West Germanic languages, West Germanic language spoken in South Africa, Namibia and to a lesser extent Botswana, Zambia, Zimbabwe and also Argentina where there is a group in Sarmiento, Chubut, Sarmiento that speaks the Pat ...
uses a diaeresis to mark vowels that are pronounced separately and not as one would expect where they occur together, for example ''voel'' (to feel) as opposed to ''voël'' (bird). The circumflex is used in ''ê, î, ô'' and ''û'' generally to indicate long close-mid, as opposed to open-mid vowels, for example in the words ''wêreld'' (world) and ''môre'' (morning, tomorrow). The acute accent is used to add emphasis in the same way as underlining or writing in bold or italics in English, for example ''Dit is jóú boek'' (It is your book). The grave accent is used to distinguish between words that are different only in placement of the stress, for example ''appel'' (apple) and ''appèl'' (appeal) and in a few cases where it makes no difference to the pronunciation but distinguishes between homophones. The two most usual cases of the latter are in the sayings ''òf... òf'' (either... or) and ''nòg... nòg'' (neither... nor) to distinguish them from ''of'' (or) and ''nog'' (again, still).
* Aymara uses a diacritical horn over ''p, q, t, k, ch''.
* Catalan has the following composite characters: ''à, ç, é, è, í, ï, ó, ò, ú, ü, l·l''. The acute and the grave indicate stress and vowel height, the cedilla marks the result of a historical palatalization, the diaeresis indicates either a hiatus, or that the letter ''u'' is pronounced when the graphemes ''gü, qü'' are followed by ''e'' or ''i'', the interpunct
An interpunct , also known as an interpoint, middle dot, middot, centered dot or centred dot, is a punctuation mark consisting of a vertically centered dot used for interword separation in Classical Latin. ( Word-separating spaces did not appe ...
(·) distinguishes the different values of '.
* Some orthographies of Cornish such as Kernowek Standard and Unified Cornish
Unified Cornish (UC) (''Kernewek Uny '', ''KU'') is a variety of the Cornish language
Cornish (Standard Written Form: or , ) is a Southwestern Brittonic language, Southwestern Brittonic language of the Celtic language family. Along with We ...
use diacritics, while others such as Kernewek Kemmyn and the Standard Written Form do not (or only use them optionally in teaching materials).
* Dutch uses the diaeresis. For example, in ''ruïne'' it means that the ''u'' and the ''i'' are separately pronounced in their usual way, and not in the way that the combination ''ui'' is normally pronounced. Thus it works as a separation sign and not as an indication for an alternative version of the ''i''. Diacritics can be used for emphasis (''érg koud'' for ''very'' cold) or for disambiguation between a number of words that are spelled the same when context does not indicate the correct meaning (''één appel'' = one apple, ''een appel'' = an apple; ''vóórkomen'' = to occur, ''voorkómen'' = to prevent). Grave and acute accents are used on a very small number of words, mostly loanwords. The ç also appears in some loanwords.
* Faroese. Non-Faroese accented letters are not added to the Faroese alphabet. These include ''é'', ''ö'', ''ü'', ''å'' and recently also letters like ''š'', ''ł'', and ''ć''.
* Filipino has the following composite characters: ''á, à, â, é, è, ê, í, ì, î, ó, ò, ô, ú, ù, û''. Everyday use of diacritics for Filipino is, however, uncommon, and meant only to distinguish between homonym
In linguistics, homonyms are words which are either; '' homographs''—words that mean different things, but have the same spelling (regardless of pronunciation), or '' homophones''—words that mean different things, but have the same pronunciat ...
s between a word with the usual penultimate stress and one with a different stress placement. This aids both comprehension and pronunciation if both are relatively adjacent in a text, or if a word is itself ambiguous in meaning. The letter ''ñ'' ("''eñe''") is not a ''n'' with a diacritic, but rather collated as a separate letter, one of eight borrowed from Spanish. Diacritics appear in Spanish loanwords and names
A name is a term used for identification by an external observer. They can identify a class or category of things, or a single thing, either uniquely, or within a given context. The entity identified by a name is called its referent. A person ...
observing Spanish orthography rules.
* Finnish. Carons in ''š'' and ''ž'' appear only in foreign proper names and loanword
A loanword (also a loan word, loan-word) is a word at least partly assimilated from one language (the donor language) into another language (the recipient or target language), through the process of borrowing. Borrowing is a metaphorical term t ...
s, but may be substituted with ''sh'' or ''zh'' if and only if it is technically impossible to produce accented letters in the medium. Contrary to Estonian, ''š'' and ''ž'' are not considered distinct letters in Finnish.
* French uses five diacritics. The grave (''accent grave'') marks the sound when over an e, as in ''père'' ("father") or is used to distinguish words that are otherwise homographs such as ''a''/''à'' ("has"/"to") or ''ou''/''où'' ("or"/"where"). The acute (''accent aigu'') is only used in "é", modifying the "e" to make the sound , as in ''étoile'' ("star"). The circumflex
The circumflex () is a diacritic in the Latin and Greek scripts that is also used in the written forms of many languages and in various romanization and transcription schemes. It received its English name from "bent around"a translation of ...
(''accent circonflexe'') generally denotes that an "s" once followed the vowel in Old French or Latin, as in ''fête'' ("party"), the Old French being ''feste'' and the Latin being ''festum''. Whether the circumflex modifies the vowel's pronunciation depends on the dialect and the vowel. The cedilla
A cedilla ( ; from Spanish language, Spanish ', "small ''ceda''", i.e. small "z"), or cedille (from French , ), is a hook or tail () added under certain letters (as a diacritic, diacritical mark) to indicate that their pronunciation is modif ...
(''cédille'') indicates that a normally hard "c" (before the vowels "a", "o", and "u") is to be pronounced , as in ''ça'' ("that"). The diaeresis diacritic () indicates that two adjacent vowels that would normally be pronounced as one are to be pronounced separately, as in ''Noël'' ("Christmas").
* Galician vowels can bear an acute (''á, é, í, ó, ú'') to indicate stress or difference between two otherwise same written words (''é'', 'is' vs. ''e'', 'and'), but the diaeresis is only used with ''ï'' and ''ü'' to show two separate vowel sounds in pronunciation. Only in foreign words may Galician use other diacritics such as ''ç'' (common during the Middle Ages), ''ê'', or ''à''.
* German uses the three umlauted characters ''ä'', ''ö'' and ''ü''. These diacritics indicate vowel changes. For instance, the word ''Ofen'' "oven" has the plural ''Öfen'' . The mark originated as a superscript ''e''; a handwritten blackletter ''e'' resembles two parallel vertical lines, like a diaeresis. Due to this history, "ä", "ö" and "ü" can be written as "ae", "oe" and "ue" respectively, if the umlaut letters are not available.
* Hebrew
Hebrew (; ''ʿÎbrit'') is a Northwest Semitic languages, Northwest Semitic language within the Afroasiatic languages, Afroasiatic language family. A regional dialect of the Canaanite languages, it was natively spoken by the Israelites and ...
has many various diacritic marks known as '' niqqud'' that are used above and below script to represent vowels. These must be distinguished from cantillation, which are keys to pronunciation and syntax.
* The International Phonetic Alphabet
The International Phonetic Alphabet (IPA) is an alphabetic system of phonetic notation based primarily on the Latin script. It was devised by the International Phonetic Association in the late 19th century as a standard written representation ...
uses diacritic symbols and characters to indicate phonetic features or secondary articulations.
* Irish uses the acute to indicate that a vowel is long: ''á'', ''é'', ''í'', ''ó'', ''ú''. It is known as ''síneadh fada'' "long sign" or simply ''fada'' "long" in Irish. In the older Gaelic type, overdots are used to indicate lenition of a consonant: ''ḃ'', ''ċ'', ''ḋ'', ''ḟ'', ''ġ'', ''ṁ'', ''ṗ'', ''ṡ'', ''ṫ''.
* Italian mainly has the acute and the grave
A grave is a location where a cadaver, dead body (typically that of a human, although sometimes that of an animal) is burial, buried or interred after a funeral. Graves are usually located in special areas set aside for the purpose of buria ...
(''à'', ''è''/''é'', ''ì'', ''ò''/''ó'', ''ù''), typically to indicate a stressed syllable that would not be stressed under the normal rules of pronunciation but sometimes also to distinguish between words that are otherwise spelled the same way (e.g. "e", and; "è", is). Despite its rare use, Italian orthography allows the circumflex (î) too, in two cases: it can be found in old literary context (roughly up to 19th century) to signal a syncope (fêro→fecero, they did), or in modern Italian to signal the contraction of ″-ii″ due to the plural ending -i whereas the root ends with another -i; e.g., s. demonio, p. demonii→demonî; in this case the circumflex also signals that the word intended is not demoni, plural of "demone" by shifting the accent (demònî, "devils"; dèmoni, "demons").
* Lithuanian uses the acute, grave
A grave is a location where a cadaver, dead body (typically that of a human, although sometimes that of an animal) is burial, buried or interred after a funeral. Graves are usually located in special areas set aside for the purpose of buria ...
and tilde in dictionaries to indicate stress types in the language's pitch accent
A pitch-accent language is a type of language that, when spoken, has certain syllables in words or morphemes that are prominent, as indicated by a distinct contrasting pitch (music), pitch (tone (linguistics), linguistic tone) rather than by vol ...
system.
* Maltese also uses the grave on its vowels to indicate stress at the end of a word with two syllables or more:– lowercase letters: à, è, ì, ò, ù; capital letters: À, È, Ì, Ò, Ù
* Māori makes use of macrons to mark long vowels.
* Occitan has the following composite characters: ''á, à, ç, é, è, í, ï, ó, ò, ú, ü, n·h, s·h''. The acute and the grave indicate stress and vowel height, the cedilla marks the result of a historical palatalization, the diaeresis indicates either a hiatus, or that the letter ''u'' is pronounced when the graphemes ''gü, qü'' are followed by ''e'' or ''i'', and the interpunct
An interpunct , also known as an interpoint, middle dot, middot, centered dot or centred dot, is a punctuation mark consisting of a vertically centered dot used for interword separation in Classical Latin. ( Word-separating spaces did not appe ...
(·) distinguishes the different values of ''nh/n·h'' and ''sh/s·h'' (i.e., that the letters are supposed to be pronounced separately, not combined into "ny" and "sh").
* Portuguese has the following composite characters: ''à, á, â, ã, ç, é, ê, í, ó, ô, õ, ú''. The acute and the circumflex indicate stress and vowel height, the grave indicates crasis, the tilde represents nasalization, and the cedilla marks the result of a historical lenition.
* Acutes are also used in Slavic language
The Slavic languages, also known as the Slavonic languages, are Indo-European languages spoken primarily by the Slavic peoples and their descendants. They are thought to descend from a proto-language called Proto-Slavic, spoken during the Ear ...
dictionaries and textbooks to indicate lexical stress
In linguistics, and particularly phonology, stress or accent is the relative emphasis or prominence given to a certain syllable in a word or to a certain word in a phrase or sentence. That emphasis is typically caused by such properties as i ...
, placed over the vowel of the stressed syllable. This can also serve to disambiguate meaning (e.g., in Russian писа́ть (''pisáť'') means "to write", but пи́сать (''písať'') means "to piss"), or "бо́льшая часть" (the biggest part) vs "больша́я часть" (the big part).
* Spanish uses the acute and the diaeresis. The acute is used on a vowel in a stressed syllable in words with irregular stress patterns. It can also be used to "break up" a diphthong
A diphthong ( ), also known as a gliding vowel or a vowel glide, is a combination of two adjacent vowel sounds within the same syllable. Technically, a diphthong is a vowel with two different targets: that is, the tongue (and/or other parts of ...
as in ''tío'' (pronounced , rather than as it would be without the accent). Moreover, the acute can be used to distinguish words that otherwise are spelled alike, such as ''si'' ("if") and ''sí'' ("yes"), and also to distinguish interrogative and exclamatory pronouns from homophones with a different grammatical function, such as ''donde/¿dónde?'' ("where"/"where?") or ''como/¿cómo?'' ("as"/"how?"). The acute may also be used to avoid typographical ambiguity, as in ''1 ó 2'' ("1 or 2"; without the acute this might be interpreted as "1 0 2". The diaeresis is used only over ''u'' (''ü'') for it to be pronounced in the combinations ''gue'' and ''gui,'' where ''u'' is normally silent, for example ''ambigüedad.'' In poetry, the diaeresis may be used on ''i'' and ''u'' as a way to force a hiatus. As foreshadowed above, in nasal ''ñ'' the tilde (squiggle) is not considered a diacritic sign at all, but a composite part of a distinct glyph, with its own chapter in the dictionary: a glyph that denotes the 15th letter of the Spanish alphabet.
* Swedish uses the acute to show non-standard stress, for example in (café) and (résumé). This occasionally helps resolve ambiguities, such as ''ide'' (hibernation) versus ''idé'' (idea). In these words, the acute is not optional. Some proper names use non-standard diacritics, such as Carolina Klüft and Staël von Holstein. For foreign loanwords the original accents are strongly recommended, unless the word has been infused into the language, in which case they are optional. Hence ''crème fraîche'' but ''ampere''. Swedish also has the letters ''å'', ''ä'', and ''ö'', but these are considered distinct letters, not ''a'' and ''o'' with diacritics.
* Tamil does not have any diacritics in itself, but uses the Arabic numerals
The ten Arabic numerals (0, 1, 2, 3, 4, 5, 6, 7, 8, and 9) are the most commonly used symbols for writing numbers. The term often also implies a positional notation number with a decimal base, in particular when contrasted with Roman numera ...
2, 3 and 4 as diacritics to represent aspirated, voiced, and voiced-aspirated consonants when Tamil script is used to write long passages in Sanskrit
Sanskrit (; stem form ; nominal singular , ,) is a classical language belonging to the Indo-Aryan languages, Indo-Aryan branch of the Indo-European languages. It arose in northwest South Asia after its predecessor languages had Trans-cultural ...
.
* Thai has its own system of diacritics derived from Indian numerals
Indian or Indians may refer to:
Associated with India
* of or related to India
** Indian people
** Indian diaspora
** Languages of India
** Indian English, a dialect of the English language
** Indian cuisine
Associated with indigenous peopl ...
, which denote different tones.
* Vietnamese uses the acute (''dấu sắc''), the grave (''dấu huyền''), the tilde (''dấu ngã''), the underdot (''dấu nặng'') and the hook above (''dấu hỏi'') on vowels as tone indicators.
* Welsh uses the circumflex, diaeresis, acute, and grave on its seven vowels ''a, e, i, o, u, w, y''. The most common is the circumflex (which it calls ''to bach'', meaning "little roof", or ''acen grom'' "crooked accent", or ''hirnod'' "long sign") to denote a long vowel, usually to disambiguate it from a similar word with a short vowel or a semivowel. The rarer grave accent has the opposite effect, shortening vowel sounds that would usually be pronounced long. The acute accent and diaeresis are also occasionally used, to denote stress and vowel separation respectively. The ''w''-circumflex and the ''y''-circumflex are among the most commonly accented characters in Welsh, but unusual in languages generally, and were until recently very hard to obtain in word-processed and HTML documents.
Transliteration
Several languages that are not written with the Roman alphabet are transliterated
Transliteration is a type of conversion of a text from one writing system, script to another that involves swapping Letter (alphabet), letters (thus ''wikt:trans-#Prefix, trans-'' + ''wikt:littera#Latin, liter-'') in predictable ways, such as ...
, or romanized, using diacritics. Examples:
* Arabic
Arabic (, , or , ) is a Central Semitic languages, Central Semitic language of the Afroasiatic languages, Afroasiatic language family spoken primarily in the Arab world. The International Organization for Standardization (ISO) assigns lang ...
has several romanisations, depending on the type of the application, region, intended audience, country, etc. many of them extensively use diacritics, e.g., some methods use an underdot for rendering emphatic consonants (ṣ, ṭ, ḍ, ẓ, ḥ). The macron is often used to render long vowels. š is often used for , ġ for .
* Chinese has several romanizations that use the umlaut, but only on ''u'' (''ü''). In Hanyu Pinyin, the four tones of Mandarin Chinese
Mandarin ( ; zh, s=, t=, p=Guānhuà, l=Mandarin (bureaucrat), officials' speech) is the largest branch of the Sinitic languages. Mandarin varieties are spoken by 70 percent of all Chinese speakers over a large geographical area that stretch ...
are denoted by the macron (first tone), acute (second tone), caron (third tone) and grave (fourth tone) diacritics. Example: ''ā, á, ǎ, à''.
* Romanized Japanese ( Rōmaji) occasionally uses macrons to mark long vowels. The Hepburn romanization
is the main system of Romanization of Japanese, romanization for the Japanese language. The system was originally published in 1867 by American Christian missionary and physician James Curtis Hepburn as the standard in the first edition of h ...
system uses macrons to mark long vowels, and the Kunrei-shiki and Nihon-shiki systems use a circumflex
The circumflex () is a diacritic in the Latin and Greek scripts that is also used in the written forms of many languages and in various romanization and transcription schemes. It received its English name from "bent around"a translation of ...
.
* Sanskrit
Sanskrit (; stem form ; nominal singular , ,) is a classical language belonging to the Indo-Aryan languages, Indo-Aryan branch of the Indo-European languages. It arose in northwest South Asia after its predecessor languages had Trans-cultural ...
, as well as many of its descendants, like Hindi
Modern Standard Hindi (, ), commonly referred to as Hindi, is the Standard language, standardised variety of the Hindustani language written in the Devanagari script. It is an official language of India, official language of the Government ...
and Bengali, uses a lossless romanization
In linguistics, romanization is the conversion of text from a different writing system to the Latin script, Roman (Latin) script, or a system for doing so. Methods of romanization include transliteration, for representing written text, and tra ...
system, IAST
The International Alphabet of Sanskrit Transliteration (IAST) is a transliteration scheme that allows the lossless romanisation of Brahmic family, Indic scripts as employed by Sanskrit and related Indic languages. It is based on a scheme that ...
. This includes several letters with diacritical markings, such as the macron (ā, ī, ū), over- and underdots (ṛ, ḥ, ṃ, ṇ, ṣ, ṭ, ḍ) as well as a few others (ś, ñ).
Limits
Orthographic
Possibly the greatest number of combining diacritics ''required'' to compose a valid character in any Unicode language is 8, for the "well-known grapheme cluster in Tibetan and Ranjana scripts" or .
It consists of
#
#
#
#
#
#
#
#
#
An example of rendering, may be broken depending on browser:
Unorthographic/ornamental
Some users have explored the limits of rendering in web browsers and other software by "decorating" words with excessive nonsensical diacritics per character to produce so-called Zalgo text.
List of diacritics in Unicode
Diacritics for Latin script in Unicode:
See also
* Latin-script alphabets
* Alt code
* :Letters with diacritics
* Collating sequence
* Combining character
In digital typography, combining characters are Character (computing), characters that are intended to modify other characters. The most common combining characters in the Latin script are the combining diacritic, diacritical marks (including c ...
* Compose key
A compose key (sometimes called multi key) is a key on a computer keyboard that indicates that the following (usually 2 or more) keystrokes trigger the insertion of an alternate character, typically a precomposed character or a symbol.
For insta ...
* English terms with diacritical marks
* Heavy metal umlaut
* ISO/IEC 8859 8-bit extended-Latin-alphabet European character encodings
* Latin alphabet
The Latin alphabet, also known as the Roman alphabet, is the collection of letters originally used by the Ancient Rome, ancient Romans to write the Latin language. Largely unaltered except several letters splitting—i.e. from , and from � ...
* List of Latin letters
* List of precomposed Latin characters in Unicode
* List of U.S. cities with diacritics
* Romanization
In linguistics, romanization is the conversion of text from a different writing system to the Latin script, Roman (Latin) script, or a system for doing so. Methods of romanization include transliteration, for representing written text, and tra ...
Notes
References
External links
Context of Diacritics A research project
Diacritics Project
Unicode
* ttp://www.elisanet.fi/mlang/strip.html Notes on the use of the diacritics, by Markus Lång
Entering International Characters (in Linux, KDE)
Standard Character Set for Macintosh
PDF at Adobe
{{Latin script
Orthography
Punctuation
Typography