Diacritical mark
A diacritic (also diacritical mark, diacritical point, diacritical sign, or accent) is a glyph added to a letter or to a basic glyph. The term derives from the Ancient Greek (, "distinguishing"), from (, "to distinguish"). The word ''diacritic ...
s of two
dots , placed side-by-side over or under a letter, are used in a number of languages for several different purposes. The most familiar to
English language
English is a West Germanic language of the Indo-European language family, with its earliest forms spoken by the inhabitants of early medieval England. It is named after the Angles, one of the ancient Germanic peoples that migrated to the is ...
speakers are the
diaeresis and the
umlaut, though there are numerous others. For example, in
Albanian
Albanian may refer to:
*Pertaining to Albania in Southeast Europe; in particular:
**Albanians, an ethnic group native to the Balkans
**Albanian language
**Albanian culture
**Demographics of Albania, includes other ethnic groups within the country ...
, represents a
schwa
In linguistics, specifically phonetics and phonology, schwa (, rarely or ; sometimes spelled shwa) is a vowel sound denoted by the IPA symbol , placed in the central position of the vowel chart. In English and some other languages, it rep ...
. Such dots are also sometimes used for stylistic reasons (as in the family name
Brontë or the band name
Mötley Crüe
Mötley Crüe is an American heavy metal band formed in Los Angeles in 1981. The group was founded by bassist Nikki Sixx, drummer Tommy Lee, lead guitarist Mick Mars and lead singer Vince Neil. Mötley Crüe has sold over 100 million albums ...
).
In modern computer systems using
Unicode
Unicode, formally The Unicode Standard,The formal version reference is is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expre ...
, the two-dot diacritics are almost always
encoded
In communications and information processing, code is a system of rules to convert information—such as a letter, word, sound, image, or gesture—into another form, sometimes shortened or secret, for communication through a communication ...
identically, having the same
code point
In character encoding terminology, a code point, codepoint or code position is a numerical value that maps to a specific character. Code points usually represent a single grapheme—usually a letter, digit, punctuation mark, or whitespace—but ...
.
For example, represents both ''a-umlaut'' and ''a-diaeresis''. Their appearance in print or on screen may vary between
typeface
A typeface (or font family) is the design of lettering that can include variations in size, weight (e.g. bold), slope (e.g. italic), width (e.g. condensed), and so on. Each of these variations of the typeface is a font.
There are list of type ...
s but rarely within the same typeface.
Uses
Diaeresis
The "diaeresis" diacritic is used to mark the separation of two distinct vowels in adjacent syllables when an instance of
diaeresis (or hiatus) occurs, so as to distinguish from a
digraph or
diphthong
A diphthong ( ; , ), also known as a gliding vowel, is a combination of two adjacent vowel sounds within the same syllable. Technically, a diphthong is a vowel with two different targets: that is, the tongue (and/or other parts of the speech o ...
. For example in the spelling "coöperate", the diaeresis reminds the reader that the word has four syllables ''co-op-er-ate'', not three. It is used in several languages of western and southern Europe, though rarely now in English.
Umlaut
The "umlaut" diacritic indicates a
sound shift
In physics, sound is a vibration that propagates as an acoustic wave, through a transmission medium such as a gas, liquid or solid.
In human physiology and psychology, sound is the ''reception'' of such waves and their ''perception'' by the ...
phenomenon also known as
umlaut in which a
back vowel
A back vowel is any in a class of vowel sound used in spoken languages. The defining characteristic of a back vowel is that the highest point of the tongue is positioned relatively back in the mouth without creating a constriction that would be c ...
becomes a
front vowel
A front vowel is a class of vowel sounds used in some spoken languages, its defining characteristic being that the highest point of the tongue is positioned as far forward as possible in the mouth without creating a constriction that would otherw ...
. It is a specific phenomenon in
German
German(s) may refer to:
* Germany (of or related to)
**Germania (historical use)
* Germans, citizens of Germany, people of German ancestry, or native speakers of the German language
** For citizens of Germany, see also German nationality law
**Ger ...
and other Germanic languages, affecting the graphemes , , and , which are modified to , , and .
Stylistic use
The two dot diacritic is also sometimes used for purely stylistic reasons. For example, the
Brontë family
The Brontës () were a nineteenth-century literary family, born in the village of Thornton and later associated with the village of Haworth in the West Riding of Yorkshire, England. The sisters, Charlotte (1816–1855), Emily (1818–1848) ...
, whose surname was derived from
gaelic
Gaelic is an adjective that means "pertaining to the Gaels". As a noun it refers to the group of languages spoken by the Gaels, or to any one of the languages individually. Gaelic languages are spoken in Ireland, Scotland, the Isle of Man, and Ca ...
and had been
anglicised
Anglicisation is the process by which a place or person becomes influenced by English culture or British culture, or a process of cultural and/or linguistic change in which something non-English becomes English. It can also refer to the influen ...
as "Prunty", or "Brunty": At some point, the father of the sisters,
Patrick Brontë
Patrick Brontë (, commonly ; born Patrick Brunty; 17 March 1777 – 7 June 1861) was an Irish Anglican priest and author who spent most of his adult life in England. He was the father of the writers Charlotte, Emily, and Anne Brontë, and of ...
(born Brunty), decided on the alternative spelling with a diaeresis diacritic over the terminal to indicate that the name had two syllables.
Similarly the "
metal umlaut
A metal umlaut is a diacritic that is sometimes used gratuitously or decoratively over letters in the names of mainly hard rock or heavy metal bands—for example, those of Blue Öyster Cult, Queensrÿche, Motörhead, the Accüsed, Mötley Crüe ...
" is a diacritic that is sometimes used gratuitously or decoratively over letters in the names of
hard rock
Hard rock or heavy rock is a loosely defined subgenre of rock music typified by aggressive vocals and distorted electric guitars. Hard rock began in the mid-1960s with the garage, psychedelic and blues rock movements. Some of the earliest hard ...
or
heavy metal bandsfor example, those of
Motörhead
Motörhead () were an English rock band formed in London in 1975 by Lemmy (lead vocals, bass), Larry Wallis (guitar) and Lucas Fox (drums). Lemmy was also the primary songwriter and only constant member. The band are often considered a precu ...
and
Mötley Crüe
Mötley Crüe is an American heavy metal band formed in Los Angeles in 1981. The group was founded by bassist Nikki Sixx, drummer Tommy Lee, lead guitarist Mick Mars and lead singer Vince Neil. Mötley Crüe has sold over 100 million albums ...
, and of parody bands, such as
Spın̈al Tap.
Other uses by language
A double dot is also used as a diacritic in cases where it functions as neither a diaeresis nor an umlaut. In the
International Phonetic Alphabet
The International Phonetic Alphabet (IPA) is an alphabetic system of phonetic transcription, phonetic notation based primarily on the Latin script. It was devised by the International Phonetic Association in the late 19th century as a standa ...
(IPA), a double dot above a letter is used for a
centralized vowel
In phonetics, vowel reduction is any of various changes in the acoustic ''quality'' of vowels as a result of changes in stress, sonority, duration, loudness, articulation, or position in the word (e.g. for the Creek language), and which are perc ...
, a situation more similar to umlaut than to diaeresis. In other languages it is used for vowel length, nasalization, tone, and various other uses where diaeresis or umlaut was available typographically. The IPA uses a double dot below a letter to indicate
breathy (murmured) voice.
[.]
Vowels
* In
Albanian
Albanian may refer to:
*Pertaining to Albania in Southeast Europe; in particular:
**Albanians, an ethnic group native to the Balkans
**Albanian language
**Albanian culture
**Demographics of Albania, includes other ethnic groups within the country ...
,
Tagalog, and
Kashubian, represents a
schwa
In linguistics, specifically phonetics and phonology, schwa (, rarely or ; sometimes spelled shwa) is a vowel sound denoted by the IPA symbol , placed in the central position of the vowel chart. In English and some other languages, it rep ...
* In
Aymara
Aymara may refer to:
Languages and people
* Aymaran languages, the second most widespread Andean language
** Aymara language, the main language within that family
** Central Aymara, the other surviving branch of the Aymara(n) family, which today ...
, a double dot is used on for
vowel length
In linguistics, vowel length is the perceived length of a vowel sound: the corresponding physical measurement is duration. In some languages vowel length is an important phonemic factor, meaning vowel length can change the meaning of the word, f ...
.
* In the Basque dialect of Soule, represents
* In the DMG romanization of
Tunisian Arabic
Tunisian Arabic, or simply Tunisian, is a set of dialects of Maghrebi Arabic spoken in Tunisia. It is known among its over 11 million speakers aeb, translit=Tounsi/Tounsiy, label=as, تونسي , "Tunisian" or "Everyday Language" to distingu ...
, , , , , and represent , , , , and .
* In
Ligurian official orthography, is used to represent the sound .
* In
Māori
Māori or Maori can refer to:
Relating to the Māori people
* Māori people of New Zealand, or members of that group
* Māori language, the language of the Māori people of New Zealand
* Māori culture
* Cook Islanders, the Māori people of the C ...
, a diaeresis (e.g. ) was often used on computers in the past instead of the
macron to indicate long vowels, as the diaeresis was relatively easy to produce on many systems, and the macron difficult or impossible.
* In
Seneca
Seneca may refer to:
People and language
* Seneca (name), a list of people with either the given name or surname
* Seneca people, one of the six Iroquois tribes of North America
** Seneca language, the language of the Seneca people
Places Extrat ...
, are
nasal vowel
A nasal vowel is a vowel that is produced with a lowering of the soft palate (or velum) so that the air flow escapes through the nose and the mouth simultaneously, as in the French vowel or Amoy []. By contrast, oral vowels are produced wit ...
s, though is , as in German umlaut.
* In Vurës language, Vurës (Vanuatu), and encode respectively and .
* In the Pahawh Hmong script, a double dot is used as one of several tone marks.
* The double dot was used in the
early Cyrillic alphabet
The Early Cyrillic alphabet, also called classical Cyrillic or paleo-Cyrillic, is a writing system that was developed in the First Bulgarian Empire during the late 9th century on the basis of the Greek alphabet for the Slavic people living ...
, which was used to write
Old Church Slavonic
Old Church Slavonic or Old Slavonic () was the first Slavic languages, Slavic literary language.
Historians credit the 9th-century Byzantine Empire, Byzantine missionaries Saints Cyril and Methodius with Standard language, standardizing the lan ...
. The modern
Cyrillic
, bg, кирилица , mk, кирилица , russian: кириллица , sr, ћирилица, uk, кирилиця
, fam1 = Egyptian hieroglyphs
, fam2 = Proto-Sinaitic
, fam3 = Phoenician
, fam4 = G ...
Belarusian and
Russian
Russian(s) refers to anything related to Russia, including:
*Russians (, ''russkiye''), an ethnic group of the East Slavic peoples, primarily living in Russia and neighboring countries
*Rossiyane (), Russian language term for all citizens and peo ...
alphabets include the letter (''
yo''), although replacing it with the letter without the diacritic is allowed in Russian.
* Since the 1870s, , (
Cyrillic letter ''yi'') has been used in the
Ukrainian alphabet
The Ukrainian alphabet ( uk, абе́тка, áзбука алфа́ві́т, abetka, azbuka alfavit) is the set of letters used to write Ukrainian, which is the official language of Ukraine. It is one of several national variations of the C ...
for
iotated
In Slavic languages, iotation (, ) is a form of palatalization that occurs when a consonant comes into contact with a palatal approximant from the succeeding phoneme. The is represented by iota (ι) in the Cyrillic alphabet and the Greek alphab ...
; plain
і is not iotated . In
Udmurt,
ӥ is used for uniotated , with
и for iotated .
* The form is common in
Dutch
Dutch commonly refers to:
* Something of, from, or related to the Netherlands
* Dutch people ()
* Dutch language ()
Dutch may also refer to:
Places
* Dutch, West Virginia, a community in the United States
* Pennsylvania Dutch Country
People E ...
handwriting and also occasionally used in printed text – but is a form of
the digraph "ij" rather than a modification of the letter .
*
Komi and
Udmurt use (a Cyrillic O with two dots) for
.
* The
Swedish
Swedish or ' may refer to:
Anything from or related to Sweden, a country in Northern Europe. Or, specifically:
* Swedish language, a North Germanic language spoken primarily in Sweden and Finland
** Swedish alphabet, the official alphabet used by ...
,
Finnish
Finnish may refer to:
* Something or someone from, or related to Finland
* Culture of Finland
* Finnish people or Finns, the primary ethnic group in Finland
* Finnish language, the national language of the Finnish people
* Finnish cuisine
See also ...
and
Estonian
Estonian may refer to:
* Something of, from, or related to Estonia, a country in the Baltic region in northern Europe
* Estonians, people from Estonia, or of Estonian descent
* Estonian language
* Estonian cuisine
* Estonian culture
See also ...
languages use and to represent
and
* In the languages of
J.R.R. Tolkien
John Ronald Reuel Tolkien (, ; 3 January 1892 – 2 September 1973) was an English writer and philologist. He was the author of the high fantasy works ''The Hobbit'' and ''The Lord of the Rings''.
From 1925 to 1945, Tolkien was the Rawlins ...
's ''
Middle-Earth
Middle-earth is the fictional setting of much of the English writer J. R. R. Tolkien's fantasy. The term is equivalent to the ''Miðgarðr'' of Norse mythology and ''Middangeard'' in Old English works, including ''Beowulf''. Middle-earth is t ...
'' novels, a diaeresis is used to separate vowels belonging to different syllables (e.g. in ''
Eärendil'') and on final e to mark it as ''not'' a
schwa
In linguistics, specifically phonetics and phonology, schwa (, rarely or ; sometimes spelled shwa) is a vowel sound denoted by the IPA symbol , placed in the central position of the vowel chart. In English and some other languages, it rep ...
(e.g. in ''
Manwë Manwë refers to:
* Manwë (Middle-earth), the husband of the Elvish goddess Varda in Tolkien's mythology
*385446 Manwë
385446 Manwë , or (385446) Manwë–Thorondor , is a binary resonant Kuiper belt object in a 4:7 mean-motion resonance with ...
'', ''
Aulë
The Valar (; singular Vala) are characters in J. R. R. Tolkien's legendarium. They are "angelic powers" or "gods", #154 to Naomi Mitchison, September 1954 subordinate to the one God (Eru Ilúvatar). The Ainulindalë describes how those of the A ...
'', ''
Oromë
The Valar (; singular Vala) are characters in J. R. R. Tolkien's legendarium. They are "angelic powers" or "gods", #154 to Naomi Mitchison, September 1954 subordinate to the one God (Eru Ilúvatar). The Ainulindalë describes how those of the ...
'', etc.). (There is no schwa in these languages but Tolkien wanted to make sure that readers wouldn't mistakenly pronounce one when speaking the names aloud.)
Consonants
Jacaltec (a
Mayan
Mayan most commonly refers to:
* Maya peoples, various indigenous peoples of Mesoamerica and northern Central America
* Maya civilization, pre-Columbian culture of Mesoamerica and northern Central America
* Mayan languages, language family spoken ...
language) and
Malagasy are among the very few languages with a double dot on the letter "n"; in both,
n̈ is the
velar nasal
The voiced velar nasal, also known as agma, from the Greek word for 'fragment', is a type of consonantal sound used in some spoken languages. It is the sound of ''ng'' in English ''sing'' as well as ''n'' before velar consonants as in ''Englis ...
.
In
Udmurt, a double dot is also used with the consonant letters
ӝ (from ж ),
ӟ (from з ) and
ӵ (from ч ).
When distinction is important,
Ḧ
Diacritical marks of two dots , placed side-by-side over or under a letter, are used in a number of languages for several different purposes. The most familiar to English language speakers are the diaeresis and the umlaut, though there are nu ...
and
ẍ are used for representing and in the Kurdish
Kurmanji alphabet (which are otherwise represented by "h" and "x"). These sounds are borrowed from Arabic.
Ẅ and
ÿ: ''Ÿ'' is generally a vowel, but it is used as the (semi-vowel) consonant (a without the use of the lips) in
Tlingit
The Tlingit ( or ; also spelled Tlinkit) are indigenous peoples of the Pacific Northwest Coast of North America. Their language is the Tlingit language (natively , pronounced ), . This sound is also found in
Coast Tsimshian
Tsimshian, known by its speakers as Sm'álgyax, is a dialect of the Tsimshian language spoken in northwestern British Columbia and southeastern Alaska. ''Sm'algyax'' means literally "real or true language."
The linguist Tonya Stebbins estimat ...
, where it is written
ẅ.
A number of languages in
Vanuatu
Vanuatu ( or ; ), officially the Republic of Vanuatu (french: link=no, République de Vanuatu; bi, Ripablik blong Vanuatu), is an island country located in the South Pacific Ocean. The archipelago, which is of volcanic origin, is east of no ...
use double dots on consonants, to represent
linguolabial
Linguolabials or apicolabials are consonants articulated by placing the tongue tip or blade against the upper lip, which is drawn downward to meet the tongue. They represent one extreme of a coronal articulatory continuum which extends from ling ...
(or "apicolabial") phonemes in their orthography. Thus
Araki
Araki may refer to:
People
* Araki (surname) (荒木)
* Hirohiko Araki (荒木 飛呂彦), a Japanese manga artist, fashion designer and illustrator
* Nobuyoshi Araki (荒木 経惟), a Japanese photographer and contemporary artist also known by t ...
contrasts bilabial ''p'' with linguolabial ''p̈'' ; bilabial ''m'' with linguolabial ''m̈'' ; and bilabial ''v'' with linguolabial ''v̈'' .
Seneca
Seneca may refer to:
People and language
* Seneca (name), a list of people with either the given name or surname
* Seneca people, one of the six Iroquois tribes of North America
** Seneca language, the language of the Seneca people
Places Extrat ...
uses for .
In
Arabic
Arabic (, ' ; , ' or ) is a Semitic languages, Semitic language spoken primarily across the Arab world.Semitic languages: an international handbook / edited by Stefan Weninger; in collaboration with Geoffrey Khan, Michael P. Streck, Janet C ...
the letter
ẗ
ẗ is a modified letter of the Latin alphabet, derived from the letter T with a double dot on it. It is used in the ISO 233 transliteration of Arabic to represent ''tāʼ marbūṭa'' (ﺓ, ﺔ), and also in the Uralic Phonetic Alphabet to rep ...
is used in the
ISO 233
The international standard ISO 233 establishes a system for romanization of Arabic and Syriac. It was supplemented by ISO 233-2 in 1993.
1984 edition
The table below shows the consonants for the Arabic language.
ISO 233-2:1993
ISO 233-2: ...
transliteration for the
tāʾ marbūṭah
Taw, tav, or taf is the twenty-second and last letter of the Semitic abjads, including Phoenician Tāw , Hebrew Tav , Aramaic Taw , Syriac Taw ܬ, and Arabic ت Tāʼ (22nd in abjadi order, 3rd in modern order). In Arabic, it is also gives ri ...
used to mark feminine gender in nouns and adjectives.
Syriac Syriac may refer to:
*Syriac language, an ancient dialect of Middle Aramaic
*Sureth, one of the modern dialects of Syriac spoken in the Nineveh Plains region
* Syriac alphabet
** Syriac (Unicode block)
** Syriac Supplement
* Neo-Aramaic languages a ...
uses a two dots above a letter, called
Siyame, to indicate that the word should be understood as plural. For instance, () means "house", while () means "houses". The sign is used especially when no vowel marks are present, which could differentiate between the two forms. Although the origin of the
Siyame is different from that of the diaeresis sign, in modern computer systems both are represented by the same Unicode character. This, however, often leads to wrong rendering of the Syriac text.
The
N'Ko script
N'Ko () is a script devised by Solomana Kante in 1949, as a modern writing system for the Mandé languages of West Africa. The term ''N'Ko'', which means ''I say'' in all Mandé languages, is also used for the Mandé literary standard written i ...
, used to write the
Mandé languages
The Mande languages are spoken in several countries in West Africa by the Mandé peoples and include Maninka, Mandinka, Soninke, Bambara, Kpelle, Dioula, Bozo, Mende, Susu, and Vai. There are "60 to 75 languages spoken by 30 to 40 million ...
of
West Africa
West Africa or Western Africa is the westernmost region of Africa. The United Nations defines Western Africa as the 16 countries of Benin, Burkina Faso, Cape Verde, The Gambia, Ghana, Guinea, Guinea-Bissau, Ivory Coast, Liberia, Mali, Maurit ...
uses a two-dot diacritic (among others) to represent non-native sounds. The dots are slightly larger than those used for diaeresis or umlaut.
Diacritic underneath
The IPA specifies a "subscript umlaut", for example Hindi "potter";
the
ALA-LC romanization
ALA-LC (American Library AssociationLibrary of Congress) is a set of standards for romanization, the representation of text in other writing systems using the Latin script.
Applications
The system is used to represent bibliographic information by ...
system provides for its use and is one of the
main schemes to romanize Persian (for example, rendering as ). The notation was used to write some Asian languages in Latin script, for example
Red Karen Red Karen may refer to:
* Karenni language
* Karenni people
* Karenni States
The Karenni States, also known as Red Karen States, was the name formerly given to the states inhabited mainly by the Red Karen, in the area of present-day Kayah ...
.
Computer encodings
In Unicode
Character encoding
Character encoding is the process of assigning numbers to Graphics, graphical character (computing), characters, especially the written characters of Language, human language, allowing them to be Data storage, stored, Data communication, transmi ...
generally treats the umlaut and the diaeresis as the same diacritic mark.
Unicode
Unicode, formally The Unicode Standard,The formal version reference is is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expre ...
refers to both as diareses without making any distinction, although the term itself
has a more precise literary meaning. For example, represents both ''a-umlaut'' and ''a-diaeresis'', while similar codes are used to represent all such cases.
Unicode encodes a number of cases of "letter with a two dots diacritic" as
precomposed character
A precomposed character (alternatively composite character or decomposable character) is a Unicode entity that can also be defined as a sequence of one or more other characters. A precomposed character may typically represent a letter with a diacri ...
s and these are displayed above. (Unicode uses the term "Diaeresis" for all two-dot diacritics, irrespective of the actual term used for the language in question.) In addition, many more symbols may be composed using the
combining character
In digital typography, combining characters are characters that are intended to modify other characters. The most common combining characters in the Latin script are the combining diacritical marks (including combining accents).
Unicode also ...
facility, , that may be used with any letter or other diacritic to create a customised symbol but this does not mean that the result has any real-world application.
Both the combining character and the pre-composed codepoints may be regarded as an umlaut or a diaeresis according to context. Compound diacritics are possible, for example , used as a
tonal marks for Hanyu Pinyin, which uses both a two dots diacritic with a
caron
A caron (), háček or haček (, or ; plural ''háčeks'' or ''háčky'') also known as a hachek, wedge, check, kvačica, strešica, mäkčeň, varnelė, inverted circumflex, inverted hat, flying bird, inverted chevron, is a diacritic mark ( ...
diacritic. Conversely, when the letter to be accented is an , the diacritic replaces the
tittle
A tittle or superscript dot is a small distinguishing mark, such as a diacritic in the form of a dot on a letter (for example, lowercase ''i'' or ''j''). The tittle is an integral part of the glyph of ''i'' and ''j'', but diacritic dots can ap ...
, thus: .
Sometimes, there's a need to distinguish between the umlaut sign and the diaeresis sign. For instance, either may
appear in a German name. ISO/IEC JTC 1/SC 2/WG 2 recommends the following for these cases:
* To represent the umlaut use Combining Diaeresis (U+0308)
* To represent the diaeresis use
Combining Grapheme Joiner
The combining grapheme joiner (CGJ), is a Unicode character that has no visible glyph and is "default ignorable" by applications. Its name is a misnomer and does not describe its function: the character does not join graphemes. Its purpose is to s ...
(CGJ, U+034F) + Combining Diaeresis (U+0308)
The same advice can be found in the official Unicode FAQ.
Since version 3.2.0, Unicode also provides which can produce the older umlaut typography.
Unicode provides a combining double dot below as .
Finally, for use with the
N'Ko script
N'Ko () is a script devised by Solomana Kante in 1949, as a modern writing system for the Mandé languages of West Africa. The term ''N'Ko'', which means ''I say'' in all Mandé languages, is also used for the Mandé literary standard written i ...
, there is .
In ASCII, ISO/IEC 646 and ISO 8859
ASCII
ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of ...
, a seven-bit code with just 95 "printable" characters, has no provision for any kind of dot diacritic. Subsequent standardisation treated ASCII as the US national variant of
ISO/IEC 646
ISO/IEC 646 is a set of ISO/IEC standards, described as ''Information technology — ISO 7-bit coded character set for information interchange'' and developed in cooperation with ASCII at least since 1964. Since its first edition in 1 ...
: the French, German and other national variants
reassigned a few code points to specific vowels with diacritics, as precomposed characters.
The subsequent (eight bit)
ISO 8859-1
ISO/IEC 8859-1:1998, ''Information technology — 8-bit single-byte coded graphic character sets — Part 1: Latin alphabet No. 1'', is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1 ...
character encoding includes the letters ''ä'', ''ë'', ''ï'', ''ö'', ''ü'', and their respective
capital
Capital may refer to:
Common uses
* Capital city, a municipality of primary status
** List of national capital cities
* Capital letter, an upper-case letter Economics and social sciences
* Capital (economics), the durable produced goods used f ...
forms, as well as ''ÿ'' in
lower case
Letter case is the distinction between the letters that are in larger uppercase or capitals (or more formally ''majuscule'') and smaller lowercase (or more formally ''minuscule'') in the written representation of certain languages. The writing ...
only, with ''Ÿ'' added in the revised edition
ISO 8859-15
ISO/IEC 8859-15:1999, ''Information technology — 8-bit single-byte coded graphic character sets — Part 15: Latin alphabet No. 9'', is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1999. ...
and
Windows-1252
Windows-1252 or CP-1252 ( code page 1252) is a single-byte character encoding of the Latin alphabet, used by default in the legacy components of Microsoft Windows for English and many European languages including Spanish, French, and German.
It ...
.
These standards are technically obsolete, having been replaced by Unicode.
Computer usage
Character encoding
Character encoding is the process of assigning numbers to Graphics, graphical character (computing), characters, especially the written characters of Language, human language, allowing them to be Data storage, stored, Data communication, transmi ...
generally treats the umlaut and the diaeresis as the same diacritic mark.
Unicode
Unicode, formally The Unicode Standard,The formal version reference is is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expre ...
refers to both as diareses without making any distinction, although the term itself
has a more precise literary meaning. For example, represents both ''a-umlaut'' and ''a-diaeresis'', while similar codes are used to represent all such cases.
Keyboard input
If letters with double dots are not present on the keyboard (or if they are not recognized by the operating system), there are a number of ways to input them into a computer system.
Apple MacOS, iOS
iOS provides accented letters through press-and-hold on most European Latin-script keyboards, including English. Some keyboard layouts feature combining-accent keys that can add accents to any appropriate letter. A letter with double dots can be produced by pressing , then the letter. This works on English and other keyboards and is documented further in the supplied manuals.
Google ChromeOS
For
ChromeOS
ChromeOS, sometimes stylized as chromeOS and formerly styled as Chrome OS, is a Linux-based operating system designed by Google. It is derived from the open-source ChromiumOS and uses the Google Chrome web browser as its principal user interfac ...
with
US-International
QWERTY () is a keyboard layout for Latin-script alphabets. The name comes from the order of the first six keys on the top left letter row of the keyboard ( ). The QWERTY design is based on a layout created for the Sholes and Glidden t ...
keyboard setting, the combination is . For ChromeOS with
UK extended
QWERTY () is a keyboard layout for Latin-script alphabets. The name comes from the order of the first six Computer keyboard keys#Types, keys on the top left letter row of the keyboard ( ). The QWERTY design is based on a layout created f ...
setting, use , release, then the letter. Alternatively, the Unicode codepoint may be entered directly, using , release, then the four-digit code, then or .
Linux
In some
Linux
Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which ...
desktop environment
In computing, a desktop environment (DE) is an implementation of the desktop metaphor made of a bundle of programs running on top of a computer operating system that share a common graphical user interface (GUI), sometimes described as a graphica ...
s a letter with double dots can be produced by pressing , then the letter. When the system has a
compose key
A compose key (sometimes called multi key) is a key on a computer keyboard that indicates that the following (usually 2 or more) keystrokes trigger the insertion of an alternate character, typically a precomposed character or a symbol.
For insta ...
, the same procedure as that described at X-Windows (below) may be used.
Microsoft Windows
AZERTY
AZERTY () is a specific keyboard layout, layout for the characters of the Latin alphabet on typewriter keys and computer keyboard (computing), keyboards. The layout takes its name from the first six letter (alphabet), letters to appear on the fir ...
and
QZERTY
A keyboard layout is any specific physical, visual or functional arrangement of the keys, legends, or key-meaning associations (respectively) of a computer keyboard, mobile phone, or other computer-controlled typographic keyboard.
is the actua ...
keyboards (as used in much of Europe) include
precomposed character
A precomposed character (alternatively composite character or decomposable character) is a Unicode entity that can also be defined as a sequence of one or more other characters. A precomposed character may typically represent a letter with a diacri ...
s (accented letters) as standard and these are fully supported by
Microsoft Windows
Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for serv ...
, typically accessed using the
AltGr
AltGr (also Alt Graph) is a modifier key found on many computer keyboards (rather than a second Alt key found on US keyboards). It is primarily used to type characters that are not widely used in the territory where sold, such as foreign cur ...
key.
For users with a
US keyboard layout, Windows includes a setting
"US International", which supports creation of accented letters by changing the function of some keys into
dead key
A dead key is a special kind of modifier key on a mechanical typewriter, or computer keyboard, that is typically used to attach a specific diacritic to a base letter. The dead key does not generate a (complete) character by itself, but modifies th ...
s. If the user enters ", nothing will appear on screen, until the user types another character, after which the characters will be merged if possible, or added independently at once if not. Otherwise, the desired character may be generated using the Alt table above.
For users in the United Kingdom and Ireland with
QWERTY
QWERTY () is a keyboard layout for Latin-script alphabets. The name comes from the order of the first six Computer keyboard keys#Types, keys on the top left letter row of the keyboard ( ). The QWERTY design is based on a layout created f ...
keyboards, Windows has an "
Extended" setting such that an accented letter can be created using then the base letter.
When using
Microsoft Word
Microsoft Word is a word processing software developed by Microsoft. It was first released on October 25, 1983, under the name ''Multi-Tool Word'' for Xenix systems. Subsequent versions were later written for several other platforms includin ...
or
Outlook
Outlook or The Outlook may refer to:
Computing
* Microsoft Outlook, an e-mail and personal information management software product from Microsoft
* Outlook.com, a web mail service from Microsoft
* Outlook on the web, a suite of web applications ...
, a letter with double dots can be produced by pressing and then the letter.
On Microsoft Windows
keyboard layout
A keyboard layout is any specific physical, visual or functional arrangement of the keys, legends, or key-meaning associations (respectively) of a computer keyboard, mobile phone, or other computer-controlled typographic keyboard.
is the actua ...
s that do not have double dotted characters, one may use
Windows Alt keycodes
On personal computers with numeric keypads that use Microsoft operating systems, such as Windows, many characters that do not have a dedicated key combination on the keyboard may nevertheless be entered using the Alt code (the Alt numpad input me ...
. Double dots are then entered by pressing the left Alt key, and entering the full decimal value of the character's position in the
Windows code page
Windows code pages are sets of characters or code pages (known as character encodings in other operating systems) used in Microsoft Windows from the 1980s and 1990s. Windows code pages were gradually superseded when Unicode was implemented in Wind ...
on the numeric keypad, provided that the compatible code page is used as a system code page. One can also use numbers from
Code page 850
Code page 850 (CCSID 850) (also known as CP 850, IBM 00850, OEM 850, DOS Latin 1) is a code page used under DOS and Psion's EPOC16 operating systems in Western Europe. Depending on the country setting and system configuration, code page 850 i ...
; these are used without a leading 0.
X Window System
X-based systems with a
Compose key
A compose key (sometimes called multi key) is a key on a computer keyboard that indicates that the following (usually 2 or more) keystrokes trigger the insertion of an alternate character, typically a precomposed character or a symbol.
For insta ...
set in the system can usually insert characters with double dots by typing , (i.e.
"
) followed by the letter. , may also work, depending on the system's set-up. However, most modern UNIX-like systems also accept the sequence to initiate the direct input of a Unicode value. Thus, typing ,
00F6
, finishing with or , will insert
ö
into the document.
Dedicated keys
The German keyboard has dedicated keys for ''ü ö ä''. Scandinavian and Turkish keyboards have dedicated keys for their respective language-specific letters, including ''ö'' for Swedish, Finnish, and Icelandic, and both ''ö'' and ''ü'' for Turkish. French and Belgian
AZERTY
AZERTY () is a specific keyboard layout, layout for the characters of the Latin alphabet on typewriter keys and computer keyboard (computing), keyboards. The layout takes its name from the first six letter (alphabet), letters to appear on the fir ...
keyboards have a
dead key
A dead key is a special kind of modifier key on a mechanical typewriter, or computer keyboard, that is typically used to attach a specific diacritic to a base letter. The dead key does not generate a (complete) character by itself, but modifies th ...
which adds a circumflex (if without Shift) or a diaeresis/umlaut (if with Shift) to the letter key immediately following (for instance Shift-^ followed by e gives ë).
Other scripts
For non-Latin scripts, Greek and Russian use press-and-hold for double-dot diacritics on only a few characters. The Greek keyboard has dialytica and dialytica–tonos variants for upsilon and iota (ϋ ΰ ϊ ΐ), but not for ε ο α η ω, following modern monotonic usage. Russian keyboards feature separate keys for е and ё.
On-screen keyboards
The early 21st century has seen noticeable growth in stylus- and touch-operated interfaces, making the use of on-screen keyboards operated by pointing devices (mouse, stylus, or finger) more important. These "soft" keyboards may replicate the modifier keys found on hardware keyboards, but they may also employ other means of selecting options from a base key, such as right-click or press-and-hold. Soft keyboards may also have multiple contexts, such as letter, numeric, and symbol.
HTML
In
HTML
The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScri ...
, vowels with double dots can be entered with an entity reference of the form
&?uml;
, where
?
can be any of
a
,
e
,
i
,
o
,
u
,
y
or their
majuscule
Letter case is the distinction between the letters that are in larger uppercase or capitals (or more formally ''majuscule'') and smaller lowercase (or more formally ''minuscule'') in the written representation of certain languages. The writing ...
counterparts. With the exception of the uppercase ''Ÿ'', these characters are also available in all of the
ISO 8859
ISO/IEC 8859 is a joint ISO and IEC series of standards for 8-bit character encodings. The series of standards consists of numbered parts, such as ISO/IEC 8859-1, ISO/IEC 8859-2, etc. There are 15 parts, excluding the abandoned ISO/IEC 8859-12 ...
character sets and thus have the same codepoints in
ISO-8859-1
ISO/IEC 8859-1:1998, ''Information technology — 8-bit single-byte coded graphic character sets — Part 1: Latin alphabet No. 1'', is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1 ...
(
-2,
-3,
-4,
-9,
-10,
-13,
-14,
-15,
-16) and
Unicode
Unicode, formally The Unicode Standard,The formal version reference is is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expre ...
. The uppercase ''Ÿ'' is available in ISO 8859-15 and Unicode, and Unicode provides a number of other letters with double dots as well.
Note: when replacing umlaut characters with plain
ASCII
ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of ...
, use ''ae'', ''oe'', etc. for the German language, and the simple character replacements for all other languages.
TeX and LaTeX
TeX
Tex may refer to:
People and fictional characters
* Tex (nickname), a list of people and fictional characters with the nickname
* Joe Tex (1933–1982), stage name of American soul singer Joseph Arrington Jr.
Entertainment
* ''Tex'', the Italian ...
(and its derivatives, most notably
LaTeX
Latex is an emulsion (stable dispersion) of polymer microparticles in water. Latexes are found in nature, but synthetic latexes are common as well.
In nature, latex is found as a milky fluid found in 10% of all flowering plants (angiosperms ...
) also allows double dots to be placed over letters. The standard way is to use the control sequence
\"
followed by the relevant letter, e.g.
\"u
. It is good practice to set the sequence off with curly braces:
or
\"
.
TeX
Tex may refer to:
People and fictional characters
* Tex (nickname), a list of people and fictional characters with the nickname
* Joe Tex (1933–1982), stage name of American soul singer Joseph Arrington Jr.
Entertainment
* ''Tex'', the Italian ...
's "German" package can be used: it adds the
"
control sequence (without the backslash) to produce the Umlaut. However, this can cause conflicts if the main language of the document is not German. Since the integration of Unicode through the development of
XeTeX
XeTeX (
or ; see also Pronouncing and writing "TeX") is a TeX typesetting engine using Unicode and supporting modern font technologies such as OpenType, Graphite and Apple Advanced Typography (AAT). It was originally written by Jonathan Kew ...
and
XeLaTeX
XeTeX (
or ; see also Pronouncing and writing "TeX") is a TeX typesetting engine using Unicode and supporting modern font technologies such as OpenType, Graphite and Apple Advanced Typography (AAT). It was originally written by Jonathan Kew ...
, it is also possible to input the Unicode character directly into the document, using one of the recognized methods such as
Compose key
A compose key (sometimes called multi key) is a key on a computer keyboard that indicates that the following (usually 2 or more) keystrokes trigger the insertion of an alternate character, typically a precomposed character or a symbol.
For insta ...
or direct
Unicode input
Unicode input is the insertion of a specific Universal Character Set characters, Unicode character on a computer by a user (computing), user; it is a common way to input characters not directly supported by a physical Keyboard (computing), keybo ...
.
TeX
Tex may refer to:
People and fictional characters
* Tex (nickname), a list of people and fictional characters with the nickname
* Joe Tex (1933–1982), stage name of American soul singer Joseph Arrington Jr.
Entertainment
* ''Tex'', the Italian ...
's traditional control sequences can still be used and will produce the same output (in very early versions of
TeX
Tex may refer to:
People and fictional characters
* Tex (nickname), a list of people and fictional characters with the nickname
* Joe Tex (1933–1982), stage name of American soul singer Joseph Arrington Jr.
Entertainment
* ''Tex'', the Italian ...
these sequences would produce double dots that were too far above the letter's body).
All these methods can be used with all available font variations (italic, bold etc.).
See also
*
Dot (diacritic)
When used as a diacritic mark, the term dot is usually reserved for the ''interpunct'' ( · ), or to the glyphs "combining dot above" ( ◌̇ ) and "combining dot below" ( ◌̣ )
which may be combined with some letters of the ...
*
Two dots (disambiguation)
The term two dots or double dot may refer to:
Orthography
* Colon (punctuation), the punctuation mark ()
* Two dots (diacritic), a mark used with a base letter to indicate that its pronunciation is somehow modified ()
** Diaeresis (diacritic), ...
Notes
References
External links
{{Latin script, , diaeresis
Latin-script diacritics
Greek-script diacritics
Cyrillic-script diacritics