Unicode
Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
supports several
phonetic scripts and notation systems through its existing scripts and the addition of extra
blocks with phonetic characters. These phonetic characters are derived from an existing script, usually Latin, Greek or Cyrillic. Apart from the
International Phonetic Alphabet
The International Phonetic Alphabet (IPA) is an alphabetic system of phonetic notation based primarily on the Latin script. It was devised by the International Phonetic Association in the late 19th century as a standard written representation ...
(IPA),
extensions to the IPA
The Extensions to the International Phonetic Alphabet for Disordered Speech, commonly abbreviated extIPA , are a set of letters and diacritics devised by the International Clinical Phonetics and Linguistics Association to augment the Internati ...
and
obsolete and nonstandard IPA symbols, these blocks also contain characters from the
Uralic Phonetic Alphabet
Finno-Ugric transcription (FUT) or the Uralic Phonetic Alphabet (UPA) is a phonetic transcription or notational system used predominantly for the transcription and reconstruction of Uralic languages. It was first published in 1901 by Eemil Nesto ...
and the
Americanist Phonetic Alphabet.
Phonetic scripts
The
International Phonetic Alphabet (IPA) makes use of letters from other writing systems as most phonetic scripts do. IPA notably uses Latin, Greek and Cyrillic characters. Combining diacritics also add meaning to the phonetic text. Finally, these phonetic alphabets make use of modifier letters, that are specially constructed for phonetic meaning. A "modifier letter" is strictly intended not as an independent grapheme but as a modification of the preceding character
resulting in a distinct grapheme, notably in the context of the International Phonetic Alphabet. For example,
ʰ should not occur on its own but modifies the preceding or following symbol. Thus, is a single IPA symbol, distinct from . In practice, however, several of these "modifier letters" are also used as full graphemes, e.g.
ʿ as transliterating Semitic
ayin
''Ayin'' (also ''ayn'' or ''ain''; transliterated ) is the sixteenth letter of the Semitic scripts, including Phoenician ''ʿayin'' 𐤏, Hebrew ''ʿayin'' , Aramaic ''ʿē'' 𐡏, Syriac ''ʿē'' ܥ, and Arabic ''ʿayn'' (where it is si ...
or Hawaiian
okina, or
˚ transliterating Abkhaz
ә.
From to Unicode
Consonants
The following tables indicates the Unicode code point sequences for phonemes as used in the
International Phonetic Alphabet
The International Phonetic Alphabet (IPA) is an alphabetic system of phonetic notation based primarily on the Latin script. It was devised by the International Phonetic Association in the late 19th century as a standard written representation ...
. A bold code point indicates that the Unicode chart provides an application note such as "voiced retroflex lateral" for . An entry in bold italics indicates the character name itself refers to a phoneme such as
Vowels
The following figures depict the phonetic vowels and their Unicode / UCS code points, arranged to represent the phonetic
vowel trapezium. Vowels appearing in pairs in the figure to the right indicate rounded and unrounded variations respectively. Again, characters with Unicode names referring to phonemes are indicated by bold text. Those with explicit application notes are indicated by bold italic text. Those from borrowed unchanged from another script (Latin, Greek or Cyrillic) are indicated by italics. Before and after a bullet are the
unrounded • rounded vowels.
Diacritics
Diacritics may be encoded as either
modifier (e.g. ˳) or
combining (e.g.
◌̥) characters.
Unicode blocks
*
Basic Latin (0020–007E),
IPA example:
Palatal approximant
The voiced palatal approximant is a type of consonant used in many spoken languages. The symbol in the International Phonetic Alphabet that represents this sound is ; the equivalent X-SAMPA symbol is j, and in the Americanist phonetic notation i ...
(006A)
*
Latin-1 Supplement
The Latin-1 Supplement (also called C1 Controls and Latin-1 Supplement) is the second Unicode block in the Unicode standard. It encodes the upper range of ISO 8859-1: 80 (U+0080) – FF (U+00FF). C1 Controls (0080–009F) are not graphic. T ...
(0080–00FF),
IPA example:
Voiceless palatal fricative (00E7)
*
Latin Extended-A
Latin Extended-A is a Unicode block and is the third block of the Unicode standard. It encodes Latin letters from the Latin ISO character sets other than Latin-1 (which is already encoded in the Latin-1 Supplement block) and also legacy characte ...
(0100–017F),
IPA example:
Velar nasal
The voiced velar nasal, also known as eng, engma, or agma (from Greek 'fragment'), is a type of consonantal sound used in some spoken languages. It is the sound of ''ng'' in English ''sing'' as well as ''n'' before velar consonants as in ''E ...
(014B)
*
Latin Extended-B
Latin Extended-B is the fourth block (0180-024F) of the Unicode Standard. It has been included since version 1.0, where it was only allocated to the code points 0180-01FF and contained 113 characters. During unification with ISO 10646 for version ...
(0180–024F),
IPA example:
Tenuis dental click (01C0 0287)
*
IPA Extensions
IPA Extensions is a block (U+0250–U+02AF) of the Unicode standard that contains full size letters used in the International Phonetic Alphabet (IPA). Both modern and historical characters are included, as well as former and proposed IPA signs ...
(0250–02AF),
IPA example:
Voiced retroflex fricative
The voiced retroflex sibilant fricative is a type of consonantal sound, used in some spoken languages. The symbol in the International Phonetic Alphabet that represents this sound is , and the equivalent X-SAMPA symbol is z`. Like all the retro ...
(0290)
*
Spacing Modifier Letters
Spacing Modifier Letters is a Unicode block containing characters for the IPA, UPA, and other phonetic transcriptions. Included are the IPA tone marks, and modifiers for aspiration and palatalization. The word ''spacing'' indicates that these ...
(02B0–02FF),
IPA example:
Palatal ejective (0063 02BC)
*
Combining Diacritical Marks
Combining Diacritical Marks is a Unicode block containing the most common combining characters. It also contains the character " Combining Grapheme Joiner", which prevents canonical reordering of combining characters, and despite the name, actua ...
(0300–036F),
IPA example:
Voiceless bilabial nasal (006D 0325)
*
Greek and Coptic
Greek and Coptic is the Unicode block for representing modern (monotonic) Greek. It was originally also used for writing Coptic, using the similar Greek letters in addition to the uniquely Coptic additions. Beginning with version 4.1 of the Un ...
(0370–03FF),
IPA example:
Voiceless dental fricative
The voiceless dental non-sibilant fricative is a type of consonantal sound used in some spoken languages. It is familiar to most English speakers as the 'th' in ''think''. Though rather rare as a phoneme among the world's languages, it is encount ...
(03B8)
*
Combining Diacritical Marks Extended
Combining Diacritical Marks Extended is a Unicode block containing diacritical marks used in German dialectology (Teuthonista
Teuthonista is a phonetic transcription system used predominantly for the transcription of High German languages, (Hig ...
(1AB0–1AFF),
extIPA
The Extensions to the International Phonetic Alphabet for Disordered Speech, commonly abbreviated extIPA , are a set of letters and diacritics devised by the International Clinical Phonetics and Linguistics Association to augment the Internati ...
examples: combining parentheses
*
Combining Diacritical Marks Supplement (1DC0–1DFF),
IPA example: Rising-falling contour tone (1DC8)
*
General Punctuation
General Punctuation is a Unicode block containing punctuation, spacing, and formatting characters for use with all scripts and writing systems. Included are the defined-width spaces, joining formats, directional formats, smart quotes, archaic ...
(2000–206F),
IPA example:
Linking (absence of a break) (203F)
*
Superscripts and Subscripts
Superscripts and Subscripts is a Unicode block containing superscript and subscript numerals, mathematical operators, and letters used in mathematics and phonetics. The use of subscripts and superscripts in Unicode allows any polynomial, chemic ...
(2070–209F),
IPA example:
Nasal release
In phonetics, a nasal release is the release of a stop consonant into a nasal. Such sounds are transcribed in the International Phonetic Alphabet with superscript nasal letters, for example as in English ''catnip'' . In English words such as ''s ...
(207F)
*
Arrows (2190–21FF),
IPA example:
Global rise (2197)
*
Latin Extended-C
Latin Extended-C is a Unicode block containing Latin characters for Uighur New Script, the Uralic Phonetic Alphabet, Shona, Claudian Latin and the Swedish Dialect Alphabet.
Block
History
The following Unicode-related documents record the ...
(2C60–2C7F),
IPA example:
Labiodental flap (2C71)
*
Modifier Tone Letters (A700–A71F),
IPA example:
Upstep
In linguistics, upstep is a phonemic or phonetic upward shift of tone between the syllables or words of a tonal language. It is best known in the tonal languages of Sub-Saharan Africa. Upstep is a much rarer phenomenon than its counterpart, down ...
(A71B)
*
Phonetic Extensions
Phonetic Extensions is a Unicode block containing phonetic characters used in the Uralic Phonetic Alphabet, Old Irish phonetic notation, the ''Oxford English Dictionary'' and American dictionaries, and Americanist and Russianist phonetic notat ...
(1D00–1D7F)
*
Phonetic Extensions Supplement
Phonetic Extensions Supplement is a Unicode block containing characters for specialized and deprecated forms of the International Phonetic Alphabet
The International Phonetic Alphabet (IPA) is an alphabetic system of phonetic notation based ...
(1D80–1DBF)
*
Latin Extended-D
Latin Extended-D is a Unicode block containing Latin (script), Latin characters for phonetic, Mayanist, and Medieval transcription and notation systems. 89 of the characters in this block are for medieval characters proposed by the Medieval Unic ...
(A720–A7FF),
extIPA
The Extensions to the International Phonetic Alphabet for Disordered Speech, commonly abbreviated extIPA , are a set of letters and diacritics devised by the International Clinical Phonetics and Linguistics Association to augment the Internati ...
example:
Voiceless retroflex lateral fricative
The voiceless retroflex lateral fricative is a type of consonantal sound, used in some spoken languages. The "implicit" IPA letter for this sound, ,Kirk Miller & Michael AshbyL2/20-252RUnicode request for IPA modifier-letters (a), pulmonic is ...
(A78E)
*
Latin Extended-E
Latin Extended-E is a Unicode block containing Latin script characters used in German dialectology (Teuthonista), Anthropos (journal), Anthropos alphabet, Yakut scripts, Sakha and Americanist phonetic notation, Americanist usage.
Block
Histo ...
(AB30–AB6F),
IPA example:
Voiceless retroflex affricate
The voiceless retroflex sibilant affricate is a type of consonantal sound, used in some spoken languages. The symbol in the International Phonetic Alphabet that represents this sound is , sometimes simplified to or , and the equivalent X-SAMPA ...
ligature (AB67)
*
Latin Extended-F
Latin Extended-F is a Unicode block containing modifier letters, nearly all IPA and extIPA, for phonetic transcription. The Latin Extended-F and -G blocks contain the first Latin characters defined outside of the Basic Multilingual Plane (BMP). ...
(10780–107BF)
*
Latin Extended-G
Latin Extended-G is a Unicode block containing additional characters for phonetic transcription. The Latin Extended-F and -G blocks contain the first Latin characters defined outside of the Basic Multilingual Plane
In the Unicode standard, a p ...
(1DF00–1DFFF),
extIPA
The Extensions to the International Phonetic Alphabet for Disordered Speech, commonly abbreviated extIPA , are a set of letters and diacritics devised by the International Clinical Phonetics and Linguistics Association to augment the Internati ...
example:
Voiceless palatal lateral fricative
The voiceless palatal lateral fricative is a type of consonantal sound, used in a few spoken languages.
This sound is somewhat rare; Dahalo has both a palatal lateral fricative and an affricate
An affricate is a consonant that begins as ...
(1DF06)
Unicode blocks with many phonetic symbols
Six
Unicode block
A Unicode block is one of several contiguous ranges of numeric character codes (code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation purposes. Typically, proposals such as the ...
s contain many phonetic symbols:
IPA Extensions (U+0250–02AF)
Spacing Modifier Letters (U+02B0–02FF)
The characters in the "Spacing Modifier Letters" block are intended as forming a unity with the preceding letter (which they "modify"). E.g. the character isn't intended simply as a superscript ''h'' (
h), but as the mark of aspiration placed after the letter being aspirated, as in "
aspirated voiceless bilabial plosive
The voiceless bilabial plosive or stop is a type of consonantal sound used in most Speech communication, spoken languages. The symbol in the International Phonetic Alphabet that represents this sound is , and the equivalent X-SAMPA symbol is p.
F ...
". The block contains:
*Latin superscript modifier letters: (U+02B0–U+02B8): ʰ aspiration; ʱ breathy voice, murmured; ʲ palatalization; ʳ, ʴ, ʵ, ʶ r-coloring or r-offglides; ʷ labialization; ʸ palatalization,
Americanist usage for U+02B2
*Miscellaneous phonetic modifiers: (U+02B9–U+02D7): ʹ ʺ ʻ ʼ ʽ ʾ ʿ ˀ ˁ ˂ ˃ ˄ ˅ ˆ ˇ ˈ ˉ ˊ ˋ ˌ ˍ ˎ ˏ ː ˑ ˒ ˓ ˔ ˕ ˖ ˗
*Spacing clones of diacritics: (U+02D8–U+02DD): ˘
breve
A breve ( , less often , grammatical gender, neuter form of the Latin "short, brief") is the diacritic mark , shaped like the bottom half of a circle. As used in Ancient Greek, it is also called , . It resembles the caron (, the wedge or in ...
; ˙
dot above
When used as a diacritic mark, the term dot refers to the glyphs "combining dot above" (, and "combining dot below" (
which may be combined with some letters of the extended Latin alphabets in use in
a variety of languages. Similar marks are ...
; ˚
ring above
A ring diacritic may appear above or below letters. It may be combined with some letters of the extended Latin alphabets in various contexts.
Rings
Distinct letter
The character Å (å) is derived from an A with a ring. It is a distinct le ...
; ˛
ogonek
The tail or ( ; Polish: , "little tail", diminutive of ) is a diacritic hook placed under the lower right corner of a vowel in the Latin alphabet used in several European languages, and directly under a vowel in several Native American langu ...
; ˜
small tilde; ˝
double acute accent
The double acute accent () is a diacritic mark of the Latin and Cyrillic scripts. It is used primarily in Hungarian or Chuvash, and consequently it is sometimes referred to by typographers as hungarumlaut. The signs formed with a regular umlau ...
*Additions based on 1989 IPA: (U+02DE–U+02E4): ˞ ˟ ˠ ˡ ˢ ˣ ˤ
*
Tone letters: (U+02E5–U+02E9): ˥ ˦ ˧ ˨ ˩
*Extended
Bopomofo
Bopomofo, also called Zhuyin Fuhao ( ; ), or simply Zhuyin, is a Chinese transliteration, transliteration system for Standard Chinese and other Sinitic languages. It is the principal method of teaching Chinese Mandarin pronunciation in Taiwa ...
tone marks: ;
*IPA modifiers: , unaspirated
*Other modifier letters: for
Nenets
*
Uralic Phonetic Alphabet
Finno-Ugric transcription (FUT) or the Uralic Phonetic Alphabet (UPA) is a phonetic transcription or notational system used predominantly for the transcription and reconstruction of Uralic languages. It was first published in 1901 by Eemil Nesto ...
(UPA) modifiers: (U+02EF–U+02FF): ˯ ˰ ˱ ˲ ˳ ˴ ˵ ˶ ˷ ˸ ˹ ˺ ˻ ˼ ˽ ˾ ˿
Phonetic Extensions (U+1D00–1D7F)
This block, together with Phonetic Extensions Supplement below, contains:
* Small capitals "ɢ ɪ ɴ ɶ ʀ ʏ ʙ ʜ ʟ"
* Turned small letters "ɐ ɥ ɯ ɹ ɺ ɻ ʇ ʌ ʍ ʎ ʞ ʮ ʯ"
* Extra small capitals "ʁ ʛ ᴀ ᴁ ᴃ ᴄ ᴅ ᴆ ᴇ ᴊ ᴋ ᴌ ᴍ ᴎ ᴏ ᴐ ᴘ ᴙ ᴚ ᴛ ᴜ ᴠ ᴡ ᴢ ᴣ ᴦ ᴧ ᴨ ᴩ ᴪ"
* Letters with palatal hooks "ƫ ᶀ ᶁ ᶂ ᶃ ᶄ ᶅ ᶆ ᶇ ᶈ ᶉ ᶊ ᶋ ᶌ ᶍ ᶎ ᶪ ᶵ"
* Letters with retroflex hooks "ᶏ ᶐ ᶒ ᶓ ᶔ ᶕ ᶖ ᶗ ᶘ ᶙ ᶚ ᶩ ᶯ ᶼ"
Phonetic Extensions Supplement (U+1D80–1DBF)
Modifier Tone Letters (U+A700–A71F)
Superscripts and Subscripts (U+2070–209F)
Font support for IPA
Input by selection from a screen

Many systems provide a way to select Unicode characters visually.
ISO/IEC 14755 refers to this as a ''screen-selection entry method''.
Microsoft Windows
Windows is a Product lining, product line of Proprietary software, proprietary graphical user interface, graphical operating systems developed and marketed by Microsoft. It is grouped into families and subfamilies that cater to particular sec ...
has provided a Unicode version of the Character Map program (find it by hitting then type
charmap
then hit ) since version NT 4.0 – appearing in the consumer edition since XP. This is limited to characters in the
Basic Multilingual Plane
In the Unicode standard, a plane is a contiguous group of 65,536 (216) code points. There are 17 planes, identified by the numbers 0 to 16, which corresponds with the possible values 00–1016 of the first two positions in six position hexadecimal ...
(BMP). Characters are searchable by Unicode character name, and the table can be limited to a particular code block. More advanced third-party tools of the same type are also available (a notable
freeware
Freeware is software, often proprietary, that is distributed at no monetary cost to the end user. There is no agreed-upon set of rights, license, or EULA that defines ''freeware'' unambiguously; every publisher defines its own rules for the free ...
example is
BabelMap).
macOS
macOS, previously OS X and originally Mac OS X, is a Unix, Unix-based operating system developed and marketed by Apple Inc., Apple since 2001. It is the current operating system for Apple's Mac (computer), Mac computers. With ...
provides a "character palette" with much the same functionality, along with searching by related characters, glyph tables in a font, etc. It can be enabled in the input menu in the menu bar under System Preferences → International → Input Menu (or System Preferences → Language and Text → Input Sources) or can be viewed under Edit → Emoji & Symbols in many programs.
Equivalent tools – such as
gucharmap (
GNOME
A gnome () is a mythological creature and diminutive spirit in Renaissance magic and alchemy, introduced by Paracelsus in the 16th century and widely adopted by authors, including those of modern fantasy literature. They are typically depict ...
) or
kcharselect (
KDE
KDE is an international free software community that develops free and open-source software. As a central development hub, it provides tools and resources that enable collaborative work on its projects. Its products include the KDE Plasma gra ...
) – exist on most Linux desktop environments.
See also
*
Unicode symbols
*
Universal Character Set characters
The Unicode Consortium and the ISO/IEC JTC 1/SC 2/ WG 2 jointly collaborate on the list of the characters in the Universal Coded Character Set. The Universal Coded Character Set, most commonly called the Universal Character Set ( UCS, offici ...
*
Latin script in Unicode
Over a thousand characters from the Latin script are encoded in the Unicode Standard, grouped in several basic and extended Latin blocks. The extended ranges contain mainly precomposed letters plus diacritics that are equivalently encoded with c ...
*
IPA
References
External links
Links to PDFs of Unicode codes for several phonetic symbol sets
{{IPA navigation
Unicode
Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
*