Unicode
Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, ...
has
subscripted and superscripted versions of a number of characters including a full set of
Arabic numerals.
These characters allow any
polynomial,
chemical and certain other
equations to be represented in plain text without using any form of
markup
Markup or mark-up can refer to:
* Markup language, a standardized set of notations used to annotate a plain-text document's content to give information regarding the structure of the text or instructions for how it is to be displayed
** Lightweigh ...
like
HTML
The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScri ...
or
TeX.
The
World Wide Web Consortium and the
Unicode Consortium have made recommendations on the choice between using markup and using superscript and subscript characters:
When used in mathematical context ( MathML) it is recommended to consistently use style markup for superscripts and subscripts.... However, when super and sub-scripts are to reflect semantic distinctions, it is easier to work with these meanings encoded in text rather than markup, for example, in phonetic
Phonetics is a branch of linguistics that studies how humans produce and perceive sounds, or in the case of sign languages, the equivalent aspects of sign. Linguists who specialize in studying the physical properties of speech are phoneticians. ...
or phonemic transcription.
Uses
The ''intended'' use
[ when these characters were added to Unicode was to allow chemical and algebra formulas and phonetics to be written without markup, but produce true superscripts and subscripts. Thus "Hâ‚‚O" (using a subscript character) is ''supposed'' to be identical to "H2O" (with subscript markup).
In reality most fonts that include these characters ignore the Unicode definition, and design the digits for mathematical numerator and denominator glyphs, which are smaller than normal characters but are aligned with the ]cap line
In typography, cap height is the height of a capital letter above the baseline for a particular typeface.http://pfaedit.sourceforge.net/glossary.html Glossary of (some) Typographic Terms It specifically is the height of capital letters that are fl ...
and the baseline, respectively. When used with the solidus, these glyphs are useful for making arbitrary diagonal fractions (similar to the ½ glyph). Making fractions using existing software super/subscripts requires many characters and does not look like the rendered fraction (example: 1/2), so font designers provided this alternative. This also makes the superscript letters useful for ordinal indicators, more closely matching the ª and º characters. However it makes them incorrect for normal super and subscripts, and formulas are rendered correctly by using markup rather than these characters.
Unicode intended to produce diagonal fractions through a different mechanism but it is very poorly supported. The fraction slash U+2044 is visually similar to the solidus, but when used with the ordinary digits (not the superscripts and subscripts) is intended to tell a layout system that a fraction such as ¾ should be rendered using automatic glyph substitution[For a general overview and technical information on glyph substitution (though not specifically for fractions)]
GSUB — Glyph Substitution Table
in th
on th
Microsoft Typography site
for the digits. Some browsers support this[Such a]
on Windows
Firefox
/ref> but not in all fonts. A selection of fonts is shown in the below table.
Superscripts and subscripts block
The most common superscript digits (1, 2, and 3) were in ISO-8859-1 and were therefore carried over into those positions in the Latin-1 range of Unicode. The rest were placed in a dedicated section of Unicode at to U+209F. The two tables below show these characters. Each superscript or subscript character is preceded by a normal ''x'' to show the subscripting/superscripting. The table on the left contains the actual Unicode characters; the one on the right contains the equivalents using HTML
The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScri ...
markup for the subscript or superscript.
Other superscript and subscript characters
Unicode version 15.0 also includes subscript and superscript characters that are intended for semantic usage, in the following blocks:
;Superscript
* The Latin-1 Supplement block contains the feminine and masculine ordinal indicators ª and º.
* The Latin Extended-C block contains one additional superscript, â±½.
* The Latin Extended-D block contains six superscripts: ê° êŸ² ꟳ ꟴ ꟸ ꟹ.
* The Latin Extended-E block contains five superscripts: êœ ê êž êŸ ê©.
* The Latin Extended-F block is entirely superscript IPA letters
The International Phonetic Alphabet (IPA) is an alphabetic system of phonetic notation based primarily on the Latin script. It was devised by the International Phonetic Association in the late 19th century as a standardized representation ...
: ðž 𞂠𞃠𞄠𞅠𞇠𞈠𞉠𞊠𞋠𞌠ðž 𞎠ðž ðž 𞑠𞒠𞓠𞔠𞕠𞖠𞗠𞘠𞙠𞚠𞛠𞜠ðž 𞞠𞟠ðž 𞡠𞢠𞣠𞤠𞥠𞦠𞧠𞨠𞩠𞪠𞫠𞬠ðž 𞮠𞯠𞰠𞲠𞳠𞴠𞵠𞶠𞷠𞸠𞹠ðžº.
* The Spacing Modifier Letters block has superscripted letters and symbols used for phonetic transcription: ʰ ʱ ʲ ʳ Ê´ ʵ ʶ Ê· ʸ Ë€ Ë Ë Ë¡ Ë¢ Ë£ ˤ.
* The Phonetic Extensions block has several superscripted letters and symbols: Latin/IPA ᴬ ᴠᴮ ᴯ ᴰ ᴱ ᴲ ᴳ ᴴ ᴵ ᴶ ᴷ ᴸ ᴹ ᴺ ᴻ ᴼ ᴽ ᴾ ᴿ ᵀ ᵠᵂ ᵃ ᵄ ᵅ ᵆ ᵇ ᵈ ᵉ ᵊ ᵋ ᵌ ᵠᵠᵠᵑ ᵒ ᵓ ᵖ ᵗ ᵘ ᵚ ᵛ, Greek ᵠᵞ ᵟ ᵠ, Cyrillic ᵸ, other ᵎ ᵔ ᵕ ᵙ ᵜ. These are intended to indicate secondary articulation
In phonetics, secondary articulation occurs when the articulation of a consonant is equivalent to the combined articulations of two or three simpler consonants, at least one of which is an approximant. The secondary articulation of such co-artic ...
.
* The Phonetic Extensions Supplement block has several more: Latin/IPA ᶛ ᶜ ᶠᶞ ᶟ ᶠᶡ ᶢ ᶣ ᶤ ᶥ ᶦ ᶧ ᶨ ᶩ ᶪ ᶫ ᶬ ᶠᶮ ᶯ ᶰ ᶱ ᶲ ᶳ ᶴ ᶵ ᶶ ᶷ ᶸ ᶹ ᶺ ᶻ ᶼ ᶽ ᶾ, Greek ᶿ.
* The IPA Extensions block has several Latin superscripts: 𞄠𞒠𞖠𞪠ðž².
* The Cyrillic Extended-B
Cyrillic Extended-B is a Unicode block
A Unicode block is one of several contiguous ranges of numeric character codes ( code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation pu ...
block contains two Cyrillic superscripts: êšœ êš.
* The Cyrillic Extended-D block contains many Cyrillic superscripts: 𞀰 𞀱 𞀲 𞀳 𞀷 𞀵 𞀶 𞀷 𞀸 𞀹 𞀺 𞀻 𞀼 𞀽 𞀾 𞀿 𞀠ðž 𞂠𞃠𞄠𞅠𞆠𞇠𞈠𞉠𞊠𞋠𞌠ðž 𞎠ðž ðž 𞫠𞬠ðž.
* The Georgian block contains one superscripted Mkhedruli letter: ჼ.
* The Kanbun block has superscripted annotation characters used in Japanese copies of Classical Chinese
Classical Chinese, also known as Literary Chinese (夿–‡ ''gÇ”wén'' "ancient text", or 文言 ''wényán'' "text speak", meaning
"literary language/speech"; modern vernacular: 文言文 ''wényánwén'' "text speak text", meaning
"literar ...
texts: ㆒ ㆓ ㆔ ㆕ ㆖ ㆗ ㆘ ㆙ ㆚ ㆛ ㆜ ㆠ㆞ ㆟.
* The Tifinagh block has one superscript letter : ⵯ.
* The Unified Canadian Aboriginal Syllabics
Canadian syllabic writing, or simply syllabics, is a family of writing systems used in a number of Indigenous Canadian languages of the Algonquian, Inuit, and (formerly) Athabaskan language families. These languages had no formal writing sy ...
and its Extended
Extension, extend or extended may refer to:
Mathematics
Logic or set theory
* Axiom of extensionality
* Extensible cardinal
* Extension (model theory)
* Extension (predicate logic), the set of tuples of values that satisfy the predicate
* Exte ...
blocks contain several mostly consonant
In articulatory phonetics, a consonant is a speech sound that is articulated with complete or partial closure of the vocal tract. Examples are and pronounced with the lips; and pronounced with the front of the tongue; and pronounced ...
-only letters to indicate syllable coda
A syllable is a unit of organization for a sequence of Phone (phonetics), speech sounds typically made up of a syllable nucleus (most often a vowel) with optional initial and final margins (typically, consonants). Syllables are often considered t ...
called Finals, along with some characters that indicate syllable medial
A syllable is a unit of organization for a sequence of speech sounds typically made up of a syllable nucleus (most often a vowel) with optional initial and final margins (typically, consonants). Syllables are often considered the phonological "bu ...
known as Medials: Main block ᜠá ហ០á ᡠᢠᣠᤠᥠᦠ᧠ᨠ᩠᪠ᑉ ᑊ á‘‹ á’ƒ á’„ á’¡ á’¢ á’» á’¼ á’½ á’¾ á“ á“‘ á“’ ᓪ á“« á”… ᔆ ᔇ ᔈ ᔉ ᔊ ᔋ ᔥ ᔾ ᔿ á•€ á• á• á•‘ ᕠᕪ á•» ᕯ ᕽ á–… á–• á–– á–Ÿ á–¦ á–® á—® ᘠᙆ ᙇ ᙚ ᙾ ᙿ; Extended block: ᣔ ᣕ ᣖ ᣗ ᣘ ᣙ ᣚ ᣛ ᣜ ᣠᣞ ᣟ ᣳ ᣴ ᣵ.
;Combining superscript
* The Combining Diacritical Marks block contains medieval superscript letter diacritics. These letters are written directly above other letters appearing in medieval Germanic manuscripts, and so these glyphs do not include spacing, for example uͤ. They are shown here over the dotted circle placeholder â—Œ: ◌ͣ ◌ͤ ◌ͥ ◌ͦ ◌ͧ ◌ͨ ◌ͩ ◌ͪ ◌ͫ ◌ͬ â—ŒÍ â—ŒÍ® ◌ͯ.
* The Combining Diacritical Marks Extended block contains two combining letters for linguistic transcriptions of Scots
Scots usually refers to something of, from, or related to Scotland, including:
* Scots language, a language of the West Germanic language family native to Scotland
* Scots people, a nation and ethnic group native to Scotland
* Scoti, a Latin na ...
(◌ᪿ ◌ᫀ) and three combining insular letters for the Middle English Ormulum (◌ᫌ ◌᫠◌ᫎ).
* The Combining Diacritical Marks Supplement block contains additional medieval superscript letter diacritics, enough to complete the basic lowercase Latin alphabet except for j, q and y, a few small capitals and ligatures (ae, ao, av), and additional letters: ◌᷒ ◌ᷓ ◌ᷔ ◌ᷕ ◌ᷖ ◌ᷗ ◌ᷘ ◌ᷙ ◌ᷚ ◌ᷛ ◌ᷜ ◌ᷠ◌ᷞ ◌ᷟ ◌ᷠ◌ᷡ ◌ᷢ ◌ᷣ ◌ᷤ ◌ᷥ ◌ᷦ ◌ᷧ ◌ᷨ ◌ᷪ ◌ᷫ ◌ᷬ ◌ᷠ◌ᷮ ◌ᷯ ◌ᷰ ◌ᷱ ◌ᷲ ◌ᷳ ◌ᷴ, Greek ◌ᷩ.
* The Cyrillic Extended-A
Cyrillic Extended-A is a Unicode block containing combining Cyrillic
, bg, кирилица , mk, кирилица , russian: кириллица , sr, ћирилица, uk, кирилицÑ
, fam1 = Egyptian hieroglyphs
, fam2 ...
and -B blocks contains multiple medieval superscript letter diacritics, enough to complete the basic lowercase Cyrillic alphabet used in Church Slavonic texts, also includes an additional ligature (ÑÑ‚): ◌ⷠ◌ⷡ ◌ⷢ ◌ⷣ ◌ⷤ ◌ⷥ ◌ⷦ ◌ⷧ ◌ⷨ ◌ⷩ ◌ⷪ ◌ⷫ ◌ⷬ ◌ⷠ◌ⷮ ◌ⷯ ◌ⷰ ◌ⷱ ◌ⷲ ◌ⷳ ◌ⷴ ◌ⷵ ◌ⷶ ◌ⷷ ◌ⷸ ◌ⷹ ◌ⷺ ◌ⷻ ◌ⷼ ◌ⷽ ◌ⷾ ◌ⷿ ◌ꙴ ◌ꙵ ◌ꙶ ◌ꙷ ◌ꙸ ◌ꙹ ◌ꙺ ◌ꙻ ◌ꚞ ◌ꚟ.
;Subscript
* The Latin Extended-C block contains one additional subscript, â±¼.
* The Phonetic Extensions block has several subscripted letters and symbols: Latin/IPA ᵢ ᵣ ᵤ ᵥ and Greek ᵦ ᵧ ᵨ ᵩ ᵪ.
* The Cyrillic Extended-D block also contains many Cyrillic subscripts: 𞑠𞒠𞓠𞔠𞕠𞖠𞗠𞘠𞙠𞚠𞛠𞜠ðž 𞞠𞟠ðž 𞡠𞢠𞣠𞤠𞥠𞦠𞧠𞨠𞩠ðžª.
;Combining subscript
* The Combining Diacritical Marks Supplement block contains a combining subscript: ◌᷊.
Latin, Greek and Cyrillic tables
Consolidated, the Unicode standard contains superscript and subscript versions of a subset of Latin, Greek and Cyrillic letters. Here they are arranged in alphabetical order for comparison (or for copy and paste convenience). Since these characters appear in different Unicode ranges, they may not appear to be the same size or position due to font substitution in the browser. Shaded cells mark small capitals that are not very distinct from minuscules, and Greek letters that are indistinguishable from Latin, and so would not be expected to be supported by Unicode.
Many of these characters were added to Unicode 15, in the Cyrillic Extended-D block, and published in 2022.
See also small caps in Unicode.
Composite characters
Primarily for compatibility with earlier character sets, Unicode contains a number of characters that compose super- and subscripts with other symbols. In most fonts these render much better than attempts to construct these symbols from the above characters or by using markup.
* The Latin-1 Supplement block contains the precomposed fractions ½, ¼, and ¾. The copyright
A copyright is a type of intellectual property that gives its owner the exclusive right to copy, distribute, adapt, display, and perform a creative work, usually for a limited time. The creative work may be in a literary, artistic, education ...
© and registered trademark sign
The registered trademark symbol, , is a typographic symbol that provides notice that the preceding word or symbol is a trademark or service mark that has been registered with a national trademark office. A trademark is a symbol, word, or word ...
s ® are also in this block.
* The General Punctuation block contains the permille sign ‰ and the per-ten-thousand sign ‱, and Basic Latin has the percent sign %.
* The Number Forms
Number Forms is a Unicode block containing Unicode compatibility characters that have specific meaning as numbers, but are constructed from other characters. They consist primarily of vulgar fractions and Roman numerals. In addition to the cha ...
block contains several precomposed fractions: â… â…‘ â…’ â…“ â…” â…• â…– â…— â…˜ â…™ â…š â…› â…œ â… â…ž â…Ÿ ↉.
* The Letterlike Symbols block contains a few symbols composed of subscript and superscript characters: â„€ â„ â„… ℆ â„– â„ â„¢ â….
* The Enclosed Alphanumeric Supplement block contains three superscript abbreviations 🅪 🅫 🅬: MC for (trademark
A trademark (also written trade mark or trade-mark) is a type of intellectual property consisting of a recognizable sign, design, or expression that identifies products or services from a particular source and distinguishes them from oth ...
), MD for ( registered trademark), both used in Canada; MR for (registered trademark) in Spanish and Portuguese speaking countries.
* The Miscellaneous Technical block has one additional subscript, a subscript 10 (â¨), for the purpose of scientific notation.
* The Unified Canadian Aboriginal Syllabics
Canadian syllabic writing, or simply syllabics, is a family of writing systems used in a number of Indigenous Canadian languages of the Algonquian, Inuit, and (formerly) Athabaskan language families. These languages had no formal writing sy ...
and its Extended
Extension, extend or extended may refer to:
Mathematics
Logic or set theory
* Axiom of extensionality
* Extensible cardinal
* Extension (model theory)
* Extension (predicate logic), the set of tuples of values that satisfy the predicate
* Exte ...
blocks contain several letters composed with superscripted letters to indicate extended sound values: Main block Ⴀ᫠ᬠá á® á° á‘ á‘§ ᑨ á‘© ᑪ ᑬ á’… á’† á’‡ á’ˆ á’Š á’¤ á“ á“” á“® ᔌ ᔠᔎ á” á”§ á•… á•” á•¿ á–€ á– á–‚ á–ƒ á–„ á–Ž á– á– á–‘ á–’ á–“ á–” ᙯ á™° á™± ᙲ ᙳ á™´ ᙵ á™¶, Extended block ᢰ ᢱ ᢲ ᢳ ᢴ ᢵ ᢶ ᢷ ᢸ ᢹ ᢺ ᢻ ᢼ ᢽ ᢾ ᢿ ᣀ ᣠᣂ ᣃ ᣄ ᣅ.
Notes
References
{{DEFAULTSORT:Subscripts and superscripts, Unicode
Unicode