In the
Unicode
Unicode, formally The Unicode Standard,The formal version reference is is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expre ...
standard, a plane is a continuous group of 65,536 (2
16)
code point
In character encoding terminology, a code point, codepoint or code position is a numerical value that maps to a specific character. Code points usually represent a single grapheme—usually a letter, digit, punctuation mark, or whitespace—but ...
s. There are 17 planes, identified by the numbers 0 to 16, which corresponds with the possible values 00–10
16 of the first two positions in six position
hexadecimal
In mathematics and computing, the hexadecimal (also base-16 or simply hex) numeral system is a positional numeral system that represents numbers using a radix (base) of 16. Unlike the decimal system representing numbers using 10 symbols, hexa ...
format (U+''hhhhhh''). Plane 0 is the Basic Multilingual Plane (BMP), which contains most commonly used characters. The higher planes 1 through 16 are called "supplementary planes". The last code point in Unicode is the last code point in plane 16, U+10FFFF. As of Unicode version , five of the planes have assigned code points (characters), and seven are named.
The limit of 17 planes is due to
UTF-16
UTF-16 (16-bit computing, 16-bit Unicode Transformation Format) is a character encoding capable of encoding all 1,112,064 valid code points of Unicode (in fact this number of code points is dictated by the design of UTF-16). The encoding is variab ...
, which can encode 2
20 code points (16 planes) as pairs of
words
A word is a basic element of language that carries an objective or practical meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of what a word is, there is no consen ...
, plus the BMP as a single word.
UTF-8
UTF-8 is a variable-width encoding, variable-length character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode'' (or ''Universal Coded Character Set'') ''Transformation Format 8-bit'' ...
was designed with a much larger limit of 2
31 (2,147,483,648) code points (32,768 planes), and would still be able to encode 2
21 (2,097,152) code points (32 planes) even under the current limit of 4
byte
The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable unit ...
s.
The 17 planes can accommodate 1,114,112 code points. Of these, 2,048 are
surrogates (used to make the pairs in UTF-16), 66 are
non-characters, and 137,468 are
reserved for private use, leaving 974,530 for public assignment.
Planes are further subdivided into
Unicode block
A Unicode block is one of several contiguous ranges of numeric character codes (code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation purposes. Typically, proposals such as the ad ...
s, which, unlike planes, do not have a fixed size. The 327 blocks defined in Unicode cover 26% of the possible code point space, and range in size from a minimum of 16 code points (sixteen blocks) to a maximum of 65,536 code points (Supplementary Private Use Area-A and -B, which constitute the entirety of planes 15 and 16). For future usage, ranges of characters have been tentatively mapped out for most known current and ancient writing systems.
Overview
Assigned characters
Basic Multilingual Plane
The first plane, plane 0, the Basic Multilingual Plane (BMP) contains characters for almost all modern languages, and a large number of
symbols
A symbol is a mark, sign, or word that indicates, signifies, or is understood as representing an idea, object, or relationship. Symbols allow people to go beyond what is known or seen by creating linkages between otherwise very different conc ...
. A primary objective for the BMP is to support the unification of prior character sets as well as characters for
writing
Writing is a medium of human communication which involves the representation of a language through a system of physically Epigraphy, inscribed, Printing press, mechanically transferred, or Word processor, digitally represented Symbols (semiot ...
. Most of the assigned code points in the BMP are used to encode Chinese, Japanese, and Korean (
CJK) characters.
The High Surrogate (
U+D800–U+DBFF) and Low Surrogate (
U+DC00–U+DFFF) codes are reserved for
encoding non-BMP characters in UTF-16 by using a ''pair'' of 16-
bit
The bit is the most basic unit of information in computing and digital communications. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represente ...
codes: one High Surrogate and one Low Surrogate. A single surrogate code point will never be assigned a character.
65,520 of the 65,536 code points in this plane have been allocated to a
Unicode block
A Unicode block is one of several contiguous ranges of numeric character codes (code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation purposes. Typically, proposals such as the ad ...
, leaving just 16 code points in a single unallocated range (2FE0..2FEF).
, the BMP comprises the following 164 blocks:
*
Basic Latin (Lower half of
ISO/IEC 8859-1
ISO/IEC 8859-1:1998, ''Information technology — 8-bit single-byte coded graphic character sets — Part 1: Latin alphabet No. 1'', is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in ...
:
ISO/IEC 646:1991-IRV aka
ASCII
ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of ...
) (0000–007F)
*
Latin-1 Supplement
The Latin-1 Supplement (also called C1 Controls and Latin-1 Supplement) is the second Unicode block in the Unicode standard. It encodes the upper range of ISO 8859-1: 80 (U+0080) - FF (U+00FF). C1 Controls (0080–009F) are not graphic. Thi ...
(Upper half of
ISO/IEC 8859-1
ISO/IEC 8859-1:1998, ''Information technology — 8-bit single-byte coded graphic character sets — Part 1: Latin alphabet No. 1'', is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in ...
) (0080–00FF)
*
Latin Extended-A
Latin Extended-A is a Unicode block and is the third block of the Unicode standard. It encodes Latin letters from the Latin ISO character sets other than Latin-1 (which is already encoded in the Latin-1 Supplement block) and also legacy character ...
(0100–017F)
*
Latin Extended-B
Latin Extended-B is the fourth block (0180-024F) of the Unicode Standard. It has been included since version 1.0, where it was only allocated to the code points 0180-01FF and contained 113 characters. During unification with ISO 10646 for version ...
(0180–024F)
*
IPA Extensions
IPA Extensions is a block (U+0250–U+02AF) of the Unicode standard that contains full size letters used in the International Phonetic Alphabet (IPA). Both modern and historical characters are included, as well as former and proposed IPA signs ...
(0250–02AF)
*
Spacing Modifier Letters
Spacing Modifier Letters is a Unicode block containing characters for the IPA, UPA, and other phonetic transcriptions. Included are the IPA tone marks, and modifiers for aspiration and palatalization. The word ''spacing'' indicates that these ...
(02B0–02FF)
*
Combining Diacritical Marks
Combining Diacritical Marks is a Unicode block containing the most common combining characters. It also contains the character "Combining Grapheme Joiner", which prevents canonical reordering of combining characters, and despite the name, actual ...
(0300–036F)
*
Greek and Coptic
Greek and Coptic is the Unicode block for representing modern (monotonic) Greek. It was originally used for writing Coptic, using the similar Greek letters, in addition to the uniquely Coptic additions. Beginning with version 4.1 of the Unicode ...
(0370–03FF)
*
Cyrillic
, bg, кирилица , mk, кирилица , russian: кириллица , sr, ћирилица, uk, кирилиця
, fam1 = Egyptian hieroglyphs
, fam2 = Proto-Sinaitic
, fam3 = Phoenician
, fam4 = G ...
(0400–04FF)
*
Cyrillic Supplement
Cyrillic Supplement is a Unicode block containing Cyrillic letters for writing several minority languages, including Abkhaz, Kurdish, Komi, Mordvin, Aleut
The Aleuts ( ; russian: Алеуты, Aleuty) are the indigenous people of the ...
(0500–052F)
*
Armenian
Armenian may refer to:
* Something of, from, or related to Armenia, a country in the South Caucasus region of Eurasia
* Armenians, the national people of Armenia, or people of Armenian descent
** Armenian Diaspora, Armenian communities across the ...
(0530–058F)
*
Aramaic
The Aramaic languages, short Aramaic ( syc, ܐܪܡܝܐ, Arāmāyā; oar, 𐤀𐤓𐤌𐤉𐤀; arc, 𐡀𐡓𐡌𐡉𐡀; tmr, אֲרָמִית), are a language family containing many varieties (languages and dialects) that originated in ...
Scripts:
**
Hebrew
Hebrew (; ; ) is a Northwest Semitic language of the Afroasiatic language family. Historically, it is one of the spoken languages of the Israelites and their longest-surviving descendants, the Jews and Samaritans. It was largely preserved ...
(0590–05FF)
**
Arabic
Arabic (, ' ; , ' or ) is a Semitic languages, Semitic language spoken primarily across the Arab world.Semitic languages: an international handbook / edited by Stefan Weninger; in collaboration with Geoffrey Khan, Michael P. Streck, Janet C ...
(0600–06FF)
**
Syriac Syriac may refer to:
*Syriac language, an ancient dialect of Middle Aramaic
*Sureth, one of the modern dialects of Syriac spoken in the Nineveh Plains region
* Syriac alphabet
** Syriac (Unicode block)
** Syriac Supplement
* Neo-Aramaic languages a ...
(0700–074F)
**
Arabic Supplement
Arabic Supplement is a Unicode block that encodes Arabic
Arabic (, ' ; , ' or ) is a Semitic language spoken primarily across the Arab world.Semitic languages: an international handbook / edited by Stefan Weninger; in collaboration with ...
(0750–077F)
**
Thaana
Thaana, Taana or Tāna ( ) is the present writing system of the Maldivian language spoken in the Maldives. Thaana has characteristics of both an abugida (diacritic, vowel-killer strokes) and a true alphabet (all vowels are written), ...
(0780–07BF)
**
N'Ko
N'Ko () is a script devised by Solomana Kante in 1949, as a modern writing system for the Mandé languages of West Africa. The term ''N'Ko'', which means ''I say'' in all Mandé languages, is also used for the Mandé literary standard written i ...
(07C0–07FF)
**
Samaritan (0800–083F)
**
Mandaic Mandaic may refer to:
* Mandaic language
* Mandaic alphabet
** Mandaic (Unicode block)
Mandaic is a Unicode block containing characters of the Mandaic script used for writing the historic Eastern Aramaic, also called Classical Mandaic, and the m ...
(0840–085F)
**
Syriac Supplement
Syriac Supplement is a Unicode block containing supplementary Syriac Syriac may refer to:
*Syriac language, an ancient dialect of Middle Aramaic
*Sureth, one of the modern dialects of Syriac spoken in the Nineveh Plains region
* Syriac alphabet
...
(0860–086F)
**
Arabic Extended-B
Arabic Extended-B is a Unicode block
A Unicode block is one of several contiguous ranges of numeric character codes ( code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation purpo ...
(0870–089F)
**
Arabic Extended-A
Arabic Extended-A is a Unicode block encoding Qur'anic
The Quran (, ; Standard Arabic: , Quranic Arabic: , , 'the recitation'), also romanized Qur'an or Koran, is the central religious text of Islam, believed by Muslims to be a revelation ...
(08A0–08FF)
*
Brahmic
The Brahmic scripts, also known as Indic scripts, are a family of abugida writing systems. They are used throughout the Indian subcontinent, Southeast Asia and parts of East Asia. They are descended from the Brahmi script of ancient India ...
scripts:
**
Devanagari
Devanagari ( ; , , Sanskrit pronunciation: ), also called Nagari (),Kathleen Kuiper (2010), The Culture of India, New York: The Rosen Publishing Group, , page 83 is a left-to-right abugida (a type of segmental Writing systems#Segmental syste ...
(0900–097F)
**
Bengali
Bengali or Bengalee, or Bengalese may refer to:
*something of, from, or related to Bengal, a large region in South Asia
* Bengalis, an ethnic and linguistic group of the region
* Bengali language, the language they speak
** Bengali alphabet, the w ...
(0980–09FF)
**
Gurmukhi
Gurmukhī ( pa, ਗੁਰਮੁਖੀ, , Shahmukhi: ) is an abugida developed from the Laṇḍā scripts, standardized and used by the second Sikh guru, Guru Angad (1504–1552). It is used by Punjabi Sikhs to write the language, commonly r ...
(0A00–0A7F)
**
Gujarati
Gujarati may refer to:
* something of, from, or related to Gujarat, a state of India
* Gujarati people, the major ethnic group of Gujarat
* Gujarati language, the Indo-Aryan language spoken by them
* Gujarati languages, the Western Indo-Aryan sub ...
(0A80–0AFF)
**
Oriya (0B00–0B7F)
**
Tamil
Tamil may refer to:
* Tamils, an ethnic group native to India and some other parts of Asia
**Sri Lankan Tamils, Tamil people native to Sri Lanka also called ilankai tamils
**Tamil Malaysians, Tamil people native to Malaysia
* Tamil language, nativ ...
(0B80–0BFF)
**
Telugu
Telugu may refer to:
* Telugu language, a major Dravidian language of India
*Telugu people, an ethno-linguistic group of India
* Telugu script, used to write the Telugu language
** Telugu (Unicode block), a block of Telugu characters in Unicode
S ...
(0C00–0C7F)
**
Kannada
Kannada (; ಕನ್ನಡ, ), originally romanised Canarese, is a Dravidian language spoken predominantly by the people of Karnataka in southwestern India, with minorities in all neighbouring states. It has around 47 million native s ...
(0C80–0CFF)
**
Malayalam
Malayalam (; , ) is a Dravidian language spoken in the Indian state of Kerala and the union territories of Lakshadweep and Puducherry (Mahé district) by the Malayali people. It is one of 22 scheduled languages of India. Malayalam was des ...
(0D00–0D7F)
**
Sinhala (0D80–0DFF)
**
Thai
Thai or THAI may refer to:
* Of or from Thailand, a country in Southeast Asia
** Thai people, the dominant ethnic group of Thailand
** Thai language, a Tai-Kadai language spoken mainly in and around Thailand
*** Thai script
*** Thai (Unicode block ...
(0E00–0E7F)
**
Lao (0E80–0EFF)
**
Tibetan
Tibetan may mean:
* of, from, or related to Tibet
* Tibetan people, an ethnic group
* Tibetan language:
** Classical Tibetan, the classical language used also as a contemporary written standard
** Standard Tibetan, the most widely used spoken dial ...
(0F00–0FFF)
**
Myanmar
Myanmar, ; UK pronunciations: US pronunciations incl. . Note: Wikipedia's IPA conventions require indicating /r/ even in British English although only some British English speakers pronounce r at the end of syllables. As John C. Wells, Joh ...
(1000–109F)
*
Georgian
Georgian may refer to:
Common meanings
* Anything related to, or originating from Georgia (country)
** Georgians, an indigenous Caucasian ethnic group
** Georgian language, a Kartvelian language spoken by Georgians
**Georgian scripts, three scrip ...
(10A0–10FF)
*
Hangul Jamo
This is the list of Hangul ''jamo'' (Korean alphabet letters which represent consonants and vowels in Korean) including obsolete ones. This list contains Unicode code points.
In the lists below,
* code points in were added in Unicode 5.2. (1100–11FF)
*
Ethiopic (1200–137F)
*
Ethiopic Supplement (1380–139F)
*
Cherokee
The Cherokee (; chr, ᎠᏂᏴᏫᏯᎢ, translit=Aniyvwiyaʔi or Anigiduwagi, or chr, ᏣᎳᎩ, links=no, translit=Tsalagi) are one of the indigenous peoples of the Southeastern Woodlands of the United States. Prior to the 18th century, t ...
(13A0–13FF)
*
Unified Canadian Aboriginal Syllabics (1400–167F)
*
Ogham
Ogham (Modern Irish: ; mga, ogum, ogom, later mga, ogam, label=none ) is an Early Medieval alphabet used primarily to write the early Irish language (in the "orthodox" inscriptions, 4th to 6th centuries AD), and later the Old Irish langua ...
(1680–169F)
*
Runic
Runes are the letters in a set of related alphabets known as runic alphabets native to the Germanic peoples. Runes were used to write various Germanic languages (with some exceptions) before they adopted the Latin alphabet, and for specialised ...
(16A0–16FF)
*
Philippine
The Philippines (; fil, Pilipinas, links=no), officially the Republic of the Philippines ( fil, Republika ng Pilipinas, links=no),
* bik, Republika kan Filipinas
* ceb, Republika sa Pilipinas
* cbk, República de Filipinas
* hil, Republ ...
scripts:
**
Tagalog (1700–171F)
**
Hanunoo (1720–173F)
**
Buhid (1740–175F)
**
Tagbanwa (1760–177F)
*
Khmer (1780–17FF)
*
Mongolian (1800–18AF)
*
Unified Canadian Aboriginal Syllabics Extended
Unified Canadian Aboriginal Syllabics Extended is a Unicode block containing extensions to the Canadian syllabics contained in the Unified Canadian Aboriginal Syllabics Unicode block for some dialects of Cree, Ojibwe, Dene
The Dene people ...
(18B0–18FF)
*
Brahmic
The Brahmic scripts, also known as Indic scripts, are a family of abugida writing systems. They are used throughout the Indian subcontinent, Southeast Asia and parts of East Asia. They are descended from the Brahmi script of ancient India ...
scripts:
**
Limbu (1900–194F)
*
Tai
Tai or TAI may refer to:
Arts and entertainment
*Tai (comics) a fictional Marvel Comics supervillain
*Tai Fraiser, a fictional character in the 1995 film ''Clueless''
*Tai Kamiya, a fictional character in ''Digimon''
Businesses and organisations ...
scripts:
**
Tai Le (1950–197F)
**
New Tai Lue (1980–19DF)
**
Khmer Symbols
Khmer Symbols is a Unicode block
A Unicode block is one of several contiguous ranges of numeric character codes ( code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation purpose ...
(19E0–19FF)
**
Buginese (1A00–1A1F)
**
Tai Tham
Tai Tham script ('' Tham'' meaning "scripture") is the name given to an abugida writing system used mainly for a group of Southwestern Tai languages i.e., Northern Thai, Tai Lü, Khün and Lao; as well as the liturgical languages of Buddhism ...
(1A20–1AAF)
*
Combining Diacritical Marks Extended
Combining Diacritical Marks Extended is a Unicode block containing diacritical marks used in German dialectology (Teuthonista
Teuthonista is a phonetic transcription system used predominantly for the transcription of (High) German dialects. I ...
(1AB0–1AFF)
*
Indonesian
Indonesian is anything of, from, or related to Indonesia, an archipelagic country in Southeast Asia. It may refer to:
* Indonesians, citizens of Indonesia
** Native Indonesians, diverse groups of local inhabitants of the archipelago
** Indonesian ...
scripts:
**
Balinese (1B00–1B7F)
**
Sundanese
Sundanese may refer to:
* Sundanese people
* Sundanese language
* Sundanese script
Standard Sundanese script (''Aksara Sunda Baku'', ) is a writing system which is used by the Sundanese people. It is built based on Old Sundanese script (' ...
(1B80–1BBF)
**
Batak
Batak is a collective term used to identify a number of closely related Austronesian ethnic groups predominantly found in North Sumatra, Indonesia, who speak Batak languages. The term is used to include the Karo, Pakpak, Simalungun, Toba, ...
(1BC0–1BFF)
*
Lepcha (1C00–1C4F)
*
Ol Chiki
The Ol Chiki () script, also known as Ol Chemetʼ (Santali: ''ol'' 'writing', ''chemet'' 'learning'), Ol Ciki, Ol, and sometimes as the Santali alphabet invented by Pandit Raghunath Murmu in the year 1925, is the official writing system for San ...
(1C50–1C7F)
*
Cyrillic Extended-C (1C80–1C8F)
*
Georgian Extended
Georgian Extended is a Unicode block containing Georgian ''Mtavruli'' ( ka, მთავრული, "title" or "heading") letters that function as uppercase versions of their ''Mkhedruli
The Georgian scripts are the three writing systems us ...
(1C90–1CBF)
*
Sundanese Supplement
Sundanese Supplement is a Unicode block
A Unicode block is one of several contiguous ranges of numeric character codes ( code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation p ...
(1CC0–1CCF)
*
Vedic Extensions
Vedic Extensions is a Unicode block containing characters for representing tones and other vedic symbols in Devanagari and other Indic scripts. Related symbols (also used in many scripts to represent vedic accents) are defined in two other blocks ...
(1CD0–1CFF)
* Latin supplements:
**
Phonetic Extensions
Phonetic Extensions is a Unicode block containing phonetic characters used in the Uralic Phonetic Alphabet, Old Irish phonetic notation, the Oxford English dictionary and American dictionaries, and Americanist and Russianist phonetic notations. ...
(1D00–1D7F)
**
Phonetic Extensions Supplement
Phonetic Extensions Supplement is a Unicode block containing characters for specialized and deprecated forms of the International Phonetic Alphabet.
Block
History
The following Unicode-related documents record the purpose and process of defini ...
(1D80–1DBF)
**
Combining Diacritical Marks Supplement
Combining Diacritical Marks Supplement is a Unicode block containing combining characters for the Uralic Phonetic Alphabet, Medievalist notations, and German dialectology (Teuthonista
Teuthonista is a phonetic transcription system used predomin ...
(1DC0–1DFF)
**
Latin Extended Additional
Latin Extended Additional is a Unicode block.
The characters in this block are mostly precomposed combinations of Latin letters with one or more general diacritical marks. Ninety of the characters are used in the Vietnamese alphabet
The Vietna ...
(1E00–1EFF)
*
Greek Extended
Greek Extended is a Unicode block containing the accented vowels necessary for writing polytonic Greek. The regular, unaccented Greek characters as well as the characters with tonos and diaeresis can be found in the Greek and Coptic block. Gre ...
(1F00–1FFF)
*
Symbols
A symbol is a mark, sign, or word that indicates, signifies, or is understood as representing an idea, object, or relationship. Symbols allow people to go beyond what is known or seen by creating linkages between otherwise very different conc ...
:
**
General Punctuation
General Punctuation is a Unicode block containing punctuation, spacing, and formatting characters for use with all scripts and writing systems. Included are the defined-width spaces, joining formats, directional formats, smart quotes, archaic an ...
(2000–206F)
**
Superscripts and Subscripts
Superscripts and Subscripts is a Unicode block containing superscript and subscript numerals, mathematical operators, and letters used in mathematics and phonetics. The use of subscripts and superscripts in Unicode allows any polynomial, chemic ...
(2070–209F)
**
Currency Symbols
A currency symbol or currency sign is a graphic symbol used to denote a currency unit. Usually it is defined by the monetary authority, like the national central bank for the currency concerned.
In formatting, the symbol can use various format ...
(20A0–20CF)
**
Combining Diacritical Marks for Symbols
Combining may refer to:
* Combine harvester use in agriculture
* Combining capacity, in chemistry
* Combining character, in digital typography
* Combining form, in linguistics
* Combining grapheme joiner, Unicode character that has no visible gly ...
(20D0–20FF)
**
Letterlike Symbols
Letterlike Symbols is a Unicode block containing 80 characters which are constructed mainly from the glyphs of one or more letters. In addition to this block, Unicode includes full styled mathematical alphabets, although Unicode does not expli ...
(2100–214F)
**
Number Forms (2150–218F)
**
Arrows (2190–21FF)
**
Mathematical Operators
Mathematical Operators is a Unicode block containing characters for mathematical, logical, and set notation.
Notably absent are the plus sign (+), greater than sign (>) and less than sign (<), due to them already appearing in the Basi ...
(2200–22FF)
**
Miscellaneous Technical
Miscellaneous Technical is a Unicode block ranging from U+2300 to U+23FF, which contains various common symbols which are related to and used in the various technical, programming language, and academic professions. For example:
* Symbol ⌂ (H ...
(2300–23FF)
**
Control Pictures
Control Pictures is a Unicode block containing characters for graphically representing the C0 control codes, and other control characters. Its block name in Unicode 1.0 was Pictures for Control Codes.
Block
History
The following Unicode-rela ...
(2400–243F)
**
Optical Character Recognition
Optical character recognition or optical character reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scen ...
(2440–245F)
**
Enclosed Alphanumerics
Enclosed Alphanumerics is a Unicode block of Typography, typographical symbols of an alphanumeric within a circle, a bracket or other not-closed enclosure, or ending in a full stop.
It is currently fully allocated. Within the Basic Multili ...
(2460–24FF)
**
Box Drawing (2500–257F)
**
Block Elements
Block Elements is a Unicode block containing square block symbols of various fill and shading. Used along with block elements are box-drawing characters, shade characters, and terminal graphic characters. These can be used for filling regions of th ...
(2580–259F)
**
Geometric Shapes
Geometric Shapes is a Unicode block of 96 symbols at code point range U+25A0–25FF.
U+25A0–U+25CF
The BLACK CIRCLE is displayed when typing in a password field, in order to hide characters from a screen recorder or shoulder surfing.
U+2 ...
(25A0–25FF)
**
Miscellaneous Symbols
Miscellaneous Symbols is a Unicode block (U+2600–U+26FF) containing glyphs representing concepts from a variety of categories: astrological, astronomical, chess, dice, musical notation, political symbols, recycling, religious symbols, Bagua, ...
(2600–26FF)
**
Dingbats (2700–27BF)
**
Miscellaneous Mathematical Symbols-A
Miscellaneous Mathematical Symbols-A is a Unicode block
A Unicode block is one of several contiguous ranges of numeric character codes ( code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and ...
(27C0–27EF)
**
Supplemental Arrows-A (27F0–27FF)
**
Braille Patterns
The Unicode block Braille Patterns (U+2800..U+28FF) contains all 256 possible patterns of an 8-dot braille cell, thereby including the complete 6-dot cell range. (2800–28FF)
**
Supplemental Arrows-B (2900–297F)
**
Miscellaneous Mathematical Symbols-B
Miscellaneous Mathematical Symbols-B is a Unicode block containing miscellaneous mathematical symbols, including brackets, angles, and circle symbols.
Block
Some of these symbols are used in Z notation. Specifically
*
*
*
*
*
*
The last two ...
(2980–29FF)
**
Supplemental Mathematical Operators
Supplemental Mathematical Operators is a Unicode block containing various mathematical symbols, including N-ary operators, summations and integrals, intersections and unions, logical and relational operators, and subset/superset relations.
Block
...
(2A00–2AFF)
**
Miscellaneous Symbols and Arrows
Miscellaneous Symbols and Arrows is a Unicode block
A Unicode block is one of several contiguous ranges of numeric character codes ( code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and ...
(2B00–2BFF)
*
Glagolitic
The Glagolitic script (, , ''glagolitsa'') is the oldest known Slavic alphabet. It is generally agreed to have been created in the 9th century by Saint Cyril, a monk from Thessalonica. He and his brother Saint Methodius were sent by the Byzan ...
(2C00–2C5F)
*
Latin Extended-C
Latin Extended-C is a Unicode block containing Latin characters for Uighur New Script, the Uralic Phonetic Alphabet, Shona, Claudian Latin and the Swedish Dialect Alphabet.
Block
History
The following Unicode-related documents record the pur ...
(2C60–2C7F)
*
Coptic
Coptic may refer to:
Afro-Asia
* Copts, an ethnoreligious group mainly in the area of modern Egypt but also in Sudan and Libya
* Coptic language, a Northern Afro-Asiatic language spoken in Egypt until at least the 17th century
* Coptic alphabet ...
(2C80–2CFF)
*
Georgian Supplement
Georgian Supplement is a Unicode block containing characters for the ecclesiastical form of the Georgian script, Nuskhuri ( ka, ნუსხური). To write the full ecclesiastical Khutsuri orthography, the Asomtavruli capitals encoded in the ...
(2D00–2D2F)
*
Tifinagh
Tifinagh ( Tuareg Berber language: or , ) is a script used to write the Berber languages. Tifinagh is descended from the ancient Libyco-Berber alphabet. The traditional Tifinagh, sometimes called Tuareg Tifinagh, is still favored by the Tuare ...
(2D30–2D7F)
*
Ethiopic Extended (2D80–2DDF)
*
Cyrillic Extended-A (2DE0–2DFF)
*
Supplemental Punctuation (2E00–2E7F)
*
CJK scripts and symbols:
**
CJK Radicals Supplement
CJK Radicals Supplement is a Unicode block containing alternative, often positional, forms of the Kangxi radicals
The 214 Kangxi radicals (), also known as the Zihui radicals, form a system of radicals () of Chinese characters.
The radicals ...
(2E80–2EFF)
**
Kangxi Radicals
The 214 Kangxi radicals (), also known as the Zihui radicals, form a system of radicals () of Chinese characters.
The radicals are numbered in stroke count order. They are the most popular system of radicals for dictionaries that order Traditio ...
(2F00–2FDF)
**
Ideographic Description Characters (2FF0–2FFF)
**
CJK Symbols and Punctuation
CJK Symbols and Punctuation is a Unicode block containing symbols and punctuation used for writing the Chinese, Japanese and Korean languages. It also contains one Chinese character.
Block
The block has variation sequences defined for East ...
(3000–303F)
**
Hiragana
is a Japanese syllabary, part of the Japanese writing system, along with ''katakana'' as well as ''kanji''.
It is a phonetic lettering system. The word ''hiragana'' literally means "flowing" or "simple" kana ("simple" originally as contrast ...
(3040–309F)
**
Katakana
is a Japanese syllabary, one component of the Japanese writing system along with hiragana, kanji and in some cases the Latin script (known as rōmaji). The word ''katakana'' means "fragmentary kana", as the katakana characters are derived fr ...
(30A0–30FF)
**
Bopomofo
Bopomofo (), or Mandarin Phonetic Symbols, also named Zhuyin (), is a Chinese transliteration system for Mandarin Chinese and other related languages and dialects. More commonly used in Taiwanese Mandarin, it may also be used to transcribe ...
(3100–312F)
**
Hangul Compatibility Jamo
Hangul Compatibility Jamo is a Unicode block containing Hangul characters for compatibility with the South Korean national standard KS X 1001
KS X 1001, "''Code for Information Interchange (Hangul and Hanja)''", formerly called KS C 5601, ...
(3130–318F)
**
Kanbun
A is a form of Classical Chinese used in Japan from the Nara period to the mid-20th century. Much of Japanese literature was written in this style and it was the general writing style for official and intellectual works throughout the period. A ...
(3190–319F)
**
Bopomofo Extended (31A0–31BF)
**
CJK Strokes
CJK strokes () are the calligraphic strokes needed to write the Chinese characters in regular script used in East Asian calligraphy. CJK strokes are the classified set of line patterns that may be arranged and combined to form Chinese charact ...
(31C0–31EF)
**
Katakana Phonetic Extensions
Katakana Phonetic Extensions is a Unicode block containing additional small katakana characters for writing the Ainu language, in addition to characters in the Katakana
is a Japanese syllabary, one component of the Japanese writing system ...
(31F0–31FF)
**
Enclosed CJK Letters and Months
Enclosed CJK Letters and Months is a Unicode block containing circled and parenthesized Katakana, Hangul, and CJK ideographs. Also included in the block are miscellaneous glyphs that would more likely fit in CJK Compatibility or Enclosed Alpha ...
(3200–32FF)
**
CJK Compatibility
CJK Compatibility is a Unicode block containing square symbols (both CJK and Latin alphanumeric) encoded for compatibility with East Asian character sets. In Unicode 1.0, it was divided into two blocks, named CJK Squared Words (U+3300–U+337F) ...
(3300–33FF)
**
CJK Unified Ideographs Extension A (3400–4DBF)
**
Yijing Hexagram Symbols
Yijing Hexagram Symbols is a Unicode block containing the 64 hexagrams from the ''I Ching''.
History
The following Unicode-related documents record the purpose and process of defining specific characters in the Yijing Hexagram Symbols block:
...
(4DC0–4DFF)
**
CJK Unified Ideographs
The Chinese, Japanese and Korean (CJK) scripts share a common background, collectively known as CJK characters. In the process called Han unification, the common (shared) characters were identified and named CJK Unified Ideographs. As of Unicode ...
(4E00–9FFF)
*
Yi Syllables (A000–A48F)
*
Yi Radicals
Yi Radicals is a Unicode block
A Unicode block is one of several contiguous ranges of numeric character codes ( code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation purposes. ...
(A490–A4CF)
*
Lisu Lisu may refer to:
*Lisu people, an ethnic group of Southeast Asia
*Lisu language, spoken by the Lisu people
* Old Lisu Alphabet or Fraser Alphabet
*Lisu syllabary
* Lisu (Unicode block), the block of Unicode characters for the Lisu language.
*Lisu ...
(A4D0–A4FF)
*
Vai (A500–A63F)
*
Cyrillic Extended-B (A640–A69F)
*
Bamum Bamum, also spelled Bamoum, Bamun, or Bamoun, may refer to:
*The Bamum people
*The Bamum kingdom
*The Bamum language
*The Bamum script
** Bamum (Unicode block)
* Bamum Scripts and Archives Project
{{Disambig
Language and nationality disambiguation ...
(A6A0–A6FF)
*
Modifier Tone Letters
Modifier Tone Letters is a Unicode block containing tone markings for Chinese, Chinantec, Africanist, and other phonetic transcriptions. It does not contain the standard IPA tone marks, which are found in Spacing Modifier Letters.
are used to m ...
(A700–A71F)
*
Latin Extended-D
Latin Extended-D is a Unicode block containing Latin characters for phonetic, Mayanist, and Medieval transcription and notation systems. 89 of the characters in this block are for medieval characters proposed by the Medieval Unicode Font Initiati ...
(A720–A7FF)
*
Brahmic
The Brahmic scripts, also known as Indic scripts, are a family of abugida writing systems. They are used throughout the Indian subcontinent, Southeast Asia and parts of East Asia. They are descended from the Brahmi script of ancient India ...
scripts:
**
Syloti Nagri (A800–A82F)
**
Common Indic Number Forms
Common Indic Number Forms is a Unicode block containing characters for representing fractions in north India, Pakistan, and Nepal.
History
The following Unicode-related documents record the purpose and process of defining specific characters i ...
(A830–A83F)
**
Phags-pa (A840–A87F)
**
Saurashtra (A880–A8DF)
**
Devanagari Extended
Devanagari Extended is a Unicode block containing cantilation marks for writing the Samaveda, and nasalization marks for the Devanagari script.
Block
History
The following Unicode-related documents record the purpose and process of defining s ...
(A8E0–A8FF)
**
Kayah Li (A900–A92F)
**
Rejang (A930–A95F)
*
Hangul Jamo Extended-A (A960–A97F)
*
Brahmic
The Brahmic scripts, also known as Indic scripts, are a family of abugida writing systems. They are used throughout the Indian subcontinent, Southeast Asia and parts of East Asia. They are descended from the Brahmi script of ancient India ...
scripts:
**
Javanese (A980–A9DF)
**
Myanmar Extended-B
Myanmar Extended-B is a Unicode block containing Burmese script characters for writing Pali and Tai Laing
Tai Laing (, variously spelt Tai Lai or Tai Nai), also known as Shan-ni (, ), is a Tai language of Burma, related to Khamti. It is writt ...
(A9E0–A9FF)
**
Cham
Cham or CHAM may refer to:
Ethnicities and languages
*Chams, people in Vietnam and Cambodia
**Cham language, the language of the Cham people
***Cham script
***Cham (Unicode block), a block of Unicode characters of the Cham script
*Cham Albanian ...
(AA00–AA5F)
**
Myanmar Extended-A
Myanmar Extended-A is a Unicode block containing Myanmar characters for writing the Khamti Shan
The Tai Khamti, ( Khamti: တဲး ၵံးတီႈ, ( th, ชาวไทคำตี่, my, ခန္တီးရှမ်းလူမ ...
(AA60–AA7F)
**
Tai Viet
The Tai Viet script (Tai Dam: ("Tai script"), Vietnamese: Chữ Thái Việt) ( th, อักษรไทดำ, ) is a Brahmic script used by the Tai Dam people and various other Thai people in Vietnam and Thailand.[Meetei Mayek Extensions
Meetei Mayek Extensions is a Unicode block containing characters for historic Meitei language
Meitei (), also known as Manipuri (, ), is a Tibeto-Burman language of north-eastern India. It is spoken by around 1.8 million people, predominan ...]
(AAE0–AAFF)
*
Ethiopic Extended-A
Ethiopic Extended-A is a Unicode block containing Geʽez
Geez (; ' , and sometimes referred to in scholarly literature as Classical Ethiopic) is an ancient Ethiopian Semitic language. The language originates from what is now northern Et ...
(AB00–AB2F)
*
Latin Extended-E
Latin Extended-E is a Unicode block containing Latin script characters used in German dialectology ( Teuthonista),, Anthropos alphabet, Sakha
Sakha, officially the Republic of Sakha (Yakutia),, is the largest republic of Russia, locate ...
(AB30–AB6F)
*
Cherokee Supplement
Cherokee Supplement is a Unicode block containing the syllabic characters for writing the Cherokee language. When Cherokee was first added to Unicode in version 3.0 it was treated as a unicameral alphabet, but in version 8.0 it was redefined a ...
(AB70–ABBF)
*
Meetei Mayek
)
, altname =
, type = Abugida
, languages = Meitei language (officially known as Manipuri language)
, region =
* Manipur
, sample = "Meitei Mayek" (literally meaning "Meitei script" in Meitei language) written ...
(ABC0–ABFF)
*
Hangul Syllables
Hangul Syllables is a Unicode block containing precomposed Hangul syllable blocks for modern Korean. The syllables can be directly mapped by algorithm to sequences of two or three characters in the Hangul Jamo Unicode block:
* one of U+1100–U+ ...
(AC00–D7AF)
*
Hangul Jamo Extended-B (D7B0–D7FF)
*
Surrogates:
**
High Surrogates (D800–DB7F)
**
High Private Use Surrogates
The Unicode Consortium and the ISO/IEC JTC 1/SC 2/Working group, WG 2 jointly collaborate on the list of the characters in the Universal Coded Character Set. The Universal Coded Character Set, most commonly called the Universal Character Set ...
(DB80–DBFF)
**
Low Surrogates (DC00–DFFF)
*
Private Use Area
In Unicode, a Private Use Area (PUA) is a range of code points that, by definition, will not be assigned characters by the Unicode Consortium. Three private use areas are defined: one in the Basic Multilingual Plane (), and one each in, and nearl ...
(E000–F8FF)
*
CJK Compatibility Ideographs
CJK Compatibility Ideographs is a Unicode block created to contain Han characters that were encoded in multiple locations in other established character encodings, in addition to their CJK Unified Ideographs assignments, in order to retain roun ...
(F900–FAFF)
*
Alphabetic Presentation Forms
Alphabetic Presentation Forms is a Unicode block containing standard ligatures for the Latin, Armenian, and Hebrew scripts.
Block
History
The following Unicode-related documents record the purpose and process of defining specific characters in ...
(FB00–FB4F)
*
Arabic Presentation Forms-A
Arabic Presentation Forms-A is a Unicode block encoding contextual forms and ligatures of letter variants needed for Persian, Urdu, Sindhi and Central Asian languages. This block also allocates 32 noncharacters in Unicode, designed specifically f ...
(FB50–FDFF)
*
Variation Selectors
Variation Selectors is the block name of a Unicode code point block containing 16 variation selectors. Each variation selector is used to specify a specific glyph variant for a preceding character. They are currently used to specify standardize ...
(FE00–FE0F)
*
Vertical Forms
Vertical Forms is a Unicode block containing vertical punctuation for compatibility characters with the Chinese Standard GB 18030
GB 18030 is a Chinese government standard, described as ''Information Technology — Chinese coded character ...
(FE10–FE1F)
*
Combining Half Marks (FE20–FE2F)
*
CJK Compatibility Forms
CJK Compatibility Forms is a Unicode block containing vertical glyph variants for east Asian compatibility. Its block name in Unicode 1.0 was CNS 11643 Compatibility, in reference to CNS 11643.
History
The following Unicode-related documents ...
(FE30–FE4F)
*
Small Form Variants
Small Form Variants is a Unicode block containing small punctuation characters for compatibility with the Chinese National Standard CNS 11643
The CNS 11643 character set (Chinese National Standard 11643), also officially known as the Chinese Sta ...
(FE50–FE6F)
*
Arabic Presentation Forms-B
Arabic Presentation Forms-B is a Unicode block encoding spacing forms of Arabic diacritics, and contextual letter forms. The special codepoint ZWNBSP is also here, which is only meant for a byte order mark
The byte order mark (BOM) is a parti ...
(FE70–FEFF)
*
Halfwidth and Fullwidth Forms
In CJK (Chinese, Japanese and Korean) computing, graphic characters are traditionally classed into fullwidth (in Taiwan and Hong Kong: 全形; in CJK: 全角) and halfwidth (in Taiwan and Hong Kong: 半形; in CJK: 半角) characters. Unlik ...
(FF00–FFEF)
*
Specials (FFF0–FFFF)
Supplementary Multilingual Plane
Plane 1, the Supplementary Multilingual Plane (SMP), contains historic scripts (except CJK ideographic), and symbols and notation used within certain fields. Scripts include
Linear B
Linear B was a syllabic script used for writing in Mycenaean Greek, the earliest attested form of Greek. The script predates the Greek alphabet by several centuries. The oldest Mycenaean writing dates to about 1400 BC. It is descended from ...
,
Egyptian hieroglyphs
Egyptian hieroglyphs (, ) were the formal writing system used in Ancient Egypt, used for writing the Egyptian language. Hieroglyphs combined logographic, syllabic and alphabetic elements, with some 1,000 distinct characters.There were about 1,00 ...
, and
cuneiform
Cuneiform is a logo-syllabic script that was used to write several languages of the Ancient Middle East. The script was in active use from the early Bronze Age until the beginning of the Common Era. It is named for the characteristic wedge-sha ...
scripts. It also includes English reform orthographies like
Shavian
The Shavian alphabet (; also known as the Shaw alphabet) is an alphabet conceived as a way to provide simple, phonemic orthography for the English language to replace the difficulties of English orthography, conventional spelling using the E ...
and
Deseret, and some modern scripts like
Osage,
Warang Citi
Warang Citi (also written Varang Kshiti or Barang Kshiti; , IPA: /wɐrɐŋ ʧɪt̪ɪ/) is a writing system invented by Lako Bodra for the Ho language spoken in East India. It is used in primary and adult education and in various publications.
I ...
,
Adlam,
Wancho and
Toto. Symbols and notations include historic and modern
musical notation
Music notation or musical notation is any system used to visually represent aurally perceived music played with instruments or sung by the human voice through the use of written, printed, or otherwise-produced symbols, including notation fo ...
;
mathematical alphanumerics; shorthands;
Emoji
An emoji ( ; plural emoji or emojis) is a pictogram, logogram, ideogram or smiley embedded in text and used in electronic messages and web pages. The primary function of emoji is to fill in emotional cues otherwise missing from typed conversat ...
and other pictographic sets; and game symbols for
playing card
A playing card is a piece of specially prepared card stock, heavy paper, thin cardboard, plastic-coated paper, cotton-paper blend, or thin plastic that is marked with distinguishing motifs. Often the front (face) and back of each card has a fi ...
s,
mahjong
Mahjong or mah-jongg (English pronunciation: ) is a tile-based game that was developed in the 19th century in China and has spread throughout the world since the early 20th century. It is commonly played by four players (with some three-play ...
, and
dominoes
Dominoes is a family of tile-based games played with gaming pieces, commonly known as dominoes. Each domino is a rectangular tile, usually with a line dividing its face into two square ''ends''. Each end is marked with a number of spots (also ca ...
.
, the SMP comprises the following 151 blocks:
*
Archaic Greek
Archaic Greece was the period in Greek history lasting from circa 800 BC to the second Persian invasion of Greece in 480 BC, following the Greek Dark Ages and succeeded by the Classical period. In the archaic period, Greeks settled across the M ...
and Other Left-to-right scripts:
**
Linear B Syllabary
Linear B Syllabary is a Unicode block containing characters for the syllabic writing of Mycenaean Greek.
Block
History
The following Unicode-related documents record the purpose and process of defining specific characters in the Linear B Syll ...
(10000–1007F)
**
Linear B Ideograms (10080–100FF)
**
Aegean Numbers
Aegean numbers was an additive sign-value numeral system used by the Minoan and Mycenaean civilizations. They are attested in Linear A and Linear B scripts. They may have survived in the Cypro-Minoan script, where a single sign with "100" va ...
(10100–1013F)
**
Ancient Greek Numbers
Ancient Greek Numbers is a Unicode block containing acrophonic numerals
The Attic numerals are a symbolic number notation used by the ancient Greeks. They were also known as Herodianic numerals because they were first described in a 2nd- ...
(10140–1018F)
**
Ancient Symbols (10190–101CF)
**
Phaistos Disc
The Phaistos Disc (also spelled Phaistos Disk, Phaestos Disc) is a disk of fired clay from the Minoan palace of Phaistos on the island of Crete, possibly dating to the middle or late Minoan Bronze Age (second millennium BC). The disk is about ...
(101D0–101FF)
**
Lycian (10280–1029F)
**
Carian
The Carian language is an extinct language of the Luwic subgroup of the Anatolian branch of the Indo-European language family. The Carian language was spoken in Caria, a region of western Anatolia between the ancient regions of Lycia and Lydia, ...
(102A0–102DF)
**
Coptic Epact Numbers
Coptic Epact Numbers is a Unicode block containing old Coptic number forms.
These numbers were used in some regions instead of letters of the Coptic alphabet that were used for encoding numbers, as was common in much of the world at the time, l ...
(102E0–102FF)
**
Old Italic (10300–1032F)
**
Gothic
Gothic or Gothics may refer to:
People and languages
*Goths or Gothic people, the ethnonym of a group of East Germanic tribes
**Gothic language, an extinct East Germanic language spoken by the Goths
**Crimean Gothic, the Gothic language spoken b ...
(10330–1034F)
**
Old Permic
The Old Permic script ( kv, Важ Перым гижӧм, ), sometimes known by its initial 2 characters as Abur or Anbur, is a "highly idiosyncratic adaptation" of the Cyrillic script once used to write medieval Komi (a member of the Permic bran ...
(10350–1037F)
**
Ugaritic
Ugaritic () is an extinct Northwest Semitic language, classified by some as a dialect of the Amorite language and so the only known Amorite dialect preserved in writing. It is known through the Ugaritic texts discovered by French archaeologis ...
(10380–1039F)
**
Old Persian
Old Persian is one of the two directly attested Old Iranian languages (the other being Avestan language, Avestan) and is the ancestor of Middle Persian (the language of Sasanian Empire). Like other Old Iranian languages, it was known to its native ...
(103A0–103DF)
**
Deseret (10400–1044F)
**
Shavian
The Shavian alphabet (; also known as the Shaw alphabet) is an alphabet conceived as a way to provide simple, phonemic orthography for the English language to replace the difficulties of English orthography, conventional spelling using the E ...
(10450–1047F)
**
Osmanya
The Osmanya script ( so, Farta Cismaanya 𐒍𐒖𐒇𐒂𐒖 𐒋𐒘𐒈𐒑𐒛𐒒𐒕𐒖), also known as Far Soomaali (𐒍𐒖𐒇 𐒘𐒝𐒈𐒑𐒛𐒘, "Somali writing") and, in Arabic, as ''al-kitābah al-ʿuthmānīyah'' (الكتا ...
(10480–104AF)
**
Osage (104B0–104FF)
**
Elbasan
Elbasan ( ; sq-definite, Elbasani ) is the fourth most populous city of Albania and seat of Elbasan County and Elbasan Municipality. It lies to the north of the river Shkumbin between the Skanderbeg Mountains and the Myzeqe Plain in central Al ...
(10500–1052F)
**
Caucasian Albanian
Caucasian Albania is a modern exonym for a former state located in ancient times in the Caucasus: mostly in what is now Azerbaijan (where both of its capitals were located). The modern endonyms for the area are ''Aghwank'' and ''Aluank'', among t ...
(10530–1056F)
**
Vithkuqi (10570–105BF)
**
Linear A
Linear A is a writing system that was used by the Minoans of Crete from 1800 to 1450 BC to write the hypothesized Minoan language or languages. Linear A was the primary script used in palace and religious writings of the Minoan civil ...
(10600–1077F)
**
Latin Extended-F
Latin Extended-F is a Unicode block containing modifier letters, nearly all IPA and extIPA, for phonetic transcription. The Latin Extended-F and -G blocks contain the first Latin characters defined outside of the Basic Multilingual Plane
In t ...
(10780–107BF)
* Right-to-left scripts:
**
Cypriot Syllabary
The Cypriot or Cypriote syllabary is a syllabic script used in Iron Age Cyprus, from about the 11th to the 4th centuries BCE, when it was replaced by the Greek alphabet. A pioneer of that change was King Evagoras of Salamis. It is descended fr ...
(10800–1083F)
**
Imperial Aramaic
Imperial Aramaic is a linguistic term, coined by modern scholars in order to designate a specific historical variety of Aramaic language. The term is polysemic, with two distinctive meanings, wider (sociolinguistic) and narrower (dialectological ...
(10840–1085F)
**
Palmyrene (10860–1087F)
**
Nabataean
The Nabataeans or Nabateans (; Nabataean Aramaic: , , vocalized as ; Arabic: , , singular , ; compare grc, Ναβαταῖος, translit=Nabataîos; la, Nabataeus) were an ancient Arab people who inhabited northern Arabia and the southern Lev ...
(10880–108AF)
**
Hatran (108E0–108FF)
**
Phoenician (10900–1091F)
**
Lydian (10920–1093F)
**
Meroitic Hieroglyphs (10980–1099F)
**
Meroitic Cursive
The Meroitic script consists of two alphasyllabic scripts developed to write the Meroitic language at the beginning of the Meroitic Period (3rd century BC) of the Kingdom of Kush. The two scripts are Meroitic Cursive, derived from Demotic Egyp ...
(109A0–109FF)
**
Kharoshthi
The Kharoṣṭhī script, also spelled Kharoshthi (Kharosthi: ), was an ancient Indo-Iranian script used by various Aryan peoples in north-western regions of the Indian subcontinent, more precisely around present-day northern Pakistan and ...
(10A00–10A5F)
**
Old South Arabian
Old South Arabian (or Ṣayhadic or Yemenite) is a group of four closely related extinct languages spoken in the far southern portion of the Arabian Peninsula. They were written in the Ancient South Arabian script.
There were a number of othe ...
(10A60–10A7F)
**
Old North Arabian (10A80–10A9F)
**
Manichaean
Manichaeism (;
in New Persian ; ) is a former major religionR. van den Broek, Wouter J. Hanegraaff ''Gnosis and Hermeticism from Antiquity to Modern Times''SUNY Press, 1998 p. 37 founded in the 3rd century AD by the Parthian Empire, Parthian ...
(10AC0–10AFF)
**
Avestan
Avestan (), or historically Zend, is an umbrella term for two Old Iranian languages: Old Avestan (spoken in the 2nd millennium BCE) and Younger Avestan (spoken in the 1st millennium BCE). They are known only from their conjoined use as the scrip ...
(10B00–10B3F)
**
Inscriptional Parthian
Inscriptional Parthian is a script used to write Parthian language on coins of Parthia from the time of Arsaces I of Parthia (250 BC). It was also used for inscriptions of Parthian (mostly on clay fragments) and later Sassanian periods (mostly on ...
(10B40–10B5F)
**
Inscriptional Pahlavi
Inscriptional Pahlavi is the earliest attested form of Pahlavi scripts, and is evident in clay fragments that have been dated to the reign of Mithridates I (''r.'' 171–138 BC). Other early evidence includes the Pahlavi inscriptions of Arsacid ...
(10B60–10B7F)
**
Psalter Pahlavi
Psalter Pahlavi is a cursive abjad that was used for writing Middle Persian on paper; it is thus described as one of the Pahlavi scripts. It was written right to left, usually with spaces between words.
It takes its name from the Pahlavi Psalt ...
(10B80–10BAF)
**
Old Turkic
Old Turkic (also East Old Turkic, Orkhon Turkic language, Old Uyghur) is the earliest attested form of the Turkic languages, found in Göktürks, Göktürk and Uyghur Khaganate inscriptions dating from about the eighth to the 13th century. It ...
(10C00–10C4F)
**
Old Hungarian (10C80–10CFF)
**
Hanifi Rohingya (10D00–10D3F)
**
Rumi Numeral Symbols
Rumi Numeral Symbols is a Unicode block containing numeric characters used in Fez, Morocco
Fez or Fes (; ar, فاس, fās; zgh, ⴼⵉⵣⴰⵣ, fizaz; french: Fès) is a city in northern inland Morocco and the capital of the Fès-Meknè ...
(10E60–10E7F)
**
Yezidi
Yazidis or Yezidis (; ku, ئێزیدی, translit=Êzidî) are a Kurmanji-speaking endogamous minority group who are indigenous to Kurdistan, a geographical region in Western Asia that includes parts of Iraq, Syria, Turkey and Iran. The majo ...
(10E80–10EBF)
**
Arabic Extended-C (10EC0–10EFF)
**
Old Sogdian (10F00–10F2F)
**
Sogdian (10F30–10F6F)
**
Old Uyghur
Old Uyghur () was a Turkic language which was spoken in Qocho from the 9th–14th centuries and in Gansu.
History
The Old Uyghur language evolved from Old Turkic after the Uyghur Khaganate broke up and remnants of it migrated to Turfan, Qomu ...
(10F70–10FAF)
**
Chorasmian (10FB0–10FDF)
**
Elymaic The Elymaic alphabet is a right-to-left, non-joining abjad.
It is derived from the Aramaic alphabet.
Elymaic was used in the ancient state of Elymais, which was a semi-independent state of the 2nd century BCE to the early 3rd century CE, frequentl ...
(10FE0–10FFF)
*
Brahmic
The Brahmic scripts, also known as Indic scripts, are a family of abugida writing systems. They are used throughout the Indian subcontinent, Southeast Asia and parts of East Asia. They are descended from the Brahmi script of ancient India ...
scripts:
**
Brahmi
Brahmi (; ; ISO 15919, ISO: ''Brāhmī'') is a writing system of ancient South Asia. "Until the late nineteenth century, the script of the Aśokan (non-Kharosthi) inscriptions and its immediate derivatives was referred to by various names such ...
(11000–1107F)
**
Kaithi
Kaithi (), also called Kayathi () or Kayasthi (), is a historical Brahmic script that was used widely in parts of Northern and Eastern India, primarily in the present-day states of Uttar Pradesh, Jharkhand and Bihar. In particular, it was us ...
(11080–110CF)
**
Sora Sompeng (110D0–110FF)
**
Chakma (11100–1114F)
**
Mahajani
Mahajani is a Laṇḍā mercantile script that was historically used in northern India for writing accounts and financial records in Marwari, Hindi and Punjabi.
It is a Brahmic script and is written left-to-right. Mahajani refers to the Hin ...
(11150–1117F)
**
Sharada (11180–111DF)
**
Sinhala Archaic Numbers
Sinhala Archaic Numbers is a Unicode block
A Unicode block is one of several contiguous ranges of numeric character codes ( code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and documentati ...
(111E0–111FF)
**
Khojki
Khojkī, Khojakī, or Khwājā Sindhī ( sd, خوجڪي (Arabic script) खोजकी (Devanagari)), is a script used formerly and almost exclusively by the Khoja community of parts of the Indian subcontinent, including Sindh, Gujarat, and P ...
(11200–1124F)
**
Multani (11280–112AF)
**
Khudawadi (112B0–112FF)
**
Grantha (11300–1137F)
**
Newa
Newar (), or Newari and known officially in Nepal as Nepal Bhasa, is a Sino-Tibetan language spoken by the Newar people, the indigenous inhabitants of Nepal Mandala, which consists of the Kathmandu Valley and surrounding regions in Nepal. ...
(11400–1147F)
**
Tirhuta
The Tirhuta or Maithili script is the primary historical script for the Maithili language, as well as one of the historical scripts for Sanskrit. It is believed to have originated in the 10th century CE. It is very similar to Bengali–Assam ...
(11480–114DF)
**
Siddham (11580–115FF)
**
Modi
Narendra Damodardas Modi (; born 17 September 1950) is an Indian politician serving as the 14th and current Prime Minister of India since 2014. Modi was the Chief Minister of Gujarat from 2001 to 2014 and is the Member of Parliament from ...
(11600–1165F)
**
Mongolian Supplement (11660–1167F)
**
Takri
The Tākri script (Takri (Chamba): ; Takri (Jammu/Dogra): ; sometimes called Tankri ) is an abugida writing system of the Brahmic family of scripts. It is derived from the Sharada script formerly employed for Kashmiri. It is the sister script ...
(11680–116CF)
**
Ahom Ahom may refer to:
*Ahom people, an ethnic community in Assam
* Ahom language, a language associated with the Ahom people
*Ahom religion, an ethnic folk religion of Tai-Ahom people
*Ahom alphabet, a script used to write the Ahom language
* Ahom kin ...
(11700–1174F)
**
Dogra
The Dogras or Dogra people, are an Indo-Aryan ethno-linguistic group in India and Pakistan consisting of the Dogri language speakers. They live predominantly in the Jammu region of Jammu and Kashmir, and in adjoining areas of Punjab, Himachal ...
(11800–1184F)
**
Warang Citi
Warang Citi (also written Varang Kshiti or Barang Kshiti; , IPA: /wɐrɐŋ ʧɪt̪ɪ/) is a writing system invented by Lako Bodra for the Ho language spoken in East India. It is used in primary and adult education and in various publications.
I ...
(118A0–118FF)
**
Dives Akuru
Dhives Akuru, later called Dhivehi Akuru (meaning "letters" letters) is a script formerly used for the Maldivian language. The name can be alternatively spelled Dives Akuru or Divehi Akuru, as the "d" is unaspirated.
History
Dhives Akuru de ...
(11900–1195F)
**
Nandinagari
Nandinagari is a Brahmic script derived from the Nāgarī script which appeared in the 7th century AD.George Cardona and Danesh Jain (2003), The Indo-Aryan Languages, Routledge, , page 75 This script and its variants were used in the central Dec ...
(119A0–119FF)
**
Zanabazar Square (11A00–11A4F)
**
Soyombo (11A50–11AAF)
*
Unified Canadian Aboriginal Syllabics Extended-A
Unified Canadian Aboriginal Syllabics Extended-A is a Unicode block containing extensions to the Canadian syllabics contained in the Unified Canadian Aboriginal Syllabics Unicode block. The extension adds missing characters for Nattilik and hi ...
(11AB0–11ABF)
* Brahmic scripts:
**
Pau Cin Hau Pau Cin Hau is the founder and the name of a religion followed by some Tedim, Hakha in Chin state and Kale in Sagaing division in the north-western part of Myanmar.
Pau Cin Hau was born in the Tedim (Tiddim) in 1859; and lived until 1948. He sta ...
(11AC0–11AFF)
**
Devanagari Extended-A
Devanagari Extended-A is a Unicode block containing characters for auspicious signs from Indian
Indian or Indians may refer to:
Peoples South Asia
* Indian people, people of Indian nationality, or people who have an Indian ancestor
** Non ...
(11B00–11B5F)
**
Bhaiksuki (11C00–11C6F)
**
Marchen (11C70–11CBF)
**
Masaram Gondi (11D00–11D5F)
**
Gunjala Gondi (11D60–11DAF)
**
Makasar
Makassar (, mak, ᨆᨀᨔᨑ, Mangkasara’, ) is the capital of the Indonesian province of South Sulawesi. It is the largest city in the region of Eastern Indonesia and the country's fifth-largest urban center after Jakarta, Surabaya, Medan ...
(11EE0–11EFF)
**
Kawi (11F00–11F5F)
*
Lisu Supplement
Lisu Supplement is a Unicode block containing supplementary characters of the Fraser alphabet, which is used to write the Lisu language. This is a supplement to the main Lisu block, with currently only a single character used for the Naxi langua ...
(11FB0–11FBF)
*
Tamil Supplement
Tamil Supplement is a Unicode block containing Tamil
Tamil may refer to:
* Tamils, an ethnic group native to India and some other parts of Asia
** Sri Lankan Tamils, Tamil people native to Sri Lanka also called ilankai tamils
**Tamil Malaysian ...
(11FC0–11FFF)
*
Cuneiform
Cuneiform is a logo-syllabic script that was used to write several languages of the Ancient Middle East. The script was in active use from the early Bronze Age until the beginning of the Common Era. It is named for the characteristic wedge-sha ...
(12000–123FF)
*
Cuneiform Numbers and Punctuation
In Unicode, the Sumero-Akkadian Cuneiform script is covered in three blocks in the Supplementary Multilingual Plane (SMP):
* U+12000–U+123FF Cuneiform
* U+12400–U+1247F Cuneiform Numbers and Punctuation
* U+12480–U+1254F Early ...
(12400–1247F)
*
Early Dynastic Cuneiform
Early Dynastic Cuneiform is the name of a Unicode block of the Supplementary Multilingual Plane (SMP), at U+12480–U+1254F, introduced in version 8.0 (June 2015).
It is a supplement to the earlier encoding of the cuneiform script in the ...
(12480–1254F)
*
Cypro-Minoan
The Cypro-Minoan syllabary (CM) is an undeciphered syllabary used on the island of Cyprus during the late Bronze Age (c. 1550–1050 BC). The term "Cypro-Minoan" was coined by Arthur Evans in 1909 based on its visual similarity to Linear A on M ...
(12F90–12FFF)
*
Egyptian Hieroglyphs
Egyptian hieroglyphs (, ) were the formal writing system used in Ancient Egypt, used for writing the Egyptian language. Hieroglyphs combined logographic, syllabic and alphabetic elements, with some 1,000 distinct characters.There were about 1,00 ...
(13000–1342F)
*
Egyptian Hieroglyph Format Controls
Egyptian Hieroglyph Format Controls is a Unicode block containing formatting characters that enable full formatting of quadrats for Egyptian hieroglyphs
Egyptian hieroglyphs (, ) were the formal writing system used in Ancient Egypt, used for ...
(13430–1345F)
*
Anatolian Hieroglyphs
Anatolian hieroglyphs are an indigenous logographic script native to central Anatolia, consisting of some 500 signs. They were once commonly known as Hittite hieroglyphs, but the language they encode proved to be Luwian, not Hittite, and the ter ...
(14400–1467F)
*
Bamum Supplement (16800–16A3F)
*
Mro (16A40–16A6F)
*
Tangsa
The Tangsa or Tangshang in India and Myanmar (Burma) respectively, is a tribe native to Changlang District of Arunachal Pradesh, parts of Tinsukia District of Assam, in north-eastern India, and across the border in Sagaing Region, parts of K ...
(16A70–16ACF)
*
Bassa Vah (16AD0–16AFF)
*
Pahawh Hmong
Pahawh Hmong ( RPA: Phaj hauj Hmoob , Pahawh: ; known also as ''Ntawv Pahawh, Ntawv Keeb, Ntawv Caub Fab, Ntawv Soob Lwj'') is an indigenous semi-syllabic script, invented in 1959 by Shong Lue Yang, to write two Hmong languages, Hmong Daw ''( ...
(16B00–16B8F)
*
Medefaidrin
Medefaidrin (Medefidrin), or ', is a constructed language and script created as a Christian sacred language by an Ibibio congregation in 1930s Nigeria. It has its roots in glossolalia ('speaking in tongues').
History
Speakers consider Medefa ...
(16E40–16E9F)
*
Miao Miao may refer to:
* Miao people, linguistically and culturally related group of people, recognized as such by the government of the People's Republic of China
* Miao script or Pollard script, writing system used for Miao languages
* Miao (Unicode ...
(16F00–16F9F)
*
Ideographic Symbols and Punctuation
Ideographic Symbols and Punctuation is a Unicode block containing symbols and punctuation marks used by ideographic scripts such as Tangut and Nüshu.
History
The following Unicode-related documents record the purpose and process of defining ...
(16FE0–16FFF)
*
Tangut (17000–187FF)
*
Tangut Components
Tangut Components is a Unicode block containing components and radicals used in the modern study of the Tangut script
The Tangut script ( Tangut: ; ) was a logographic writing system, used for writing the extinct Tangut language of the West ...
(18800–18AFF)
*
Khitan Small Script
The Khitan small script () was one of two writing systems used for the now-extinct Khitan language (the other was the Khitan large script). It was used during the 10th–12th century by the Khitan people, who had created the Liao Empire in present- ...
(18B00–18CFF)
*
Tangut Supplement
Tangut Supplement is a Unicode block containing characters from the Tangut script, which was used for writing the Tangut language spoken by the Tangut people in the Western Xia Empire, and in China during the Yuan dynasty and early Ming dynasty
...
(18D00–18D7F)
*
Kana Extended-B
Kana Extended-B is a Unicode block containing kana originally created by Japanese linguists to write Taiwanese Hokkien known as Taiwanese kana.
Block
History
The following Unicode-related documents record the purpose and process of defining spe ...
(1AFF0–1AFFF)
*
Kana Supplement
Kana Supplement is a Unicode block containing one archaic katakana character and 255 hentaigana (non-standard Hiragana) characters. Additional hentaigana characters are encoded in the Kana Extended-A block.
Block
History
The following Unicode- ...
(1B000–1B0FF)
*
Kana Extended-A
Kana Extended-A is a Unicode block containing hentaigana (non-standard hiragana
is a Japanese syllabary, part of the Japanese writing system, along with ''katakana'' as well as ''kanji''.
It is a phonetic lettering system. The word ''hira ...
(1B100–1B12F)
*
Small Kana Extension
Small Kana Extension is a Unicode block containing additional small variants for the Hiragana and Katakana syllabaries, in addition to those in the Hiragana, Katakana and Katakana Phonetic Extensions blocks.
Block
History
The following Unicode- ...
(1B130–1B16F)
*
Nushu (1B170–1B2FF)
*
Duployan (1BC00–1BC9F)
*
Shorthand Format Controls
Shorthand Format Controls is a Unicode block containing four formatting characters for representing shorthands in Unicode.
Block
Being invisible controls, they have no visible glyph but can have a representation.
*
*
*
::Romanian affix -tsion ...
(1BCA0–1BCAF)
*
Symbols
A symbol is a mark, sign, or word that indicates, signifies, or is understood as representing an idea, object, or relationship. Symbols allow people to go beyond what is known or seen by creating linkages between otherwise very different conc ...
:
**
Musical notation
Music notation or musical notation is any system used to visually represent aurally perceived music played with instruments or sung by the human voice through the use of written, printed, or otherwise-produced symbols, including notation fo ...
:
***
Znamenny Musical Notation
Znamenny Musical Notation is a Unicode block containing characters for Znamenny musical notation from Russia.
Few fonts support this block as of 2021. Ones that do and are free for personal use include '' Symbola'' 14.0 and Slavonic' 1.00 (non-c ...
(1CF00–1CFCF)
***
Byzantine Musical Symbols
Byzantine Musical Symbols is a Unicode block containing characters for representing Byzantine-era musical notation.
Block
History
The following Unicode-related documents record the purpose and process of defining specific characters in the Byza ...
(1D000–1D0FF)
***
Musical Symbols
Musical symbols are marks and symbols in musical notation that indicate various aspects of how a piece of music is to be performed. There are symbols to communicate information about many musical elements, including Pitch (music), pitch, Duration ...
(1D100–1D1FF)
***
Ancient Greek Musical Notation
Ancient Greek Musical Notation is a Unicode block
A Unicode block is one of several contiguous ranges of numeric character codes ( code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and docu ...
(1D200–1D24F)
**
Kaktovik Numerals
The Kaktovik numerals or Kaktovik Iñupiaq numerals are a base-20 system of numerical digits created by Alaskan Iñupiat. They are visually Iconicity, iconic, with shapes that indicate the number being represented.
The Iñupiaq language#Nume ...
(1D2C0–1D2DF)
**
Mayan Numerals
The Mayan numeral system was the system to represent numbers and calendar dates in the Maya civilization. It was a vigesimal (base-20) positional numeral system. The numerals are made up of three symbols; zero (a shell), one (a dot) and f ...
(1D2E0–1D2FF)
**
Mathematical symbols
A mathematical symbol is a figure or a combination of figures that is used to represent a mathematical object, an action on mathematical objects, a relation between mathematical objects, or for structuring the other symbols that occur in a formula. ...
:
***
Tai Xuan Jing Symbols (1D300–1D35F)
***
Counting Rod Numerals
Counting Rod Numerals is a Unicode block containing traditional Chinese counting rod symbols, which mathematicians used for calculation in ancient China, Japan, Korea, and Vietnam. The orientation of the Unicode characters follows Song dynasty co ...
(1D360–1D37F)
***
Mathematical Alphanumeric Symbols
Mathematical Alphanumeric Symbols is a Unicode block comprising styled forms of Latin alphabet, Latin and Greek alphabet, Greek letters and decimal numerical digit, digits that enable mathematicians to denote different notions with different ...
(1D400–1D7FF)
**
Sutton SignWriting (1D800–1DAAF)
*
Latin Extended-G
Latin Extended-G is a Unicode block containing additional characters for phonetic transcription. The Latin Extended-F and -G blocks contain the first Latin characters defined outside of the Basic Multilingual Plane (BMP).
As of early 2022, only ...
(1DF00–1DFFF)
*
Glagolitic Supplement
Glagolitic Supplement is a Unicode block
A Unicode block is one of several contiguous ranges of numeric character codes ( code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation ...
(1E000–1E02F)
*
Cyrillic Extended-D (1E030–1E08F)
*
Nyiakeng Puachue Hmong
Nyiakeng Puachue Hmong ( Hmong: ; RPA: ''Ntawv Nyiajkeeb Puajtxwm Hmoob'') is an alphabet script devised for White Hmong and Green Hmong in the 1980s by Reverend Chervang Kong for use within his United Christians Liberty Evangelical Church. T ...
(1E100–1E14F)
*
Toto (1E290–1E2BF)
*
Wancho (1E2C0–1E2FF)
*
Nag Mundari (1E4D0–1E4FF)
*
Ethiopic Extended-B (1E7E0–1E7FF)
*
Mende Kikakui (1E800–1E8DF)
*
Adlam (1E900–1E95F)
*
Symbols
A symbol is a mark, sign, or word that indicates, signifies, or is understood as representing an idea, object, or relationship. Symbols allow people to go beyond what is known or seen by creating linkages between otherwise very different conc ...
:
**
Indic Siyaq Numbers
Indic Siyaq Numbers is a Unicode block containing a specialized subset of the Arabic script that was used for accounting in India under the Mughals
The Mughal Empire was an early-modern empire that controlled much of South Asia between ...
(1EC70–1ECBF)
**
Ottoman Siyaq Numbers (1ED00–1ED4F)
**
Arabic Mathematical Alphabetic Symbols
Arabic Mathematical Alphabetic Symbols is a Unicode block
A Unicode block is one of several contiguous ranges of numeric character codes ( code points) of the Unicode character set that are defined by the Unicode Consortium for administrative ...
(1EE00–1EEFF)
** Game tiles and cards:
***
Mahjong Tiles
Mahjong tiles () are tiles of Chinese origin that are used to play mahjong as well as mahjong solitaire and other games. Although they are most commonly tiles, they may refer to playing cards with similar contents as well.
Development
The ...
(1F000–1F02F)
***
Domino Tiles
Domino Tiles is a Unicode block containing characters for representing game situations in dominoes. The block includes symbols for the standard six dot tile set and backs in horizontal and vertical orientations.
History
The following Unicod ...
(1F030–1F09F)
***
Playing Cards
A playing card is a piece of specially prepared card stock, heavy paper, thin cardboard, plastic-coated paper, cotton-paper blend, or thin plastic that is marked with distinguishing motifs. Often the front (face) and back of each card has a fi ...
(1F0A0–1F0FF)
**
Enclosed Alphanumeric Supplement (1F100–1F1FF)
**
Enclosed Ideographic Supplement
Enclosed Ideographic Supplement is a Unicode block containing forms of characters and words from Chinese, Japanese and Korean enclosed within or stylised as squares, brackets, or circles. It contains three such characters containing one or more ...
(1F200–1F2FF)
**
Miscellaneous Symbols and Pictographs
Miscellaneous Symbols and Pictographs is a Unicode block containing meteorological and astronomical symbols, emoji characters largely for compatibility with Japanese telephone carriers' implementations of Shift JIS, and characters originally from ...
(1F300–1F5FF)
**
Emoticons
An emoticon (, , rarely , ), short for "emotion icon", also known simply as an emote, is a pictorial representation of a facial expression using characters—usually punctuation marks, numbers, and letters—to express a person's feelings, m ...
(1F600–1F64F)
**
Ornamental Dingbats
Ornamental Dingbats is a Unicode block containing ornamental leaves, punctuation, and ampersands, quilt squares, and checkerboard patterns.
It is a subset of dingbat fonts Webdings, Wingdings, and Wingdings 2.
History
The following Unicode- ...
(1F650–1F67F)
**
Transport and Map Symbols
Transport and Map Symbols is a Unicode block containing transportation and map icons, largely for compatibility with Japanese telephone carriers' emoji implementations of Shift JIS, and to encode characters in the Wingdings and Wingdings 2 char ...
(1F680–1F6FF)
**
Alchemical Symbols
Alchemical symbols, originally devised as part of alchemy, were used to denote some elements and some compounds until the 18th century. Although notation like this was mostly standardized, style and symbol varied between alchemists, so this pag ...
(1F700–1F77F)
**
Geometric Shapes Extended (1F780–1F7FF)
**
Supplemental Arrows-C
Supplemental Arrows-C is a Unicode block
A Unicode block is one of several contiguous ranges of numeric character codes ( code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation ...
(1F800–1F8FF)
**
Supplemental Symbols and Pictographs (1F900–1F9FF)
**
Chess Symbols
Chess Symbols is a Unicode block containing characters for fairy chess and related notations beyond the basic Western chess symbols in the Miscellaneous Symbols block, as well as symbols representing game pieces for xiangqi
''Xiangqi'' ( ...
(1FA00–1FA6F)
**
Symbols and Pictographs Extended-A (1FA70–1FAFF)
**
Symbols for Legacy Computing
Symbols for Legacy Computing is a Unicode block containing graphic characters that were used for various home computers from the 1970s and 1980s and in Teletext broadcasting standards. It includes characters from the Amstrad CPC, MSX, Mattel Aqu ...
(1FB00–1FBFF)
Supplementary Ideographic Plane
Plane 2, the Supplementary Ideographic Plane (SIP), is used for CJK Ideographs, mostly
CJK Unified Ideographs
The Chinese, Japanese and Korean (CJK) scripts share a common background, collectively known as CJK characters. In the process called Han unification, the common (shared) characters were identified and named CJK Unified Ideographs. As of Unicode ...
, that were not included in earlier character encoding standards.
, the SIP comprises the following six blocks:
*
CJK Unified Ideographs Extension B
CJK Unified Ideographs Extension B is a Unicode block
A Unicode block is one of several contiguous ranges of numeric character codes ( code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and ...
(20000–2A6DF)
*
CJK Unified Ideographs Extension C
__FORCETOC__
CJK Unified Ideographs Extension C is a Unicode block containing rare and historic CJK ideographs for Chinese, Japanese, Korean, and Vietnamese.
The block has dozens of ideographic variation sequences registered in the Unicode Ide ...
(2A700–2B73F)
*
CJK Unified Ideographs Extension D
CJK Unified Ideographs Extension D is a Unicode block
A Unicode block is one of several contiguous ranges of numeric character codes ( code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and d ...
(2B740–2B81F)
*
CJK Unified Ideographs Extension E
CJK Unified Ideographs Extension E is a Unicode block
A Unicode block is one of several contiguous ranges of numeric character codes ( code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and d ...
(2B820–2CEAF)
*
CJK Unified Ideographs Extension F
CJK Unified Ideographs Extension F is a Unicode block containing rare and historic CJK ideographs for Chinese, Japanese, Korean, and Vietnamese, as well as more than a thousand Sawndip characters for writing the Zhuang language
The Zhuang la ...
(2CEB0–2EBEF)
*
CJK Compatibility Ideographs Supplement
CJK Compatibility Ideographs Supplement is a Unicode block containing Han characters used only for Round-trip format conversion, roundtrip compatibility mapping with planes 3, 4, 5, 6, 7, and 15 of CNS 11643-1992.
Block
History
The following Un ...
(2F800–2FA1F)
Tertiary Ideographic Plane
Plane 3 is the Tertiary Ideographic Plane (TIP).
CJK Unified Ideographs Extension G was added to the TIP in Unicode 13.0, released in March 2020. It also is tentatively allocated for
Oracle Bone script
Oracle bone script () is an ancient form of Chinese characters that were engraved on oracle bonesanimal bones or Turtle shell#Plastron, turtle plastrons used in pyromancy, pyromantic divination. Oracle bone script was used in the late 2nd millen ...
and
Small Seal Script
The small seal script (), or Qin script (, ''Qínzhuàn''), is an archaic form of Chinese calligraphy. It was standardized and promulgated as a national standard by the government of Qin Shi Huang, the founder of the Chinese Qin dynasty.
Name ...
.
, the TIP comprises the following two blocks:
*
CJK Unified Ideographs Extension G (30000–3134F)
*
CJK Unified Ideographs Extension H
__FORCETOC__
CJK Unified Ideographs Extension H is a Unicode block containing rare and historic CJK Unified Ideographs for Chinese, Japanese, Korean, Sawndip, and Vietnamese.
Block
History
The following Unicode-related documents record the purpo ...
(31350–323AF)
Unassigned planes
Planes 4 to 13 (planes to in
hexadecimal
In mathematics and computing, the hexadecimal (also base-16 or simply hex) numeral system is a positional numeral system that represents numbers using a radix (base) of 16. Unlike the decimal system representing numbers using 10 symbols, hexa ...
): No characters have yet been assigned, or proposed for assignment, to Planes 4 through 13.
Supplementary Special-purpose Plane
Plane 14 ( in hexadecimal) is designated as the Supplementary Special-purpose Plane (SSP). It comprises the following two
blocks, :
*
Tags (E0000–E007F)
*
Variation Selectors Supplement
Variation Selectors Supplement is a Unicode block containing additional Variation Selectors beyond those found in the Variation Selectors (Unicode block), Variation Selectors block.
These combining characters are named ''variation selector-17'' ...
(E0100–E01EF) – used to indicate alternate glyphs for characters.
Private Use Area Planes
The two planes 15 and 16 (planes and in hexadecimal) each contain a "
Private Use Area
In Unicode, a Private Use Area (PUA) is a range of code points that, by definition, will not be assigned characters by the Unicode Consortium. Three private use areas are defined: one in the Basic Multilingual Plane (), and one each in, and nearl ...
". They contain blocks named Supplementary Private Use Area-A (PUA-A) and -B (PUA-B). The Private Use Areas are available for use by parties outside ISO and Unicode (private character encoding).
References
{{Unicode navigation
Plane
Plane(s) most often refers to:
* Aero- or airplane, a powered, fixed-wing aircraft
* Plane (geometry), a flat, 2-dimensional surface
Plane or planes may also refer to:
Biology
* Plane (tree) or ''Platanus'', wetland native plant
* ''Planes' ...