HOME

TheInfoList



OR:

In the
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, ...
standard, a plane is a continuous group of 65,536 (216)
code point In character encoding terminology, a code point, codepoint or code position is a numerical value that maps to a specific character. Code points usually represent a single grapheme—usually a letter, digit, punctuation mark, or whitespace—but ...
s. There are 17 planes, identified by the numbers 0 to 16, which corresponds with the possible values 00–1016 of the first two positions in six position
hexadecimal In mathematics and computing, the hexadecimal (also base-16 or simply hex) numeral system is a positional numeral system that represents numbers using a radix (base) of 16. Unlike the decimal system representing numbers using 10 symbols, he ...
format (U+''hhhhhh''). Plane 0 is the Basic Multilingual Plane (BMP), which contains most commonly used characters. The higher planes 1 through 16 are called "supplementary planes". The last code point in Unicode is the last code point in plane 16, U+10FFFF. As of Unicode version , five of the planes have assigned code points (characters), and seven are named. The limit of 17 planes is due to UTF-16, which can encode 220 code points (16 planes) as pairs of words, plus the BMP as a single word.
UTF-8 UTF-8 is a variable-length character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode'' (or ''Universal Coded Character Set'') ''Transformation Format 8-bit''. UTF-8 is capable of e ...
was designed with a much larger limit of 231 (2,147,483,648) code points (32,768 planes), and would still be able to encode 221 (2,097,152) code points (32 planes) even under the current limit of 4
byte The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable uni ...
s. The 17 planes can accommodate 1,114,112 code points. Of these, 2,048 are
surrogates ''Surrogates'' is a 2009 American science fiction action film based on the 2005–2006 comic book series '' The Surrogates''. Directed by Jonathan Mostow, it stars Bruce Willis as Tom Greer, an FBI agent who ventures out into the real world to ...
(used to make the pairs in UTF-16), 66 are non-characters, and 137,468 are reserved for private use, leaving 974,530 for public assignment. Planes are further subdivided into
Unicode block A Unicode block is one of several contiguous ranges of numeric character codes ( code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation purposes. Typically, proposals such as the ...
s, which, unlike planes, do not have a fixed size. The 327 blocks defined in Unicode cover 26% of the possible code point space, and range in size from a minimum of 16 code points (sixteen blocks) to a maximum of 65,536 code points (Supplementary Private Use Area-A and -B, which constitute the entirety of planes 15 and 16). For future usage, ranges of characters have been tentatively mapped out for most known current and ancient writing systems.


Overview


Assigned characters


Basic Multilingual Plane

The first plane, plane 0, the Basic Multilingual Plane (BMP) contains characters for almost all modern languages, and a large number of
symbols A symbol is a mark, sign, or word that indicates, signifies, or is understood as representing an idea, object, or relationship. Symbols allow people to go beyond what is known or seen by creating linkages between otherwise very different co ...
. A primary objective for the BMP is to support the unification of prior character sets as well as characters for
writing Writing is a medium of human communication which involves the representation of a language through a system of physically inscribed, mechanically transferred, or digitally represented symbols. Writing systems do not themselves constitute h ...
. Most of the assigned code points in the BMP are used to encode Chinese, Japanese, and Korean ( CJK) characters. The High Surrogate (U+D800–U+DBFF) and Low Surrogate (U+DC00–U+DFFF) codes are reserved for encoding non-BMP characters in UTF-16 by using a ''pair'' of 16- bit codes: one High Surrogate and one Low Surrogate. A single surrogate code point will never be assigned a character. 65,520 of the 65,536 code points in this plane have been allocated to a
Unicode block A Unicode block is one of several contiguous ranges of numeric character codes ( code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation purposes. Typically, proposals such as the ...
, leaving just 16 code points in a single unallocated range (2FE0..2FEF). , the BMP comprises the following 164 blocks: * Basic Latin (Lower half of
ISO/IEC 8859-1 ISO/IEC 8859-1:1998, ''Information technology — 8-bit single-byte coded graphic character sets — Part 1: Latin alphabet No. 1'', is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in ...
: ISO/IEC 646:1991-IRV aka
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
) (0000–007F) * Latin-1 Supplement (Upper half of
ISO/IEC 8859-1 ISO/IEC 8859-1:1998, ''Information technology — 8-bit single-byte coded graphic character sets — Part 1: Latin alphabet No. 1'', is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in ...
) (0080–00FF) *
Latin Extended-A Latin Extended-A is a Unicode block and is the third block of the Unicode standard. It encodes Latin letters from the Latin ISO character sets other than Latin-1 (which is already encoded in the Latin-1 Supplement block) and also legacy characte ...
(0100–017F) *
Latin Extended-B Latin Extended-B is the fourth block (0180-024F) of the Unicode Standard. It has been included since version 1.0, where it was only allocated to the code points 0180-01FF and contained 113 characters. During unification with ISO 10646 for versio ...
(0180–024F) *
IPA Extensions IPA Extensions is a block (U+0250–U+02AF) of the Unicode standard that contains full size letters used in the International Phonetic Alphabet (IPA). Both modern and historical characters are included, as well as former and proposed IPA sign ...
(0250–02AF) * Spacing Modifier Letters (02B0–02FF) *
Combining Diacritical Marks Combining Diacritical Marks is a Unicode block containing the most common combining characters. It also contains the character " Combining Grapheme Joiner", which prevents canonical reordering of combining characters, and despite the name, actu ...
(0300–036F) * Greek and Coptic (0370–03FF) *
Cyrillic The Cyrillic script ( ), Slavonic script or the Slavic script, is a writing system used for various languages across Eurasia. It is the designated national script in various Slavic, Turkic, Mongolic, Uralic, Caucasian and Iranic-speaking co ...
(0400–04FF) *
Cyrillic Supplement Cyrillic Supplement is a Unicode block containing Cyrillic letters for writing several minority languages, including Abkhaz, Kurdish, Komi, Mordvin, Aleut The Aleuts ( ; russian: Алеуты, Aleuty) are the indigenous people of the ...
(0500–052F) * Armenian (0530–058F) *
Aramaic The Aramaic languages, short Aramaic ( syc, ܐܪܡܝܐ, Arāmāyā; oar, 𐤀𐤓𐤌𐤉𐤀; arc, 𐡀𐡓𐡌𐡉𐡀; tmr, אֲרָמִית), are a language family containing many varieties (languages and dialects) that originated i ...
Scripts: **
Hebrew Hebrew (; ; ) is a Northwest Semitic language of the Afroasiatic language family. Historically, it is one of the spoken languages of the Israelites and their longest-surviving descendants, the Jews and Samaritans. It was largely preserved ...
(0590–05FF) **
Arabic Arabic (, ' ; , ' or ) is a Semitic language spoken primarily across the Arab world.Semitic languages: an international handbook / edited by Stefan Weninger; in collaboration with Geoffrey Khan, Michael P. Streck, Janet C. E.Watson; Walter ...
(0600–06FF) ** Syriac (0700–074F) **
Arabic Supplement Arabic Supplement is a Unicode block that encodes Arabic Arabic (, ' ; , ' or ) is a Semitic language spoken primarily across the Arab world.Semitic languages: an international handbook / edited by Stefan Weninger; in collaboration wit ...
(0750–077F) **
Thaana Thaana, Taana or Tāna (  ) is the present writing system of the Maldivian language spoken in the Maldives. Thaana has characteristics of both an abugida (diacritic, vowel-killer strokes) and a true alphabet (all vowels are written), ...
(0780–07BF) **
N'Ko N'Ko () is a script devised by Solomana Kante in 1949, as a modern writing system for the Mandé languages of West Africa. The term ''N'Ko'', which means ''I say'' in all Mandé languages, is also used for the Mandé literary standard written ...
(07C0–07FF) ** Samaritan (0800–083F) ** Mandaic (0840–085F) ** Syriac Supplement (0860–086F) ** Arabic Extended-B (0870–089F) **
Arabic Extended-A Arabic Extended-A is a Unicode block encoding Qur'anic The Quran (, ; Standard Arabic: , Quranic Arabic: , , 'the recitation'), also romanized Qur'an or Koran, is the central religious text of Islam, believed by Muslims to be a revelation ...
(08A0–08FF) *
Brahmic The Brahmic scripts, also known as Indic scripts, are a family of abugida writing systems. They are used throughout the Indian subcontinent, Southeast Asia and parts of East Asia. They are descended from the Brahmi script of ancient In ...
scripts: **
Devanagari Devanagari ( ; , , Sanskrit pronunciation: ), also called Nagari (),Kathleen Kuiper (2010), The Culture of India, New York: The Rosen Publishing Group, , page 83 is a left-to-right abugida (a type of segmental writing system), based on the ...
(0900–097F) **
Bengali Bengali or Bengalee, or Bengalese may refer to: *something of, from, or related to Bengal, a large region in South Asia * Bengalis, an ethnic and linguistic group of the region * Bengali language, the language they speak ** Bengali alphabet, the w ...
(0980–09FF) **
Gurmukhi Gurmukhī ( pa, ਗੁਰਮੁਖੀ, , Shahmukhi: ) is an abugida developed from the Laṇḍā scripts, standardized and used by the second Sikh guru, Guru Angad (1504–1552). It is used by Punjabi Sikhs to write the language, commonly ...
(0A00–0A7F) **
Gujarati Gujarati may refer to: * something of, from, or related to Gujarat, a state of India * Gujarati people, the major ethnic group of Gujarat * Gujarati language, the Indo-Aryan language spoken by them * Gujarati languages, the Western Indo-Aryan sub- ...
(0A80–0AFF) ** Oriya (0B00–0B7F) **
Tamil Tamil may refer to: * Tamils, an ethnic group native to India and some other parts of Asia ** Sri Lankan Tamils, Tamil people native to Sri Lanka also called ilankai tamils **Tamil Malaysians, Tamil people native to Malaysia * Tamil language, na ...
(0B80–0BFF) **
Telugu Telugu may refer to: * Telugu language, a major Dravidian language of India *Telugu people, an ethno-linguistic group of India * Telugu script, used to write the Telugu language ** Telugu (Unicode block), a block of Telugu characters in Unicode ...
(0C00–0C7F) **
Kannada Kannada (; ಕನ್ನಡ, ), originally romanised Canarese, is a Dravidian language spoken predominantly by the people of Karnataka in southwestern India, with minorities in all neighbouring states. It has around 47 million native s ...
(0C80–0CFF) **
Malayalam Malayalam (; , ) is a Dravidian language spoken in the Indian state of Kerala and the union territories of Lakshadweep and Puducherry ( Mahé district) by the Malayali people. It is one of 22 scheduled languages of India. Malayalam wa ...
(0D00–0D7F) ** Sinhala (0D80–0DFF) ** Thai (0E00–0E7F) ** Lao (0E80–0EFF) **
Tibetan Tibetan may mean: * of, from, or related to Tibet * Tibetan people, an ethnic group * Tibetan language: ** Classical Tibetan, the classical language used also as a contemporary written standard ** Standard Tibetan, the most widely used spoken diale ...
(0F00–0FFF) **
Myanmar Myanmar, ; UK pronunciations: US pronunciations incl. . Note: Wikipedia's IPA conventions require indicating /r/ even in British English although only some British English speakers pronounce r at the end of syllables. As John Wells explai ...
(1000–109F) * Georgian (10A0–10FF) * Hangul Jamo (1100–11FF) * Ethiopic (1200–137F) *
Ethiopic Supplement Ethiopic Supplement is a Unicode block containing extra Geʽez characters for writing the Sebatbeit language, and Ethiopic tone marks. Block History The following Unicode-related documents record the purpose and process of defining specific ...
(1380–139F) *
Cherokee The Cherokee (; chr, ᎠᏂᏴᏫᏯᎢ, translit=Aniyvwiyaʔi or Anigiduwagi, or chr, ᏣᎳᎩ, links=no, translit=Tsalagi) are one of the indigenous peoples of the Southeastern Woodlands of the United States. Prior to the 18th century, th ...
(13A0–13FF) * Unified Canadian Aboriginal Syllabics (1400–167F) * Ogham (1680–169F) * Runic (16A0–16FF) *
Philippine The Philippines (; fil, Pilipinas, links=no), officially the Republic of the Philippines ( fil, Republika ng Pilipinas, links=no), * bik, Republika kan Filipinas * ceb, Republika sa Pilipinas * cbk, República de Filipinas * hil, Republ ...
scripts: **
Tagalog Tagalog may refer to: Language * Tagalog language, a language spoken in the Philippines ** Old Tagalog, an archaic form of the language ** Batangas Tagalog, a dialect of the language * Tagalog script, the writing system historically used for Taga ...
(1700–171F) ** Hanunoo (1720–173F) ** Buhid (1740–175F) **
Tagbanwa The Tagbanwa people ( Tagbanwa: ) are one of the oldest ethnic groups in the Philippines, and can be mainly found in the central and northern Palawan. Research has shown that the Tagbanwa are possible descendants of the Tabon Man, thus making th ...
(1760–177F) * Khmer (1780–17FF) * Mongolian (1800–18AF) *
Unified Canadian Aboriginal Syllabics Extended Unified Canadian Aboriginal Syllabics Extended is a Unicode block containing extensions to the Canadian syllabics contained in the Unified Canadian Aboriginal Syllabics Unicode block for some dialects of Cree, Ojibwe, Dene, and Carrier Carrie ...
(18B0–18FF) *
Brahmic The Brahmic scripts, also known as Indic scripts, are a family of abugida writing systems. They are used throughout the Indian subcontinent, Southeast Asia and parts of East Asia. They are descended from the Brahmi script of ancient In ...
scripts: ** Limbu (1900–194F) * Tai scripts: ** Tai Le (1950–197F) ** New Tai Lue (1980–19DF) ** Khmer Symbols (19E0–19FF) ** Buginese (1A00–1A1F) ** Tai Tham (1A20–1AAF) *
Combining Diacritical Marks Extended Combining Diacritical Marks Extended is a Unicode block containing diacritical marks used in German dialectology (Teuthonista Teuthonista is a phonetic transcription system used predominantly for the transcription of (High) German dialects. I ...
(1AB0–1AFF) * Indonesian scripts: ** Balinese (1B00–1B7F) ** Sundanese (1B80–1BBF) **
Batak Batak is a collective term used to identify a number of closely related Austronesian ethnic groups predominantly found in North Sumatra, Indonesia, who speak Batak languages. The term is used to include the Karo, Pakpak, Simalungun, Tob ...
(1BC0–1BFF) * Lepcha (1C00–1C4F) * Ol Chiki (1C50–1C7F) *
Cyrillic Extended-C Cyrillic Extended-C is a Unicode block containing Cyrillic , bg, кирилица , mk, кирилица , russian: кириллица , sr, ћирилица, uk, кирилиця , fam1 = Egyptian hieroglyphs , fam2 = ...
(1C80–1C8F) *
Georgian Extended Georgian Extended is a Unicode block containing Georgian ''Mtavruli'' ( ka, მთავრული, "title" or "heading") letters that function as uppercase versions of their ''Mkhedruli'' counterparts in the Georgian block. Unlike all other ...
(1C90–1CBF) * Sundanese Supplement (1CC0–1CCF) * Vedic Extensions (1CD0–1CFF) * Latin supplements: ** Phonetic Extensions (1D00–1D7F) ** Phonetic Extensions Supplement (1D80–1DBF) **
Combining Diacritical Marks Supplement Combining Diacritical Marks Supplement is a Unicode block containing combining characters for the Uralic Phonetic Alphabet, Medievalist notations, and German dialectology (Teuthonista). It is an extension of the diacritic characters found in the ...
(1DC0–1DFF) **
Latin Extended Additional Latin Extended Additional is a Unicode block. The characters in this block are mostly precomposed combinations of Latin letters with one or more general diacritical marks. Ninety of the characters are used in the Vietnamese alphabet. There are a ...
(1E00–1EFF) *
Greek Extended Greek Extended is a Unicode block containing the accented vowels necessary for writing polytonic Greek. The regular, unaccented Greek characters as well as the characters with tonos and diaeresis can be found in the Greek and Coptic block. G ...
(1F00–1FFF) *
Symbols A symbol is a mark, sign, or word that indicates, signifies, or is understood as representing an idea, object, or relationship. Symbols allow people to go beyond what is known or seen by creating linkages between otherwise very different co ...
: ** General Punctuation (2000–206F) ** Superscripts and Subscripts (2070–209F) ** Currency Symbols (20A0–20CF) ** Combining Diacritical Marks for Symbols (20D0–20FF) ** Letterlike Symbols (2100–214F) **
Number Forms Number Forms is a Unicode block containing Unicode compatibility characters that have specific meaning as numbers, but are constructed from other characters. They consist primarily of vulgar fractions and Roman numerals. In addition to the cha ...
(2150–218F) ** Arrows (2190–21FF) **
Mathematical Operators Mathematical Operators is a Unicode block containing characters for mathematical, logical, and set notation. Notably absent are the plus sign (+), greater than sign (>) and less than sign (<), due to them already appearing in the Bas ...
(2200–22FF) ** Miscellaneous Technical (2300–23FF) ** Control Pictures (2400–243F) **
Optical Character Recognition Optical character recognition or optical character reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a sc ...
(2440–245F) **
Enclosed Alphanumerics Enclosed Alphanumerics is a Unicode block of typographical symbols of an alphanumeric within a circle, a bracket or other not-closed enclosure, or ending in a full stop. It is currently fully allocated. Within the Basic Multilingual Plan ...
(2460–24FF) ** Box Drawing (2500–257F) ** Block Elements (2580–259F) **
Geometric Shapes Geometric Shapes is a Unicode block of 96 symbols at code point range U+25A0–25FF. U+25A0–U+25CF The BLACK CIRCLE is displayed when typing in a password field, in order to hide characters from a screen recorder or shoulder surfing. U+ ...
(25A0–25FF) **
Miscellaneous Symbols Miscellaneous Symbols is a Unicode block (U+2600–U+26FF) containing glyphs representing concepts from a variety of categories: astrological, astronomical, chess, dice, musical notation, political symbols, recycling, religious symbols, trigr ...
(2600–26FF) ** Dingbats (2700–27BF) **
Miscellaneous Mathematical Symbols-A Miscellaneous Mathematical Symbols-A is a Unicode block containing characters for mathematical, logical, and database notation. Character table Compact table History The following Unicode-related documents record the purpose and process ...
(27C0–27EF) **
Supplemental Arrows-A Supplemental Arrows-A is a Unicode block containing various arrow symbols. Block History The following Unicode-related documents record the purpose and process of defining specific characters in the Supplemental Arrows-A block: See also ...
(27F0–27FF) **
Braille Patterns The Unicode block Braille Patterns (U+2800..U+28FF) contains all 256 possible patterns of an 8-dot braille cell, thereby including the complete 6-dot cell range.
(2800–28FF) **
Supplemental Arrows-B Supplemental Arrows-B is a Unicode block containing miscellaneous arrows, arrow tails, crossing arrows used in knot descriptions, curved arrows, and harpoons. Block Emoji The Supplemental Arrows-B block contains two emoji: U+2934–U+2935. ...
(2900–297F) **
Miscellaneous Mathematical Symbols-B Miscellaneous Mathematical Symbols-B is a Unicode block containing miscellaneous mathematical symbols, including brackets, angles, and circle symbols. Block Some of these symbols are used in Z notation. Specifically * * * * * * The last two ...
(2980–29FF) ** Supplemental Mathematical Operators (2A00–2AFF) **
Miscellaneous Symbols and Arrows Miscellaneous Symbols and Arrows is a Unicode block containing arrows and geometric shapes with various fills, astrological symbols, technical symbols, intonation marks, and others. Block Emoji The Miscellaneous Symbols and Arrows block co ...
(2B00–2BFF) * Glagolitic (2C00–2C5F) * Latin Extended-C (2C60–2C7F) * Coptic (2C80–2CFF) *
Georgian Supplement Georgian Supplement is a Unicode block containing characters for the ecclesiastical form of the Georgian script, Nuskhuri ( ka, ნუსხური). To write the full ecclesiastical Khutsuri orthography, the Asomtavruli The Georgian scri ...
(2D00–2D2F) * Tifinagh (2D30–2D7F) *
Ethiopic Extended Ethiopic Extended is a Unicode block containing Geʽez script, Geʽez characters for the Me'en_language, Me'en, Blin_language, Blin, and Sebatbeit_language, Sebatbeit languages. Block History The following Unicode-related documents record the ...
(2D80–2DDF) *
Cyrillic Extended-A Cyrillic Extended-A is a Unicode block containing combining Cyrillic , bg, кирилица , mk, кирилица , russian: кириллица , sr, ћирилица, uk, кирилиця , fam1 = Egyptian hieroglyphs , fam2 ...
(2DE0–2DFF) *
Supplemental Punctuation Supplemental Punctuation is a Unicode block containing historic and specialized punctuation characters, including biblical editorial symbols, ancient Greek punctuation, and German dictionary marks. Additional punctuation characters are in the G ...
(2E00–2E7F) * CJK scripts and symbols: **
CJK Radicals Supplement CJK Radicals Supplement is a Unicode block containing alternative, often positional, forms of the Kangxi radicals. They are used as headers in dictionary indices and other CJK ideograph collections organized by radical-stroke. Block History ...
(2E80–2EFF) ** Kangxi Radicals (2F00–2FDF) ** Ideographic Description Characters (2FF0–2FFF) ** CJK Symbols and Punctuation (3000–303F) **
Hiragana is a Japanese syllabary, part of the Japanese writing system, along with ''katakana'' as well as ''kanji''. It is a phonetic lettering system. The word ''hiragana'' literally means "flowing" or "simple" kana ("simple" originally as contras ...
(3040–309F) **
Katakana is a Japanese syllabary, one component of the Japanese writing system along with hiragana, kanji and in some cases the Latin script (known as rōmaji). The word ''katakana'' means "fragmentary kana", as the katakana characters are derived f ...
(30A0–30FF) **
Bopomofo Bopomofo (), or Mandarin Phonetic Symbols, also named Zhuyin (), is a Chinese transliteration system for Mandarin Chinese and other related languages and dialects. More commonly used in Taiwanese Mandarin, it may also be used to transcribe ...
(3100–312F) **
Hangul Compatibility Jamo Hangul Compatibility Jamo is a Unicode block containing Hangul characters for compatibility with the South Korean national standard KS X 1001 KS X 1001, "''Code for Information Interchange (Hangul and Hanja)''", formerly called KS C 5601, ...
(3130–318F) **
Kanbun A is a form of Classical Chinese used in Japan from the Nara period to the mid-20th century. Much of Japanese literature was written in this style and it was the general writing style for official and intellectual works throughout the period. ...
(3190–319F) **
Bopomofo Extended Bopomofo Extended is a Unicode block containing additional Bopomofo characters for writing phonetic Min Nan, Hakka Chinese, Cantonese, Hmu, and Ge. The basic set of Bopomofo characters can be found in the Bopomofo block. Block History The f ...
(31A0–31BF) ** CJK Strokes (31C0–31EF) **
Katakana Phonetic Extensions Katakana Phonetic Extensions is a Unicode block containing additional small katakana characters for writing the Ainu language, in addition to characters in the Katakana block. Further small katakana are present in the Small Kana Extension Small ...
(31F0–31FF) **
Enclosed CJK Letters and Months Enclosed CJK Letters and Months is a Unicode block containing circled and parenthesized Katakana, Hangul, and CJK ideographs. Also included in the block are miscellaneous glyphs that would more likely fit in CJK Compatibility or Enclosed Alp ...
(3200–32FF) ** CJK Compatibility (3300–33FF) **
CJK Unified Ideographs Extension A CJK Unified Ideographs Extension-A is a Unicode block containing rare Han ideographs. The block has dozens of variation sequences defined for standardized variants. It also has thousands of ideographic variation sequences registered in the Un ...
(3400–4DBF) ** Yijing Hexagram Symbols (4DC0–4DFF) **
CJK Unified Ideographs The Chinese, Japanese and Korean (CJK) scripts share a common background, collectively known as CJK characters. In the process called Han unification, the common (shared) characters were identified and named CJK Unified Ideographs. As of Unicode ...
(4E00–9FFF) * Yi Syllables (A000–A48F) * Yi Radicals (A490–A4CF) * Lisu (A4D0–A4FF) * Vai (A500–A63F) *
Cyrillic Extended-B Cyrillic Extended-B is a Unicode block containing Cyrillic , bg, кирилица , mk, кирилица , russian: кириллица , sr, ћирилица, uk, кирилиця , fam1 = Egyptian hieroglyphs , fam2 = ...
(A640–A69F) *
Bamum Bamum, also spelled Bamoum, Bamun, or Bamoun, may refer to: *The Bamum people *The Bamum kingdom *The Bamum language *The Bamum script The Bamum scripts are an evolutionary series of six scripts created for the Bamum language by Ibrahim Njoya, ...
(A6A0–A6FF) * Modifier Tone Letters (A700–A71F) * Latin Extended-D (A720–A7FF) *
Brahmic The Brahmic scripts, also known as Indic scripts, are a family of abugida writing systems. They are used throughout the Indian subcontinent, Southeast Asia and parts of East Asia. They are descended from the Brahmi script of ancient In ...
scripts: ** Syloti Nagri (A800–A82F) **
Common Indic Number Forms Common Indic Number Forms is a Unicode block containing characters for representing fractions in north India, Pakistan, and Nepal Nepal (; ne, नेपाल ), formerly the Federal Democratic Republic of Nepal ( ne, सङ्घ� ...
(A830–A83F) ** Phags-pa (A840–A87F) ** Saurashtra (A880–A8DF) ** Devanagari Extended (A8E0–A8FF) ** Kayah Li (A900–A92F) ** Rejang (A930–A95F) *
Hangul Jamo Extended-A Hangul Jamo Extended-A is a Unicode block containing ''choseong'' (initial consonant) forms of archaic Hangul consonant clusters. They can be used to dynamically compose syllables that are not available as precomposed Hangul syllables in Unic ...
(A960–A97F) *
Brahmic The Brahmic scripts, also known as Indic scripts, are a family of abugida writing systems. They are used throughout the Indian subcontinent, Southeast Asia and parts of East Asia. They are descended from the Brahmi script of ancient In ...
scripts: ** Javanese (A980–A9DF) ** Myanmar Extended-B (A9E0–A9FF) ** Cham (AA00–AA5F) **
Myanmar Extended-A Myanmar Extended-A is a Unicode block containing Myanmar characters for writing the Khamti Shan The Tai Khamti, ( Khamti: တဲး ၵံးတီႈ, ( th, ชาวไทคำตี่, my, ခန္တီးရှမ်းလူမ� ...
(AA60–AA7F) ** Tai Viet (AA80–AADF) ** Meetei Mayek Extensions (AAE0–AAFF) * Ethiopic Extended-A (AB00–AB2F) *
Latin Extended-E Latin Extended-E is a Unicode block containing Latin script characters used in German dialectology (Teuthonista),, Anthropos alphabet, Sakha Sakha, officially the Republic of Sakha (Yakutia),, is the largest republic of Russia, locate ...
(AB30–AB6F) * Cherokee Supplement (AB70–ABBF) *
Meetei Mayek ) , altname = , type = Abugida , languages = Meitei language (officially known as Manipuri language) , region = * Manipur , sample = "Meitei Mayek" (literally meaning "Meitei script" in Meitei language) written ...
(ABC0–ABFF) * Hangul Syllables (AC00–D7AF) *
Hangul Jamo Extended-B Hangul Jamo Extended-B is a Unicode block containing positional (''jungseong'' and ''jongseong'') forms of archaic Hangul vowel and consonant clusters. They can be used to dynamically compose syllables that are not available as precomposed Ha ...
(D7B0–D7FF) *
Surrogates ''Surrogates'' is a 2009 American science fiction action film based on the 2005–2006 comic book series '' The Surrogates''. Directed by Jonathan Mostow, it stars Bruce Willis as Tom Greer, an FBI agent who ventures out into the real world to ...
: ** High Surrogates (D800–DB7F) ** High Private Use Surrogates (DB80–DBFF) ** Low Surrogates (DC00–DFFF) * Private Use Area (E000–F8FF) *
CJK Compatibility Ideographs CJK Compatibility Ideographs is a Unicode block created to contain Han characters that were encoded in multiple locations in other established character encodings, in addition to their CJK Unified Ideographs assignments, in order to retain rou ...
(F900–FAFF) * Alphabetic Presentation Forms (FB00–FB4F) * Arabic Presentation Forms-A (FB50–FDFF) *
Variation Selectors Variation Selectors is the block name of a Unicode code point block containing 16 variation selectors. Each variation selector is used to specify a specific glyph variant for a preceding character. They are currently used to specify standardize ...
(FE00–FE0F) * Vertical Forms (FE10–FE1F) *
Combining Half Marks Combining Half Marks is a Unicode block containing diacritical combining characters for spanning multiple characters. Block History The following Unicode-related documents record the purpose and process of defining specific characters in the ...
(FE20–FE2F) * CJK Compatibility Forms (FE30–FE4F) * Small Form Variants (FE50–FE6F) * Arabic Presentation Forms-B (FE70–FEFF) *
Halfwidth and Fullwidth Forms In CJK (Chinese, Japanese and Korean) computing, graphic characters are traditionally classed into fullwidth (in Taiwan and Hong Kong: 全形; in CJK: 全角) and halfwidth (in Taiwan and Hong Kong: 半形; in CJK: 半角) characters. Unlike ...
(FF00–FFEF) * Specials (FFF0–FFFF)


Supplementary Multilingual Plane

Plane 1, the Supplementary Multilingual Plane (SMP), contains historic scripts (except CJK ideographic), and symbols and notation used within certain fields. Scripts include
Linear B Linear B was a syllabic script used for writing in Mycenaean Greek, the earliest attested form of Greek. The script predates the Greek alphabet by several centuries. The oldest Mycenaean writing dates to about 1400 BC. It is descended from ...
,
Egyptian hieroglyphs Egyptian hieroglyphs (, ) were the formal writing system used in Ancient Egypt, used for writing the Egyptian language. Hieroglyphs combined logographic, syllabic and alphabetic elements, with some 1,000 distinct characters.There were about 1, ...
, and
cuneiform Cuneiform is a logo- syllabic script that was used to write several languages of the Ancient Middle East. The script was in active use from the early Bronze Age until the beginning of the Common Era. It is named for the characteristic wedge- ...
scripts. It also includes English reform orthographies like Shavian and Deseret, and some modern scripts like Osage, Warang Citi, Adlam, Wancho and Toto. Symbols and notations include historic and modern
musical notation Music notation or musical notation is any system used to visually represent aurally perceived music played with instruments or sung by the human voice through the use of written, printed, or otherwise-produced symbols, including notation f ...
; mathematical alphanumerics; shorthands;
Emoji An emoji ( ; plural emoji or emojis) is a pictogram, logogram, ideogram or smiley embedded in text and used in electronic messages and web pages. The primary function of emoji is to fill in emotional cues otherwise missing from typed conv ...
and other pictographic sets; and game symbols for
playing card A playing card is a piece of specially prepared card stock, heavy paper, thin cardboard, plastic-coated paper, cotton-paper blend, or thin plastic that is marked with distinguishing motifs. Often the front (face) and back of each card has a f ...
s,
mahjong Mahjong or mah-jongg (English pronunciation: ) is a tile-based game that was developed in the 19th century in China and has spread throughout the world since the early 20th century. It is commonly played by four players (with some three-pla ...
, and
dominoes Dominoes is a family of tile-based games played with gaming pieces, commonly known as dominoes. Each domino is a rectangular tile, usually with a line dividing its face into two square ''ends''. Each end is marked with a number of spots (also c ...
. , the SMP comprises the following 151 blocks: *
Archaic Greek Archaic Greece was the period in Greek history lasting from circa 800 BC to the second Persian invasion of Greece in 480 BC, following the Greek Dark Ages and succeeded by the Classical period. In the archaic period, Greeks settled across the Me ...
and Other Left-to-right scripts: ** Linear B Syllabary (10000–1007F) ** Linear B Ideograms (10080–100FF) ** Aegean Numbers (10100–1013F) ** Ancient Greek Numbers (10140–1018F) ** Ancient Symbols (10190–101CF) ** Phaistos Disc (101D0–101FF) ** Lycian (10280–1029F) ** Carian (102A0–102DF) **
Coptic Epact Numbers Coptic Epact Numbers is a Unicode block containing old Coptic number forms. These numbers were used in some regions instead of letters of the Coptic alphabet that were used for encoding numbers, as was common in much of the world at the time, ...
(102E0–102FF) ** Old Italic (10300–1032F) **
Gothic Gothic or Gothics may refer to: People and languages *Goths or Gothic people, the ethnonym of a group of East Germanic tribes **Gothic language, an extinct East Germanic language spoken by the Goths **Crimean Gothic, the Gothic language spoken b ...
(10330–1034F) **
Old Permic The Old Permic script ( kv, Важ Перым гижӧм, ), sometimes known by its initial 2 characters as Abur or Anbur, is a "highly idiosyncratic adaptation" of the Cyrillic script once used to write medieval Komi (a member of the Permic bra ...
(10350–1037F) ** Ugaritic (10380–1039F) **
Old Persian Old Persian is one of the two directly attested Old Iranian languages (the other being Avestan) and is the ancestor of Middle Persian (the language of Sasanian Empire). Like other Old Iranian languages, it was known to its native speakers as ( ...
(103A0–103DF) ** Deseret (10400–1044F) ** Shavian (10450–1047F) ** Osmanya (10480–104AF) ** Osage (104B0–104FF) **
Elbasan Elbasan ( ; sq-definite, Elbasani ) is the fourth most populous city of Albania and seat of Elbasan County and Elbasan Municipality. It lies to the north of the river Shkumbin between the Skanderbeg Mountains and the Myzeqe Plain in central ...
(10500–1052F) **
Caucasian Albanian Caucasian Albania is a modern exonym for a former state located in ancient times in the Caucasus: mostly in what is now Azerbaijan (where both of its capitals were located). The modern endonyms for the area are ''Aghwank'' and ''Aluank'', amon ...
(10530–1056F) ** Vithkuqi (10570–105BF) ** Linear A (10600–1077F) ** Latin Extended-F (10780–107BF) * Right-to-left scripts: ** Cypriot Syllabary (10800–1083F) **
Imperial Aramaic Imperial Aramaic is a linguistic term, coined by modern scholars in order to designate a specific historical variety of Aramaic language. The term is polysemic, with two distinctive meanings, wider (sociolinguistic) and narrower (dialectologica ...
(10840–1085F) ** Palmyrene (10860–1087F) **
Nabataean The Nabataeans or Nabateans (; Nabataean Aramaic: , , vocalized as ; Arabic: , , singular , ; compare grc, Ναβαταῖος, translit=Nabataîos; la, Nabataeus) were an ancient Arab people who inhabited northern Arabia and the southern L ...
(10880–108AF) ** Hatran (108E0–108FF) ** Phoenician (10900–1091F) ** Lydian (10920–1093F) ** Meroitic Hieroglyphs (10980–1099F) **
Meroitic Cursive The Meroitic script consists of two alphasyllabic scripts developed to write the Meroitic language at the beginning of the Meroitic Period (3rd century BC) of the Kingdom of Kush. The two scripts are Meroitic Cursive, derived from Demotic Egyp ...
(109A0–109FF) ** Kharoshthi (10A00–10A5F) ** Old South Arabian (10A60–10A7F) ** Old North Arabian (10A80–10A9F) **
Manichaean Manichaeism (; in New Persian ; ) is a former major religionR. van den Broek, Wouter J. Hanegraaff ''Gnosis and Hermeticism from Antiquity to Modern Times''SUNY Press, 1998 p. 37 founded in the 3rd century AD by the Parthian prophet Mani (AD ...
(10AC0–10AFF) **
Avestan Avestan (), or historically Zend, is an umbrella term for two Old Iranian languages: Old Avestan (spoken in the 2nd millennium BCE) and Younger Avestan (spoken in the 1st millennium BCE). They are known only from their conjoined use as the scrip ...
(10B00–10B3F) **
Inscriptional Parthian Inscriptional Parthian is a script used to write Parthian language on coins of Parthia from the time of Arsaces I of Parthia (250 BC). It was also used for inscriptions of Parthian (mostly on clay fragments) and later Sassanian Empire, Sassanian p ...
(10B40–10B5F) **
Inscriptional Pahlavi Inscriptional Pahlavi is the earliest attested form of Pahlavi scripts, and is evident in clay fragments that have been dated to the reign of Mithridates I (''r.'' 171–138 BC). Other early evidence includes the Pahlavi inscriptions of Arsaci ...
(10B60–10B7F) **
Psalter Pahlavi Psalter Pahlavi is a cursive abjad that was used for writing Middle Persian on paper; it is thus described as one of the Pahlavi scripts. It was written right to left, usually with spaces between words. It takes its name from the Pahlavi Psalter ...
(10B80–10BAF) ** Old Turkic (10C00–10C4F) ** Old Hungarian (10C80–10CFF) ** Hanifi Rohingya (10D00–10D3F) **
Rumi Numeral Symbols Rumi Numeral Symbols is a Unicode block containing numeric characters used in Fez, Morocco Fez or Fes (; ar, فاس, fās; zgh, ⴼⵉⵣⴰⵣ, fizaz; french: Fès) is a city in northern inland Morocco and the capital of the Fès-Meknè ...
(10E60–10E7F) **
Yezidi Yazidis or Yezidis (; ku, ئێزیدی, translit=Êzidî) are a Kurmanji-speaking endogamous minority group who are indigenous to Kurdistan, a geographical region in Western Asia that includes parts of Iraq, Syria, Turkey and Iran. The ma ...
(10E80–10EBF) **
Arabic Extended-C Arabic Extended-C is a Unicode block encoding Qur'anic The Quran (, ; Standard Arabic: , Quranic Arabic: , , 'the recitation'), also romanized Qur'an or Koran, is the central religious text of Islam, believed by Muslims to be a revelation ...
(10EC0–10EFF) ** Old Sogdian (10F00–10F2F) ** Sogdian (10F30–10F6F) **
Old Uyghur Old Uyghur () was a Turkic language which was spoken in Qocho from the 9th–14th centuries and in Gansu. History The Old Uyghur language evolved from Old Turkic after the Uyghur Khaganate broke up and remnants of it migrated to Turfan, ...
(10F70–10FAF) ** Chorasmian (10FB0–10FDF) ** Elymaic (10FE0–10FFF) *
Brahmic The Brahmic scripts, also known as Indic scripts, are a family of abugida writing systems. They are used throughout the Indian subcontinent, Southeast Asia and parts of East Asia. They are descended from the Brahmi script of ancient In ...
scripts: **
Brahmi Brahmi (; ; ISO: ''Brāhmī'') is a writing system of ancient South Asia. "Until the late nineteenth century, the script of the Aśokan (non-Kharosthi) inscriptions and its immediate derivatives was referred to by various names such as 'lath' o ...
(11000–1107F) ** Kaithi (11080–110CF) ** Sora Sompeng (110D0–110FF) **
Chakma Chakma may refer to: *Chakma people, a Tibeto-Burman people of Bangladesh and Northeast India *Chakma language, the Indo-Aryan language spoken by them **Chakma script The Chakma Script (''Ajhā pāṭh''), also called Ajhā pāṭh, Ojhapath, O ...
(11100–1114F) **
Mahajani Mahajani is a Laṇḍā scripts, Laṇḍā mercantile script that was historically used in northern India for writing accounts and financial records in Marwari language, Marwari, Hindi and Punjabi language, Punjabi. It is a Brahmic script and ...
(11150–1117F) ** Sharada (11180–111DF) ** Sinhala Archaic Numbers (111E0–111FF) **
Khojki Khojkī, Khojakī, or Khwājā Sindhī ( sd, خوجڪي (Arabic script) खोजकी (Devanagari)), is a script used formerly and almost exclusively by the Khoja community of parts of the Indian subcontinent, including Sindh, Gujarat, and P ...
(11200–1124F) ** Multani (11280–112AF) ** Khudawadi (112B0–112FF) ** Grantha (11300–1137F) **
Newa Newar (), or Newari and known officially in Nepal as Nepal Bhasa, is a Sino-Tibetan language spoken by the Newar people, the indigenous inhabitants of Nepal Mandala, which consists of the Kathmandu Valley and surrounding regions in Nep ...
(11400–1147F) **
Tirhuta The Tirhuta or Maithili script is the primary historical script for the Maithili language, as well as one of the historical scripts for Sanskrit. It is believed to have originated in the 10th century CE. It is very similar to Bengali–Ass ...
(11480–114DF) ** Siddham (11580–115FF) ** Modi (11600–1165F) ** Mongolian Supplement (11660–1167F) **
Takri The Tākri script (Takri (Chamba): ; Takri (Jammu/Dogra): ; sometimes called Tankri ) is an abugida writing system of the Brahmic family of scripts. It is derived from the Sharada script formerly employed for Kashmiri. It is the sister script ...
(11680–116CF) ** Ahom (11700–1174F) **
Dogra The Dogras or Dogra people, are an Indo-Aryan ethno-linguistic group in India and Pakistan consisting of the Dogri language speakers. They live predominantly in the Jammu region of Jammu and Kashmir, and in adjoining areas of Punjab, Himac ...
(11800–1184F) ** Warang Citi (118A0–118FF) ** Dives Akuru (11900–1195F) ** Nandinagari (119A0–119FF) ** Zanabazar Square (11A00–11A4F) ** Soyombo (11A50–11AAF) * Unified Canadian Aboriginal Syllabics Extended-A (11AB0–11ABF) * Brahmic scripts: ** Pau Cin Hau (11AC0–11AFF) ** Devanagari Extended-A (11B00–11B5F) ** Bhaiksuki (11C00–11C6F) ** Marchen (11C70–11CBF) ** Masaram Gondi (11D00–11D5F) ** Gunjala Gondi (11D60–11DAF) **
Makasar Makassar (, mak, ᨆᨀᨔᨑ, Mangkasara’, ) is the capital of the Indonesian province of South Sulawesi. It is the largest city in the region of Eastern Indonesia and the country's fifth-largest urban center after Jakarta, Surabaya, Meda ...
(11EE0–11EFF) ** Kawi (11F00–11F5F) * Lisu Supplement (11FB0–11FBF) * Tamil Supplement (11FC0–11FFF) *
Cuneiform Cuneiform is a logo- syllabic script that was used to write several languages of the Ancient Middle East. The script was in active use from the early Bronze Age until the beginning of the Common Era. It is named for the characteristic wedge- ...
(12000–123FF) * Cuneiform Numbers and Punctuation (12400–1247F) *
Early Dynastic Cuneiform Early Dynastic Cuneiform is the name of a Unicode block of the Supplementary Multilingual Plane (SMP), at U+12480–U+1254F, introduced in version 8.0 (June 2015). It is a supplement to the earlier encoding of the cuneiform script in the t ...
(12480–1254F) * Cypro-Minoan (12F90–12FFF) *
Egyptian Hieroglyphs Egyptian hieroglyphs (, ) were the formal writing system used in Ancient Egypt, used for writing the Egyptian language. Hieroglyphs combined logographic, syllabic and alphabetic elements, with some 1,000 distinct characters.There were about 1, ...
(13000–1342F) * Egyptian Hieroglyph Format Controls (13430–1345F) *
Anatolian Hieroglyphs Anatolian hieroglyphs are an indigenous logographic script native to central Anatolia, consisting of some 500 signs. They were once commonly known as Hittite hieroglyphs, but the language they encode proved to be Luwian, not Hittite, and the te ...
(14400–1467F) *
Bamum Supplement Bamum Supplement is a Unicode block containing the characters of the historic stage A-F of the Bamum script, used for writing the Bamum language Bamum (Shü Pamom "language of the Bamum", or ''Shümom'' "Mum language"), also spelled Bamun or ...
(16800–16A3F) * Mro (16A40–16A6F) *
Tangsa The Tangsa or Tangshang in India and Myanmar (Burma) respectively, is a tribe native to Changlang District of Arunachal Pradesh, parts of Tinsukia District of Assam, in north-eastern India, and across the border in Sagaing Region, parts of ...
(16A70–16ACF) * Bassa Vah (16AD0–16AFF) * Pahawh Hmong (16B00–16B8F) * Medefaidrin (16E40–16E9F) *
Miao Miao may refer to: * Miao people, linguistically and culturally related group of people, recognized as such by the government of the People's Republic of China * Miao script or Pollard script, writing system used for Miao languages * Miao (Unicode ...
(16F00–16F9F) *
Ideographic Symbols and Punctuation Ideographic Symbols and Punctuation is a Unicode block containing symbols and punctuation marks used by ideographic scripts such as Tangut and Nüshu. History The following Unicode-related documents record the purpose and process of definin ...
(16FE0–16FFF) * Tangut (17000–187FF) * Tangut Components (18800–18AFF) *
Khitan Small Script The Khitan small script () was one of two writing systems used for the now-extinct Khitan language (the other was the Khitan large script). It was used during the 10th–12th century by the Khitan people, who had created the Liao Empire in present ...
(18B00–18CFF) * Tangut Supplement (18D00–18D7F) *
Kana Extended-B Kana Extended-B is a Unicode block containing kana originally created by Japanese linguists to write Taiwanese Hokkien known as Taiwanese kana. Block History The following Unicode-related documents record the purpose and process of defining sp ...
(1AFF0–1AFFF) * Kana Supplement (1B000–1B0FF) * Kana Extended-A (1B100–1B12F) *
Small Kana Extension Small Kana Extension is a Unicode block containing additional small variants for the Hiragana and Katakana syllabaries, in addition to those in the Hiragana, Katakana and Katakana Phonetic Extensions blocks. Block History The following Unicode ...
(1B130–1B16F) * Nushu (1B170–1B2FF) * Duployan (1BC00–1BC9F) * Shorthand Format Controls (1BCA0–1BCAF) *
Symbols A symbol is a mark, sign, or word that indicates, signifies, or is understood as representing an idea, object, or relationship. Symbols allow people to go beyond what is known or seen by creating linkages between otherwise very different co ...
: **
Musical notation Music notation or musical notation is any system used to visually represent aurally perceived music played with instruments or sung by the human voice through the use of written, printed, or otherwise-produced symbols, including notation f ...
: ***
Znamenny Musical Notation Znamenny Musical Notation is a Unicode block containing characters for Znamenny musical notation from Russia. Few fonts support this block as of 2021. Ones that do and are free for personal use include '' Symbola'' 14.0 and Slavonic' 1.00 (non- ...
(1CF00–1CFCF) ***
Byzantine Musical Symbols Byzantine Musical Symbols is a Unicode block containing characters for representing Byzantine-era musical notation. Block History The following Unicode-related documents record the purpose and process of defining specific characters in the Byz ...
(1D000–1D0FF) *** Musical Symbols (1D100–1D1FF) ***
Ancient Greek Musical Notation Ancient Greek Musical Notation is a Unicode block containing symbols representing musical notations used in ancient Greece. Some can be used as rotated Greek letters (e.g. Α𝈗 Ε𝈧𝈨 Κ𝈎𝈲 Μ𝈌 Π𝈈 Ρ𝈃𝈇𝈟 Τ𝈜 Υ� ...
(1D200–1D24F) ** Kaktovik Numerals (1D2C0–1D2DF) ** Mayan Numerals (1D2E0–1D2FF) ** Mathematical symbols: *** Tai Xuan Jing Symbols (1D300–1D35F) *** Counting Rod Numerals (1D360–1D37F) ***
Mathematical Alphanumeric Symbols Mathematical Alphanumeric Symbols is a Unicode block comprising styled forms of Latin and Greek letters and decimal digits that enable mathematicians to denote different notions with different letter styles. The letters in various fonts o ...
(1D400–1D7FF) **
Sutton SignWriting Sutton SignWriting, or simply SignWriting, is a system of writing sign languages. It is highly featural and visually iconic, both in the shapes of the characters, which are abstract pictures of the hands, face, and body, and in their spatial arr ...
(1D800–1DAAF) * Latin Extended-G (1DF00–1DFFF) * Glagolitic Supplement (1E000–1E02F) *
Cyrillic Extended-D Cyrillic Extended-D is a Unicode block containing superscript and subscript Cyrillic characters used in Cyrillic-based phonetic transcription. The block contains the first Cyrillic characters defined outside of the Basic Multilingual Plane In ...
(1E030–1E08F) * Nyiakeng Puachue Hmong (1E100–1E14F) * Toto (1E290–1E2BF) * Wancho (1E2C0–1E2FF) * Nag Mundari (1E4D0–1E4FF) * Ethiopic Extended-B (1E7E0–1E7FF) * Mende Kikakui (1E800–1E8DF) * Adlam (1E900–1E95F) *
Symbols A symbol is a mark, sign, or word that indicates, signifies, or is understood as representing an idea, object, or relationship. Symbols allow people to go beyond what is known or seen by creating linkages between otherwise very different co ...
: ** Indic Siyaq Numbers (1EC70–1ECBF) **
Ottoman Siyaq Numbers Ottoman Siyaq Numbers is a Unicode block containing a specialized subset of the Arabic script that was used for accounting in Ottoman Turkish Ottoman Turkish ( ota, لِسانِ عُثمانى, Lisân-ı Osmânî, ; tr, Osmanlı Türkçesi ...
(1ED00–1ED4F) **
Arabic Mathematical Alphabetic Symbols Arabic Mathematical Alphabetic Symbols is a Unicode block encoding characters used in Arabic mathematical expressions. Block History The following Unicode-related documents record the purpose and process of defining specific characters in th ...
(1EE00–1EEFF) ** Game tiles and cards: ***
Mahjong Tiles Mahjong tiles () are tiles of Chinese origin that are used to play mahjong as well as mahjong solitaire and other games. Although they are most commonly tiles, they may refer to playing cards with similar contents as well. Development The ...
(1F000–1F02F) *** Domino Tiles (1F030–1F09F) *** Playing Cards (1F0A0–1F0FF) ** Enclosed Alphanumeric Supplement (1F100–1F1FF) **
Enclosed Ideographic Supplement Enclosed Ideographic Supplement is a Unicode block containing forms of characters and words from Chinese, Japanese and Korean enclosed within or stylised as squares, brackets, or circles. It contains three such characters containing one or more ...
(1F200–1F2FF) ** Miscellaneous Symbols and Pictographs (1F300–1F5FF) **
Emoticons An emoticon (, , rarely , ), short for "emotion icon", also known simply as an emote, is a pictorial representation of a facial expression using characters—usually punctuation marks, numbers, and letters—to express a person's feelings ...
(1F600–1F64F) ** Ornamental Dingbats (1F650–1F67F) **
Transport and Map Symbols Transport and Map Symbols is a Unicode block containing transportation and map icons, largely for compatibility with Japanese telephone carriers' emoji implementations of Shift JIS, and to encode characters in the Wingdings and Wingdings 2 char ...
(1F680–1F6FF) ** Alchemical Symbols (1F700–1F77F) ** Geometric Shapes Extended (1F780–1F7FF) ** Supplemental Arrows-C (1F800–1F8FF) **
Supplemental Symbols and Pictographs Supplemental Symbols and Pictographs is a Unicode block containing emoji characters. It extends the set of symbols included in the Miscellaneous Symbols and Pictographs block. It also includes Typikon symbols. Emoji The Unicode 14.0 Suppleme ...
(1F900–1F9FF) ** Chess Symbols (1FA00–1FA6F) **
Symbols and Pictographs Extended-A Symbols and Pictographs Extended-A is a Unicode block containing emoji characters. It extends the set of symbols included in the Supplemental Symbols and Pictographs block. All of the characters in the Symbols and Pictographs Extended-A block ...
(1FA70–1FAFF) ** Symbols for Legacy Computing (1FB00–1FBFF)


Supplementary Ideographic Plane

Plane 2, the Supplementary Ideographic Plane (SIP), is used for CJK Ideographs, mostly
CJK Unified Ideographs The Chinese, Japanese and Korean (CJK) scripts share a common background, collectively known as CJK characters. In the process called Han unification, the common (shared) characters were identified and named CJK Unified Ideographs. As of Unicode ...
, that were not included in earlier character encoding standards. , the SIP comprises the following six blocks: *
CJK Unified Ideographs Extension B CJK Unified Ideographs Extension B is a Unicode block containing rare and historic CJK ideographs for Chinese, Japanese, Korean, and Vietnamese. The block has dozens of variation sequences defined for standardized variants. It also has thous ...
(20000–2A6DF) *
CJK Unified Ideographs Extension C __FORCETOC__ CJK Unified Ideographs Extension C is a Unicode block containing rare and historic CJK ideographs for Chinese, Japanese, Korean, and Vietnamese. The block has dozens of ideographic variation sequences registered in the Unicode I ...
(2A700–2B73F) *
CJK Unified Ideographs Extension D CJK Unified Ideographs Extension D is a Unicode block containing rare and historic CJK ideographs for Chinese, Japanese, Korean, and Vietnamese. The block has hundreds of ideographic variation sequences registered in the Unicode Ideographic Va ...
(2B740–2B81F) *
CJK Unified Ideographs Extension E CJK Unified Ideographs Extension E is a Unicode block containing rare and historic CJK ideographs for Chinese, Japanese, Korean, and Vietnamese. The block has dozens of ideographic variation sequences registered in the Unicode Ideographic Vari ...
(2B820–2CEAF) *
CJK Unified Ideographs Extension F CJK Unified Ideographs Extension F is a Unicode block containing rare and historic CJK ideographs for Chinese, Japanese, Korean, and Vietnamese, as well as more than a thousand Sawndip characters for writing the Zhuang language. The block has ...
(2CEB0–2EBEF) *
CJK Compatibility Ideographs Supplement CJK Compatibility Ideographs Supplement is a Unicode block containing Han characters used only for roundtrip compatibility mapping with planes 3, 4, 5, 6, 7, and 15 of CNS 11643-1992. Block History The following Unicode-related documents recor ...
(2F800–2FA1F)


Tertiary Ideographic Plane

Plane 3 is the Tertiary Ideographic Plane (TIP).
CJK Unified Ideographs Extension G CJK Unified Ideographs Extension G is a Unicode block containing rare and historic CJK Unified Ideographs for Chinese, Japanese, Korean, and Vietnamese. It is the first block to be allocated to the Tertiary Ideographic Plane. The exotic charac ...
was added to the TIP in Unicode 13.0, released in March 2020. It also is tentatively allocated for
Oracle Bone script Oracle bone script () is an ancient form of Chinese characters that were engraved on oracle bonesanimal bones or turtle plastrons used in pyromantic divination. Oracle bone script was used in the late 2nd millennium BC, and is the earliest k ...
and
Small Seal Script The small seal script (), or Qin script (, ''Qínzhuàn''), is an archaic form of Chinese calligraphy. It was standardized and promulgated as a national standard by the government of Qin Shi Huang, the founder of the Chinese Qin dynasty. Name ...
. , the TIP comprises the following two blocks: *
CJK Unified Ideographs Extension G CJK Unified Ideographs Extension G is a Unicode block containing rare and historic CJK Unified Ideographs for Chinese, Japanese, Korean, and Vietnamese. It is the first block to be allocated to the Tertiary Ideographic Plane. The exotic charac ...
(30000–3134F) *
CJK Unified Ideographs Extension H __FORCETOC__ CJK Unified Ideographs Extension H is a Unicode block containing rare and historic CJK Unified Ideographs The Chinese, Japanese and Korean (CJK) scripts share a common background, collectively known as CJK characters. In the proc ...
(31350–323AF)


Unassigned planes

Planes 4 to 13 (planes to in
hexadecimal In mathematics and computing, the hexadecimal (also base-16 or simply hex) numeral system is a positional numeral system that represents numbers using a radix (base) of 16. Unlike the decimal system representing numbers using 10 symbols, he ...
): No characters have yet been assigned, or proposed for assignment, to Planes 4 through 13.


Supplementary Special-purpose Plane

Plane 14 ( in hexadecimal) is designated as the Supplementary Special-purpose Plane (SSP). It comprises the following two blocks, : * Tags (E0000–E007F) *
Variation Selectors Supplement Variation Selectors Supplement is a Unicode block containing additional Variation Selectors beyond those found in the Variation Selectors Variation Selectors is the block name of a Unicode code point block containing 16 variation selectors. ...
(E0100–E01EF) – used to indicate alternate glyphs for characters.


Private Use Area Planes

The two planes 15 and 16 (planes and in hexadecimal) each contain a " Private Use Area". They contain blocks named Supplementary Private Use Area-A (PUA-A) and -B (PUA-B). The Private Use Areas are available for use by parties outside ISO and Unicode (private character encoding).


References

{{Unicode navigation Plane