HOME

TheInfoList



OR:

In the
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, wh ...
standard, a plane is a continuous group of 65,536 (216)
code point In character encoding terminology, a code point, codepoint or code position is a numerical value that maps to a specific character. Code points usually represent a single grapheme—usually a letter, digit, punctuation mark, or whitespace—but ...
s. There are 17 planes, identified by the numbers 0 to 16, which corresponds with the possible values 00–1016 of the first two positions in six position hexadecimal format (U+''hhhhhh''). Plane 0 is the Basic Multilingual Plane (BMP), which contains most commonly used characters. The higher planes 1 through 16 are called "supplementary planes". The last code point in Unicode is the last code point in plane 16, U+10FFFF. As of Unicode version , five of the planes have assigned code points (characters), and seven are named. The limit of 17 planes is due to
UTF-16 UTF-16 (16-bit Unicode Transformation Format) is a character encoding capable of encoding all 1,112,064 valid code points of Unicode (in fact this number of code points is dictated by the design of UTF-16). The encoding is variable-length, as cod ...
, which can encode 220 code points (16 planes) as pairs of
words A word is a basic element of language that carries an objective or practical meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of what a word is, there is no conse ...
, plus the BMP as a single word.
UTF-8 UTF-8 is a variable-length character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode'' (or ''Universal Coded Character Set'') ''Transformation Format 8-bit''. UTF-8 is capable of ...
was designed with a much larger limit of 231 (2,147,483,648) code points (32,768 planes), and would still be able to encode 221 (2,097,152) code points (32 planes) even under the current limit of 4
byte The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable uni ...
s. The 17 planes can accommodate 1,114,112 code points. Of these, 2,048 are surrogates (used to make the pairs in UTF-16), 66 are non-characters, and 137,468 are reserved for private use, leaving 974,530 for public assignment. Planes are further subdivided into
Unicode block A Unicode block is one of several contiguous ranges of numeric character codes ( code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation purposes. Typically, proposals such as the ...
s, which, unlike planes, do not have a fixed size. The 327 blocks defined in Unicode cover 26% of the possible code point space, and range in size from a minimum of 16 code points (sixteen blocks) to a maximum of 65,536 code points (Supplementary Private Use Area-A and -B, which constitute the entirety of planes 15 and 16). For future usage, ranges of characters have been tentatively mapped out for most known current and ancient writing systems.


Overview


Assigned characters


Basic Multilingual Plane

The first plane, plane 0, the Basic Multilingual Plane (BMP) contains characters for almost all modern languages, and a large number of symbols. A primary objective for the BMP is to support the unification of prior character sets as well as characters for
writing Writing is a medium of human communication which involves the representation of a language through a system of physically inscribed, mechanically transferred, or digitally represented symbols. Writing systems do not themselves constitute h ...
. Most of the assigned code points in the BMP are used to encode Chinese, Japanese, and Korean ( CJK) characters. The High Surrogate (U+D800–U+DBFF) and Low Surrogate (U+DC00–U+DFFF) codes are reserved for encoding non-BMP characters in UTF-16 by using a ''pair'' of 16-
bit The bit is the most basic unit of information in computing and digital communications. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represente ...
codes: one High Surrogate and one Low Surrogate. A single surrogate code point will never be assigned a character. 65,520 of the 65,536 code points in this plane have been allocated to a
Unicode block A Unicode block is one of several contiguous ranges of numeric character codes ( code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation purposes. Typically, proposals such as the ...
, leaving just 16 code points in a single unallocated range (2FE0..2FEF). , the BMP comprises the following 164 blocks: * Basic Latin (Lower half of ISO/IEC 8859-1: ISO/IEC 646:1991-IRV aka
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
) (0000–007F) *
Latin-1 Supplement The Latin-1 Supplement (also called C1 Controls and Latin-1 Supplement) is the second Unicode block in the Unicode standard. It encodes the upper range of ISO 8859-1: 80 (U+0080) - FF (U+00FF). C1 Controls (0080–009F) are not graphic. Thi ...
(Upper half of ISO/IEC 8859-1) (0080–00FF) *
Latin Extended-A Latin Extended-A is a Unicode block and is the third block of the Unicode standard. It encodes Latin letters from the Latin ISO character sets other than Latin-1 (which is already encoded in the Latin-1 Supplement block) and also legacy characte ...
(0100–017F) *
Latin Extended-B Latin Extended-B is the fourth block (0180-024F) of the Unicode Standard. It has been included since version 1.0, where it was only allocated to the code points 0180-01FF and contained 113 characters. During unification with ISO 10646 for version ...
(0180–024F) *
IPA Extensions IPA Extensions is a block (U+0250–U+02AF) of the Unicode standard that contains full size letters used in the International Phonetic Alphabet (IPA). Both modern and historical characters are included, as well as former and proposed IPA signs ...
(0250–02AF) *
Spacing Modifier Letters Spacing Modifier Letters is a Unicode block containing characters for the IPA IPA commonly refers to: * India pale ale, a style of beer * International Phonetic Alphabet, a system of phonetic notation * Isopropyl alcohol, a chemical compound ...
(02B0–02FF) * Combining Diacritical Marks (0300–036F) *
Greek and Coptic Greek and Coptic is the Unicode block for representing modern (monotonic) Greek. It was originally used for writing Coptic, using the similar Greek letters, in addition to the uniquely Coptic additions. Beginning with version 4.1 of the Unicode ...
(0370–03FF) * Cyrillic (0400–04FF) * Cyrillic Supplement (0500–052F) *
Armenian Armenian may refer to: * Something of, from, or related to Armenia, a country in the South Caucasus region of Eurasia * Armenians, the national people of Armenia, or people of Armenian descent ** Armenian Diaspora, Armenian communities across the ...
(0530–058F) *
Aramaic The Aramaic languages, short Aramaic ( syc, ܐܪܡܝܐ, Arāmāyā; oar, 𐤀𐤓𐤌𐤉𐤀; arc, 𐡀𐡓𐡌𐡉𐡀; tmr, אֲרָמִית), are a language family containing many varieties (languages and dialects) that originated in ...
Scripts: **
Hebrew Hebrew (; ; ) is a Northwest Semitic language of the Afroasiatic language family. Historically, it is one of the spoken languages of the Israelites and their longest-surviving descendants, the Jews and Samaritans. It was largely preserved ...
(0590–05FF) **
Arabic Arabic (, ' ; , ' or ) is a Semitic language spoken primarily across the Arab world.Semitic languages: an international handbook / edited by Stefan Weninger; in collaboration with Geoffrey Khan, Michael P. Streck, Janet C. E.Watson; Walter ...
(0600–06FF) **
Syriac Syriac may refer to: *Syriac language, an ancient dialect of Middle Aramaic *Sureth, one of the modern dialects of Syriac spoken in the Nineveh Plains region * Syriac alphabet ** Syriac (Unicode block) ** Syriac Supplement * Neo-Aramaic languages a ...
(0700–074F) ** Arabic Supplement (0750–077F) **
Thaana Thaana, Taana or Tāna (  ) is the present writing system of the Maldivian language spoken in the Maldives. Thaana has characteristics of both an abugida (diacritic, vowel-killer strokes) and a true alphabet (all vowels are written), ...
(0780–07BF) **
N'Ko N'Ko () is a script devised by Solomana Kante in 1949, as a modern writing system for the Mandé languages of West Africa. The term ''N'Ko'', which means ''I say'' in all Mandé languages, is also used for the Mandé literary standard written i ...
(07C0–07FF) ** Samaritan (0800–083F) ** Mandaic (0840–085F) **
Syriac Supplement Syriac Supplement is a Unicode block containing supplementary Syriac letters used for writing the Suriyani Malayalam Suriyani Malayalam (സുറിയാനി മലയാളം, ܣܘܪܝܢܝ ܡܠܝܠܡ), also known as Karshoni, Syro-Mala ...
(0860–086F) **
Arabic Extended-B Arabic Extended-B is a Unicode block A Unicode block is one of several contiguous ranges of numeric character codes ( code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation purpo ...
(0870–089F) **
Arabic Extended-A Arabic Extended-A is a Unicode block encoding Qur'anic The Quran (, ; Standard Arabic: , Quranic Arabic: , , 'the recitation'), also romanized Qur'an or Koran, is the central religious text of Islam, believed by Muslims to be a revelation ...
(08A0–08FF) *
Brahmic The Brahmic scripts, also known as Indic scripts, are a family of abugida writing systems. They are used throughout the Indian subcontinent, Southeast Asia and parts of East Asia. They are descended from the Brahmi script of ancient India ...
scripts: **
Devanagari Devanagari ( ; , , Sanskrit pronunciation: ), also called Nagari (),Kathleen Kuiper (2010), The Culture of India, New York: The Rosen Publishing Group, , page 83 is a left-to-right abugida (a type of segmental writing system), based on the ...
(0900–097F) **
Bengali Bengali or Bengalee, or Bengalese may refer to: *something of, from, or related to Bengal, a large region in South Asia * Bengalis, an ethnic and linguistic group of the region * Bengali language, the language they speak ** Bengali alphabet, the w ...
(0980–09FF) **
Gurmukhi Gurmukhī ( pa, ਗੁਰਮੁਖੀ, , Shahmukhi: ) is an abugida developed from the Laṇḍā scripts, standardized and used by the second Sikh guru, Guru Angad (1504–1552). It is used by Punjabi Sikhs to write the language, commonly ...
(0A00–0A7F) **
Gujarati Gujarati may refer to: * something of, from, or related to Gujarat, a state of India * Gujarati people, the major ethnic group of Gujarat * Gujarati language, the Indo-Aryan language spoken by them * Gujarati languages, the Western Indo-Aryan sub ...
(0A80–0AFF) ** Oriya (0B00–0B7F) **
Tamil Tamil may refer to: * Tamils, an ethnic group native to India and some other parts of Asia **Sri Lankan Tamils, Tamil people native to Sri Lanka also called ilankai tamils **Tamil Malaysians, Tamil people native to Malaysia * Tamil language, nativ ...
(0B80–0BFF) ** Telugu (0C00–0C7F) **
Kannada Kannada (; ಕನ್ನಡ, ), originally romanised Canarese, is a Dravidian language spoken predominantly by the people of Karnataka in southwestern India, with minorities in all neighbouring states. It has around 47 million native s ...
(0C80–0CFF) **
Malayalam Malayalam (; , ) is a Dravidian languages, Dravidian language spoken in the Indian state of Kerala and the union territories of Lakshadweep and Puducherry (union territory), Puducherry (Mahé district) by the Malayali people. It is one of 2 ...
(0D00–0D7F) ** Sinhala (0D80–0DFF) ** Thai (0E00–0E7F) ** Lao (0E80–0EFF) ** Tibetan (0F00–0FFF) **
Myanmar Myanmar, ; UK pronunciations: US pronunciations incl. . Note: Wikipedia's IPA conventions require indicating /r/ even in British English although only some British English speakers pronounce r at the end of syllables. As John Wells explai ...
(1000–109F) *
Georgian Georgian may refer to: Common meanings * Anything related to, or originating from Georgia (country) ** Georgians, an indigenous Caucasian ethnic group ** Georgian language, a Kartvelian language spoken by Georgians **Georgian scripts, three scrip ...
(10A0–10FF) * Hangul Jamo (1100–11FF) * Ethiopic (1200–137F) * Ethiopic Supplement (1380–139F) *
Cherokee The Cherokee (; chr, ᎠᏂᏴᏫᏯᎢ, translit=Aniyvwiyaʔi or Anigiduwagi, or chr, ᏣᎳᎩ, links=no, translit=Tsalagi) are one of the indigenous peoples of the Southeastern Woodlands of the United States. Prior to the 18th century, t ...
(13A0–13FF) * Unified Canadian Aboriginal Syllabics (1400–167F) *
Ogham Ogham ( Modern Irish: ; mga, ogum, ogom, later mga, ogam, label=none ) is an Early Medieval alphabet used primarily to write the early Irish language (in the "orthodox" inscriptions, 4th to 6th centuries AD), and later the Old Irish langu ...
(1680–169F) *
Runic Runes are the letters in a set of related alphabets known as runic alphabets native to the Germanic peoples. Runes were used to write various Germanic languages (with some exceptions) before they adopted the Latin alphabet, and for specialised ...
(16A0–16FF) *
Philippine The Philippines (; fil, Pilipinas, links=no), officially the Republic of the Philippines ( fil, Republika ng Pilipinas, links=no), * bik, Republika kan Filipinas * ceb, Republika sa Pilipinas * cbk, República de Filipinas * hil, Republ ...
scripts: ** Tagalog (1700–171F) ** Hanunoo (1720–173F) ** Buhid (1740–175F) ** Tagbanwa (1760–177F) * Khmer (1780–17FF) * Mongolian (1800–18AF) *
Unified Canadian Aboriginal Syllabics Extended Unified Canadian Aboriginal Syllabics Extended is a Unicode block containing extensions to the Canadian syllabics contained in the Unified Canadian Aboriginal Syllabics Unicode block for some dialects of Cree, Ojibwe, Dene The Dene people ...
(18B0–18FF) *
Brahmic The Brahmic scripts, also known as Indic scripts, are a family of abugida writing systems. They are used throughout the Indian subcontinent, Southeast Asia and parts of East Asia. They are descended from the Brahmi script of ancient India ...
scripts: ** Limbu (1900–194F) *
Tai Tai or TAI may refer to: Arts and entertainment *Tai (comics) a fictional Marvel Comics supervillain *Tai Fraiser, a fictional character in the 1995 film ''Clueless'' *Tai Kamiya, a fictional character in ''Digimon'' Businesses and organisations ...
scripts: ** Tai Le (1950–197F) ** New Tai Lue (1980–19DF) **
Khmer Symbols Khmer Symbols is a Unicode block A Unicode block is one of several contiguous ranges of numeric character codes ( code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation purpose ...
(19E0–19FF) ** Buginese (1A00–1A1F) ** Tai Tham (1A20–1AAF) *
Combining Diacritical Marks Extended Combining Diacritical Marks Extended is a Unicode block containing diacritical marks used in German dialectology (Teuthonista Teuthonista is a phonetic transcription system used predominantly for the transcription of (High) German dialects. I ...
(1AB0–1AFF) * Indonesian scripts: ** Balinese (1B00–1B7F) ** Sundanese (1B80–1BBF) ** Batak (1BC0–1BFF) * Lepcha (1C00–1C4F) * Ol Chiki (1C50–1C7F) * Cyrillic Extended-C (1C80–1C8F) * Georgian Extended (1C90–1CBF) * Sundanese Supplement (1CC0–1CCF) *
Vedic Extensions Vedic Extensions is a Unicode block containing characters for representing tones and other vedic symbols in Devanagari and other Indic scripts. Related symbols (also used in many scripts to represent vedic accents) are defined in two other blocks ...
(1CD0–1CFF) * Latin supplements: **
Phonetic Extensions Phonetic Extensions is a Unicode block containing phonetic characters used in the Uralic Phonetic Alphabet, Old Irish phonetic notation, the Oxford English dictionary and American dictionaries, and Americanist and Russianist phonetic notations. ...
(1D00–1D7F) **
Phonetic Extensions Supplement Phonetic Extensions Supplement is a Unicode block containing characters for specialized and deprecated forms of the International Phonetic Alphabet The International Phonetic Alphabet (IPA) is an alphabetic system of phonetic notation ...
(1D80–1DBF) **
Combining Diacritical Marks Supplement Combining Diacritical Marks Supplement is a Unicode block containing combining characters for the Uralic Phonetic Alphabet, Medievalist notations, and German dialectology (Teuthonista). It is an extension of the diacritic characters found in the ...
(1DC0–1DFF) **
Latin Extended Additional Latin Extended Additional is a Unicode block. The characters in this block are mostly precomposed combinations of Latin letters with one or more general diacritical marks. Ninety of the characters are used in the Vietnamese alphabet. There are a ...
(1E00–1EFF) *
Greek Extended Greek Extended is a Unicode block containing the accented vowels necessary for writing polytonic Greek. The regular, unaccented Greek characters as well as the characters with tonos and diaeresis can be found in the Greek and Coptic block. Gre ...
(1F00–1FFF) * Symbols: **
General Punctuation General Punctuation is a Unicode block containing punctuation, spacing, and formatting characters for use with all scripts and writing systems. Included are the defined-width spaces, joining formats, directional formats, smart quotes, archaic a ...
(2000–206F) **
Superscripts and Subscripts Superscripts and Subscripts is a Unicode block containing superscript and subscript numerals, mathematical operators, and letters used in mathematics and phonetics. The use of subscripts and superscripts in Unicode allows any polynomial, chemic ...
(2070–209F) **
Currency Symbols A currency symbol or currency sign is a graphic symbol used to denote a currency unit. Usually it is defined by the monetary authority, like the national central bank for the currency concerned. In formatting, the symbol can use various format ...
(20A0–20CF) **
Combining Diacritical Marks for Symbols Combining may refer to: * Combine harvester use in agriculture * Combining capacity, in chemistry * Combining character, in digital typography * Combining form, in linguistics * Combining grapheme joiner, Unicode character that has no visible gly ...
(20D0–20FF) **
Letterlike Symbols Letterlike Symbols is a Unicode block containing 80 characters which are constructed mainly from the glyphs of one or more letters. In addition to this block, Unicode includes full styled mathematical alphabets, although Unicode does not expl ...
(2100–214F) ** Number Forms (2150–218F) ** Arrows (2190–21FF) **
Mathematical Operators Mathematical Operators is a Unicode block containing characters for mathematical, logical, and set notation. Notably absent are the plus sign (+), greater than sign (>) and less than sign (<), due to them already appearing in the Basi ...
(2200–22FF) **
Miscellaneous Technical Miscellaneous Technical is a Unicode block ranging from U+2300 to U+23FF, which contains various common symbols which are related to and used in the various technical, programming language, and academic professions. For example: * Symbol ⌂ (H ...
(2300–23FF) **
Control Pictures Control Pictures is a Unicode block A Unicode block is one of several contiguous ranges of numeric character codes ( code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation purpos ...
(2400–243F) **
Optical Character Recognition Optical character recognition or optical character reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a sc ...
(2440–245F) **
Enclosed Alphanumerics Enclosed Alphanumerics is a Unicode block of typographical symbols of an alphanumeric within a circle, a bracket or other not-closed enclosure, or ending in a full stop. It is currently fully allocated. Within the Basic Multilingual Plane, ...
(2460–24FF) ** Box Drawing (2500–257F) **
Block Elements Block Elements is a Unicode block containing square block symbols of various fill and shading. Used along with block elements are box-drawing characters, shade characters, and terminal graphic characters. These can be used for filling regions of th ...
(2580–259F) **
Geometric Shapes Geometric Shapes is a Unicode block of 96 symbols at code point range U+25A0–25FF. U+25A0–U+25CF The BLACK CIRCLE is displayed when typing in a password field, in order to hide characters from a screen recorder or shoulder surfing. U+2 ...
(25A0–25FF) **
Miscellaneous Symbols Miscellaneous Symbols is a Unicode block (U+2600–U+26FF) containing glyphs representing concepts from a variety of categories: astrological, astronomical, chess, dice, musical notation, political symbols, recycling, religious symbols, trigr ...
(2600–26FF) ** Dingbats (2700–27BF) **
Miscellaneous Mathematical Symbols-A Miscellaneous Mathematical Symbols-A is a Unicode block A Unicode block is one of several contiguous ranges of numeric character codes ( code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and ...
(27C0–27EF) ** Supplemental Arrows-A (27F0–27FF) **
Braille Patterns The Unicode block Braille Patterns (U+2800..U+28FF) contains all 256 possible patterns of an 8-dot braille cell, thereby including the complete 6-dot cell range.
(2800–28FF) ** Supplemental Arrows-B (2900–297F) **
Miscellaneous Mathematical Symbols-B Miscellaneous Mathematical Symbols-B is a Unicode block containing miscellaneous mathematical symbols, including brackets, angles, and circle symbols. Block Some of these symbols are used in Z notation. Specifically * * * * * * The last two ...
(2980–29FF) **
Supplemental Mathematical Operators Supplemental Mathematical Operators is a Unicode block containing various mathematical symbols, including N-ary operators, summations and integrals, intersections and unions, logical and relational operators, and subset/superset relations. Block ...
(2A00–2AFF) **
Miscellaneous Symbols and Arrows Miscellaneous Symbols and Arrows is a Unicode block A Unicode block is one of several contiguous ranges of numeric character codes ( code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and ...
(2B00–2BFF) *
Glagolitic The Glagolitic script (, , ''glagolitsa'') is the oldest known Slavic alphabet. It is generally agreed to have been created in the 9th century by Saint Cyril, a monk from Thessalonica. He and his brother Saint Methodius were sent by the Byzan ...
(2C00–2C5F) *
Latin Extended-C Latin Extended-C is a Unicode block containing Latin characters for Uighur New Script, the Uralic Phonetic Alphabet The Uralic Phonetic Alphabet (UPA) or Finno-Ugric transcription system is a phonetic transcription or notational system used ...
(2C60–2C7F) * Coptic (2C80–2CFF) *
Georgian Supplement Georgian Supplement is a Unicode block containing characters for the ecclesiastical form of the Georgian script, Nuskhuri ( ka, ნუსხური). To write the full ecclesiastical Khutsuri orthography, the Asomtavruli The Georgian scrip ...
(2D00–2D2F) *
Tifinagh Tifinagh ( Tuareg Berber language: or , ) is a script used to write the Berber languages. Tifinagh is descended from the ancient Libyco-Berber alphabet. The traditional Tifinagh, sometimes called Tuareg Tifinagh, is still favored by the Tuar ...
(2D30–2D7F) * Ethiopic Extended (2D80–2DDF) * Cyrillic Extended-A (2DE0–2DFF) * Supplemental Punctuation (2E00–2E7F) * CJK scripts and symbols: ** CJK Radicals Supplement (2E80–2EFF) **
Kangxi Radicals The 214 Kangxi radicals (), also known as the Zihui radicals, form a system of radicals () of Chinese characters. The radicals are numbered in stroke count order. They are the most popular system of radicals for dictionaries that order Traditio ...
(2F00–2FDF) ** Ideographic Description Characters (2FF0–2FFF) **
CJK Symbols and Punctuation CJK Symbols and Punctuation is a Unicode block containing symbols and punctuation used for writing the Chinese, Japanese and Korean languages. It also contains one Chinese character. Block The block has variation sequences defined for East ...
(3000–303F) **
Hiragana is a Japanese syllabary, part of the Japanese writing system, along with ''katakana'' as well as ''kanji''. It is a phonetic lettering system. The word ''hiragana'' literally means "flowing" or "simple" kana ("simple" originally as contrast ...
(3040–309F) **
Katakana is a Japanese syllabary, one component of the Japanese writing system along with hiragana, kanji and in some cases the Latin script (known as rōmaji). The word ''katakana'' means "fragmentary kana", as the katakana characters are derived f ...
(30A0–30FF) ** Bopomofo (3100–312F) **
Hangul Compatibility Jamo Hangul Compatibility Jamo is a Unicode block containing Hangul characters for compatibility with the South Korean national standard KS X 1001 KS X 1001, "''Code for Information Interchange (Hangul and Hanja)''", formerly called KS C 5601, ...
(3130–318F) **
Kanbun A is a form of Classical Chinese used in Japan from the Nara period to the mid-20th century. Much of Japanese literature was written in this style and it was the general writing style for official and intellectual works throughout the period. ...
(3190–319F) ** Bopomofo Extended (31A0–31BF) **
CJK Strokes CJK strokes () are the calligraphic strokes needed to write the Chinese characters in regular script used in East Asian calligraphy. CJK strokes are the classified set of line patterns that may be arranged and combined to form Chinese charact ...
(31C0–31EF) **
Katakana Phonetic Extensions Katakana Phonetic Extensions is a Unicode block containing additional small katakana characters for writing the Ainu language, in addition to characters in the Katakana is a Japanese syllabary, one component of the Japanese writing system ...
(31F0–31FF) **
Enclosed CJK Letters and Months Enclosed CJK Letters and Months is a Unicode block containing circled and parenthesized Katakana, Hangul, and CJK ideographs. Also included in the block are miscellaneous glyphs that would more likely fit in CJK Compatibility or Enclosed Alpha ...
(3200–32FF) **
CJK Compatibility CJK Compatibility is a Unicode block containing square symbols (both CJK and Latin alphanumeric) encoded for compatibility with East Asian character sets. In Unicode 1.0, it was divided into two blocks, named CJK Squared Words (U+3300–U+337F) ...
(3300–33FF) ** CJK Unified Ideographs Extension A (3400–4DBF) ** Yijing Hexagram Symbols (Unicode block), Yijing Hexagram Symbols (4DC0–4DFF) ** CJK Unified Ideographs (Unicode block), CJK Unified Ideographs (4E00–9FFF) * Yi Syllables (Unicode block), Yi Syllables (A000–A48F) * Yi Radicals (Unicode block), Yi Radicals (A490–A4CF) * Lisu (Unicode block), Lisu (A4D0–A4FF) * Vai (Unicode block), Vai (A500–A63F) * Cyrillic Extended-B (A640–A69F) * Bamum (Unicode block), Bamum (A6A0–A6FF) * Modifier Tone Letters (A700–A71F) * Latin Extended-D (A720–A7FF) *
Brahmic The Brahmic scripts, also known as Indic scripts, are a family of abugida writing systems. They are used throughout the Indian subcontinent, Southeast Asia and parts of East Asia. They are descended from the Brahmi script of ancient India ...
scripts: ** Syloti Nagri (Unicode block), Syloti Nagri (A800–A82F) ** Common Indic Number Forms (A830–A83F) ** Phags-pa (Unicode block), Phags-pa (A840–A87F) ** Saurashtra (Unicode block), Saurashtra (A880–A8DF) ** Devanagari Extended (A8E0–A8FF) ** Kayah Li (Unicode block), Kayah Li (A900–A92F) ** Rejang (Unicode block), Rejang (A930–A95F) * Hangul Jamo Extended-A (A960–A97F) *
Brahmic The Brahmic scripts, also known as Indic scripts, are a family of abugida writing systems. They are used throughout the Indian subcontinent, Southeast Asia and parts of East Asia. They are descended from the Brahmi script of ancient India ...
scripts: ** Javanese (Unicode block), Javanese (A980–A9DF) ** Myanmar Extended-B (A9E0–A9FF) ** Cham (Unicode block), Cham (AA00–AA5F) ** Myanmar Extended-A (AA60–AA7F) ** Tai Viet (Unicode block), Tai Viet (AA80–AADF) ** Meetei Mayek Extensions (AAE0–AAFF) * Ethiopic Extended-A (AB00–AB2F) * Latin Extended-E (AB30–AB6F) * Cherokee Supplement (AB70–ABBF) * Meetei Mayek (Unicode block), Meetei Mayek (ABC0–ABFF) * Hangul Syllables (Unicode block), Hangul Syllables (AC00–D7AF) * Hangul Jamo Extended-B (D7B0–D7FF) * Universal Character Set characters#Surrogates, Surrogates: ** High Surrogates (Unicode block), High Surrogates (D800–DB7F) ** High Private Use Surrogates (Unicode block), High Private Use Surrogates (DB80–DBFF) ** Low Surrogates (Unicode block), Low Surrogates (DC00–DFFF) * Private Use Areas, Private Use Area (E000–F8FF) * CJK Compatibility Ideographs (F900–FAFF) * Alphabetic Presentation Forms (FB00–FB4F) * Arabic Presentation Forms-A (FB50–FDFF) * Variation Selectors (Unicode block), Variation Selectors (FE00–FE0F) * Vertical Forms (FE10–FE1F) * Combining Half Marks (FE20–FE2F) * CJK Compatibility Forms (FE30–FE4F) * Small Form Variants (FE50–FE6F) * Arabic Presentation Forms-B (FE70–FEFF) * Halfwidth and fullwidth forms (Unicode block), Halfwidth and Fullwidth Forms (FF00–FFEF) * Specials (Unicode block), Specials (FFF0–FFFF)


Supplementary Multilingual Plane

Plane 1, the Supplementary Multilingual Plane (SMP), contains historic scripts (except CJK ideographic), and symbols and notation used within certain fields. Scripts include Linear B, Egyptian hieroglyphs, and cuneiform scripts. It also includes English reform orthographies like Shavian and Deseret alphabet, Deseret, and some modern scripts like Osage alphabet, Osage, Warang Citi, Adlam alphabet, Adlam, Wancho language, Wancho and Toto language, Toto. Symbols and notations include historic and modern musical notation; Mathematical alphanumeric symbols, mathematical alphanumerics; shorthands; Emoji and other pictographic sets; and game symbols for playing cards, mahjong, and dominoes. , the SMP comprises the following 151 blocks: * Archaic Greek and Other Left-to-right scripts: ** Linear B Syllabary (Unicode block), Linear B Syllabary (10000–1007F) ** Linear B Ideograms (Unicode block), Linear B Ideograms (10080–100FF) ** Aegean Numbers (Unicode block), Aegean Numbers (10100–1013F) ** Ancient Greek Numbers (Unicode block), Ancient Greek Numbers (10140–1018F) ** Ancient Symbols (Unicode block), Ancient Symbols (10190–101CF) ** Phaistos Disc (Unicode block), Phaistos Disc (101D0–101FF) ** Lycian (Unicode block), Lycian (10280–1029F) ** Carian (Unicode block), Carian (102A0–102DF) ** Coptic Epact Numbers (Unicode block), Coptic Epact Numbers (102E0–102FF) ** Old Italic (Unicode block), Old Italic (10300–1032F) ** Gothic (Unicode block), Gothic (10330–1034F) ** Old Permic (Unicode block), Old Permic (10350–1037F) ** Ugaritic (Unicode block), Ugaritic (10380–1039F) ** Old Persian (Unicode block), Old Persian (103A0–103DF) ** Deseret (Unicode block), Deseret (10400–1044F) ** Shavian (Unicode block), Shavian (10450–1047F) ** Osmanya (Unicode block), Osmanya (10480–104AF) ** Osage (Unicode block), Osage (104B0–104FF) ** Elbasan (Unicode block), Elbasan (10500–1052F) ** Caucasian Albanian (Unicode block), Caucasian Albanian (10530–1056F) ** Vithkuqi (Unicode block), Vithkuqi (10570–105BF) ** Linear A (Unicode block), Linear A (10600–1077F) ** Latin Extended-F (10780–107BF) * Right-to-left scripts: ** Cypriot Syllabary (Unicode block), Cypriot Syllabary (10800–1083F) ** Imperial Aramaic (Unicode block), Imperial Aramaic (10840–1085F) ** Palmyrene (Unicode block), Palmyrene (10860–1087F) ** Nabataean (Unicode block), Nabataean (10880–108AF) ** Hatran (Unicode block), Hatran (108E0–108FF) ** Phoenician (Unicode block), Phoenician (10900–1091F) ** Lydian (Unicode block), Lydian (10920–1093F) ** Meroitic Hieroglyphs (Unicode block), Meroitic Hieroglyphs (10980–1099F) ** Meroitic Cursive (Unicode block), Meroitic Cursive (109A0–109FF) ** Kharoshthi (Unicode block), Kharoshthi (10A00–10A5F) ** Old South Arabian (Unicode block), Old South Arabian (10A60–10A7F) ** Old North Arabian (Unicode block), Old North Arabian (10A80–10A9F) ** Manichaean (Unicode block), Manichaean (10AC0–10AFF) ** Avestan (Unicode block), Avestan (10B00–10B3F) ** Inscriptional Parthian (Unicode block), Inscriptional Parthian (10B40–10B5F) ** Inscriptional Pahlavi (Unicode block), Inscriptional Pahlavi (10B60–10B7F) ** Psalter Pahlavi (Unicode block), Psalter Pahlavi (10B80–10BAF) ** Old Turkic (Unicode block), Old Turkic (10C00–10C4F) ** Old Hungarian (Unicode block), Old Hungarian (10C80–10CFF) ** Hanifi Rohingya (Unicode block), Hanifi Rohingya (10D00–10D3F) ** Rumi Numeral Symbols (Unicode block), Rumi Numeral Symbols (10E60–10E7F) ** Yezidi (Unicode block), Yezidi (10E80–10EBF) ** Arabic Extended-C (10EC0–10EFF) ** Old Sogdian (Unicode block), Old Sogdian (10F00–10F2F) ** Sogdian (Unicode block), Sogdian (10F30–10F6F) ** Old Uyghur (Unicode block), Old Uyghur (10F70–10FAF) ** Chorasmian (Unicode block), Chorasmian (10FB0–10FDF) ** Elymaic (Unicode block), Elymaic (10FE0–10FFF) *
Brahmic The Brahmic scripts, also known as Indic scripts, are a family of abugida writing systems. They are used throughout the Indian subcontinent, Southeast Asia and parts of East Asia. They are descended from the Brahmi script of ancient India ...
scripts: ** Brahmi (Unicode block), Brahmi (11000–1107F) ** Kaithi (Unicode block), Kaithi (11080–110CF) ** Sora Sompeng (Unicode block), Sora Sompeng (110D0–110FF) ** Chakma (Unicode block), Chakma (11100–1114F) ** Mahajani (Unicode block), Mahajani (11150–1117F) ** Sharada (Unicode block), Sharada (11180–111DF) ** Sinhala Archaic Numbers (Unicode block), Sinhala Archaic Numbers (111E0–111FF) ** Khojki (Unicode block), Khojki (11200–1124F) ** Multani (Unicode block), Multani (11280–112AF) ** Khudawadi (Unicode block), Khudawadi (112B0–112FF) ** Grantha (Unicode block), Grantha (11300–1137F) ** Newa (Unicode block), Newa (11400–1147F) ** Tirhuta (Unicode block), Tirhuta (11480–114DF) ** Siddham (Unicode block), Siddham (11580–115FF) ** Modi (Unicode block), Modi (11600–1165F) ** Mongolian Supplement (Unicode block), Mongolian Supplement (11660–1167F) ** Takri (Unicode block), Takri (11680–116CF) ** Ahom (Unicode block), Ahom (11700–1174F) ** Dogra (Unicode block), Dogra (11800–1184F) ** Warang Citi (Unicode block), Warang Citi (118A0–118FF) ** Dives Akuru (Unicode block), Dives Akuru (11900–1195F) ** Nandinagari (Unicode block), Nandinagari (119A0–119FF) ** Zanabazar Square (Unicode block), Zanabazar Square (11A00–11A4F) ** Soyombo (Unicode block), Soyombo (11A50–11AAF) * Unified Canadian Aboriginal Syllabics Extended-A (11AB0–11ABF) * Brahmic scripts: ** Pau Cin Hau (Unicode block), Pau Cin Hau (11AC0–11AFF) ** Devanagari Extended-A (11B00–11B5F) ** Bhaiksuki (Unicode block), Bhaiksuki (11C00–11C6F) ** Marchen (Unicode block), Marchen (11C70–11CBF) ** Masaram Gondi (Unicode block), Masaram Gondi (11D00–11D5F) ** Gunjala Gondi (Unicode block), Gunjala Gondi (11D60–11DAF) ** Makasar (Unicode block), Makasar (11EE0–11EFF) ** Kawi (Unicode block), Kawi (11F00–11F5F) * Lisu Supplement (11FB0–11FBF) * Tamil Supplement (11FC0–11FFF) * Cuneiform (Unicode block), Cuneiform (12000–123FF) * Cuneiform Numbers and Punctuation (Unicode block), Cuneiform Numbers and Punctuation (12400–1247F) * Early Dynastic Cuneiform (Unicode block), Early Dynastic Cuneiform (12480–1254F) * Cypro-Minoan (Unicode block), Cypro-Minoan (12F90–12FFF) * Egyptian Hieroglyphs (Unicode block), Egyptian Hieroglyphs (13000–1342F) * Egyptian Hieroglyph Format Controls (13430–1345F) * Anatolian Hieroglyphs (Unicode block), Anatolian Hieroglyphs (14400–1467F) * Bamum Supplement (Unicode block), Bamum Supplement (16800–16A3F) * Mro (Unicode block), Mro (16A40–16A6F) * Tangsa (Unicode block), Tangsa (16A70–16ACF) * Bassa Vah (Unicode block), Bassa Vah (16AD0–16AFF) * Pahawh Hmong (Unicode block), Pahawh Hmong (16B00–16B8F) * Medefaidrin (Unicode block), Medefaidrin (16E40–16E9F) * Miao (Unicode block), Miao (16F00–16F9F) * Ideographic Symbols and Punctuation (Unicode block), Ideographic Symbols and Punctuation (16FE0–16FFF) * Tangut (Unicode block), Tangut (17000–187FF) * Tangut Components (Unicode block), Tangut Components (18800–18AFF) * Khitan Small Script (Unicode block), Khitan Small Script (18B00–18CFF) * Tangut Supplement (18D00–18D7F) * Kana Extended-B (1AFF0–1AFFF) * Kana Supplement (Unicode block), Kana Supplement (1B000–1B0FF) * Kana Extended-A (Unicode block), Kana Extended-A (1B100–1B12F) * Small Kana Extension (1B130–1B16F) * Nushu (Unicode block), Nushu (1B170–1B2FF) * Duployan (Unicode block), Duployan (1BC00–1BC9F) * Shorthand Format Controls (Unicode block), Shorthand Format Controls (1BCA0–1BCAF) * Symbols: ** Musical notation: *** Znamenny Musical Notation (1CF00–1CFCF) *** Byzantine Musical Symbols (Unicode block), Byzantine Musical Symbols (1D000–1D0FF) *** Musical Symbols (Unicode block), Musical Symbols (1D100–1D1FF) *** Ancient Greek Musical Notation (Unicode block), Ancient Greek Musical Notation (1D200–1D24F) ** Kaktovik Numerals (Unicode block), Kaktovik Numerals (1D2C0–1D2DF) ** Mayan Numerals (Unicode block), Mayan Numerals (1D2E0–1D2FF) ** Mathematical symbols: *** Tai Xuan Jing Symbols (Unicode block), Tai Xuan Jing Symbols (1D300–1D35F) *** Counting Rod Numerals (Unicode block), Counting Rod Numerals (1D360–1D37F) *** Mathematical Alphanumeric Symbols (Unicode block), Mathematical Alphanumeric Symbols (1D400–1D7FF) ** Sutton SignWriting (Unicode block), Sutton SignWriting (1D800–1DAAF) * Latin Extended-G (1DF00–1DFFF) * Glagolitic Supplement (Unicode block), Glagolitic Supplement (1E000–1E02F) * Cyrillic Extended-D (1E030–1E08F) * Nyiakeng Puachue Hmong (Unicode block), Nyiakeng Puachue Hmong (1E100–1E14F) * Toto (Unicode block), Toto (1E290–1E2BF) * Wancho (Unicode block), Wancho (1E2C0–1E2FF) * Nag Mundari (Unicode block), Nag Mundari (1E4D0–1E4FF) * Ethiopic Extended-B (1E7E0–1E7FF) * Mende Kikakui (Unicode block), Mende Kikakui (1E800–1E8DF) * Adlam (Unicode block), Adlam (1E900–1E95F) * Symbols: ** Indic Siyaq Numbers (1EC70–1ECBF) ** Ottoman Siyaq Numbers (1ED00–1ED4F) ** Arabic Mathematical Alphabetic Symbols (Unicode block), Arabic Mathematical Alphabetic Symbols (1EE00–1EEFF) ** Game tiles and cards: *** Mahjong Tiles (Unicode block), Mahjong Tiles (1F000–1F02F) *** Domino Tiles (Unicode block), Domino Tiles (1F030–1F09F) *** Playing Cards (Unicode block), Playing Cards (1F0A0–1F0FF) ** Enclosed Alphanumeric Supplement (Unicode block), Enclosed Alphanumeric Supplement (1F100–1F1FF) ** Enclosed Ideographic Supplement (Unicode block), Enclosed Ideographic Supplement (1F200–1F2FF) ** Miscellaneous Symbols and Pictographs (Unicode block), Miscellaneous Symbols and Pictographs (1F300–1F5FF) ** Emoticons (Unicode block), Emoticons (1F600–1F64F) ** Ornamental Dingbats (Unicode block), Ornamental Dingbats (1F650–1F67F) ** Transport and Map Symbols (Unicode block), Transport and Map Symbols (1F680–1F6FF) ** Alchemical Symbols (Unicode block), Alchemical Symbols (1F700–1F77F) ** Geometric Shapes Extended (Unicode block), Geometric Shapes Extended (1F780–1F7FF) ** Supplemental Arrows-C (Unicode block), Supplemental Arrows-C (1F800–1F8FF) ** Supplemental Symbols and Pictographs (Unicode block), Supplemental Symbols and Pictographs (1F900–1F9FF) ** Chess Symbols (1FA00–1FA6F) ** Symbols and Pictographs Extended-A (1FA70–1FAFF) ** Symbols for Legacy Computing (1FB00–1FBFF)


Supplementary Ideographic Plane

Plane 2, the Supplementary Ideographic Plane (SIP), is used for CJK Ideographs, mostly CJK Unified Ideographs, that were not included in earlier character encoding standards. , the SIP comprises the following six blocks: * CJK Unified Ideographs Extension B (20000–2A6DF) * CJK Unified Ideographs Extension C (2A700–2B73F) * CJK Unified Ideographs Extension D (2B740–2B81F) * CJK Unified Ideographs Extension E (2B820–2CEAF) * CJK Unified Ideographs Extension F (2CEB0–2EBEF) * CJK Compatibility Ideographs Supplement (2F800–2FA1F)


Tertiary Ideographic Plane

Plane 3 is the Tertiary Ideographic Plane (TIP). CJK Unified Ideographs Extension G was added to the TIP in Unicode 13.0, released in March 2020. It also is tentatively allocated for Oracle bone script, Oracle Bone script and Small Seal Script. , the TIP comprises the following two blocks: * CJK Unified Ideographs Extension G (30000–3134F) * CJK Unified Ideographs Extension H (31350–323AF)


Unassigned planes

Planes 4 to 13 (planes to in hexadecimal): No characters have yet been assigned, or proposed for assignment, to Planes 4 through 13.


Supplementary Special-purpose Plane

Plane 14 ( in hexadecimal) is designated as the Supplementary Special-purpose Plane (SSP). It comprises the following two Unicode block, blocks, : *Tags (Unicode block), Tags (E0000–E007F) *Variation Selectors Supplement (Unicode block), Variation Selectors Supplement (E0100–E01EF) – used to indicate alternate glyphs for characters.


Private Use Area Planes

The two planes 15 and 16 (planes and in hexadecimal) each contain a "Private Use Areas, Private Use Area". They contain blocks named Supplementary Private Use Area-A (PUA-A) and -B (PUA-B). The Private Use Areas are available for use by parties outside ISO and Unicode (private character encoding).


References

{{Unicode navigation Unicode, Plane