Unicode blocks
   HOME

TheInfoList



OR:

A Unicode block is one of several contiguous ranges of numeric character codes (
code point A code point, codepoint or code position is a particular position in a Table (database), table, where the position has been assigned a meaning. The table may be one dimensional (a column), two dimensional (like cells in a spreadsheet), three dime ...
s) of the
Unicode Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
character set that are defined by the
Unicode Consortium The Unicode Consortium (legally Unicode, Inc.) is a 501(c)(3) non-profit organization incorporated and based in Mountain View, California, U.S. Its primary purpose is to maintain and publish the Unicode Standard which was developed with the in ...
for administrative and documentation purposes. Typically, proposals such as the addition of new glyphs are discussed and evaluated by considering the relevant block or blocks as a whole. Each block is generally, but not always, meant to supply glyphs used by one or more specific languages, or in some general application area such as
mathematics Mathematics is a field of study that discovers and organizes methods, Mathematical theory, theories and theorems that are developed and Mathematical proof, proved for the needs of empirical sciences and mathematics itself. There are many ar ...
,
surveying Surveying or land surveying is the technique, profession, art, and science of determining the land, terrestrial Plane (mathematics), two-dimensional or Three-dimensional space#In Euclidean geometry, three-dimensional positions of Point (geom ...
, decorative
typesetting Typesetting is the composition of text for publication, display, or distribution by means of arranging physical ''type'' (or ''sort'') in mechanical systems or '' glyphs'' in digital systems representing '' characters'' (letters and other ...
, social forums, etc.


Design and implementation

Unicode blocks are identified by unique names, which use only ASCII characters and are usually descriptive of the nature of the symbols, in English; such as "Tibetan" or "Supplemental Arrows-A". (When comparing block names, one is supposed to equate uppercase with lowercase letters, and ignore any whitespace, hyphens, and underbars; so the last name is equivalent to "supplemental_arrows_a", "SupplementalArrowsA" and "SUPPLEMENTALARROWSA". Blocks are
pairwise disjoint In set theory in mathematics and Logic#Formal logic, formal logic, two Set (mathematics), sets are said to be disjoint sets if they have no element (mathematics), element in common. Equivalently, two disjoint sets are sets whose intersection (se ...
; that is, they do not overlap. The starting code point and the size (number of code points) of each block are always multiples of 16; therefore, in the
hexadecimal Hexadecimal (also known as base-16 or simply hex) is a Numeral system#Positional systems in detail, positional numeral system that represents numbers using a radix (base) of sixteen. Unlike the decimal system representing numbers using ten symbo ...
notation, the starting (smallest) point is U+''xxx''0 and the ending (largest) point is U+''yyy''F, where ''xxx'' and ''yyy'' are three or more hexadecimal digits. (These constraints are intended to simplify the display of glyphs in Unicode Consortium documents, as tables with 16 rows labeled with the last hexadecimal digit of the code point.) The size of a block may range from the minimum of 16 to a maximum of 65,536 code points. Every assigned code point has a glyph property called "Block", whose value is a character string naming the unique block that owns that point. However, a block may also contain unassigned code points, usually reserved for future additions of characters that "logically" should belong to that block. Code points not belonging to any of the named blocks, e.g. in the unassigned planes 4–13, have the value block="No_Block". Simply belonging to a particular Unicode block does not guarantee the certain particular properties of the characters it is or will be expected to contain. The identity of any character is determined by its properties stated in the Unicode Character Database. For example, the contiguous range of 32 noncharacter code points U+FDD0..U+FDEF share none of the properties common to the other characters in the Arabic Presentation Forms-A block, that they are certainly not Arabic script characters or "right-to-left noncharacters", and are assigned there as a filler to this block given that it has been agreed that no further Arabic compatibility characters will be encoded.


Other classifications

Each Unicode point also has a property called " General Category", that attempts to describe the role of the corresponding symbol in the languages or applications for whose sake it was included in the system. Examples of General Categories are "Lu" (meaning upper-case letter), "Nd" (decimal digit), "Pi" (open-quote punctuation), and "Mn" (non-spacing mark, i.e. a diacritic for the preceding glyph). This division is completely independent of code blocks: the code points with a given General Category generally span many blocks, and do not have to be consecutive, not even within each block. Each code point also has a script property, specifying which
writing system A writing system comprises a set of symbols, called a ''script'', as well as the rules by which the script represents a particular language. The earliest writing appeared during the late 4th millennium BC. Throughout history, each independen ...
it is intended for, or whether it is intended for multiple writing systems. This, also, is independent of block. In descriptions of the Unicode system, a block may be subdivided into more specific subgroups, such as the "
Chess symbols Chess Symbols is a Unicode block containing characters for fairy chess and related notations beyond the basic Western chess symbols (U+2654 to U+265F) in the Miscellaneous Symbols block, as well as symbols representing game pieces for xiangqi ...
" in the Miscellaneous Symbols block (not to be confused with the separate
Chess Symbols Chess Symbols is a Unicode block containing characters for fairy chess and related notations beyond the basic Western chess symbols (U+2654 to U+265F) in the Miscellaneous Symbols block, as well as symbols representing game pieces for xiangqi ...
block). Those subgroups are not "blocks" in the technical sense used by the Unicode consortium, and are named only for the convenience of users.


List of blocks

Unicode defines 338 blocks: * 164 in plane 0, the Basic Multilingual Plane (in table below: ) * 161 in plane 1, the Supplementary Multilingual Plane () * 7 in plane 2, the Supplementary Ideographic Plane () * 2 in plane 3, the Tertiary Ideographic Plane () * 2 in plane 14 (E in
hexadecimal Hexadecimal (also known as base-16 or simply hex) is a Numeral system#Positional systems in detail, positional numeral system that represents numbers using a radix (base) of sixteen. Unlike the decimal system representing numbers using ten symbo ...
), the Supplementary Special-purpose Plane () * One each in the planes 15 (Fhex) and 16 (10hex), called Supplementary Private Use Area-A and -B ()


Moved blocks

The Unicode Stability Policy requires that a character, once assigned, may not be moved or removed, although it may be deprecated. This applies to Unicode 2.0 and all subsequent versions. Prior to this, the following former blocks were moved:


References


External links

* of the Unicode Consortium {{MathematicalSymbolsNotationLanguage