Ideographic Variation Database
   HOME

TheInfoList



OR:

A variant form is an alternate glyph for a character, encoded in
Unicode Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
through the mechanism of variation sequences: sequences in Unicode that consist of a base character followed by a variation selector character. A variant form usually has a very similar appearance and meaning as its base form. The mechanism is intended for variant forms where, generally, if the variant form is unavailable, displaying the base character does not change the meaning of the text, and may not even be noticeable to many readers. Unicode defines two types of variation sequences: * ''Standardized variation sequences'' defined in StandardizedVariants.txt * ''Ideographic variation sequences'' defined in the Ideographic Variation Database (IVD) Variation selector characters reside in several Unicode blocks: *
Variation Selectors Variation Selectors is a Unicode block containing 16 variation selectors used to specify a Variant form (Unicode), glyph variant for a preceding character. They are currently used to specify standardized variation sequences for mathematical symb ...
(16 characters abbreviated VS1–VS16) *
Variation Selectors Supplement Variation Selectors Supplement is a Unicode block containing additional variation selectors beyond those found in the Variation Selectors block. These combining characters are named ''variation selector-17'' (for U+E0100) through to ''variation ...
(240 characters abbreviated VS17–VS256) * Mongolian (4 characters abbreviated FVS1–FVS4) Variation selectors are not required for Arabic and Latin cursive characters, where substitution of glyphs can occur based on context: glyphs may be connected together depending on whether the character is the initial character in a word, the final character, a medial character or an isolated character. These types of glyph substitution are easily handled by the context of the character with no other authoring input involved. Authors may also use special-purpose characters such as joiners and non-joiners to force an alternate form of glyph where it would not otherwise appear. Ligatures are similar instances where glyphs may be substituted simply by turning ligatures on or off as a rich text attribute. For other glyph substitution, the author's intent may need to be encoded with the text and cannot be determined contextually. This is the case with character/glyphs referred to as gaiji, where different glyphs are used for the same character either historically or for ideographs for family names. This is one of the gray areas in distinguishing between a glyph and a character: If a family name differs slightly from the ideograph character it derives from, then is that a simple glyph variant or a character variant? Character substitutions may also occur outside of Unicode, for example with
OpenType OpenType is a format for scalable computer fonts. Derived from TrueType, it retains TrueType's basic structure but adds many intricate data structures for describing typographic behavior. OpenType is a registered trademark of Microsoft Corpora ...
Layout tags.


Blocks with standardized variation sequences

, standardized variation sequences specifically for emoji/text presentation are defined for base characters in 20 blocks: * Arrows * Basic Latin * CJK Symbols and Punctuation * Dingbats *
Emoticons An emoticon (, , rarely , ), short for emotion icon, is a pictorial representation of a facial expression using characters—usually punctuation marks, numbers and letters—to express a person's feelings, mood or reaction, without needin ...
*
Enclosed Alphanumeric Supplement Enclosed Alphanumeric Supplement is a Unicode block consisting of Latin alphabet characters and Arabic numerals enclosed in circles, ovals or boxes, used for a variety of purposes. It is encoded in the range U+1F100–U+1F1FF in the Supple ...
*
Enclosed Alphanumerics Enclosed Alphanumerics is a Unicode block of Typography, typographical symbols of an alphanumeric within a circle, a bracket or other not-closed enclosure, or ending in a full stop. It is currently fully allocated. Within the Basic Multi ...
* Enclosed CJK Letters and Months *
Enclosed Ideographic Supplement Enclosed Ideographic Supplement is a Unicode block containing forms of characters and words from Chinese, Japanese and Korean enclosed within or stylised as squares, brackets, or circles. It contains three such characters containing one or more ...
* General Punctuation *
Geometric Shapes A shape is a graphics, graphical representation of an object's form or its external boundary, outline, or external Surface (mathematics), surface. It is distinct from other object properties, such as color, Surface texture, texture, or material ...
* Latin-1 Supplement *
Letterlike Symbols Letterlike Symbols is a Unicode block containing 80 characters which are constructed mainly from the glyphs of one or more letters. In addition to this block, Unicode includes full styled mathematical alphabets, although Unicode does not exp ...
* Mahjong Tiles *
Miscellaneous Symbols Miscellaneous Symbols is a Unicode block (U+2600–U+26FF) containing glyphs representing concepts from a variety of categories: astrological, astronomical, chess, dice, musical notation, political symbols, recycling, religious symbols, trig ...
*
Miscellaneous Symbols and Arrows Miscellaneous Symbols and Arrows is a Unicode block containing arrows and geometric shapes with various fills, astrological symbols, technical symbols, intonation marks, and others. Block Emoji The Miscellaneous Symbols and Arrows block co ...
* Miscellaneous Symbols and Pictographs *
Miscellaneous Technical Miscellaneous Technical is a Unicode block ranging from U+2300 to U+23FF. It contains various common symbols which are related to and used in the various technical, programming language, and academic professions. For example: * Symbol ⌂ (HTML ...
*
Supplemental Arrows-B Supplemental Arrows-B is a Unicode block containing miscellaneous arrows, arrow tails, crossing arrows used in knot descriptions, curved arrows, and harpoons. Block Emoji The Supplemental Arrows-B block contains two emoji: U+2934–U+2935. ...
* Transport and Map Symbols Other standardized variation sequences are formed with base characters in the following fourteen blocks: *
CJK Unified Ideographs The Chinese, Japanese and Korean (CJK) scripts share a common background, collectively known as CJK characters. During the process called Han unification, the common (shared) characters were identified and named CJK Unified Ideographs. As of Uni ...
*
CJK Unified Ideographs Extension A __FORCETOC__ CJK Unified Ideographs Extension-A is a Unicode block A Unicode block is one of several contiguous ranges of numeric character codes (code points) of the Unicode character set that are defined by the Unicode Consortium for adminis ...
* CJK Unified Ideographs Extension B * Egyptian Hieroglyph Format Controls *
Egyptian Hieroglyphs Ancient Egyptian hieroglyphs ( ) were the formal writing system used in Ancient Egypt for writing the Egyptian language. Hieroglyphs combined Ideogram, ideographic, logographic, syllabic and alphabetic elements, with more than 1,000 distinct char ...
*
Halfwidth and Fullwidth Forms In CJK characters, CJK (Chinese, Japanese, and Korean) computing, graphic characters are traditionally classed into fullwidth and halfwidth characters. Unlike monospaced fonts, a halfwidth character occupies half the width of a fullwidth characte ...
* Manichaean *
Mathematical Alphanumeric Symbols Mathematical Alphanumeric Symbols is a Unicode block comprising styled forms of Latin alphabet, Latin and Greek alphabet, Greek letters and decimal numerical digit, digits that enable mathematicians to denote different notions with different l ...
* Mathematical Operators * Mongolian *
Myanmar Myanmar, officially the Republic of the Union of Myanmar; and also referred to as Burma (the official English name until 1989), is a country in northwest Southeast Asia. It is the largest country by area in Mainland Southeast Asia and has ...
*
Myanmar Extended-A Myanmar Extended-A is a Unicode block containing Myanmar characters for writing the Khamti Shan and Aiton languages. Block The block has eleven variation sequences defined for standardized variants. They use (VS01) to denote the dotted let ...
* Phags-pa *
Supplemental Mathematical Operators Supplemental Mathematical Operators is a Unicode block A Unicode block is one of several contiguous ranges of numeric character codes (code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and d ...


Blocks with ideographic variation sequences

, ideographic variation sequences are defined for base characters in nine blocks: * CJK Compatibility Ideographs *
CJK Unified Ideographs The Chinese, Japanese and Korean (CJK) scripts share a common background, collectively known as CJK characters. During the process called Han unification, the common (shared) characters were identified and named CJK Unified Ideographs. As of Uni ...
*
CJK Unified Ideographs Extension A __FORCETOC__ CJK Unified Ideographs Extension-A is a Unicode block A Unicode block is one of several contiguous ranges of numeric character codes (code points) of the Unicode character set that are defined by the Unicode Consortium for adminis ...
* CJK Unified Ideographs Extension B * CJK Unified Ideographs Extension C * CJK Unified Ideographs Extension D * CJK Unified Ideographs Extension E * CJK Unified Ideographs Extension F * CJK Unified Ideographs Extension H


See also

*
Unicode control characters Many Unicode characters are used to control the interpretation or display of text, but these characters themselves have no visual or spatial representation. For example, the null character ( ) is used in C-programming application environment ...
*
Variant Chinese characters Chinese characters may have several variant forms—visually distinct glyphs that represent the same underlying meaning and pronunciation. Variants of a given character are ''allographs'' of one another, and many are directly analogous to allog ...
*
List of typographic features Typographic features made possible using digital typography, digital typographic systems have solved many of the demands placed on computer systems to replicate traditional typography and have expanded the possibilities with many new features. Thr ...


References

{{Unicode navigation Unicode