HOME



picture info

Supplementary Ideographic Plane
In the Unicode standard, a plane is a contiguous group of 65,536 (216) code points. There are 17 planes, identified by the numbers 0 to 16, which corresponds with the possible values 00–1016 of the first two positions in six position hexadecimal format (U+''hhhhhh''). Plane 0 is the Basic Multilingual Plane (BMP), which contains most commonly used characters. The higher planes 1 through 16 are called "supplementary planes". The last code point in Unicode is the last code point in plane 16, U+10FFFF. As of Unicode version , five of the planes have assigned code points (characters), and seven are named. The limit of 17 planes is due to UTF-16, which can encode 220 code points (16 planes) as pairs of words, plus the BMP as a single word. UTF-8 was designed with a much larger limit of 231 (2,147,483,648) code points (32,768 planes), and would still be able to encode 221 (2,097,152) code points (32 planes) even under the current limit of 4 bytes. The 17 planes can accommodate 1,114, ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Unicode
Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Character (computing), characters and 168 script (Unicode), scripts used in various ordinary, literary, academic, and technical contexts. Unicode has largely supplanted the previous environment of a myriad of incompatible character sets used within different locales and on different computer architectures. The entire repertoire of these sets, plus many additional characters, were merged into the single Unicode set. Unicode is used to encode the vast majority of text on the Internet, including most web pages, and relevant Unicode support has become a common consideration in contemporary software development. Unicode is ultimately capable of encoding more than 1.1 million characters. The Unicode character repertoire is synchronized with Univers ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Writing
Writing is the act of creating a persistent representation of language. A writing system includes a particular set of symbols called a ''script'', as well as the rules by which they encode a particular spoken language. Every written language arises from a corresponding spoken language; while the use of language is universal across human societies, most spoken languages are not written. Writing is a cognitive and social activity involving neuropsychological and physical processes. The outcome of this activity, also called ''writing'' (or a ''text'') is a series of physically inscribed, mechanically transferred, or digitally represented symbols. Reading is the corresponding process of interpreting a written text, with the interpreter referred to as a ''reader''. In general, writing systems do not constitute languages in and of themselves, but rather a means of encoding language such that it can be read by others across time and space. While not all languages use a writ ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Cyrillic (Unicode Block)
Cyrillic is a Unicode block containing the characters used to write the most widely used languages with a Cyrillic The Cyrillic script ( ) is a writing system used for various languages across Eurasia. It is the designated national script in various Slavic, Turkic, Mongolic, Uralic, Caucasian and Iranic-speaking countries in Southeastern Europe, Ea ... orthography. The core of the block is based on the ISO 8859-5 standard, with additions for minority languages and historic orthographies. Block List of characters * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * History The following Unicode-related documents record the purpose and process of defining specific characters in ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Greek And Coptic
Greek and Coptic is the Unicode block for representing modern (monotonic) Greek. It was originally also used for writing Coptic, using the similar Greek letters in addition to the uniquely Coptic additions. Beginning with version 4.1 of the Unicode Standard, a separate Coptic block has been included in Unicode, allowing for mixed Greek/Coptic text that is stylistically contrastive, as is convention in scholarly works. Writing polytonic Greek requires the use of combining characters or the precomposed vowel + tone characters in the Greek Extended character block. Its block name in Unicode 1.0 was simply Greek, although Coptic letters were already included. Block Points were reserved for the uppercase forms of ΐ, ΰ and ς. While letter-diacritic combinations such as ΐ and ΰ are no longer accepted by Unicode, a capital ς remains a theoretical possibility. There is in addition room for three additional casing pairs, or for capital forms of letters such as lunate ϵ and ϶. ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Combining Diacritical Marks
Combining Diacritical Marks is a Unicode block containing the most common combining characters. It also contains the character " Combining Grapheme Joiner", which prevents canonical reordering of combining characters, and despite the name, actually separates characters that would otherwise be considered a single grapheme In linguistics, a grapheme is the smallest functional unit of a writing system. The word ''grapheme'' is derived from Ancient Greek ('write'), and the suffix ''-eme'' by analogy with ''phoneme'' and other emic units. The study of graphemes ... in a given context. Its block name in Unicode 1.0 was Generic Diacritical Marks. Block Character table History The following Unicode-related documents record the purpose and process of defining specific characters in the Combining Diacritical Marks block: See also * Phonetic symbols in Unicode References {{Reflist Unicode blocks ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Spacing Modifier Letters
Spacing Modifier Letters is a Unicode block containing characters for the IPA, UPA, and other phonetic transcriptions. Included are the IPA tone marks, and modifiers for aspiration and palatalization. The word ''spacing'' indicates that these characters occupy their own horizontal space within a line of text. Its block name in Unicode 1.0 was simply Modifier Letters. Character table Compact table History The following Unicode-related documents record the purpose and process of defining specific characters in the Spacing Modifier Letters block: See also * Modifier letter * Phonetic symbols in Unicode Unicode supports several phonetic scripts and notation systems through its existing scripts and the addition of extra Unicode block, blocks with phonetic characters. These phonetic characters are derived from an existing script, usually Latin, Gr ... References {{reflist Unicode blocks ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


IPA Extensions
IPA Extensions is a block (U+0250–U+02AF) of the Unicode standard that contains full size letters used in the International Phonetic Alphabet (IPA). Both modern and historical characters are included, as well as former and proposed IPA signs and non-IPA phonetic letters. Additional characters employed for phonetics, like the palatalization sign, are encoded in the blocks Phonetic Extensions (1D00–1D7F) and Phonetic Extensions Supplement (1D80–1DBF). Diacritics are found in the Spacing Modifier Letters (02B0–02FF) and Combining Diacritical Marks Combining Diacritical Marks is a Unicode block containing the most common combining characters. It also contains the character " Combining Grapheme Joiner", which prevents canonical reordering of combining characters, and despite the name, actua ... (0300–036F) blocks. Its block name in Unicode 1.0 was Standard Phonetic. With the ability to use Unicode for the presentation of IPA symbols, ASCII-based syst ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Latin Extended-B
Latin Extended-B is the fourth block (0180-024F) of the Unicode Standard. It has been included since version 1.0, where it was only allocated to the code points 0180-01FF and contained 113 characters. During unification with ISO 10646 for version 1.1, the block range was extended by 80 code points and another 35 characters were assigned. In version 3.0 and later, the last 60 available code points in the block were assigned. Its block name in Unicode 1.0 was Extended Latin. Character table Subheadings The Latin Extended-B block contains ten subheadings for groups of characters: Non-European and historic Latin, African letters for clicks, Croatian digraphs matching Serbian Cyrillic letters, Pinyin diacritic-vowel combinations, Phonetic and historic letters, Additions for Slovenian and Croatian, Additions for Romanian, Miscellaneous additions, Additions for Livonian, and Additions for Sinology. The Non-European and historic, African clicks, Croatian digraphs, Pinyin, and the first p ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Latin Extended-A
Latin Extended-A is a Unicode block and is the third block of the Unicode standard. It encodes Latin letters from the Latin ISO character sets other than Latin-1 (which is already encoded in the Latin-1 Supplement block) and also legacy characters from the ISO 6937 standard. The Latin Extended-A block has been in the Unicode Standard since version 1.0, with its entire character repertoire, except for the Latin Small Letter Long S, which was added during unification with ISO 10646 in version 1.1. Its block name in Unicode 1.0 was European Latin. Character table Subheadings The Latin Extended-A block contains only two subheadings: European Latin and Deprecated letter. European Latin The European Latin subheading contains all but one character in the Latin Extended-A block. It is populated with accented and variant majuscule Letter case is the distinction between the letters that are in larger uppercase or capitals (more formally '' majuscule'') and smaller lowercase (mor ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Latin-1 Supplement
The Latin-1 Supplement (also called C1 Controls and Latin-1 Supplement) is the second Unicode block in the Unicode standard. It encodes the upper range of ISO 8859-1: 80 (U+0080) – FF (U+00FF). C1 Controls (0080–009F) are not graphic. This block ranges from U+0080 to U+00FF, contains 128 characters and includes the C1 controls, Latin-1 punctuation and symbols, 30 pairs of majuscule and minuscule accented Latin characters and 2 mathematical operators. The C1 Controls and Latin-1 Supplement block has been included in its present form, with the same character repertoire since version 1.0 of the Unicode Standard. Its block name in Unicode 1.0 was simply Latin1. Character table Subheadings The C1 Controls and Latin-1 Supplement block has four subheadings within its character collection: C1 controls, Latin-1 Punctuation and Symbols, Letters, and Mathematical operator(s). C1 controls The C1 controls subheading contains 32 supplementary control codes inherited from ISO/I ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

ASCII
ASCII ( ), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable character, printable and 33 control character, control characters a total of 128 code points. The set of available punctuation had significant impact on the syntax of computer languages and text markup. ASCII hugely influenced the design of character sets used by modern computers; for example, the first 128 code points of Unicode are the same as ASCII. ASCII encodes each code-point as a value from 0 to 127 storable as a seven-bit integer. Ninety-five code-points are printable, including digits ''0'' to ''9'', lowercase letters ''a'' to ''z'', uppercase letters ''A'' to ''Z'', and commonly used punctuation symbols. For example, the letter is represented as 105 (decimal). Also, ASCII specifies 33 non-printing control codes which originated with ; most of which are now obsolete. The control cha ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]