KOI Character Encodings
   HOME
*





KOI Character Encodings
KOI (''КОИ'') is a family of several code pages for the Cyrillic script. The name stands for ''Kod obmena informatsiey'' (russian: Код обмена информацией) which means "Code for Information Interchange". A particular feature of the KOI code pages is that the text remains human-readable when the leftmost bit is stripped, should it inadvertently pass through equipment or software that can only deal with 7 bit wide characters. This is due to characters being placed in a special order (128 codepoints apart from the Latin letter they sound most similar to), which, however, does not correspond to the alphabetic order in any language that is written in Cyrillic and necessitates the use of lookup tables to perform sorting. These encodings are derived from ASCII on the base of some correspondence between Latin and Cyrillic (nearly phonetical), which was already used in Russian dialect of Morse code and in MTK-2 telegraph code. The first 26 characters from А (0xE1) in ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Code Page
In computing, a code page is a character encoding and as such it is a specific association of a set of printable characters and control characters with unique numbers. Typically each number represents the binary value in a single byte. (In some contexts these terms are used more precisely; see .) The term "code page" originated from IBM's EBCDIC-based mainframe systems, but Microsoft, SAP, and Oracle Corporation are among the vendors that use this term. The majority of vendors identify their own character sets by a name. In the case when there is a plethora of character sets (like in IBM), identifying character sets through a number is a convenient way to distinguish them. Originally, the code page numbers referred to the ''page'' numbers in the IBM standard character set manual, a condition which has not held for a long time. Vendors that use a code page system allocate their own code page number to a character encoding, even if it is better known by another name; for example, U ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


KOI8-UKRAINE
KOI8-U (RFC 2319) is an 8-bit character encoding, designed to cover Ukrainian, which uses a Cyrillic alphabet. It is based on KOI8-R, which covers Russian and Bulgarian, but replaces eight box drawing characters with four Ukrainian letters Ґ, Є, І, and Ї in both upper case and lower case. KOI8-RU is closely related, but adds Ў for Belarusian. In both, the letter allocations match those in KOI8-E, except for Ґ which is added to KOI8-F. In Microsoft Windows, KOI8-U is assigned the code page number 21866. In IBM, KOI8-U is assigned code page/CCSID 1168. KOI8 remains much more commonly used than ISO 8859-5, which never really caught on. Another common Cyrillic character encoding is Windows-1251. In the future, both may eventually give way to Unicode. KOI8 stands for ''Kod Obmena Informatsiey, 8 bit'' (russian: Код Обмена Информацией, 8 бит) which means "Code for Information Exchange, 8 bit". The KOI8 character sets have the property that the Rus ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


KOI8-F
KOI8-F or KOI8 Unified is an 8-bit character set. It was designed by Peter Cassetta of Fingertip Software (now defunct) as an attempt to support all the encoded letters from both KOI8-E (ISO-IR-111) and KOI8-RU (and hence also, KOI8-U and KOI8-R), along with some of the pseudographics from KOI8-R, with some additional punctuation in the remaining space, sourced partly from Windows-1251. This encoding was only used in the software of that company. Character set The following table shows the KOI8-F encoding. Each character is shown with its equivalent Unicode code point. Differences from ISO-IR-111 are boxed; other relevant encodings which are matched, if any, are noted in footnotes. See also *KOI character encodings KOI (''КОИ'') is a family of several code pages for the Cyrillic script. The name stands for ''Kod obmena informatsiey'' (russian: Код обмена информацией) which means "Code for Information Interchange". A particular feature ... References ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




KOI8-E
ISO-IR-111 or KOI8-E is an 8-bit character set. It is a multinational extension of KOI-8 for Belarusian, Macedonian, Serbian, and Ukrainian (except Ґґ which is added to KOI8-F). The name "ISO-IR-111" refers to its registration number in the ISO-IR registry, and denotes it as a set usable with ISO/IEC 2022. It was defined by the first (1986) edition of ECMA-113, which is the Ecma International standard corresponding to , and as such also corresponds to a 1987 draft version of ISO-8859-5. The published editions of instead correspond to subsequent editions of ECMA-113, which defines a different encoding. Naming confusion ISO-IR-111, the 1985 edition of ECMA-113 (also called "ECMA-Cyrillic" or "KOI8-E"), was based on the 1974 edition of GOST 19768 (i.e. KOI-8). In 1987 ECMA-113 was redesigned. These newer editions of ECMA-113 are equivalent to ISO-8859-5, and do not follow the KOI layout. This confusion has led to a common misconception that ISO-8859-5 was defined in or based ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


ISO-IR-111
ISO-IR-111 or KOI8-E is an 8-bit character set. It is a multinational extension of KOI-8 for Belarusian, Macedonian, Serbian, and Ukrainian (except Ґґ which is added to KOI8-F). The name "ISO-IR-111" refers to its registration number in the ISO-IR registry, and denotes it as a set usable with ISO/IEC 2022. It was defined by the first (1986) edition of ECMA-113, which is the Ecma International standard corresponding to , and as such also corresponds to a 1987 draft version of ISO-8859-5. The published editions of instead correspond to subsequent editions of ECMA-113, which defines a different encoding. Naming confusion ISO-IR-111, the 1985 edition of ECMA-113 (also called "ECMA-Cyrillic" or "KOI8-E"), was based on the 1974 edition of GOST 19768 (i.e. KOI-8). In 1987 ECMA-113 was redesigned. These newer editions of ECMA-113 are equivalent to ISO-8859-5, and do not follow the KOI layout. This confusion has led to a common misconception that ISO-8859-5 was defined in or based ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Central Asia
Central Asia, also known as Middle Asia, is a subregion, region of Asia that stretches from the Caspian Sea in the west to western China and Mongolia in the east, and from Afghanistan and Iran in the south to Russia in the north. It includes the former Soviet Union, Soviet republics of the Soviet Union, republics of Kazakhstan, Kyrgyzstan, Tajikistan, Turkmenistan, and Uzbekistan, which are colloquially referred to as the "-stans" as the countries all have names ending with the Persian language, Persian suffix "-stan", meaning "land of". The current geographical location of Central Asia was formerly part of the historic region of Turkestan, Turkistan, also known as Turan. In the pre-Islamic and early Islamic eras ( and earlier) Central Asia was inhabited predominantly by Iranian peoples, populated by Eastern Iranian languages, Eastern Iranian-speaking Bactrians, Sogdians, Khwarezmian language, Chorasmians and the semi-nomadic Scythians and Dahae. After expansion by Turkic peop ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Caucasus
The Caucasus () or Caucasia (), is a region between the Black Sea and the Caspian Sea, mainly comprising Armenia, Azerbaijan, Georgia, and parts of Southern Russia. The Caucasus Mountains, including the Greater Caucasus range, have historically been considered as a natural barrier between Eastern Europe and Western Asia. Mount Elbrus in Russia, Europe's highest mountain, is situated in the Western Caucasus. On the southern side, the Lesser Caucasus includes the Javakheti Plateau and the Armenian highlands, part of which is in Turkey. The Caucasus is divided into the North Caucasus and South Caucasus, although the Western Caucasus also exists as a distinct geographic space within the North Caucasus. The Greater Caucasus mountain range in the north is mostly shared by Russia and Georgia as well as the northernmost parts of Azerbaijan. The Lesser Caucasus mountain range in the south is occupied by several independent states, mostly by Armenia, Azerbaijan, and Georgia, but also ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Tajik Language
Tajik (Tajik: , , ), also called Tajiki Persian (Tajik: , , ) or Tajiki, is the variety of Persian spoken in Tajikistan and Uzbekistan by Tajiks. It is closely related to neighbouring Dari with which it forms a continuum of mutually intelligible varieties of the Persian language. Several scholars consider Tajik as a dialectal variety of Persian rather than a language on its own. The popularity of this conception of Tajik as a variety of Persian was such that, during the period in which Tajik intellectuals were trying to establish Tajik as a language separate from Persian, prominent intellectual Sadriddin Ayni counterargued that Tajik was not a "bastardised dialect" of Persian.Shinji ldoTajik Published by UN COM GmbH 2005 (LINCOM EUROPA) The issue of whether Tajik and Persian are to be considered two dialects of a single language or two discrete languages has political sides to it. By way of Early New Persian, Tajik, like Iranian Persian and Dari Persian, is a continuation of Midd ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




KOI8-T
KOI8-T is an 8-bit single-byte extended ASCII character encoding adapting KOI8 to cover the Tajik Cyrillic alphabet. It was introduced by Michael Davis as an interim solution for representing Tajiki Cyrillic text in an interchangeable manner appropriate for use on the web, in an attempt to bridge the gap between existing non-interoperable font-specific encodings and the eventual wide adoption of Unicode. It is used by the GNU C Library as its default encoding for Tajik. The Cyrillic letters that are also used in Russian are encoded according to the KOI8-R layout, making the encoding a KOI8-B superset, whereas the punctuation mostly follows the layout in Windows-1251 and Windows-1252 Windows-1252 or CP-1252 ( code page 1252) is a single-byte character encoding of the Latin alphabet, used by default in the legacy components of Microsoft Windows for English and many European languages including Spanish, French, and German. ... as applicable. Character set See also * ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]