HOME
*





KOI-8
KOI-8 (КОИ-8) is an 8-bit character set standardized in GOST 19768-74. Маркелова Л. Н. Эксплуатация программоуправляемой вычислительной машины «Искра 226». — М.: Машиностроение, 1987. — С. 41—42. It is an extension of KOI-7 which allows the use of the Latin alphabet along with the Russian alphabet, both the upper and lower case letters; however, the letter Ёё and the uppercase Ъ are missed, the latter to avoid conflicts with the delete character (both are added in most extensions, see KOI8-B). The first 127 code points are identical to ASCII with the exception of the dollar sign $ (code point 24hex) replaced by the universal currency sign ¤. The rows x8_ and x9_ (code points 128–159) might be filled with the additional control characters from EBCDIC (code points 32–63). This standard has become the base for the later Internet standards such as KOI8-R, KOI8-U, KOI8-RU and ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


KOI Character Encodings
KOI (''КОИ'') is a family of several code pages for the Cyrillic script. The name stands for ''Kod obmena informatsiey'' (russian: Код обмена информацией) which means "Code for Information Interchange". A particular feature of the KOI code pages is that the text remains human-readable when the leftmost bit is stripped, should it inadvertently pass through equipment or software that can only deal with 7 bit wide characters. This is due to characters being placed in a special order (128 codepoints apart from the Latin letter they sound most similar to), which, however, does not correspond to the alphabetic order in any language that is written in Cyrillic and necessitates the use of lookup tables to perform sorting. These encodings are derived from ASCII on the base of some correspondence between Latin and Cyrillic (nearly phonetical), which was already used in Russian dialect of Morse code and in MTK-2 telegraph code. The first 26 characters from А (0xE1) in ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


ISO-IR-153
ISO-IR-153 (ST SEV 358-88) is an 8-bit character set that covers the Russian and Bulgarian alphabets. Unlike the KOI encodings, this encoding lists the Cyrillic letters in their correct traditional order. This has become the basis for ISO/IEC 8859-5 and the Cyrillic Unicode block. Standards and Naming The name ISO-IR-153 refers to this set's number in the ISO-IR registry, and marks it as a set which may be used within ISO/IEC 2022. ISO-IR-153 is a subset of ISO/IEC 8859-5 (synchronised with ECMA-113 since 1988). The ISO-IR-153 documentation cites ST SEV 358-88 as the source standard. While it also cites the earlier GOST 19768-74 (which defines KOI-8 and was conformed to by the first version of ECMA-113, i.e. ISO-IR-111), it does not follow the KOI-8 layout (rather using a close modification of the letter layout from the Main code page) so this appears to be in error. The ISO-IR-153 encoding was intended to replace GOST 19768-74, and is sometimes referred to as GOST-19768-8 ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


KOI8-B
KOI8-B is the informal name for an 8-bit Roman / Cyrillic character set constituting the common subset of the major KOI-8 variants (KOI8-R, KOI8-U, KOI8-RU, KOI8-E, KOI8-F). Accordingly, it is closely related to KOI8-R, but defines only the letter subset in the upper half. As such it was implemented by some font vendors for PC Unixes like Xenix Xenix is a discontinued version of the Unix operating system for various microcomputer platforms, licensed by Microsoft from AT&T Corporation in the late 1970s. The Santa Cruz Operation (SCO) later acquired exclusive rights to the software, and e ... in the late 1980s. Character set The following table shows the KOI8-B encoding. Each character is shown with its equivalent Unicode code point. See also * KOI character encodings References External links *http://czyborra.com/charsets/koi8-b.txt.gz *http://czyborra.com/charsets/koi8-b.bdf.gz {{Character encoding Character sets Computing in the Soviet Union ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




ISO-IR-111
ISO-IR-111 or KOI8-E is an 8-bit character set. It is a multinational extension of KOI-8 for Belarusian, Macedonian, Serbian, and Ukrainian (except Ґґ which is added to KOI8-F). The name "ISO-IR-111" refers to its registration number in the ISO-IR registry, and denotes it as a set usable with ISO/IEC 2022. It was defined by the first (1986) edition of ECMA-113, which is the Ecma International standard corresponding to , and as such also corresponds to a 1987 draft version of ISO-8859-5. The published editions of instead correspond to subsequent editions of ECMA-113, which defines a different encoding. Naming confusion ISO-IR-111, the 1985 edition of ECMA-113 (also called "ECMA-Cyrillic" or "KOI8-E"), was based on the 1974 edition of GOST 19768 (i.e. KOI-8). In 1987 ECMA-113 was redesigned. These newer editions of ECMA-113 are equivalent to ISO-8859-5, and do not follow the KOI layout. This confusion has led to a common misconception that ISO-8859-5 was defined in or bas ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


KOI8-R
KOI8-R (RFC 1489) is an 8-bit character encoding, derived from the KOI-8 encoding by the programmer Andrei Chernov in 1993 and designed to cover Russian, which uses a Cyrillic alphabet. KOI8-R was based on Russian Morse code, which was created from a phonetic version of Latin Morse code. As a result, Russian Cyrillic letters are in pseudo-Roman order rather than the normal Cyrillic alphabetical order. Although this may seem unnatural, if the 8th bit is stripped, the text is partially readable in ASCII and may convert to syntactically correct KOI-7. For example, "Русский Текст" in KOI8-R becomes ''rUSSKIJ tEKST'' ("Russian Text"). KOI8 stands for ''Kod Obmena Informatsiey, 8 bit'' (russian: Код Обмена Информацией, 8 бит) which means "Code for Information Exchange, 8 bit". In Microsoft Windows, KOI8-R is assigned the code page number 20866. In IBM, KOI8-R is assigned code page 878. KOI8-R also happens to cover Bulgarian, but has not ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

ASCII
ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of technical limitations of computer systems at the time it was invented, ASCII has just 128 code points, of which only 95 are , which severely limited its scope. All modern computer systems instead use Unicode, which has millions of code points, but the first 128 of these are the same as the ASCII set. The Internet Assigned Numbers Authority (IANA) prefers the name US-ASCII for this character encoding. ASCII is one of the IEEE milestones. Overview ASCII was developed from telegraph code. Its first commercial use was as a seven- bit teleprinter code promoted by Bell data services. Work on the ASCII standard began in May 1961, with the first meeting of the American Standards Association's (ASA) (now the American National Standards ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

ISO 646
ISO/IEC 646 is a set of ISO/IEC standards, described as ''Information technology — ISO 7-bit coded character set for information interchange'' and developed in cooperation with ASCII at least since 1964. Since its first edition in 1967 it has specified a 7- bit character code from which several national standards are derived. ISO/IEC 646 was also ratified by ECMA as ECMA-6. The first version of ECMA-6 had been published in 1965, based on work the ECMA's Technical Committee TC1 had carried out since December 1960. Characters in the ISO/IEC 646 Basic Character Set are ''invariant characters''. Since that portion of ISO/IEC 646, that is the ''invariant character set'' shared by all countries, specified only those letters used in the ISO basic Latin alphabet, countries using additional letters needed to create national variants of ISO/IEC 646 to be able to use their native scripts. Since transmission and storage of 8-bit codes was not standard at the ti ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


INIS-8
INIS-8 is an 8-bit character encoding developed by the International Nuclear Information System (INIS). It is an 8-bit extension of the 7-bit INIS character set (itself a subset of ASCII), adding a G1 set, and has MIB 52. It is also known as iso-ir-50 (after the ISO 2022 registration of its G1 set) and csISO50INIS8. Character set ISO-IR-51 ISO-IR-51, "INIS Cyrillic Extension", is an alternative G1 set for 8-bit INIS, supporting KOI-8 encoded Russian alphabet The Russian alphabet (russian: ру́сский алфави́т, russkiy alfavit, , label=none, or russian: ру́сская а́збука, russkaya azbuka, label=none, more traditionally) is the script used to write the Russian language. I ... letters, at the expense of the superscript and subscript digits. See also * INIS character set Footnotes References {{Character encoding Character sets International Atomic Energy Agency ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Extended ASCII
Extended ASCII is a repertoire of character encodings that include (most of) the original 96 ASCII character set, plus up to 128 additional characters. There is no formal definition of "extended ASCII", and even use of the term is sometimes criticized, because it can be mistakenly interpreted to mean that the American National Standards Institute (ANSI) had updated its standard to include more characters, or that the term identifies a single unambiguous encoding, neither of which is the case. The ISO standard ISO 8859 was the first international standard to formalise a (limited) expansion of the ASCII character set: of the many language variants it encoded, ISO 8859-1 ("ISO Latin 1")which supports most Western European languages is best known in the West. There are many other extended ASCII encodings (more than 220 DOS and Windows codepages). EBCDIC ("the other" major character code) likewise developed many extended variants (more than 186 EBCDIC codepages) over the decades. ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Request For Comments
A Request for Comments (RFC) is a publication in a series from the principal technical development and standards-setting bodies for the Internet, most prominently the Internet Engineering Task Force (IETF). An RFC is authored by individuals or groups of engineers and computer scientists in the form of a memorandum describing methods, behaviors, research, or innovations applicable to the working of the Internet and Internet-connected systems. It is submitted either for peer review or to convey new concepts, information, or, occasionally, engineering humor. The IETF adopts some of the proposals published as RFCs as Internet Standards. However, many RFCs are informational or experimental in nature and are not standards. The RFC system was invented by Steve Crocker in 1969 to help record unofficial notes on the development of ARPANET. RFCs have since become official documents of Internet specifications, communications protocols, procedures, and events. According to Crocker, the ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Cyrillic Script In Unicode
As of Unicode version 15.0 Cyrillic script is encoded across several blocks: * CyrillicU+0400–U+04FF 256 characters * Cyrillic SupplementU+0500–U+052F 48 characters * Cyrillic Extended-AU+2DE0–U+2DFF 32 characters * Cyrillic Extended-BU+A640–U+A69F 96 characters * Cyrillic Extended-CU+1C80–U+1C8F 9 characters * Cyrillic Extended-DU+1E030–U+1E08F 63 characters * Phonetic ExtensionsU+1D2B, U+1D78 2 Cyrillic characters * Combining Half MarksU+FE2E–U+FE2F 2 Cyrillic characters The characters in the range U+0400–U+045F are basically the characters from ISO 8859-5 moved upward by 864 positions. The next characters in the Cyrillic block, range U+0460–U+0489, are historical letters, some being still used for Church Slavonic. The characters in the range U+048A–U+04FF and the complete Cyrillic Supplement block (U+0500-U+052F) are additional letters for various languages that are written with Cyrillic script. Two characters in the block Phonetic Extensions block ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Windows-1251
Windows-1251 is an 8-bit character encoding, designed to cover languages that use the Cyrillic script such as Russian, Ukrainian, Belarusian, Bulgarian, Serbian Cyrillic, Macedonian and other languages. On the web, it is the second most-used single-byte character encoding (or third most-used character encoding overall), and most used of the single-byte encodings supporting Cyrillic. , 0.4% of all websites use Windows-1251. It's by far mostly used for Russian, while a small minority of Russian websites use it, with 93.7% of Russian (.ru) websites using UTF-8, and the legacy 8-bit encoding is distant second. In Linux, the encoding is known as cp1251. IBM uses code page 1251 ( CCSID 1251 and euro sign extended CCSID 5347) for Windows-1251. Windows-1251 and KOI8-R (or its Ukrainian variant KOI8-U) are much more commonly used than ISO 8859-5 (which is used by less than 0.0004% of websites). In contrast to Windows-1252 and ISO 8859-1, Windows-1251 is not closely related to I ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]