HOME

TheInfoList



OR:

KOI8-RU is an 8-bit character encoding, designed to cover
Russian Russian(s) refers to anything related to Russia, including: *Russians (, ''russkiye''), an ethnic group of the East Slavic peoples, primarily living in Russia and neighboring countries *Rossiyane (), Russian language term for all citizens and peo ...
, Ukrainian, and Belarusian which use a Cyrillic alphabet. It is closely related to KOI8-R, which covers
Russian Russian(s) refers to anything related to Russia, including: *Russians (, ''russkiye''), an ethnic group of the East Slavic peoples, primarily living in Russia and neighboring countries *Rossiyane (), Russian language term for all citizens and peo ...
and Bulgarian, but replaces ten box drawing characters with five Ukrainian and Belarusian letters Ґ, Є, І, Ї, and Ў in both upper case and lower case. It is even more closely related to
KOI8-U KOI8-U (RFC 2319) is an 8-bit character encoding, designed to cover Ukrainian, which uses a Cyrillic alphabet. It is based on KOI8-R, which covers Russian and Bulgarian, but replaces eight box drawing characters with four Ukrainian letters Ґ ...
, which does not include Ў but otherwise makes the same replacements. The additional letter allocations are matched by
KOI8-E ISO-IR-111 or KOI8-E is an 8-bit character set. It is a multinational extension of KOI-8 for Belarusian, Macedonian, Serbian, and Ukrainian (except Ґґ which is added to KOI8-F). The name "ISO-IR-111" refers to its registration number in the ...
, except for Ґ which is added to
KOI8-F KOI8-F or KOI8 Unified is an 8-bit character set. It was designed by Peter Cassetta of Fingertip Software (now defunct) as an attempt to support all the encoded letters from both KOI8-E (ISO-IR-111) and KOI8-RU (and hence also, KOI8-U and KOI8-R), ...
. In IBM, KOI8-RU is assigned code page/
CCSID A CCSID (coded character set identifier) is a 16-bit number that represents a particular encoding of a specific code page. For example, Unicode is a code page that has several encoding (so called "transformation") forms, like UTF-8, UTF-16 and UTF ...
1167. KOI8 remains much more commonly used than
ISO 8859-5 ISO/IEC 8859-5:1999, ''Information technology — 8-bit single-byte coded graphic character sets — Part 5: Latin/Cyrillic alphabet'', is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 198 ...
, which never really caught on. Another common Cyrillic character encoding is
Windows-1251 Windows-1251 is an 8-bit character encoding, designed to cover languages that use the Cyrillic script such as Russian, Ukrainian, Belarusian, Bulgarian, Serbian Cyrillic, Macedonian and other languages. On the web, it is the second most-used si ...
. In the future, both may eventually give way to
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expre ...
. KOI8 stands for ''Kod obmena informatsiey, 8 bit'' (russian: Код обмена информацией, 8 бит) which means "Code for Information Exchange, 8 bit". The KOI8 character sets have the property that the Russian Cyrillic letters are in pseudo-Roman order rather than the natural Cyrillic alphabetical order as in ISO 8859-5. Although this may seem unnatural, it has the useful property that if the eighth bit is stripped, the text can still be read (or at least deciphered) in case-reversed transliteration on an ordinary ASCII terminal. For instance, "Русский Текст" in KOI8-RU becomes ''rUSSKIJ tEKST'' ("Russian Text") if the 8th bit is stripped.


Character set

The following table shows the KOI8-RU encoding. Each character is shown with its equivalent
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expre ...
code point. Although RFC 2319 says that character 0x95 should be U+2219 (∙), it may also be U+2022 (•) to match the bullet character in
Windows-1251 Windows-1251 is an 8-bit character encoding, designed to cover languages that use the Cyrillic script such as Russian, Ukrainian, Belarusian, Bulgarian, Serbian Cyrillic, Macedonian and other languages. On the web, it is the second most-used si ...
. Some references have a typo and incorrectly state that character 0xB4 is U+0403, rather than the correct U+0404. This typo is present in Appendix A of RFC 2319 (but the table in the main text of the RFC gives the correct mapping).


See also

*
KOI character encodings KOI (''КОИ'') is a family of several code pages for the Cyrillic script. The name stands for ''Kod obmena informatsiey'' (russian: Код обмена информацией) which means "Code for Information Interchange". A particular feature ...


References


External links

* {{Character encoding Character sets