KOI8-U (RFC 2319) is an 8-bit

character encoding Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers. The numerical values tha ...

, designed to cover

Ukrainian Ukrainian may refer to: * Something of, from, or related to Ukraine * Something relating to Ukrainians, an East Slavic people from Eastern Europe * Something relating to demographics of Ukraine in terms of demography and population of Ukraine * So ...

, which uses a

Cyrillic The Cyrillic script ( ), Slavonic script or the Slavic script, is a writing system used for various languages across Eurasia. It is the designated national script in various Slavic, Turkic, Mongolic, Uralic, Caucasian and Iranic-speaking co ...

alphabet. It is based on

KOI8-R KOI8-R (RFC 1489) is an 8-bit character encoding, derived from the KOI-8 encoding by the programmer Andrei Chernov in 1993 and designed to cover Russian, which uses a Cyrillic alphabet. KOI8-R was based on Russian Morse code, which was creat ...

, which covers Russian and Bulgarian, but replaces eight box drawing characters with four Ukrainian letters Ґ, Є, І, and Ї in both upper case and lower case.

KOI8-RU KOI8-RU is an 8-bit character encoding, designed to cover Russian, Ukrainian, and Belarusian which use a Cyrillic alphabet. It is closely related to KOI8-R, which covers Russian and Bulgarian, but replaces ten box drawing characters with five Ukra ...

is closely related, but adds Ў for Belarusian. In both, the letter allocations match those in

KOI8-E ISO-IR-111 or KOI8-E is an 8-bit character set. It is a multinational extension of KOI-8 for Belarusian, Macedonian, Serbian, and Ukrainian (except Ґґ which is added to KOI8-F). The name "ISO-IR-111" refers to its registration number in the IS ...

, except for Ґ which is added to

KOI8-F KOI8-F or KOI8 Unified is an 8-bit character set. It was designed by Peter Cassetta of Fingertip Software (now defunct) as an attempt to support all the encoded letters from both KOI8-E (ISO-IR-111) and KOI8-RU (and hence also, KOI8-U and KOI8-R ...

. In

Microsoft Windows Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for ...

, KOI8-U is assigned the code page number 21866. In IBM, KOI8-U is assigned code page/

CCSID A CCSID (coded character set identifier) is a 16-bit number that represents a particular encoding of a specific code page. For example, Unicode is a code page that has several encoding (so called "transformation") forms, like UTF-8, UTF-16 and U ...

1168. KOI8 remains much more commonly used than

ISO 8859-5 ISO/IEC 8859-5:1999, ''Information technology — 8-bit single-byte coded graphic character sets — Part 5: Latin/Cyrillic alphabet'', is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1 ...

, which never really caught on. Another common Cyrillic character encoding is

Windows-1251 Windows-1251 is an 8-bit character encoding, designed to cover languages that use the Cyrillic script such as Russian, Ukrainian, Belarusian, Bulgarian, Serbian Cyrillic, Macedonian and other languages. On the web, it is the second most-used ...

. In the future, both may eventually give way to

Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, ...

. KOI8 stands for ''Kod Obmena Informatsiey, 8 bit'' (russian: Код Обмена Информацией, 8 бит) which means "Code for Information Exchange, 8 bit". The KOI8 character sets have the property that the Russian Cyrillic letters are in pseudo-Roman order rather than the natural Cyrillic alphabetical order as in ISO 8859-5. Although this may seem unnatural, it has the useful property that if the eighth bit is stripped, the text can still be read (or at least deciphered) in case-reversed transliteration on an ordinary ASCII terminal. For instance, "Русский Текст" in KOI8-U becomes ''rUSSKIJ tEKST'' ("Russian Text") if the 8th bit is stripped.

Character set

The following table shows the KOI8-U encoding. Each character is shown with its equivalent

code point. Although RFC 2319 says that character 0x95 should be U+2219 (∙), it may also be U+2022 (•) to match the bullet character in

. Some references have a typo and incorrectly state that character 0xB4 is U+0403, rather than the correct U+0404. This typo is present in Appendix A of RFC 2319 (but the table in the main text of the RFC gives the correct mapping).

References

External links

* * * * https://web.archive.org/web/20050206230944/http://www.net.ua/KOI8-U/ {{Character encoding Character sets

Character set

See also

References

Further reading

External links