KOI8-R (RFC 1489) is an 8-bit

character encoding Character encoding is the process of assigning numbers to Graphics, graphical character (computing), characters, especially the written characters of Language, human language, allowing them to be Data storage, stored, Data communication, transmi ...

, derived from the

KOI-8 KOI-8 (КОИ-8) is an 8-bit character set standardized in GOST 19768-74. Маркелова Л. Н. Эксплуатация программоуправляемой вычислительной машины «Искра 226». — М.: Ма ...

encoding by the programmer

Andrei Chernov Andrei Aleksandrovich Chernov (russian: Андре́й Алекса́ндрович Чернов, translit=Andréj Aleksándrovič Černóv; 27 August 1966 – 16 August 2017), also known as Andrew Chernov and Ache, was a Soviet and Russian prog ...

in 1993 and designed to cover

Russian Russian(s) refers to anything related to Russia, including: *Russians (, ''russkiye''), an ethnic group of the East Slavic peoples, primarily living in Russia and neighboring countries *Rossiyane (), Russian language term for all citizens and peo ...

, which uses a

Cyrillic , bg, кирилица , mk, кирилица , russian: кириллица , sr, ћирилица, uk, кирилиця , fam1 = Egyptian hieroglyphs , fam2 = Proto-Sinaitic , fam3 = Phoenician , fam4 = G ...

alphabet. KOI8-R was based on

Russian Morse code The Russian Morse code approximates the Morse code for the Latin alphabet. It was enacted by the Russian government in 1856. Полное собрание законов Российской Империи. Собрание Второе. Том XX ...

, which was created from a

phonetic Phonetics is a branch of linguistics that studies how humans produce and perceive sounds, or in the case of sign languages, the equivalent aspects of sign. Linguists who specialize in studying the physical properties of speech are phoneticians. ...

version of Latin

Morse code Morse code is a method used in telecommunication to encode text characters as standardized sequences of two different signal durations, called ''dots'' and ''dashes'', or ''dits'' and ''dahs''. Morse code is named after Samuel Morse, one of ...

. As a result, Russian Cyrillic letters are in pseudo-Roman order rather than the normal Cyrillic alphabetical order. Although this may seem unnatural, if the 8th bit is stripped, the text is partially readable in ASCII and may convert to syntactically correct KOI-7. For example, "Русский Текст" in KOI8-R becomes ''rUSSKIJ tEKST'' ("Russian Text"). KOI8 stands for ''Kod Obmena Informatsiey, 8 bit'' (russian: Код Обмена Информацией, 8 бит) which means "Code for Information Exchange, 8 bit". In

Microsoft Windows Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for serv ...

, KOI8-R is assigned the code page number 20866. In IBM, KOI8-R is assigned code page 878. KOI8-R also happens to cover

Bulgarian Bulgarian may refer to: * Something of, from, or related to the country of Bulgaria * Bulgarians, a South Slavic ethnic group * Bulgarian language, a Slavic language * Bulgarian alphabet * A citizen of Bulgaria, see Demographics of Bulgaria * Bul ...

, but has not been used for that purpose since

CP1251 Windows-1251 is an 8-bit character encoding, designed to cover languages that use the Cyrillic script such as Russian, Ukrainian, Belarusian, Bulgarian, Serbian Cyrillic, Macedonian and other languages. On the web, it is the second most-used ...

was accepted. The use of these older code pages is being replaced with

Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expre ...

as a more common way to represent Cyrillic together with other languages.

is preferred to

and its variants (KOI8-R, the most popular variant, is used by less than 0.004% of websites, mainly used for Russians, which prefer other encodings, and so do Bulgarians too) or other Cyrillic encodings in modern applications, especially on the Internet, making

UTF-8 UTF-8 is a variable-width encoding, variable-length character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode'' (or ''Universal Coded Character Set'') ''Transformation Format 8-bit'' ...

the dominant encoding for web pages. (For further discussion of Unicode's complete coverage, of 436 Cyrillic letters/code points, including for

Old Cyrillic The Early Cyrillic alphabet, also called classical Cyrillic or paleo-Cyrillic, is a writing system that was developed in the First Bulgarian Empire during the late 9th century on the basis of the Greek alphabet for the Slavic people livin ...

, and how single-byte character encodings, such as

Windows-1251 Windows-1251 is an 8-bit character encoding, designed to cover languages that use the Cyrillic script such as Russian, Ukrainian, Belarusian, Bulgarian, Serbian Cyrillic, Macedonian and other languages. On the web, it is the second most-used si ...

and KOI8 variants, cannot provide this, see

Cyrillic script in Unicode As of Unicode version 15.0 Cyrillic script is encoded across several blocks: * CyrillicU+0400–U+04FF 256 characters * Cyrillic SupplementU+0500–U+052F 48 characters * Cyrillic Extended-AU+2DE0–U+2DFF 32 characters * Cyrillic Extended-BU ...

Character set

The following table shows the KOI8-R encoding. Each character is shown with its equivalent

code point.

References

External links

Universal Cyrillic decoder
an online program that may help recovering

texts with broken KOI8-R or other

s. * * * * {{Character encoding Character sets Computing in the Soviet Union

Character set

See also

References

Further reading

External links