KOI8-R
   HOME

TheInfoList



OR:

KOI8-R (RFC 1489) is an 8-bit
character encoding Character encoding is the process of assigning numbers to Graphics, graphical character (computing), characters, especially the written characters of Language, human language, allowing them to be Data storage, stored, Data communication, transmi ...
, derived from the
KOI-8 KOI-8 (КОИ-8) is an 8-bit character set standardized in GOST 19768-74. Маркелова Л. Н. Эксплуатация программоуправляемой вычислительной машины «Искра 226». — М.: Ма ...
encoding by the programmer
Andrei Chernov Andrei Aleksandrovich Chernov (russian: Андре́й Алекса́ндрович Чернов, translit=Andréj Aleksándrovič Černóv; 27 August 1966 – 16 August 2017), also known as Andrew Chernov and Ache, was a Soviet and Russian prog ...
in 1993 and designed to cover
Russian Russian(s) refers to anything related to Russia, including: *Russians (, ''russkiye''), an ethnic group of the East Slavic peoples, primarily living in Russia and neighboring countries *Rossiyane (), Russian language term for all citizens and peo ...
, which uses a
Cyrillic , bg, кирилица , mk, кирилица , russian: кириллица , sr, ћирилица, uk, кирилиця , fam1 = Egyptian hieroglyphs , fam2 = Proto-Sinaitic , fam3 = Phoenician , fam4 = G ...
alphabet. KOI8-R was based on
Russian Morse code The Russian Morse code approximates the Morse code for the Latin alphabet. It was enacted by the Russian government in 1856. Полное собрание законов Российской Империи. Собрание Второе. Том XX ...
, which was created from a
phonetic Phonetics is a branch of linguistics that studies how humans produce and perceive sounds, or in the case of sign languages, the equivalent aspects of sign. Linguists who specialize in studying the physical properties of speech are phoneticians. ...
version of Latin
Morse code Morse code is a method used in telecommunication to encode text characters as standardized sequences of two different signal durations, called ''dots'' and ''dashes'', or ''dits'' and ''dahs''. Morse code is named after Samuel Morse, one of ...
. As a result, Russian Cyrillic letters are in pseudo-Roman order rather than the normal Cyrillic alphabetical order. Although this may seem unnatural, if the 8th bit is stripped, the text is partially readable in ASCII and may convert to syntactically correct KOI-7. For example, "Русский Текст" in KOI8-R becomes ''rUSSKIJ tEKST'' ("Russian Text"). KOI8 stands for ''Kod Obmena Informatsiey, 8 bit'' (russian: Код Обмена Информацией, 8 бит) which means "Code for Information Exchange, 8 bit". In
Microsoft Windows Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for serv ...
, KOI8-R is assigned the code page number 20866. In IBM, KOI8-R is assigned code page 878. KOI8-R also happens to cover
Bulgarian Bulgarian may refer to: * Something of, from, or related to the country of Bulgaria * Bulgarians, a South Slavic ethnic group * Bulgarian language, a Slavic language * Bulgarian alphabet * A citizen of Bulgaria, see Demographics of Bulgaria * Bul ...
, but has not been used for that purpose since
CP1251 Windows-1251 is an 8-bit character encoding, designed to cover languages that use the Cyrillic script such as Russian, Ukrainian, Belarusian, Bulgarian, Serbian Cyrillic, Macedonian and other languages. On the web, it is the second most-used ...
was accepted. The use of these older code pages is being replaced with
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expre ...
as a more common way to represent Cyrillic together with other languages.
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expre ...
is preferred to
KOI-8 KOI-8 (КОИ-8) is an 8-bit character set standardized in GOST 19768-74. Маркелова Л. Н. Эксплуатация программоуправляемой вычислительной машины «Искра 226». — М.: Ма ...
and its variants (KOI8-R, the most popular variant, is used by less than 0.004% of websites, mainly used for Russians, which prefer other encodings, and so do Bulgarians too) or other Cyrillic encodings in modern applications, especially on the Internet, making
UTF-8 UTF-8 is a variable-width encoding, variable-length character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode'' (or ''Universal Coded Character Set'') ''Transformation Format 8-bit'' ...
the dominant encoding for web pages. (For further discussion of Unicode's complete coverage, of 436 Cyrillic letters/code points, including for
Old Cyrillic The Early Cyrillic alphabet, also called classical Cyrillic or paleo-Cyrillic, is a writing system that was developed in the First Bulgarian Empire during the late 9th century on the basis of the Greek alphabet for the Slavic people livin ...
, and how single-byte character encodings, such as
Windows-1251 Windows-1251 is an 8-bit character encoding, designed to cover languages that use the Cyrillic script such as Russian, Ukrainian, Belarusian, Bulgarian, Serbian Cyrillic, Macedonian and other languages. On the web, it is the second most-used si ...
and KOI8 variants, cannot provide this, see
Cyrillic script in Unicode As of Unicode version 15.0 Cyrillic script is encoded across several blocks: * CyrillicU+0400–U+04FF 256 characters * Cyrillic SupplementU+0500–U+052F 48 characters * Cyrillic Extended-AU+2DE0–U+2DFF 32 characters * Cyrillic Extended-BU ...
.)


Character set

The following table shows the KOI8-R encoding. Each character is shown with its equivalent
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expre ...
code point.


See also

*
KOI8-B KOI8-B is the informal name for an 8-bit Roman / Cyrillic character set constituting the common subset of the major KOI-8 variants ( KOI8-R, KOI8-U, KOI8-RU, KOI8-E, KOI8-F). Accordingly, it is closely related to KOI8-R, but defines only t ...
, a derivation of KOI8-R with only the letter subset implemented *
KOI8-U KOI8-U (RFC 2319) is an 8-bit character encoding, designed to cover Ukrainian language, Ukrainian, which uses a Cyrillic alphabet. It is based on KOI8-R, which covers Russian language, Russian and Bulgarian language, Bulgarian, but replaces eight b ...
, another derivative encoding which adds
Ukrainian Ukrainian may refer to: * Something of, from, or related to Ukraine * Something relating to Ukrainians, an East Slavic people from Eastern Europe * Something relating to demographics of Ukraine in terms of demography and population of Ukraine * So ...
characters *
KOI character encodings KOI (''КОИ'') is a family of several code pages for the Cyrillic script. The name stands for ''Kod obmena informatsiey'' (russian: Код обмена информацией) which means "Code for Information Interchange". A particular feature ...
*
RELCOM RELCOM or Relcom (russian: РЕЛКОМ, Релком), an acronym for "RELiable COMmunications" is a computer network in Russia. It was launched in the Soviet Union on August 1, 1990 in the Kurchatov Institute in collaboration with DEMOS co-operat ...
*
Windows-1251 Windows-1251 is an 8-bit character encoding, designed to cover languages that use the Cyrillic script such as Russian, Ukrainian, Belarusian, Bulgarian, Serbian Cyrillic, Macedonian and other languages. On the web, it is the second most-used si ...
, another common Cyrillic character encoding


References


Further reading

* * * * *


External links


Universal Cyrillic decoder
an online program that may help recovering
Cyrillic , bg, кирилица , mk, кирилица , russian: кириллица , sr, ћирилица, uk, кирилиця , fam1 = Egyptian hieroglyphs , fam2 = Proto-Sinaitic , fam3 = Phoenician , fam4 = G ...
texts with broken KOI8-R or other
character encoding Character encoding is the process of assigning numbers to Graphics, graphical character (computing), characters, especially the written characters of Language, human language, allowing them to be Data storage, stored, Data communication, transmi ...
s. * * * * {{Character encoding Character sets Computing in the Soviet Union