Code page 866 (
CCSID
A CCSID (coded character set identifier) is a 16-bit number that represents a particular encoding of a specific code page. For example, Unicode is a code page that has several character encoding schemes (referred to as "transformation formats")—i ...
866) (CP 866, "DOS Cyrillic Russian")
is a
code page
In computing, a code page is a character encoding and as such it is a specific association of a set of printable character (computing), characters and control characters with unique numbers. Typically each number represents the binary value in a s ...
used under
DOS
DOS (, ) is a family of disk-based operating systems for IBM PC compatible computers. The DOS family primarily consists of IBM PC DOS and a rebranded version, Microsoft's MS-DOS, both of which were introduced in 1981. Later compatible syste ...
and
OS/2
OS/2 is a Proprietary software, proprietary computer operating system for x86 and PowerPC based personal computers. It was created and initially developed jointly by IBM and Microsoft, under the leadership of IBM software designer Ed Iacobucci, ...
in
Russia
Russia, or the Russian Federation, is a country spanning Eastern Europe and North Asia. It is the list of countries and dependencies by area, largest country in the world, and extends across Time in Russia, eleven time zones, sharing Borders ...
to write
Cyrillic script
The Cyrillic script ( ) is a writing system used for various languages across Eurasia. It is the designated national script in various Slavic languages, Slavic, Turkic languages, Turkic, Mongolic languages, Mongolic, Uralic languages, Uralic, C ...
.
It is based on the "alternative code page" () developed in 1984 in IHNA AS USSR and published in 1986 by a research group at the Academy of Science of the USSR.
[ Брябрин В. М., Ландау И. Я., Неменман М. Е]
О системе кодирования для персональных ЭВМ
// Микропроцессорные средства и системы. — 1986. — № 4. — С. 61–64. The code page was widely used during the DOS era because it preserves all of the
pseudographic symbols of
code page 437
Code page 437 ( CCSID 437) is the character set of the original IBM PC (personal computer). It is also known as CP437, OEM-US, OEM 437, PC-8, or MS-DOS Latin US. The set includes all printable ASCII characters as well as some accented letters (di ...
(unlike the "
Main code page" or Code page 855) and maintains alphabetic order (although non-contiguously) of Cyrillic letters (unlike
KOI8-R
KOI8-R (RFC 1489) is an 8-bit character encoding derived from the KOI-8 encoding by the programmer Andrei Chernov in 1993 and designed to cover Russian, which uses the Russian subset of a Cyrillic script. KOI-8, on its turn, is an 8-bit exten ...
). Initially this encoding was only available in the Russian version of MS-DOS 4.01 (1990), but with MS-DOS 6.22 it became available in any language version.
The
WHATWG
The Web Hypertext Application Technology Working Group (WHATWG) is a community of people interested in evolving HTML and related technologies. The WHATWG was founded by individuals from Apple Inc., the Mozilla Foundation and Opera Software, ...
Encoding Standard, which specifies the character encodings permitted in
HTML5
HTML5 (Hypertext Markup Language 5) is a markup language used for structuring and presenting hypertext documents on the World Wide Web. It was the fifth and final major HTML version that is now a retired World Wide Web Consortium (W3C) recommend ...
which compliant browsers must support, includes Code page 866.
It is the only single-byte encoding listed which is not named as an
ISO 8859
ISO/IEC 8859 is a joint International Organization for Standardization, ISO and International Electrotechnical Commission, IEC series of standards for 8-bit character encodings. The series of standards consists of numbered parts, such as ISO/IEC ...
part,
Mac OS
Mac operating systems were developed by Apple Inc. in a succession of two major series.
In 1984, Apple debuted the operating system that is now known as the classic Mac OS with its release of the original Macintosh System Software. The system ...
specific encoding,
Microsoft Windows
Windows is a Product lining, product line of Proprietary software, proprietary graphical user interface, graphical operating systems developed and marketed by Microsoft. It is grouped into families and subfamilies that cater to particular sec ...
specific encoding (
Windows-874
ISO/IEC 8859-11:2001, ''Information technology — 8-bit single-byte coded graphic character sets — Part 11: Latin/Thai alphabet'', is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 2001. I ...
or
Windows-125x
Windows code pages are sets of characters or code pages (known as character encodings in other operating systems) used in Microsoft Windows from the 1980s and 1990s. Windows code pages were gradually superseded when Unicode was implemented in Win ...
) or
KOI-8
KOI-8 (КОИ-8) is an 8-bit character set standardized in GOST 19768-74. Маркелова Л. Н. Эксплуатация программоуправляемой вычислительной машины «Искра 226». — М.: М� ...
variant.
Authors of new pages and the designers of new protocols are instructed to use
UTF-8
UTF-8 is a character encoding standard used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode Transformation Format 8-bit''. Almost every webpage is transmitted as UTF-8.
UTF-8 supports all 1,112,0 ...
instead.
A number of variants were used in different Russian territories that had slightly different sets of characters.
Character set
Each non-ASCII character is shown with its equivalent
Unicode
Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
code point. The first half (code points 0–127) of this table is the same as that of
code page 437
Code page 437 ( CCSID 437) is the character set of the original IBM PC (personal computer). It is also known as CP437, OEM-US, OEM 437, PC-8, or MS-DOS Latin US. The set includes all printable ASCII characters as well as some accented letters (di ...
.
Variants
There existed a few variants of the code page, but the differences were mostly in the last 16 code points (240–255).
Alternative code page
The original version of the code page by Bryabrin et al. (1986)
is called the "Alternative code page" (), to distinguish it from the "Main code page" () by the same authors. It supports only
Russian
Russian(s) may refer to:
*Russians (), an ethnic group of the East Slavic peoples, primarily living in Russia and neighboring countries
*A citizen of Russia
*Russian language, the most widely spoken of the Slavic languages
*''The Russians'', a b ...
and
Bulgarian. It is mostly the same as code page 866, except for codes F2
hex through F7
hex (which code page 866 changes to
Ukrainian and
Belarusian letters) and codes F8
hex through FB
hex (where code page 866 matches
code page 437
Code page 437 ( CCSID 437) is the character set of the original IBM PC (personal computer). It is also known as CP437, OEM-US, OEM 437, PC-8, or MS-DOS Latin US. The set includes all printable ASCII characters as well as some accented letters (di ...
instead). The differing row is shown below.
Modified code page 866
An unofficial variant with code points 240–255 identical to
code page 437
Code page 437 ( CCSID 437) is the character set of the original IBM PC (personal computer). It is also known as CP437, OEM-US, OEM 437, PC-8, or MS-DOS Latin US. The set includes all printable ASCII characters as well as some accented letters (di ...
. However, the letter Ёё is usually placed at 240 and 241. This version supports only
Russian
Russian(s) may refer to:
*Russians (), an ethnic group of the East Slavic peoples, primarily living in Russia and neighboring countries
*A citizen of Russia
*Russian language, the most widely spoken of the Slavic languages
*''The Russians'', a b ...
and
Bulgarian. The differing row is shown below.
GOST R 34.303-92
The
GOST
GOST () refers to a set of international technical standards maintained by the Euro-Asian Council for Standardization, Metrology and Certification (EASC), a regional standards organization operating under the auspices of the Commonwealth of I ...
R 34.303-92 standard
[ ГОСТ Р 34.303-92]
Наборы 8-битных кодированных символов. 8-битный код обмена и обработки информации.
= 8-bit coded character sets. 8-bit code for information interchange. defines two variants, KOI-8 N1 and KOI-8 N2. These are not to be confused with the
KOI-8
KOI-8 (КОИ-8) is an 8-bit character set standardized in GOST 19768-74. Маркелова Л. Н. Эксплуатация программоуправляемой вычислительной машины «Искра 226». — М.: М� ...
encoding, which they do not adhere to.
KOI-8 N2
KOI-8 N2 is the more extensive variant and matches code page 866 and the Alternative code page except for the last row or ''
stick''. For this last row, it supports letters for
Belarusian and
Ukrainian in addition to Russian, but in a layout unrelated to code page 866 or 1125. Notably the Russian
Ё/
ё (which was unchanged between the Alternative code page and code page 866) is also in a different location. KOI-8 N2's final stick is shown below.
KOI-8 N1
The other variant, KOI-8 N1, is a subset of KOI-8 N2 which omits the non-Russian Cyrillic letters and mixed single/double lined
box-drawing character
Box-drawing characters, also known as line-drawing characters, are a form of semigraphics widely used in text user interfaces to draw various geometric frames and boxes. These characters are characterized by being designed to be connected horiz ...
s, leaving them empty for further internationalization (compare with
code page 850). The affected
sticks are shown below.
Lithuanian variants
KBL
The ''KBL'' code page, unofficially known as Code page 771,
is the earliest DOS character encoding for Lithuanian.
It mostly matches code page 866 and the Alternative code page, but replaces the last row and some
block characters with letters from the
Lithuanian alphabet
Lithuanian orthography employs a Latin-script alphabet of 32 letters, two of which denote sounds not native to the Lithuanian language. Additionally, it uses five digraphs.
Alphabet
Today, the Lithuanian alphabet consists of 32 Letter (alphabet) ...
not otherwise present in ASCII. The Russian
Ё/
ё is not supported,
similarly to
KOI-7
KOI-7 (КОИ-7) is a 7-bit character encoding, designed to cover Russian, which uses the Cyrillic alphabet.
In Russian, KOI-7 stands for ''Kod Obmena Informatsiey, 7 bit'' (Код Обмена Информацией, 7 бит) which means "Co ...
.
A modified version, Code page 773, which replaces the Cyrillic letters with
Latvian and
Estonian
Estonian may refer to:
* Something of, from, or related to Estonia, a country in the Baltic region in northern Europe
* Estonians, people from Estonia, or of Estonian descent
* Estonian language
* Estonian cuisine
* Estonian culture
See also ...
letters, also exists.
LST 1284
Lithuanian Standard LST 1284:1993, known as Code page 1119 or unofficially as Code page 772,
mostly matches the "modified" Code page 866, except for the addition of
quotation marks
Quotation marks are punctuation marks used in pairs in various writing systems to identify direct speech, a quotation, or a phrase. The pair consists of an opening quotation mark and a closing quotation mark, which may or may not be the sa ...
in the last row and the replacement of the mixed single-double box-drawing characters with Lithuanian letters (compare
code page 850). Unlike KBL, the Russian
Ё/
ё is retained.
It accompanies LST 1283 (
Code page 774/1118), which encodes the additional Lithuanian letters at the same locations as LST 1284, but is based on
Code page 437
Code page 437 ( CCSID 437) is the character set of the original IBM PC (personal computer). It is also known as CP437, OEM-US, OEM 437, PC-8, or MS-DOS Latin US. The set includes all printable ASCII characters as well as some accented letters (di ...
instead. It was later superseded by LST 1590-1 (Code page 775),
which encodes these Lithuanian letters in the same locations, but does not include Cyrillic letters, replacing them with Latvian and Estonian letters.
Ukrainian and Belarusian variants
Ukrainian standard RST 2018-91 is designated by IBM as Code page 1125 (CCSID 1125), abbreviated CP1125, and also known as CP866U, CP866NAV or RUSCII. It matches the original Alternative code page for all points except for F2
hex through F9
hex inclusive, which are replaced with
Ukrainian letters.
Code page/CCSID 1131 matches code page 866 for all points except for F8
hex, F9
hex, and FC
hex through FE
hex inclusive, which are replaced with otherwise-missing Ukrainian and
Belarusian letters, in the process displacing the
bullet character (∙) from F9
hex to FE
hex. The differing rows are shown below.
Also, the so-called CP 866ukr code page is a modified version of CP866 with the replacement of Ўў by Іі. Unlike CP1125, it maintains full compatibility of Ukrainian letters with CP866, although Ґґ is missing. It is not included in the standard Windows distributions, but some users install a home-made patch that allows using this encoding to work in command-line programs (such as
FAR Manager
Far Manager (short for ''File and ARchive Manager'') is an orthodox file manager for Microsoft Windows and is a clone of Norton Commander. Far Manager uses the Win32 console and has a keyboard-oriented user interface (although limited mouse o ...
) with filenames containing the Cyrillic Іі.
Hryvnia variants
FreeDOS code page 30040 is a variant of code page 866 which replaces the currency sign (¤) at byte 0xFD with the hryvnia sign (₴, U+20B4).
FreeDOS code page 30039 is a variant of code page 1125 which makes the same replacement.
Euro sign updates
IBM code page/CCSID 808 is a variant of code page/CCSID 866; with the
euro sign
The euro sign () is the currency sign used for the euro, the official currency of the eurozone. The design was presented to the public by the European Commission on 12 December 1996. It consists of a stylized letter E (or epsilon), crossed by ...
(€, U+20AC) in position FD
hex, replacing the
universal currency sign (¤).
IBM code page/CCSID 848 is a variant of code page/CCSID 1125 with the euro sign at FD
hex, replacing ¤.
IBM code page/CCSID 849 is a variant of code page/CCSID 1131 with the euro sign at FB
hex, replacing ¤.
Lehner–Czech modification
An unofficial modification used in software developed by
Michael Lehner and
Peter R. Czech. It replaces three mathematic symbols with
guillemets and the
section sign
The section sign (§) is a typographical character for referencing individually numbered sections of a document; it is frequently used when citing sections of a legal code. It is also known as the section symbol, section mark, double-s, or si ...
which are commonly used in the Russian language. (Lehner and Czech created a number of alternative character sets for other European languages as well, including one based on
CWI-2 for
Hungarian, a
Kamenicky-based one for
Czech
Czech may refer to:
* Anything from or related to the Czech Republic, a country in Europe
** Czech language
** Czechs, the people of the area
** Czech culture
** Czech cuisine
* One of three mythical brothers, Lech, Czech, and Rus
*Czech (surnam ...
and
Slovak, a
Mazovia
Mazovia or Masovia ( ) is a historical region in mid-north-eastern Poland. It spans the North European Plain, roughly between Łódź and Białystok, with Warsaw being the largest city and Płock being the capital of the region . Throughout the ...
variant for
Polish and a seemingly-unique encoding for
Lithuanian. The modified row is shown below.
Latvian variant
A Latvian variant, supported by
Star
A star is a luminous spheroid of plasma (physics), plasma held together by Self-gravitation, self-gravity. The List of nearest stars and brown dwarfs, nearest star to Earth is the Sun. Many other stars are visible to the naked eye at night sk ...
printers and FreeDOS, is code page 3012 (earlier FreeDOS called it code page 61282). This encoding is nicknamed "RusLat".
FreeDOS
FreeDOS
FreeDOS (formerly PD-DOS) is a free software operating system for IBM PC compatible computers. It intends to provide a complete MS-DOS-compatible environment for running Legacy system, legacy software and supporting embedded systems. FreeDOS ca ...
provides additional unofficial extensions of code page 866 for various non-Slavic languages:
* 30002 – Cyrillic
Tajik
* 30008 – Cyrillic
Abkhaz and
Ossetian
* 30010 – Cyrillic
Gagauz and
Moldovan
* 30011 – Cyrillic Russian Southern District (
Kalmyk,
Karachay-Balkar
Karachay–Balkar (, ), often referred to as the "mountaineer language" (, ) by its speakers, is a Turkic language spoken by the Karachays and Balkars in Kabardino-Balkaria and Karachay–Cherkessia, European Russia, as well as by an immigra ...
,
Ossetian,
North Caucasian)
* 30012 – Cyrillic Russian Siberian and Far Eastern Districts (
Altai,
Buryat,
Khakas,
Tuvan,
Yakut,
Tungusic,
Paleo-Siberian)
* 30013 – Cyrillic Volga District – Turkic languages (
Bashkir,
Chuvash,
Tatar)
* 30014 – Cyrillic Volga District – Finno-Ugric languages (
Mari,
Udmurt)
* 30015 – Cyrillic
Khanty
The Khanty (), also known in older literature as Ostyaks (), are a Ugric Indigenous people, living in Khanty–Mansi Autonomous Okrug, a region historically known as " Yugra" in Russia, together with the Mansi. In the autonomous okrug, the K ...
* 30016 – Cyrillic
Mansi
Mansi may refer to:
* Mansi people, an Indigenous people of Russia
** Mansi language
*Mansi (name), given name and surname
*Mansi Junction railway station
* Mansi Township, Myanmar
** Mansi, Myanmar, a town in the Kachin State of Myanmar (Burma)
* ...
* 30017 – Cyrillic Northwestern District (Cyrillic
Nenets, Latin
Karelian, Latin
Veps)
* 30018 – Latin
Tatar and Cyrillic Russian
* 30019 – Latin
Chechen and Cyrillic Russian
* 58152 – Cyrillic
Kazakh with euro
* 58210 – Cyrillic
Azeri
* 59234 – Cyrillic
Tatar
* 60258 – Latin
Azeri and Cyrillic Russian
* 62306 – Cyrillic
Uzbek
Code page 900
Before Microsoft's final code page for
Russian MS-DOS 4.01 was registered with IBM by Franz Rau of Microsoft as CP866 in January 1990, draft versions of it developed by Yuri Starikov (Юрий Стариков) of Dialogue were still called code page 900 internally. While the documentation was corrected to reflect the new name before the release of the product, sketches of earlier draft versions still named code page 900 and without Ukrainian and Belarusian letters, which had been added in autumn 1989, were published in the Russian press in 1990.
Code page 900 slipped through into the distribution of the
Russian MS-DOS 5.0 LCD.CPI codepage information file.
Notes
References
Further reading
*
{{Character encoding
866
__NOTOC__
Year 866 ( DCCCLXVI) was a common year starting on Tuesday of the Julian calendar.
Events
By place
Byzantine Empire
* April 21 – Bardas, the regent of the Byzantine Empire, is murdered by Basil the Macedonian at Miletu ...