Code page 866-Latvian
   HOME

TheInfoList



OR:

Code page 866 (
CCSID A CCSID (coded character set identifier) is a 16-bit number that represents a particular encoding of a specific code page. For example, Unicode is a code page that has several encoding (so called "transformation") forms, like UTF-8, UTF-16 and U ...
866) (CP 866, "DOS Cyrillic Russian") is a code page used under
DOS DOS is shorthand for the MS-DOS and IBM PC DOS family of operating systems. DOS may also refer to: Computing * Data over signalling (DoS), multiplexing data onto a signalling channel * Denial-of-service attack (DoS), an attack on a communicat ...
and
OS/2 OS/2 (Operating System/2) is a series of computer operating systems, initially created by Microsoft and IBM under the leadership of IBM software designer Ed Iacobucci. As a result of a feud between the two companies over how to position OS/2 r ...
in
Russia Russia (, , ), or the Russian Federation, is a transcontinental country spanning Eastern Europe and Northern Asia. It is the largest country in the world, with its internationally recognised territory covering , and encompassing one-eig ...
to write
Cyrillic script The Cyrillic script ( ), Slavonic script or the Slavic script, is a writing system used for various languages across Eurasia. It is the designated national script in various Slavic, Turkic, Mongolic, Uralic, Caucasian and Iranic-speaking co ...
. It is based on the "alternative code page" (russian: Альтернативная кодировка) developed in 1984 in IHNA AS USSR and published in 1986 by a research group at the Academy of Science of the USSR. Брябрин В. М., Ландау И. Я., Неменман М. Е
О системе кодирования для персональных ЭВМ
// Микропроцессорные средства и системы. — 1986. — № 4. — С. 61–64.
The code page was widely used during the DOS era because it preserves all of the pseudographic symbols of code page 437 (unlike the " Main code page" or
Code page 855 Code page 855 (CCSID 855) (also known as CP 855, IBM 00855, OEM 855, MS-DOS Cyrillic) is a code page used under DOS to write Cyrillic script. Code page 872 (CCSID 872) is the euro currency update of code page/CCSID 855. Byte CF replaces ¤ with ...
) and maintains alphabetic order (although non-contiguously) of Cyrillic letters (unlike
KOI8-R KOI8-R (RFC 1489) is an 8-bit character encoding, derived from the KOI-8 encoding by the programmer Andrei Chernov in 1993 and designed to cover Russian, which uses a Cyrillic alphabet. KOI8-R was based on Russian Morse code, which was created ...
). Initially, this encoding was only available in the Russian version of MS-DOS 4.01 (1990) and since MS-DOS 6.22 in any language version. The
WHATWG The Web Hypertext Application Technology Working Group (WHATWG) is a community of people interested in evolving HTML and related technologies. The WHATWG was founded by individuals from Apple Inc., the Mozilla Foundation and Opera Software, l ...
Encoding Standard, which specifies the character encodings permitted in
HTML5 HTML5 is a markup language used for structuring and presenting content on the World Wide Web. It is the fifth and final major HTML version that is a World Wide Web Consortium (W3C) recommendation. The current specification is known as the HTML ...
which compliant browsers must support, includes Code page 866. It is the only single-byte encoding listed which is not named as an
ISO 8859 ISO/IEC 8859 is a joint ISO and IEC series of standards for 8-bit character encodings. The series of standards consists of numbered parts, such as ISO/IEC 8859-1, ISO/IEC 8859-2, etc. There are 15 parts, excluding the abandoned ISO/IEC 8859-12. ...
part,
Mac OS Two major famlies of Mac operating systems were developed by Apple Inc. In 1984, Apple debuted the operating system that is now known as the "Classic" Mac OS with its release of the original Macintosh System Software. The system, rebranded "M ...
specific encoding, Microsoft Windows specific encoding (
Windows-874 ISO/IEC 8859-11:2001, ''Information technology — 8-bit single-byte coded graphic character sets — Part 11: Latin/Thai alphabet'', is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 2001. I ...
or
Windows-125x Windows code pages are sets of characters or code pages (known as character encodings in other operating systems) used in Microsoft Windows from the 1980s and 1990s. Windows code pages were gradually superseded when Unicode was implemented in Win ...
) or
KOI-8 KOI-8 (КОИ-8) is an 8-bit character set standardized in GOST 19768-74. Маркелова Л. Н. Эксплуатация программоуправляемой вычислительной машины «Искра 226». — М.: Ма ...
variant. Authors of new pages and the designers of new protocols are instructed to use
UTF-8 UTF-8 is a variable-length character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode'' (or ''Universal Coded Character Set'') ''Transformation Format 8-bit''. UTF-8 is capable of ...
instead. Not identical, but two very similar encodings are standardised in
GOST GOST (russian: ГОСТ) refers to a set of International standard, international Technical standard, technical Standardization, standards maintained by the ''Euro-Asian Council for Standardization, Metrology and Certification (EASC)'', a region ...
R 34.303-92 ГОСТ Р 34.303-92
Наборы 8-битных кодированных символов. 8-битный код обмена и обработки информации.
= 8-bit coded character sets. 8-bit code for information interchange.
as KOI-8 N1 and KOI-8 N2 (not to be confused with the original
KOI-8 KOI-8 (КОИ-8) is an 8-bit character set standardized in GOST 19768-74. Маркелова Л. Н. Эксплуатация программоуправляемой вычислительной машины «Искра 226». — М.: Ма ...
).


Character set

Each character is shown with its equivalent
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, wh ...
code point. Only the second half of the table (code points 128–255) is shown, the first half (code points 0–127) being the same as code page 437.


Variants

There existed a few variants of the code page, but the differences were mostly in the last 16 code points (240–255).


Alternative code page

The original version of the code page by Bryabrin et al. (1986) is called the "Alternative code page" (russian: Альтернативная кодировка), to distinguish it from the "Main code page" (russian: Основная кодировка) by the same authors. It supports only
Russian Russian(s) refers to anything related to Russia, including: *Russians (, ''russkiye''), an ethnic group of the East Slavic peoples, primarily living in Russia and neighboring countries *Rossiyane (), Russian language term for all citizens and peo ...
and Bulgarian. It is mostly the same as code page 866, except for codes F2hex through F7hex (which code page 866 changes to
Ukrainian Ukrainian may refer to: * Something of, from, or related to Ukraine * Something relating to Ukrainians, an East Slavic people from Eastern Europe * Something relating to demographics of Ukraine in terms of demography and population of Ukraine * So ...
and Belarusian letters) and codes F8hex through FBhex (where code page 866 matches code page 437 instead). The differing row is shown below.


Modified code page 866

An unofficial variant with code points 240–255 identical to code page 437. However, the letter Ёё is usually placed at 240 and 241. This version supports only
Russian Russian(s) refers to anything related to Russia, including: *Russians (, ''russkiye''), an ethnic group of the East Slavic peoples, primarily living in Russia and neighboring countries *Rossiyane (), Russian language term for all citizens and peo ...
and Bulgarian. The differing row is shown below.


Lithuanian variants


KBL

The ''KBL'' code page, unofficially known as Code page 771, is the earliest DOS character encoding for Lithuanian. It mostly matches code page 866 and the Alternative code page, but replaces the last row and some block characters with letters from the
Lithuanian alphabet Lithuanian orthography employs a Latin-script alphabet of 32 letters, two of which denote sounds not native to the Lithuanian language. Additionally, it uses five digraphs. Alphabet Today, the Lithuanian alphabet consists of 32 letters. It featu ...
not otherwise present in ASCII. The Russian Ё/ ё is not supported, similarly to KOI-7. A modified version, Code page 773, which replaces the Cyrillic letters with Latvian and Estonian letters, also exists.


LST 1284

Lithuanian Standard LST 1284:1993, known as Code page 1119 or unofficially as Code page 772, mostly matches the "modified" Code page 866, except for the addition of
quotation marks Quotation marks (also known as quotes, quote marks, speech marks, inverted commas, or talking marks) are punctuation marks used in pairs in various writing systems to set off direct speech, a quotation, or a phrase. The pair consists of an ...
in the last row and the replacement of the mixed single-double box-drawing characters with Lithuanian letters (compare
code page 850 Code page 850 ( CCSID 850) (also known as CP 850, IBM 00850, OEM 850, DOS Latin 1) is a code page used under DOS and Psion's EPOC16 operating systems in Western Europe. Depending on the country setting and system configuration, code page 850 i ...
). Unlike KBL, the Russian Ё/ ё is retained. It accompanies LST 1283 ( Code page 774/1118), which encodes the additional Lithuanian letters at the same locations as LST 1284, but is based on Code page 437 instead. It was later superseded by LST 1590-1 ( Code page 775), which encodes these Lithuanian letters in the same locations, but does not include Cyrillic letters, replacing them with Latvian and Estonian letters.


Ukrainian and Belarusian variants

Ukrainian standard RST 2018-91 is designated by IBM as Code page 1125 (CCSID 1125), abbreviated CP1125, and also known as CP866U, CP866NAV or RUSCII. It matches the original Alternative code page for all points except for F2hex through F9hex inclusive, which are replaced with
Ukrainian Ukrainian may refer to: * Something of, from, or related to Ukraine * Something relating to Ukrainians, an East Slavic people from Eastern Europe * Something relating to demographics of Ukraine in terms of demography and population of Ukraine * So ...
letters. Code page/CCSID 1131 matches code page 866 for all points except for F8hex, F9hex, and FChex through FEhex inclusive, which are replaced with otherwise-missing Ukrainian and Belarusian letters, in the process displacing the bullet character (∙) from F9hex to FEhex. The differing rows are shown below.


Euro sign updates

IBM code page/CCSID 808 is a variant of code page/CCSID 866; with the euro sign (€, U+20AC) in position FDhex, replacing the
universal currency sign The currency sign is a character (symbol), character used to denote an unspecified currency. It can be described as a circle the size of a lowercase character with four short radiating arms at 45° (NE), 135° (SE), 225° (SW) and 315° (NW). I ...
(¤). IBM code page/CCSID 848 is a variant of code page/CCSID 1125 with the euro sign at FDhex, replacing ¤. IBM code page/CCSID 849 is a variant of code page/CCSID 1131 with the euro sign at FBhex, replacing ¤.


GOST R 34.303-92

The GOST R 34.303-92 standard defines two variants. The more extensive variant, KOI-8 N2 (but not to be confused with the
KOI-8 KOI-8 (КОИ-8) is an 8-bit character set standardized in GOST 19768-74. Маркелова Л. Н. Эксплуатация программоуправляемой вычислительной машины «Искра 226». — М.: Ма ...
encoding, which it does not follow), matches code page 866 and the Alternative code page until the last row (codes 240 through 255, or F0hex through FFhex). For the last row, it supports letters for Belarusian and
Ukrainian Ukrainian may refer to: * Something of, from, or related to Ukraine * Something relating to Ukrainians, an East Slavic people from Eastern Europe * Something relating to demographics of Ukraine in terms of demography and population of Ukraine * So ...
in addition to Russian, but in a layout unrelated to code page 866 or 1125. Notably, even the Russian Ё/ ё (which was unchanged between the Alternative code page and code page 866) is in a different location. The differing row is shown below. The other variant, KOI-8 N1, is a subset of KOI-8 N2 which omits the non-Russian Cyrillic letters and mixed single/double lined
box-drawing character Box-drawing characters, also known as line-drawing characters, are a form of semigraphics widely used in text user interfaces to draw various geometric frames and boxes. Box-drawing characters typically only work well with monospaced fonts. ...
s, leaving them empty for further internationalization (compare with
code page 850 Code page 850 ( CCSID 850) (also known as CP 850, IBM 00850, OEM 850, DOS Latin 1) is a code page used under DOS and Psion's EPOC16 operating systems in Western Europe. Depending on the country setting and system configuration, code page 850 i ...
). The affected rows are shown below.


Lehner–Czech modification

An unofficial modification used in software developed by Michael Lehner and
Peter R. Czech Peter may refer to: People * List of people named Peter, a list of people and fictional characters with the given name * Peter (given name) ** Saint Peter (died 60s), apostle of Jesus, leader of the early Christian Church * Peter (surname), a sur ...
. It replaces three mathematic symbols with
guillemet Guillemets (, also , , ) are a pair of punctuation marks in the form of sideways double chevrons, and , used as quotation marks in a number of languages. In some of these languages "single" guillemets, and , are used for a quotation inside a ...
s and the
section sign The section sign, §, is a typographical character for referencing individually numbered sections of a document; it is frequently used when citing sections of a legal code. It is also known as the section symbol, section mark, double-s, or ...
which are commonly used in the Russian language. (Lehner and Czech created a number of alternative character sets for other European languages as well, including one based on
CWI-2 CWI-2 (a.k.a. CWI, cp-hu, HUCWI, or HU8CWI2) is a Hungarian code page frequently used in the 1980s and early 1990s. If this code page is erroneously interpreted as code page 437, it will still be fairly readable (e.g. Á in place of Å). Character ...
for Hungarian, a Kamenicky-based one for
Czech Czech may refer to: * Anything from or related to the Czech Republic, a country in Europe ** Czech language ** Czechs, the people of the area ** Czech culture ** Czech cuisine * One of three mythical brothers, Lech, Czech, and Rus' Places * Czech, ...
and Slovak, a
Mazovia Mazovia or Masovia ( pl, Mazowsze) is a historical region in mid-north-eastern Poland. It spans the North European Plain, roughly between Łódź and Białystok, with Warsaw being the unofficial capital and largest city. Throughout the centurie ...
variant for
Polish Polish may refer to: * Anything from or related to Poland, a country in Europe * Polish language * Poles Poles,, ; singular masculine: ''Polak'', singular feminine: ''Polka'' or Polish people, are a West Slavic nation and ethnic group, w ...
and a seemingly-unique encoding for Lithuanian. The modified row is shown below.


Latvian variant

A Latvian variant, supported by Star printers and FreeDOS, is code page 3012. This encoding is nicknamed "RusLat".


FreeDOS

FreeDOS FreeDOS (formerly Free-DOS and PD-DOS) is a free software operating system for IBM PC compatible computers. It intends to provide a complete MS-DOS-compatible environment for running legacy software and supporting embedded systems. FreeDOS can ...
provides additional unofficial extensions of code page 866 for various non-Slavic languages: * 30002 – Cyrillic Tajik * 30008 – Cyrillic Abkhaz and Ossetian * 30010 – Cyrillic Gagauz and Moldovan * 30011 – Cyrillic Russian Southern District ( Kalmyk,
Karachay-Balkar Karachay-Balkar (, ), or Mountain Turkic (, ), is a Turkic language spoken by the Karachays and Balkars in Kabardino-Balkaria and Karachay–Cherkessia, European Russia, as well as by an immigrant population in Afyonkarahisar Province, Turkey. ...
, Ossetian, North Caucasian) * 30012 – Cyrillic Russian Siberian and Far Eastern Districts ( Altai, Buryat,
Khakas The Khakas (also spelled Khakass; Khakas: , ''khakas'', , ''tadar'', , ''khakastar'', , ''tadarlar'') are a Turkic indigenous people of Siberia, who live in the republic of Khakassia, Russia. They speak the Khakas language. The Khakhassian ...
, Tuvan, Yakut, Tungusic,
Paleo-Siberian Paleosiberian (or Paleo-Siberian) languages or Paleoasian (Paleo-Asiatic) (from , "ancient") are several linguistic isolates and small families of languages spoken in parts of northeastern Siberia and the Russian Far East. They are not know ...
) * 30013 – Cyrillic Volga District – Turkic languages ( Bashkir, Chuvash,
Tatar The Tatars ()Tatar
in the Collins English Dictionary
is an umbrella term for different
) * 30014 – Cyrillic Volga District – Finno-Ugric languages ( Mari, Udmurt) * 30015 – Cyrillic
Khanty The Khanty ( Khanty: ханти, ''hanti''), also known in older literature as Ostyaks (russian: остяки) are a Ugric indigenous people, living in Khanty–Mansi Autonomous Okrug, a region historically known as "Yugra" in Russia, togethe ...
* 30016 – Cyrillic
Mansi Mansi may refer to: People * Mansi people, an indigenous people living in Tyumen Oblast, Russia ** Mansi language * Giovanni Domenico Mansi Gian (Giovanni) Domenico Mansi (16 February 1692 – 27 September 1769) was an Italian prelate, theolog ...
* 30017 – Cyrillic Northwestern District (Cyrillic Nenets, Latin Karelian, Latin Veps) * 30018 – Latin
Tatar The Tatars ()Tatar
in the Collins English Dictionary
is an umbrella term for different
and Cyrillic Russian * 30019 – Latin Chechen and Cyrillic Russian * 58152 – Cyrillic Kazakh with euro * 58210 – Cyrillic
Azeri Azerbaijanis (; az, Azərbaycanlılar, ), Azeris ( az, Azərilər, ), or Azerbaijani Turks ( az, Azərbaycan Türkləri, ) are a Turkic people living mainly in northwestern Iran and the Republic of Azerbaijan. They are the second-most numer ...
* 59234 – Cyrillic
Tatar The Tatars ()Tatar
in the Collins English Dictionary
is an umbrella term for different
* 60258 – Latin
Azeri Azerbaijanis (; az, Azərbaycanlılar, ), Azeris ( az, Azərilər, ), or Azerbaijani Turks ( az, Azərbaycan Türkləri, ) are a Turkic people living mainly in northwestern Iran and the Republic of Azerbaijan. They are the second-most numer ...
and Cyrillic Russian * 62306 – Cyrillic Uzbek


Code page 900

Before Microsoft's final code page for
Russian MS-DOS 4.01 In computing, Russification involves the localization of computers and software, allowing the user interface of a computer and its software to communicate in the Russian language using Cyrillic script. Problems associated with Russification befo ...
was registered with IBM by Franz Rau of Microsoft as CP866 in January 1990, draft versions of it developed by Yuri Starikov (Юрий Стариков) of Dialogue were still called code page 900 internally. While the documentation was corrected to reflect the new name before the release of the product, sketches of earlier draft versions still named code page 900 and without Ukrainian and Belarusian letters, which had been added in autumn 1989, were published in the Russian press in 1990. Code page 900 slipped through into the distribution of the
Russian MS-DOS 5.0 In computing, Russification involves the localization of computers and software, allowing the user interface of a computer and its software to communicate in the Russian language using Cyrillic script. Problems associated with Russification bef ...
LCD.CPI codepage information file.


Notes


References


Further reading

* {{Character encoding 866