Code page 1251
   HOME

TheInfoList



OR:

Windows-1251 is an 8-bit
character encoding Character encoding is the process of assigning numbers to Graphics, graphical character (computing), characters, especially the written characters of Language, human language, allowing them to be Data storage, stored, Data communication, transmi ...
, designed to cover languages that use the
Cyrillic script The Cyrillic script ( ), Slavonic script or the Slavic script, is a writing system used for various languages across Eurasia. It is the designated national script in various Slavic, Turkic, Mongolic, Uralic, Caucasian and Iranic-speaking co ...
such as
Russian Russian(s) refers to anything related to Russia, including: *Russians (, ''russkiye''), an ethnic group of the East Slavic peoples, primarily living in Russia and neighboring countries *Rossiyane (), Russian language term for all citizens and peo ...
,
Ukrainian Ukrainian may refer to: * Something of, from, or related to Ukraine * Something relating to Ukrainians, an East Slavic people from Eastern Europe * Something relating to demographics of Ukraine in terms of demography and population of Ukraine * So ...
, Belarusian,
Bulgarian Bulgarian may refer to: * Something of, from, or related to the country of Bulgaria * Bulgarians, a South Slavic ethnic group * Bulgarian language, a Slavic language * Bulgarian alphabet * A citizen of Bulgaria, see Demographics of Bulgaria * Bul ...
,
Serbian Cyrillic The Serbian Cyrillic alphabet ( sr, / , ) is a variation of the Cyrillic script used to write the Serbian language, updated in 1818 by Serbian linguist Vuk Karadžić. It is one of the two alphabets used to write standard modern Serbian, t ...
, Macedonian and other languages. On the web, it is the second most-used single-byte character encoding (or third most-used character encoding overall), and most used of the single-byte encodings supporting Cyrillic. , 0.4% of all
website A website (also written as a web site) is a collection of web pages and related content that is identified by a common domain name and published on at least one web server. Examples of notable websites are Google, Facebook, Amazon, and Wi ...
s use Windows-1251. It's by far mostly used for Russian, while a small minority of Russian websites use it, with 93.7% of Russian (.ru) websites using
UTF-8 UTF-8 is a variable-length character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode'' (or ''Universal Coded Character Set'') ''Transformation Format 8-bit''. UTF-8 is capable of ...
, and the legacy 8-bit encoding is distant second. In Linux, the encoding is known as cp1251. IBM uses code page 1251 (
CCSID A CCSID (coded character set identifier) is a 16-bit number that represents a particular encoding of a specific code page. For example, Unicode is a code page that has several encoding (so called "transformation") forms, like UTF-8, UTF-16 and U ...
1251 and euro sign extended CCSID 5347) for Windows-1251. Windows-1251 and
KOI8-R KOI8-R (RFC 1489) is an 8-bit character encoding, derived from the KOI-8 encoding by the programmer Andrei Chernov in 1993 and designed to cover Russian, which uses a Cyrillic alphabet. KOI8-R was based on Russian Morse code, which was created ...
(or its
Ukrainian Ukrainian may refer to: * Something of, from, or related to Ukraine * Something relating to Ukrainians, an East Slavic people from Eastern Europe * Something relating to demographics of Ukraine in terms of demography and population of Ukraine * So ...
variant
KOI8-U KOI8-U (RFC 2319) is an 8-bit character encoding, designed to cover Ukrainian, which uses a Cyrillic alphabet. It is based on KOI8-R, which covers Russian and Bulgarian, but replaces eight box drawing characters with four Ukrainian letters Ґ ...
) are much more commonly used than
ISO 8859-5 ISO/IEC 8859-5:1999, ''Information technology — 8-bit single-byte coded graphic character sets — Part 5: Latin/Cyrillic alphabet'', is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 198 ...
(which is used by less than 0.0004% of websites). In contrast to
Windows-1252 Windows-1252 or CP-1252 ( code page 1252) is a single-byte character encoding of the Latin alphabet, used by default in the legacy components of Microsoft Windows for English and many European languages including Spanish, French, and German. I ...
and ISO 8859-1, Windows-1251 is not closely related to ISO 8859-5.
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, wh ...
(e.g. UTF-8) is preferred to Windows-1251 or other Cyrillic encodings in modern applications, especially on the Internet, making UTF-8 the dominant encoding for web pages. (For further discussion of Unicode's complete coverage, of 436 Cyrillic letters/code points, including for
Old Cyrillic The Early Cyrillic alphabet, also called classical Cyrillic or paleo-Cyrillic, is a writing system that was developed in the First Bulgarian Empire during the late 9th century on the basis of the Greek alphabet for the Slavic people living ...
, and how single-byte character encodings, such as Windows-1251 and
KOI8-R KOI8-R (RFC 1489) is an 8-bit character encoding, derived from the KOI-8 encoding by the programmer Andrei Chernov in 1993 and designed to cover Russian, which uses a Cyrillic alphabet. KOI8-R was based on Russian Morse code, which was created ...
, cannot provide this, see
Cyrillic script in Unicode As of Unicode version 15.0 Cyrillic script is encoded across several blocks: * CyrillicU+0400–U+04FF 256 characters * Cyrillic SupplementU+0500–U+052F 48 characters * Cyrillic Extended-AU+2DE0–U+2DFF 32 characters * Cyrillic Extended-BU ...
.)


Character set

The following table shows Windows-1251. Each character is shown with its
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, wh ...
equivalent and its Alt code.


Kazakh variant

An altered version of Windows-1251 was standardised in
Kazakhstan Kazakhstan, officially the Republic of Kazakhstan, is a transcontinental country located mainly in Central Asia and partly in Eastern Europe. It borders Russia to the north and west, China to the east, Kyrgyzstan to the southeast, Uzbeki ...
as Kazakh standard STRK1048, and is known by the label . It differs in the rows shown below:


Amiga variant

Russian
Amiga OS AmigaOS is a family of proprietary native operating systems of the Amiga and AmigaOne personal computers. It was developed first by Commodore International and introduced with the launch of the first Amiga, the Amiga 1000, in 1985. Early versions ...
systems used a version of code page 1251 which matches Windows-1251 for the Russian subset of the Cyrillic letters, but otherwise mostly follows ISO-8859-1. This version is known as Amiga-1251, under which name it is registered with the IANA.


See also

*
Latin script in Unicode Over a thousand characters from the Latin script are encoded in the Unicode Standard, grouped in several basic and extended Latin blocks. The extended ranges contain mainly precomposed letters plus diacritics that are equivalently encoded with co ...
*
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, wh ...
*
Universal Character Set The Universal Coded Character Set (UCS, Unicode) is a standard set of characters defined by the international standard ISO/IEC 10646, ''Information technology — Universal Coded Character Set (UCS)'' (plus amendments to that standard), w ...
** European Unicode subset (DIN 91379) *
UTF-8 UTF-8 is a variable-length character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode'' (or ''Universal Coded Character Set'') ''Transformation Format 8-bit''. UTF-8 is capable of ...


References


Further reading

*


External links


Windows 1251 reference chart
!-- not found in archive.org-->
IANA Charset Name RegistrationUnicode mappings of windows 1251 with "best fit"Universal Cyrillic decoder
an online program that may help recovering unreadable Cyrillic texts with broken Windows-1251 or other
character encoding Character encoding is the process of assigning numbers to Graphics, graphical character (computing), characters, especially the written characters of Language, human language, allowing them to be Data storage, stored, Data communication, transmi ...
s. {{character encoding Windows code pages