HOME

TheInfoList



OR:

Windows-1251 is an 8-bit
character encoding Character encoding is the process of assigning numbers to graphical character (computing), characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using computers. The numerical v ...
, designed to cover languages that use the
Cyrillic script The Cyrillic script ( ) is a writing system used for various languages across Eurasia. It is the designated national script in various Slavic languages, Slavic, Turkic languages, Turkic, Mongolic languages, Mongolic, Uralic languages, Uralic, C ...
such as
Russian Russian(s) may refer to: *Russians (), an ethnic group of the East Slavic peoples, primarily living in Russia and neighboring countries *A citizen of Russia *Russian language, the most widely spoken of the Slavic languages *''The Russians'', a b ...
, Ukrainian, Belarusian, Bulgarian,
Serbian Cyrillic The Serbian Cyrillic alphabet (, ), also known as the Serbian script, (, ), is a standardized variation of the Cyrillic script used to write the Serbian language. It originated in medieval Serbia and was significantly reformed in the 19th cen ...
, Macedonian and other languages. On the web, it is the second most-used single-byte character encoding (or third most-used character encoding overall), and most used of the single-byte encodings supporting Cyrillic. , 0.3% of all
website A website (also written as a web site) is any web page whose content is identified by a common domain name and is published on at least one web server. Websites are typically dedicated to a particular topic or purpose, such as news, educatio ...
s use Windows-1251. It's by far mostly used for Russian, while a small minority of Russian websites use it, with 94.6% of Russian (.ru) websites using
UTF-8 UTF-8 is a character encoding standard used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode Transformation Format 8-bit''. Almost every webpage is transmitted as UTF-8. UTF-8 supports all 1,112,0 ...
, and the legacy 8-bit encoding is distant second. In Linux, the encoding is known as cp1251.
IBM International Business Machines Corporation (using the trademark IBM), nicknamed Big Blue, is an American Multinational corporation, multinational technology company headquartered in Armonk, New York, and present in over 175 countries. It is ...
uses code page 1251 (
CCSID A CCSID (coded character set identifier) is a 16-bit number that represents a particular encoding of a specific code page. For example, Unicode is a code page that has several character encoding schemes (referred to as "transformation formats")—i ...
1251 and
euro sign The euro sign () is the currency sign used for the euro, the official currency of the eurozone. The design was presented to the public by the European Commission on 12 December 1996. It consists of a stylized letter E (or epsilon), crossed by ...
extended CCSID 5347) for Windows-1251. Windows-1251 and
KOI8-R KOI8-R (RFC 1489) is an 8-bit character encoding derived from the KOI-8 encoding by the programmer Andrei Chernov in 1993 and designed to cover Russian, which uses the Russian subset of a Cyrillic script. KOI-8, on its turn, is an 8-bit exten ...
(or its Ukrainian variant
KOI8-U KOI8-U (RFC 2319) is an 8-bit character encoding, designed to cover Ukrainian, which uses a Cyrillic alphabet. It is based on KOI8-R, which covers Russian and Bulgarian, but replaces eight box drawing characters with four Ukrainian letters Ґ, ...
) are much more commonly used than
ISO 8859-5 ISO/IEC 8859-5:1999, ''Information technology — 8-bit single-byte coded graphic character sets — Part 5: Latin/Cyrillic alphabet'', is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 198 ...
(which is used by less than 0.0004% of websites). In contrast to
Windows-1252 Windows-1252 or CP-1252 ( Windows code page 1252) is a legacy single-byte character encoding that is used by default (as the "ANSI code page") in Microsoft Windows throughout the Americas, Western Europe, Oceania, and much of Africa. Initially ...
and
ISO 8859-1 ISO/IEC 8859-1:1998, ''Information technology— 8-bit single-byte coded graphic character sets—Part 1: Latin alphabet No. 1'', is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 19 ...
, Windows-1251 is not closely related to ISO 8859-5.
Unicode Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
(e.g. UTF-8) is preferred to Windows-1251 or other Cyrillic encodings in modern applications, especially on the Internet, making UTF-8 the dominant encoding for web pages. (For further discussion of Unicode's complete coverage, of 436 Cyrillic letters/code points, including for Old Cyrillic, and how single-byte character encodings, such as Windows-1251 and
KOI8-R KOI8-R (RFC 1489) is an 8-bit character encoding derived from the KOI-8 encoding by the programmer Andrei Chernov in 1993 and designed to cover Russian, which uses the Russian subset of a Cyrillic script. KOI-8, on its turn, is an 8-bit exten ...
, cannot provide this, see
Cyrillic script in Unicode As of Unicode version , Cyrillic script is encoded across several blocks: * CyrillicU+0400–U+04FF 256 characters * Cyrillic SupplementU+0500–U+052F 48 characters * Cyrillic Extended-AU+2DE0–U+2DFF 32 characters * Cyrillic Extended-BU+A64 ...
.)


Character set

The following table shows Windows-1251. Each character is shown with its
Unicode Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
equivalent and its
Alt code On personal computers with numeric keypads that use Microsoft operating systems, such as Windows, many characters that do not have a dedicated key combination on the keyboard may nevertheless be entered using the Alt code (the Alt numpad input ...
.


Kazakh variants


KZ-1048

An altered version of Windows-1251 was standardised in
Kazakhstan Kazakhstan, officially the Republic of Kazakhstan, is a landlocked country primarily in Central Asia, with a European Kazakhstan, small portion in Eastern Europe. It borders Russia to the Kazakhstan–Russia border, north and west, China to th ...
as Kazakh standard STRK1048, and is known by the label . It differs in the rows shown below:


Code Page 1174

Code Page 1174 is another variant created for the
Kazakh language Kazakh is a Turkic language of the Kipchak branch spoken in Central Asia by Kazakhs. It is closely related to Nogai, Kyrgyz and Karakalpak. It is the official language of Kazakhstan, and has official status in the Altai Republic of Russia ...
, which matches Windows-1251 for the Russian subset of the Cyrillic letters. It differs from KZ-1048 by moving the Cyrillic letter Shha from 8E/9E to 8A/9A.


Latvian variant

Windows Latvian + Russian is a modification of Windows-1251 to support the
Latvian language Latvian (, ), also known as Lettish, is an East Baltic languages, East Baltic language belonging to the Indo-European language family. It is spoken in the Baltic region, and is the language of the Latvians. It is the official language of Latvia ...
.


Finnish variant

Windows Cyrillic + Finnish is a modification of Windows-1251 that was used by
Paratype In zoology and botany, a paratype is a specimen of an organism that helps define what the scientific name of a species and other taxon actually represents, but it is not the holotype (and in botany is also neither an isotype (biology), isotype ...
to cover the
Finnish language Finnish (endonym: or ) is a Finnic languages, Finnic language of the Uralic languages, Uralic language family, spoken by the majority of the population in Finland and by ethnic Finns outside of Finland. Finnish is one of the two official langu ...
. This encoding is supported by
FontLab FontLab is a font editor developed by Fontlab Ltd. FontLab is available for Windows and macOS. History The software was initially developed by the company SoftUnion Ltd. of Saint Petersburg, Russia, under lead programmer Yuri Yarmola. In 1992 ...
Studio 5. This variant is missing the letters Š and Ž which are used in loanwords in Finnish and can be replaced by the digraphs SH and ZH.


Amiga variant

Russian
Amiga OS AmigaOS is a family of proprietary native operating systems of the Amiga and AmigaOne personal computers. It was developed first by Commodore International and introduced with the launch of the first Amiga, the Amiga 1000, in 1985. Early versions ...
systems used a version of code page 1251 which matches Windows-1251 for the Russian subset of the Cyrillic letters, but otherwise mostly follows
ISO-8859-1 ISO/IEC 8859-1:1998, ''Information technology—8-bit computing, 8-bit single-byte coded graphic character (computing), character sets—Part 1: Latin alphabet No. 1'', is part of the ISO/IEC 8859 series of ASCII-based standard character enc ...
. This version is known as Amiga-1251, under which name it is registered with the
IANA The Internet Assigned Numbers Authority (IANA) is a standards organization that oversees global IP address allocation, autonomous system number allocation, root zone management in the Domain Name System (DNS), media types, and other Internet P ...
.


See also

*
Latin script in Unicode Over a thousand characters from the Latin script are encoded in the Unicode Standard, grouped in several basic and extended Latin blocks. The extended ranges contain mainly precomposed letters plus diacritics that are equivalently encoded with c ...
*
Cyrillic script in Unicode As of Unicode version , Cyrillic script is encoded across several blocks: * CyrillicU+0400–U+04FF 256 characters * Cyrillic SupplementU+0500–U+052F 48 characters * Cyrillic Extended-AU+2DE0–U+2DFF 32 characters * Cyrillic Extended-BU+A64 ...
*
Unicode Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
*
Universal Character Set The Universal Coded Character Set (UCS, Unicode) is a standard set of characters defined by the international standard ISO/ IEC 10646, ''Information technology — Universal Coded Character Set (UCS)'' (plus amendments to that standard), w ...
** European Unicode subset (DIN 91379) *
UTF-8 UTF-8 is a character encoding standard used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode Transformation Format 8-bit''. Almost every webpage is transmitted as UTF-8. UTF-8 supports all 1,112,0 ...


References


Further reading

*


External links


Windows 1251 reference chart
!-- not found in archive.org-->
IANA Charset Name RegistrationUnicode mappings of windows 1251 with "best fit"Universal Cyrillic decoder
an online program that may help recovering unreadable Cyrillic texts with broken Windows-1251 or other
character encoding Character encoding is the process of assigning numbers to graphical character (computing), characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using computers. The numerical v ...
s. {{character encoding Windows code pages