Windows-1252 or CP-1252 (
code page 1252) is a single-byte
character encoding
Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers. The numerical values tha ...
of the
Latin alphabet
The Latin alphabet or Roman alphabet is the collection of letters originally used by the ancient Romans to write the Latin language. Largely unaltered with the exception of extensions (such as diacritics), it used to write English and the ...
, used by default in the
legacy components of
Microsoft Windows for English and many European languages including Spanish, French, and German.
It is the most-used single-byte character encoding in the world (on
website
A website (also written as a web site) is a collection of web pages and related content that is identified by a common domain name and published on at least one web server. Examples of notable websites are Google, Facebook, Amazon, and Wikip ...
s at least). , 0.3% of all websites declared use of Windows-1252,
but at the same time 1.3%
used
ISO 8859-1 (while only 8 of the top 1000 websites), which by
HTML5 standards should be considered the same encoding,
so that 1.6% of websites effectively use Windows-1252. Pages declared as US-
ASCII
ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
would also count as this character set. An unknown (but probably large) subset of other pages use only the ASCII portion of
UTF-8
UTF-8 is a variable-length character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode'' (or ''Universal Coded Character Set'') ''Transformation Format 8-bit''.
UTF-8 is capable of ...
, or only the codes matching Windows-1252 from their declared character set, and could also be counted.
Depending on the country, use can be much higher than the global average, e.g., for Brazil according to website use (including ISO-8859-1), use is at 7.9%, and in Germany at 4.0%.
Details
This character encoding is a
superset of
ISO 8859-1 in terms of printable characters, but differs from the IANA's ISO-8859-1 by using displayable characters rather than control characters in the 80 to 9F (
hex
Hex or HEX may refer to:
Magic
* Hex, a curse or supposed real and potentially supernaturally realized malicious wish
* Hex sign, a barn decoration originating in Pennsylvania Dutch regions of the United States
* Hex work, a Pennsylvania Dutch ...
) range. Notable additional characters include
curly quotation marks and all the printable characters that are in
ISO 8859-15 (at different places than ISO 8859-15). It is known to Windows by the
code page number 1252, and by the
IANA
The Internet Assigned Numbers Authority (IANA) is a standards organization that oversees global IP address allocation, autonomous system number allocation, root zone management in the Domain Name System (DNS), media types, and other Interne ...
-approved name "windows-1252".
It is very common to mislabel Windows-1252 text with the charset label ISO-8859-1. A common result was that all the quotes and apostrophes (produced by "smart quotes" in word-processing software) were replaced with question marks or boxes on non-Windows operating systems, making text difficult to read. Most modern web browsers and e-mail clients treat the
media type charset ISO-8859-1 as Windows-1252 to accommodate such mislabeling. This is now standard behavior in the HTML5 specification, which requires that documents advertised as ISO-8859-1 actually be parsed with the Windows-1252 encoding.
Historically, the phrase "ANSI Code Page" was used in Windows to refer to non-DOS encodings; the intention was that most of these would be
ANSI standards such as
ISO-8859-1. Even though Windows-1252 was the first and by far most popular code page named so in Microsoft Windows parlance, the code page has never been an ANSI standard. Microsoft explains, "The term ANSI as used to signify Windows code pages is a historical reference, but is nowadays a misnomer that continues to persist in the Windows community."
In
LaTeX
Latex is an emulsion (stable dispersion) of polymer microparticles in water. Latexes are found in nature, but synthetic latexes are common as well.
In nature, latex is found as a milky fluid found in 10% of all flowering plants (angiosper ...
packages, CP-1252 is referred to as "ansinew".
IBM uses code page 1252 (
CCSID
A CCSID (coded character set identifier) is a 16-bit number that represents a particular character encoding, encoding of a specific code page. For example, Unicode is a code page that has several encoding (so called "transformation") forms, like UT ...
1252 and
euro sign extended CCSID 5348) for Windows-1252.
It is called "WE8MSWIN1252" by
Oracle
An oracle is a person or agency considered to provide wise and insightful counsel or prophetic predictions, most notably including precognition of the future, inspired by deities. As such, it is a form of divination.
Description
The wor ...
.
Codepage layout
The following table shows Windows-1252. Differences from
ISO-8859-1 have the
Unicode
Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, ...
code point number below the character, based on the Unicode.org mapping of Windows-1252 with "best fit". A tooltip, generally available only when one points to the immediate left of the character, shows the Unicode code point name and the decimal
Alt code.
According to the information on Microsoft's and the Unicode Consortium's websites, positions 81, 8D, 8F, 90, and 9D are unused; however, the Windows API
MultiByteToWideChar
/code> maps these to the corresponding C1 control code
The C0 and C1 control code or control character sets define control codes for use in text by computer systems that use ASCII and derivatives of ASCII. The codes represent additional information about the text, such as the position of a cursor, ...
s. The "best fit" mapping documents this behavior, too.[
]
History
* The first version of the codepage 1252 used in Microsoft Windows 1.0 did not have positions D7 and F7 defined. All the characters in the ranges 80–9F were undefined too.
* The second version, used in Microsoft Windows 2.0, positions D7, F7, 91, and 92 had been defined.
* The third version, used since Microsoft Windows 3.1, had all the present-day positions defined, except euro sign and Z with caron character pair.
* The final version listed above debuted in Microsoft Windows 98 and was ported to older versions of Windows with the euro symbol update.
OS/2 extensions
The OS/2
OS/2 (Operating System/2) is a series of computer operating systems, initially created by Microsoft and IBM under the leadership of IBM software designer Ed Iacobucci. As a result of a feud between the two companies over how to position OS/2 ...
operating system supports an encoding by the name of Code page 1004 (CCSID
A CCSID (coded character set identifier) is a 16-bit number that represents a particular character encoding, encoding of a specific code page. For example, Unicode is a code page that has several encoding (so called "transformation") forms, like UT ...
1004) or "Windows Extended". This mostly matches code page 1252, with the exception of certain C0 control characters being replaced by diacritic
A diacritic (also diacritical mark, diacritical point, diacritical sign, or accent) is a glyph added to a letter or to a basic glyph. The term derives from the Ancient Greek (, "distinguishing"), from (, "to distinguish"). The word ''diacriti ...
characters.
MSDOS extensions
are
Are commonly refers to:
* Are (unit), a unit of area equal to 100 m2
Are, ARE or Ã…re may also refer to:
Places
* Ã…re, a locality in Sweden
* Ã…re Municipality, a municipality in Sweden
**Ã…re ski resort in Sweden
* Are Parish, a municipa ...
/h2>
There is a rarely used, but useful, graphics extended code page 1252 where codes 0x00 to 0x1f allow for box drawing as used in applications such as MSDOS Edit and Codeview. One of the applications to use this code page was an Intel Corporation Install/Recovery disk image utility from mid/late 1995. These programs were written for its P6 User Test Program machines (US example). It was used exclusively in its then EMEA region (Europe, Middle East & Africa). In time the programs were changed to use code page 850.
Palm OS variant
This variant of Windows-1252 is used by Palm OS 3.5. Python gives it the label.
Differences from Windows-1252 have their Unicode code point.
See also
* Latin script in Unicode
* Unicode
Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, ...
* Universal Coded Character Set
** European Unicode subset (DIN 91379)
* UTF-8
UTF-8 is a variable-length character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode'' (or ''Universal Coded Character Set'') ''Transformation Format 8-bit''.
UTF-8 is capable of ...
* Western Latin character sets (computing)
* Windows-1250
* Windows code page
Windows code pages are sets of characters or code pages (known as character encodings in other operating systems) used in Microsoft Windows from the 1980s and 1990s. Windows code pages were gradually superseded when Unicode was implemented in Win ...
s
* ISO/IEC JTC 1/SC 2 ISO/IEC JTC 1/SC 2 Coded character sets is a standardization subcommittee of the Joint Technical Committee ISO/IEC JTC 1 of the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC), that deve ...
References
External links
Microsoft's
code charts for Windows-1252 ("Code Page 1252 Windows Latin 1 (ANSI)")
Unicode mapping table
an
code page definition with best fit mappings
for Windows-1252
{{Character encodings
Windows code pages
Computer-related introductions in 1985