HOME

TheInfoList



OR:

T.51 / ISO/IEC 6937:2001, ''Information technology — Coded graphic character set for text communication — Latin alphabet'', is a multibyte extension of
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of ...
, or rather of
ISO/IEC 646 ISO/IEC 646 is a set of ISO/IEC standards, described as ''Information technology — ISO 7-bit coded character set for information interchange'' and developed in cooperation with ASCII at least since 1964. Since its first edition in 1 ...
-IRV. It was developed in common with
ITU-T The ITU Telecommunication Standardization Sector (ITU-T) is one of the three sectors (divisions or units) of the International Telecommunication Union (ITU). It is responsible for coordinating standards for telecommunications and Information Commu ...
(then
CCITT The ITU Telecommunication Standardization Sector (ITU-T) is one of the three sectors (divisions or units) of the International Telecommunication Union (ITU). It is responsible for coordinating standards for telecommunications and Information Commu ...
) for telematic services under the name of ''T.51'', and first became an ISO standard in 1983. Certain byte codes are used as lead bytes for letters with
diacritic A diacritic (also diacritical mark, diacritical point, diacritical sign, or accent) is a glyph added to a letter or to a basic glyph. The term derives from the Ancient Greek (, "distinguishing"), from (, "to distinguish"). The word ''diacriti ...
s (''accents''). The value of the lead byte often indicates which diacritic that the letter has, and the follow byte then has the ASCII-value for the letter that the diacritic is on. ISO/IEC 6937's architects were
Hugh McGregor Ross Hugh McGregor Ross (31 August 1917 – 1 September 2014) was an early pioneer in the history of British computing. He was employed by Ferranti from the mid-1960s, where he worked on the Pegasus thermionic valve computer. He was involved in t ...
, Peter Fenwick, Bernard Marti and Loek Zeckendorf. ISO6937/2 defines 327 characters found in modern European languages using the
Latin alphabet The Latin alphabet or Roman alphabet is the collection of letters originally used by the ancient Romans to write the Latin language. Largely unaltered with the exception of extensions (such as diacritics), it used to write English and the o ...
. Non-Latin European characters, such as
Cyrillic , bg, кирилица , mk, кирилица , russian: кириллица , sr, ћирилица, uk, кирилиця , fam1 = Egyptian hieroglyphs , fam2 = Proto-Sinaitic , fam3 = Phoenician , fam4 = G ...
and
Greek Greek may refer to: Greece Anything of, from, or related to Greece, a country in Southern Europe: *Greeks, an ethnic group. *Greek language, a branch of the Indo-European language family. **Proto-Greek language, the assumed last common ancestor ...
, are not included in the standard. Also, some diacritics used with the Latin alphabet like the
Romanian Romanian may refer to: *anything of, from, or related to the country and nation of Romania **Romanians, an ethnic group **Romanian language, a Romance language ***Romanian dialects, variants of the Romanian language **Romanian cuisine, traditional ...
comma The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline ...
are not included, using cedilla instead as no distinction between cedilla and comma below was made at the time.
IANA The Internet Assigned Numbers Authority (IANA) is a standards organization that oversees global IP address allocation, autonomous system number allocation, root zone management in the Domain Name System (DNS), media types, and other Interne ...
has registered the charset names ''ISO_6937-2-25'' and ''ISO_6937-2-add'' for two (older) versions of this standard (plus control codes). But in practice this character encoding is unused on the Internet.


Single byte characters

The primary set (first half) originally followed ISO 646-IRV ''before'' the ISO/IEC 646:1991 revision, that is, mostly following
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of ...
but with character 0x24 still denoted as an "
international currency sign The currency sign is a character used to denote an unspecified currency. It can be described as a circle the size of a lowercase character with four short radiating arms at 45° (NE), 135° (SE), 225° (SW) and 315° (NW). It is raised slightly ...
" (¤) instead of the dollar sign ($). The 1992 edition of ITU T.51 permits existing CCITT services to continue to interpret 0x24 as the international currency sign, but stipulates that new telecommunication applications should use it for the dollar sign (i.e. following the current ISO 646-IRV), and instead represent the international currency sign using the supplementary set. The supplementary set (second half) contains a selection of spacing and non-spacing graphic characters, additional symbols and some locations reserved for future standardisation. Both of these are
ISO/IEC 2022 ISO/IEC 2022 ''Information technology—Character code structure and extension techniques'', is an ISO/IEC standard (equivalent to the ECMA standard ECMA-35, the ANSI standard ANSI X3.41 and the Japanese Industrial Standard JIS X 0202) in the f ...
graphical character sets, with the primary set being a 94-code set and the secondary set being a 96-code set. In contexts where ISO 2022 code extension techniques are not in use, the primary set is designated as the G0 set and invoked over GL ( 0x20..0x7F), whereas the supplementary set is designated as the G2 set and invoked over GR (0xA0..0xFF) in an 8-bit environment, or by using the control code 0x19 as a single-shift in a 7-bit environment. This encoding of the Single Shift Two code matches its location in
ISO-IR ISO/IEC 2022 ''Information technology—Character code structure and extension techniques'', is an ISO/ IEC standard (equivalent to the ECMA standard ECMA-35, the ANSI standard ANSI X3.41 and the Japanese Industrial Standard JIS X 0202) in the ...
-106. The ISO/IEC 2022
escape sequence In computer science, an escape sequence is a combination of characters that has a meaning other than the literal characters contained therein; it is marked by one or more preceding (and possibly terminating) characters. Examples * In C and man ...
to designate the supplementary set of ISO/IEC 6937 as the G2 set is ESC . R (hex 1B 2E 52). (The left-hand side i
US-ASCII
)
The older ISO 6937/2:1983 supplementary set is registered as a 94-code set, and designated to G2 with ESC * l (hex 1B 2A 6C).


Two byte characters

Accented letters which are not allocated single codes in the primary or supplementary set are coded using two bytes. The first byte, the "non spacing diacritical mark", is followed by a letter from the base set e.g.:
small e with acute accent (é) = 
cute Cuteness is a subjective term describing a type of attractiveness commonly associated with youth and appearance, as well as a scientific concept and analytical model in ethology, first introduced by Konrad Lorenz. Lorenz proposed the concept ...
e
The ITU T.51 standard allocates column 4 of the supplementary set (i.e. 0xC0–CF when used in 8-bit format) to non-spacing diacritic characters. However, ISO/IEC 6937 defines a fully specified character repertoire, mapping a list of composition sequences to
ISO/IEC 10646 ISO/IEC JTC 1, entitled "Information technology", is a joint technical committee (JTC) of the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC). Its purpose is to develop, maintain and pr ...
character names. The isolated nonspacing bytes are not included in this repertoire, although spacing variants of the diacritics not otherwise present in ASCII are included, with the ASCII space being the trail byte. Hence, only certain combinations of lead byte and follow byte conform to the ISO/IEC standard. This repertoire is also affixed to the ITU version of the specification as Annex A, although the ITU version does not reference it from the main text. It is described as a "unified superset" of the Latin-script character repertoires. It corresponds to the repertoire of
ISO/IEC 10367 ISO/IEC 10367:1991 is a standard developed by ISO/IEC JTC 1/SC 2, defining graphical character sets for use in character encodings implementing levels 2 and 3 of ISO/IEC 4873 (as opposed to ISO/IEC 8859, which defines character encodings at level ...
when the ASCII,
Latin-1 ISO/IEC 8859-1:1998, ''Information technology — 8-bit single-byte coded graphic character sets — Part 1: Latin alphabet No. 1'', is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1 ...
(or
Latin-5 ISO/IEC 8859-9:1999, ''Information technology — 8-bit single-byte coded graphic character sets — Part 9: Latin alphabet No. 5'', is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1989. ...
),
Latin-2 ISO/IEC 8859-2:1999, ''Information technology — 8-bit single-byte coded graphic character sets — Part 2: Latin alphabet No. 2'', is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. I ...
and supplementary Latin sets are used. This system also differs from the Unicode
combining character In digital typography, combining characters are characters that are intended to modify other characters. The most common combining characters in the Latin script are the combining diacritical marks (including combining accents). Unicode also ...
system in that the diacritic code precedes the letter (as opposed to following it), making it more similar to
ANSEL ANSEL, the American National Standard for Extended Latin Alphabet Coded Character Set for Bibliographic Use, was a character set used in text encoding. It provided a table of coded values for the representation of characters of the extended Latin ...
. A little anomaly is that ''Latin Small Letter G with Cedilla'' is coded as if it were with an acute accent, that is, with a 0xC2 lead byte, since due to its descender interfering with a cedilla, the lowercase letter is usually with turned comma above: . In total 13 diacritical marks can be followed by the selected characters from the primary set:


Codepage layout

The reference to
combining character In digital typography, combining characters are characters that are intended to modify other characters. The most common combining characters in the Latin script are the combining diacritical marks (including combining accents). Unicode also ...
s in the U+0300—U+036F range for the codes in the range 0xC1—0xCF below is subject to the caveats mentioned above; they cannot simply be mapped to the codepoints listed. Also, Unicode distinguishes 0xE2 into uppercase
D with stroke Đ (lowercase: đ, Latin alphabet), known as crossed D or dyet, is a letter formed from the base character D/d overlaid with a crossbar. Crossing was used to create eth (ð), but eth has an uncial as its base whereas ''đ'' is based on the s ...
and uppercase
Eth (colloquially) , former_name = eidgenössische polytechnische Schule , image = ETHZ.JPG , image_size = , established = , type = Public , budget = CHF 1.896 billion (2021) , rector = Günther Dissertori , president = Joël Mesot , a ...
, which usually look different for the lowercase letters (0xF2 and 0xF3). The older 1988 edition of ITU T.51 defined two versions of the supplementary set, with the first version lacking the
non-breaking space In word processing and digital typesetting, a non-breaking space, , also called NBSP, required space, hard space, or fixed space (though it is not of fixed width), is a space character that prevents an automatic line break at its position. In s ...
,
soft hyphen In computing and typesetting, a soft hyphen (ISO 8859: 0xAD, Unicode , HTML: ­ or ­ or ­) or syllable hyphen (EBCDIC: 0xCA), abbreviated SHY, is a code point reserved in some coded character sets for the purpose of breaki ...
, not sign ( ¬) and broken bar ( ¦) present in the second version. The first version was defined as an extension of the T.61 supplementary set, and the second version as an extension of the first version. The current (1992) edition only includes the second version, deprecates certain characters, and updates the primary set to the current ISO-646-IRV (
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of ...
), although existing telematic services are permitted to retain the older behaviour.


Videotex version

The versions of the supplementary set used by the ITU T.101 standard for
Videotex Videotex (or interactive videotex) was one of the earliest implementations of an end-user information system. From the late 1970s to early 2010s, it was used to deliver information (usually pages of text) to a user in computer-like format, typi ...
are based on the first supplementary set of the 1988 edition of T.51. The default G2 set for Data Syntax 2 adds a ΅ at 0xC0, for combination with codes from a
Greek Greek may refer to: Greece Anything of, from, or related to Greece, a country in Southern Europe: *Greeks, an ethnic group. *Greek language, a branch of the Indo-European language family. **Proto-Greek language, the assumed last common ancestor ...
primary set. The supplementary set for Data Syntax 3 adds non-spacing marks for a "vector overbar" and
solidus Solidus (Latin for "solid") may refer to: * Solidus (coin), a Roman coin of nearly solid gold * Solidus (punctuation), or slash, a punctuation mark * Solidus (chemistry), the line on a phase diagram below which a substance is completely solid * S ...
and several semigraphic characters.


ETS 300 706 version

The ETS 300 706 standard for
World System Teletext World System Teletext (WST) is the name of a standard for encoding and displaying teletext information, which is used as the standard for teletext throughout Europe today. It was adopted into the international standard ITU-R, CCIR 653 (now ITU-R BT ...
bases its G2 set on ISO 6937. It is a superset of the supplementary set of T.61, and a superset of the first supplementary set of the 1988 edition of T.51, but collides with the current edition of T.51 in certain positions. Diacritic codes in the ETS version are specified as being "for association with" characters from the G0 set in use, such as
US-ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of ...
or BS_viewdata. This version is shown in the chart below.


See also

*
ITU T.50 ITU-T recommendation T.50 specifies the International Reference Alphabet (IRA), formerly International Alphabet No. 5 (IA5), a character encoding. ASCII is the U.S. variant of that character set. The original version from November 1988 corresponds ...
*
ITU T.61 T.61 is an ITU-T Recommendation for a Teletex character set. T.61 predated Unicode, and was the primary character set in ASN.1 used in early versions of X.500 and X.509 for encoding strings containing characters used in Western European languag ...
, a closely related character encoding for
Teletex Teletex was ITU-T specification F.200 for a text and document communications service that could be provided over telephone lines. It was rapidly superseded by e-mail but the name ''Teletex'' lives on in several of the X.500 standard attributes u ...
use


Footnotes


References


External links


ITU Recommendation T.51
* ISO pages


WD 6937, Coded graphic character set for text communication - Latin alphabet (Revision of ISO/IEC 6937:1994)
(ISO/IEC 6937:1994 draft)
ISO-IR-156
(
ISO-IR ISO/IEC 2022 ''Information technology—Character code structure and extension techniques'', is an ISO/ IEC standard (equivalent to the ECMA standard ECMA-35, the ANSI standard ANSI X3.41 and the Japanese Industrial Standard JIS X 0202) in the ...
registration of right-hand part) {{DEFAULTSORT:ISO IEC 6937 T.51 T.51 Character encoding Character sets Computer-related introductions in 1983