ISO/IEC 646 is a set of

ISO ISO is the most common abbreviation for the International Organization for Standardization. ISO or Iso may also refer to: Business and finance * Iso (supermarket), a chain of Danish supermarkets incorporated into the SuperBest chain in 2007 * Iso ...

IEC The International Electrotechnical Commission (IEC; in French: ''Commission électrotechnique internationale'') is an international standards organization that prepares and publishes international standards for all electrical, electronic and r ...

standards, described as ''Information technology — ISO 7-bit coded character set for information interchange'' and developed in cooperation with

ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of ...

at least since 1964. Since its first edition in 1967 it has specified a 7-

bit The bit is the most basic unit of information in computing and digital communications. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represente ...

character code from which several national standards are derived. ISO/IEC 646 was also ratified by

ECMA Ecma International () is a nonprofit standards organization for information and communication systems. It acquired its current name in 1994, when the European Computer Manufacturers Association (ECMA) changed its name to reflect the organizatio ...

as ECMA-6. The first version of ECMA-6 had been published in 1965, based on work the ECMA's Technical Committee TC1 had carried out since December 1960. Characters in the ISO/IEC 646 Basic Character Set are ''invariant characters''. Since that portion of ISO/IEC 646, that is the ''invariant character set'' shared by all countries, specified only those letters used in the

ISO basic Latin alphabet The ISO basic Latin alphabet is an international standard (beginning with ISO/IEC 646) for a Latin-script alphabet that consists of two sets (uppercase and lowercase) of 26 letters, codified in various national and international standards and u ...

, countries using additional letters needed to create national variants of ISO/IEC 646 to be able to use their native scripts. Since transmission and storage of 8-bit codes was not standard at the time, the national characters had to be made to fit within the constraints of 7 bits, meaning that some characters that appear in

do not appear in other national variants of ISO/IEC 646.

History

ISO/IEC 646 and its predecessor

(

ASA X3.4 ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of ...

) largely endorsed existing practice regarding character encodings in the

telecommunication Telecommunication is the transmission of information by various types of technologies over wire, radio, optical, or other electromagnetic systems. It has its origin in the desire of humans for communication over a distance greater than that fe ...

s industry. ASCII-infobox

As ASCII did not provide a number of characters needed for languages other than English, a number of national variants were made that substituted some less-used characters with needed ones. Due to the incompatibility of the various national variants, an International Reference Version (IRV) of ISO/IEC 646 was introduced, in an attempt to at least restrict the replaced set to the same characters in all variants. The original version (ISO 646 IRV) differed from

only in that code point 0x24, ASCII's

dollar sign The dollar sign, also known as peso sign, is a symbol consisting of a capital " S" crossed with one or two vertical strokes ($ or ), used to indicate the unit of various currencies around the world, including most currencies denominated "pes ...

($) was replaced by the

international currency symbol The currency sign is a character used to denote an unspecified currency. It can be described as a circle the size of a lowercase character with four short radiating arms at 45° (NE), 135° (SE), 225° (SW) and 315° (NW). It is raised slightly ...

(¤). The final 1991 version of the code ISO/IEC 646:1991 is also known as ITU T.50, International Reference Alphabet or IRA, formerly International Alphabet No. 5 (IA5). This standard allows users to exercise the 12 variable characters (i.e., two alternative graphic characters and 10 national defined characters). Among these exercises, ISO 646:1991 IRV (International Reference Version) is explicitly defined and identical to

. The

ISO/IEC 8859 ISO/IEC 8859 is a joint ISO and IEC series of standards for 8-bit character encodings. The series of standards consists of numbered parts, such as ISO/IEC 8859-1, ISO/IEC 8859-2, etc. There are 15 parts, excluding the abandoned ISO/IEC 8859-12. ...

series of standards governing 8-bit character encodings supersede the ISO/IEC 646 international standard and its national variants, by providing 96 additional characters with the additional bit and thus avoiding any substitution of ASCII codes. The ISO/IEC 10646 standard, directly related to

Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expre ...

, supersedes all of the ISO 646 and ISO/IEC 8859 sets with one unified set of character encodings using a larger 21-bit value. ISO-646-JP-paths

A legacy of ISO/IEC 646 is visible on Windows, where in many East Asian locales the

backslash The backslash is a typographical mark used mainly in computing and mathematics. It is the mirror image of the common slash . It is a relatively recent mark, first documented in the 1930s. History , efforts to identify either the origin o ...

character used in

filenames A filename or file name is a name used to uniquely identify a computer file in a directory structure. Different file systems impose different restrictions on filename lengths. A filename may (depending on the file system) include: * name &ndas ...

is rendered as ¥ or other characters such as

₩ The won sign , is a currency symbol. It represents the South Korean won, the North Korean won and, unofficially, the old Korean won. Appearance Its appearance is "W" (the first letter of "Won") with a horizontal strike going through the cen ...

. Despite the fact that a different code for ¥ was available even on the original IBM PC's

code page 437 Code page 437 (CCSID 437) is the character set of the original IBM PC (personal computer). It is also known as CP437, OEM-US, OEM 437, PC-8, or DOS Latin US. The set includes all printable ASCII characters as well as some accented letters (diacri ...

, and a separate double-byte code for ¥ is available in

Shift JIS Shift JIS (Shift Japanese Industrial Standards, also SJIS, MIME name Shift_JIS, known as PCK in Solaris contexts) is a character encoding for the Japanese language, originally developed by a Japanese company called ASCII Corporation in conjunctio ...

(although this often uses alternative mapping), so much text was created with the backslash code used for ¥ (due to Shift_JIS being officially based on ISO 646:JP, although Microsoft maps it as ASCII) that even modern Windows fonts have found it necessary to render the code that way. A similar situation exists with ₩ and

EUC-KR Extended Unix Code (EUC) is a multibyte character encoding Character encoding is the process of assigning numbers to Graphics, graphical character (computing), characters, especially the written characters of Language, human language, allowing ...

. Another legacy is the existence of trigraphs in the

C programming language ''The C Programming Language'' (sometimes termed ''K&R'', after its authors' initials) is a computer programming book written by Brian Kernighan and Dennis Ritchie, the latter of whom originally designed and implemented the language, as well as ...

Published standards

* ISO/R646-1967 * ISO 646:1972 * ISO 646:1983 * ISO/IEC 646:1991 * ECMA-6 (1965-04-30), first edition * ECMA-6 (1967-06), second edition * ECMA-6 (1970-07), third edition * ECMA-6 (1973-08), fourth edition * ECMA-6 (1984-12, 1985-03), fifth edition * ECMA-6 (1991-12, 1997-08), sixth edition

Code page layout

The following table shows the ISO/IEC 646 Invariant character set. Each character is shown with its

equivalent. National code points are gray with the ASCII character that is replaced. Yellow indicates a character that, in some regions, could be combined with a previous character as a

diacritic A diacritic (also diacritical mark, diacritical point, diacritical sign, or accent) is a glyph added to a letter or to a basic glyph. The term derives from the Ancient Greek (, "distinguishing"), from (, "to distinguish"). The word ''diacriti ...

using the

backspace Backspace () is the keyboard key that originally pushed the typewriter carriage one position backwards and in modern computer systems moves the display cursor one position backwards,"Backwards" means to the left for left-to-right languages. delete ...

character, which may affect

glyph A glyph () is any kind of purposeful mark. In typography, a glyph is "the specific shape, design, or representation of a character". It is a particular graphical representation, in a particular typeface, of an element of written language. A g ...

choice. In addition to the invariant set restrictions, 0x23 is restricted to be either # or £ and 0x24 is restricted to be either $ or ¤ in ECMA-6:1991, equivalent to ISO/IEC 646:1991. However, these restrictions are not followed by all national variants.

Related encoding families

National Replacement Character Set

The National Replacement Character Set (NRCS) is a family of 7-bit encodings introduced in 1983 by DEC with the VT200 series of computer terminals. It is closely related to ISO/IEC 646, being based on a similar invariant subset of ASCII, differing in retaining $ as invariant but not _ (although most NRCS variants retain the _, and hence comply with the ISO/IEC 646 invariant set). Most NRCS variants are closely related to corresponding national ISO/IEC 646 variants where they exist, with the exception of the Dutch variant.

World System Teletext

The European telecommunications standard ETS 300 706, "Enhanced Teletext specification", defines Latin, Greek, Cyrillic, Arabic and Hebrew code sets with several national variants for both Latin and Cyrillic. Like NRCS and ISO/IEC 646, within the Latin variants, the family of encodings known as the G0 set are based on a similar invariant subset of ASCII, but do not retain either $ nor _ as invariant. Unlike NRCS, variants often differ considerably from corresponding national ISO/IEC 646 variants.

Variant codes and descriptions

ISO/IEC 646 national variants

Some national variants of ISO/IEC 646 are as follows:

National derivatives

Some national character sets also exist which are based on ISO/IEC 646 but do not strictly follow its invariant set (see also § Derivatives for other alphabets):

Control characters

All the variants listed above are solely graphical character sets, and are to be used with a C0 control character set such as listed in the following table:

Associated supplementary character sets

The following table lists supplementary graphical character sets defined by the same standard as specific ISO/IEC 646 variants. These would be selected by using a mechanism such as

shift out Shift Out (SO) and Shift In (SI) are ASCII control characters 14 and 15, respectively (0x0E and 0x0F). These are sometimes also called "Control-N" and "Control-O". The original meaning of those characters provided a way to shift a coloured ribbon ...

or the NATS super shift (single shift), or by setting the eighth bit in environments where one was available:

Variant comparison chart

The specifics of the changes for some of these variants are given in the following table. Character assignments unchanged across all listed variants (i.e. which remain the same as ASCII) are not shown. For ease of comparison, variants detailed include national variants of ISO/IEC 646, DEC's closely related

National Replacement Character Set The National Replacement Character Set (NRCS) was a feature supported by later models of Digital's (DEC) computer terminal systems, starting with the VT200 series in 1983. NRCS allowed individual characters from one character set to be replaced b ...

(NRCS) series used on VT200 terminals, the related European

World System Teletext World System Teletext (WST) is the name of a standard for encoding and displaying teletext information, which is used as the standard for teletext throughout Europe today. It was adopted into the international standard ITU-R, CCIR 653 (now ITU-R BT ...

encoding series defined in

ETS ETS or ets may refer to: Climate change, environment and economy * Emissions trading scheme ** European Union Emission Trading Scheme Organisations * European Thermoelectric Society * Evangelical Theological Society Education * École de techno ...

300 706, and a few other closely related encodings based on ISO/IEC 646. Individual code charts are linked from the second column. The cells with non-white background emphasize the differences from

US-ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of ...

(also the Basic Latin subset of ISO/IEC 10646 and Unicode). Several characters could be used as

combining character In digital typography, combining characters are characters that are intended to modify other characters. The most common combining characters in the Latin script are the combining diacritical marks (including combining accents). Unicode also ...

s, when preceded or followed with a

C0 control. This is attested in the code charts for IRV, GB, FR1, CA and CA2, which note that "',^ would behave as the diaeresis,

acute accent The acute accent (), , is a diacritic used in many modern written languages with alphabets based on the Latin, Cyrillic, and Greek scripts. For the most commonly encountered uses of the accent in the Latin and Greek alphabets, precomposed ch ...

cedilla A cedilla ( ; from Spanish) or cedille (from French , ) is a hook or tail ( ¸ ) added under certain letters as a diacritical mark to modify their pronunciation. In Catalan, French, and Portuguese (called cedilha) it is used only under the ' ...

and

circumflex The circumflex () is a diacritic in the Latin and Greek scripts that is also used in the written forms of many languages and in various romanization and transcription schemes. It received its English name from la, circumflexus "bent around"a ...

(rather than

quotation marks Quotation marks (also known as quotes, quote marks, speech marks, inverted commas, or talking marks) are punctuation marks used in pairs in various writing systems to set off direct speech, a quotation, or a phrase. The pair consists of an ...

, a

comma The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline ...

and an upward arrowhead) when preceded or followed by a backspace. The tilde character (~) was similarly introduced as a diacritic (˜). This encoding method originated in the typewriter/

teletype A teleprinter (teletypewriter, teletype or TTY) is an electromechanical device that can be used to send and receive typed messages through various communications channels, in both point-to-point and point-to-multipoint configurations. Initia ...

era when use of backspace would overstamp a glyph, and may be considered

deprecated In several fields, especially computing, deprecation is the discouragement of use of some terminology, feature, design, or practice, typically because it has been superseded or is no longer considered efficient or safe, without completely removing ...

. Later, when wider character sets gained more acceptance,

, vendor-specific character sets and eventually

became the preferred methods of coding most of these variants.

Derivatives for other alphabets

Some 7-bit character sets for non-Latin alphabets are derived from the ISO/IEC 646 standard: these do not themselves constitute ISO/IEC 646 due to not following its invariant code points (often replacing the letters of at least one case), due to supporting differing alphabets which the set of national code points provide insufficient encoding space for. Examples include: * 7-bit Turkmen (ISO-IR-230). * 7-bit Greek. ** In ELOT 927 (ISO-IR-088), the

Greek alphabet The Greek alphabet has been used to write the Greek language since the late 9th or early 8th century BCE. It is derived from the earlier Phoenician alphabet, and was the earliest known alphabetic script to have distinct letters for vowels as we ...

is mapped in alphabetical order (except for the final-sigma) to positions 0x61–0x71 and 0x73–0x79, on top of the Latin lowercase letters. ** ISO-IR-018 maps the Greek alphabet over both letter cases using a different scheme (not in alphabetical order, but trying where possible to match Greek letters over Roman letters which correspond in some sense), and ISO-IR-019 maps the Greek uppercase alphabet over the Latin lowercase letters using the same scheme as ISO-IR-018. ** The lower half of the

Symbol A symbol is a mark, sign, or word that indicates, signifies, or is understood as representing an idea, object, or relationship. Symbols allow people to go beyond what is known or seen by creating linkages between otherwise very different conc ...

font character encoding uses its own scheme for mapping Greek letters of both cases over the ASCII Roman letters, also trying to map Greek letters over Roman letters which correspond in some sense, but making different decisions in this regard (see chart below). It also replaces invariant code points 0x22 and 0x27 and five national code points with mathematical symbols. Although not intended for use in typesetting Greek prose, it is sometimes used for that purpose. ** ISO-IR-027 (detailed in the chart above rather than below) includes the Latin alphabet unchanged, but adds some Greek capital letters which cannot be represented with Latin-script

homoglyphs In orthography and typography, a homoglyph is one of two or more graphemes, characters, or glyphs with shapes that appear identical or very similar. The designation is also applied to sequences of characters sharing these properties. Synoglyphs ...

; while it is explicitly based on ISO/IEC 646, some of these are mapped to code points which are invariant in ISO/IEC 646 (0x21, 0x3A and 0x3F), and it is therefore not a true ISO/IEC 646 variant. ** The

encoding for Greek uses yet another scheme of mapping Greek letters in alphabetical order over the ASCII letters of both cases, notably including several letters with diacritics. * 7-bit Cyrillic ** KOI-7 or Short KOI, used for

Russian Russian(s) refers to anything related to Russia, including: *Russians (, ''russkiye''), an ethnic group of the East Slavic peoples, primarily living in Russia and neighboring countries *Rossiyane (), Russian language term for all citizens and peo ...

. The Cyrillic characters are mapped to positions 0x60–0x7E, on top of the Latin lowercase letters, matching homologous letters where possible (where в is mapped to w, not v). Superseded by the

KOI-8 KOI-8 (КОИ-8) is an 8-bit character set standardized in GOST 19768-74. Маркелова Л. Н. Эксплуатация программоуправляемой вычислительной машины «Искра 226». — М.: Ма ...

variants. ** SRPSCII and MAKSCII, Cyrillic variants of YUSCII (the Latin variant is YU/ISO-IR-141 in the chart above), used for

Serbian Serbian may refer to: * someone or something related to Serbia, a country in Southeastern Europe * someone or something related to the Serbs, a South Slavic people * Serbian language * Serbian names See also

* * * Old Serbian (disambiguat ...

and

Macedonian Macedonian most often refers to someone or something from or related to Macedonia. Macedonian(s) may specifically refer to: People Modern * Macedonians (ethnic group), a nation and a South Slavic ethnic group primarily associated with North M ...

respectively. Largely homologous to the Latin variant of YUSCII (following Serbian

digraphia In sociolinguistics, digraphia refers to the use of more than one writing system for the same language. Synchronic digraphia is the coexistence of two or more writing systems for the same language, while diachronic digraphia (or sequential digra ...

rules Rule or ruling may refer to: Education * Royal University of Law and Economics (RULE), a university in Cambodia Human activity * The exercise of political or personal control by someone with authority or power * Business rule, a rule perta ...

), except for Љ (lj), Њ (nj), Џ (dž) and ѕ (dz), which correspond to digraphs in Latin-script orthography, and are mapped over letters which are not used in Serbian or Macedonian (q, w, x, y). ** The G0 sets for the

encodings for Russian/Bulgarian and Ukrainian use G0 sets similar to KOI-7 with some modifications. The corresponding G0 set for Serbian Cyrillic uses a scheme based on the Teletext encoding for Latin-script

Serbo-Croatian Serbo-Croatian () – also called Serbo-Croat (), Serbo-Croat-Bosnian (SCB), Bosnian-Croatian-Serbian (BCS), and Bosnian-Croatian-Montenegrin-Serbian (BCMS) – is a South Slavic language and the primary language of Serbia, Croatia, Bosnia and ...

and Slovene, as opposed to the significantly different YUSCII. * 7-bit Hebrew, SI 960. The

Hebrew alphabet The Hebrew alphabet ( he, wikt:אלפבית, אָלֶף־בֵּית עִבְרִי, ), known variously by scholars as the Ktav Ashuri, Jewish script, square script and block script, is an abjad script used in the writing of the Hebrew languag ...

is mapped to positions 0x60–0x7A, on top of the lowercase Latin letters (and grave accent for aleph). 7-bit Hebrew was always stored in visual order. This mapping with the high bit set, i.e. with the Hebrew letters in 0xE0–0xFA, is

ISO/IEC 8859-8 ISO/IEC 8859-8, ''Information technology — 8-bit single-byte coded graphic character sets — Part 8: Latin/Hebrew alphabet'', is part of the ISO/IEC 8859 series of ASCII-based standard character encodings. ISO/IEC 8859-8:1999 from 1999 represen ...

. The World System Teletext encoding for Hebrew uses the same letter mappings, but uses BS_Viewdata as its base encoding (whereas SI 960 uses US-ASCII) and includes a

shekel sign The shekel sign (₪) is a currency sign used for the Israeli new shekel, which is the currency of Israel. Israeli new shekel (1986–present) The Israeli new shekel is denoted in he, שקל חדש (''šéqel ħadáš'', , lit. "New Sheke ...

at 0x7B. * 7-bit Arabic, ASMO 449 (ISO-IR-089). The Arabic alphabet is mapped to positions 0x41–0x5A and 0x60–0x6A, on top of both uppercase and lowercase Latin letters. A comparison of some of these encodings is below. Only one case is shown, except in instances where the cases are mapped to different letters. In such instances, the mapping with the smallest code is shown first. Possible transcriptions are given for some letters; where this is omitted, the letter can be considered to correspond to the Roman one which it is mapped over.

Footnotes

References

External links

ISO/IEC 646:1991 Information technology — ISO 7-bit coded character set for information interchange

(in German)

at

GNU Aspell GNU Aspell, usually called just Aspell, is a free software spell checker designed to replace Ispell. It is the standard spell checker for the GNU operating system. It also compiles for other Unix-like operating systems and Windows. The main pro ...

website
ISO646 Character Tables Character Tables by Koichi Yasuoka (安岡孝)
(see ''Domestic ISO646 Character Tables'' and ''Quasi-ISO646 Character Tables'')

a tool (based on statistical pentagram analysis of the Turkish language) which reverts an ASCII'fied Turkish text by determining the appropriate (but ambiguous) diacritics normally needed in Turkish but missing in the US-ASCII set. {{DEFAULTSORT:ISO IEC 646 Character sets Ecma standards #00646