The MARC-8 charset is a
MARC standard used in
MARC-21 library records.
The MARC formats are standards for the representation and communication of bibliographic and related information in machine-readable form, and they are frequently used in
library database systems. The
character encoding
Character encoding is the process of assigning numbers to graphical
Graphics (from Greek
Greek may refer to:
Greece
Anything of, from, or related to Greece
Greece ( el, Ελλάδα, , ), officially the Hellenic Republic, is a country ...
now known as MARC-8 was introduced in 1968 as part of the MARC format. Originally based on the
Latin alphabet
The Latin alphabet or Roman alphabet is the collection of letters originally used by the ancient Romans
In historiography
Historiography is the study of the methods of historian
( 484– 425 BC) was a Greek historian who lived ...

, from 1979 to 1983 the
JACKPHY In library automation the initialism JACKPHY refers to a group of Writing system, language scripts not based on Latin script, Roman characters, specifically: Japanese orthography, Japanese, Arabic orthography, Arabic, Chinese orthography, Chinese, K ...
initiative expanded the repertoire to include Japanese, Arabic, Chinese, and Hebrew characters (among others), with the later addition of Cyrillic and Greek scripts. If a character is not representable in MARC-8 of a MARC-21 record, then
UTF-8
UTF-8 is a variable-width character encoding
Character encoding is the process of assigning numbers to Graphics, graphical character (computing), characters, especially the written characters of Language, human language, allowing them to be ...
must be used instead. UTF-8 has support for many more characters than MARC-8, which is rarely used outside library data.
Technical details
MARC-8 uses a variant of the
ISO-2022
ISO/IEC 2022 ''Information technology—Character code structure and extension techniques'', is an International Organization for Standardization, ISO standard (equivalent to the Ecma International, ECMA standard ECMA-35, the ANSI standard ANSI ...
encoding. It uses escape characters to represent characters beyond the 7-bit
ASCII
ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding
Character encoding is the process of assigning numbers to graphical
Graphics (from Greek
Greek may refer to:
Greece
Anything of, ...
range of characters.
It generally uses the same logical
BiDi
A bidirectional text contains two text direction
A writing system is a method of visually representing verbal communication, based on a script and a orthography, set of rules regulating its use. While both writing and spoken language, speech ...
ordering as
Unicode
Unicode, formally the Unicode Standard, is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expressed in most of the world's wri ...

.
The combining characters and base characters are in a different order than used in Unicode. The following are some examples. The combining characters are not always stored in reverse order as
Unicode normalization
Unicode equivalence is the specification by the Unicode
Unicode, formally the Unicode Standard, is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Charact ...
. The MARC-21 standard describes the MARC-8 Unicode conversion issues in more detail.
Code structure
The
ISO/IEC 2022
ISO/IEC 2022 ''Information technology—Character code structure and extension techniques'', is an ISO
The International Organization for Standardization (ISO ) is an international standard
An international standard is a technical standard
...
coding specifies a two-layer mapping between character codes and displayed characters. In MARC-8, character codes from the 7-bit ASCII graphic range (0x20–0x7F) are referred to as "G0" codes, while codes from the "high ASCII" range (0xA0–0xFF) are referred to as the "G1" codes. Graphic character sets are designated and invoked by means of a multiple byte escape sequence consisting of the escape character, an Intermediate character sequence, and a Final character in the form ESC ''I'' ''F''.
The following table shows the intermediate byte after the ESC byte (hexadecimal 1B), and the corresponding ASCII characters.
The following table shows the final bytes in hexadecimal and the corresponding ASCII characters after the intermediate bytes.
The EACC is the only multibyte encoding of MARC-8, it encodes each
CJK
In internationalization
In economics, internationalization or internationalisation is the process of increasing involvement of enterprises in international markets, although there is no agreed definition of internationalization. Internationaliz ...
character in three ASCII bytes.
For example, to encode the U+4EBA CJK character (人) you will need the following bytes
\x1B\x24\x31\x21\x30\x64
The \x1B\x24\x31 switches to EACC/CJK, and the \x21\x30\x64 corresponds to the U+4EBA.
Custom set extension
In addition to the ISO-2022 character sets, the following custom sets are available too. The byte designation follows the escape byte (hexadecimal 1B). There is no intermediate byte.
References
External links
MARC 21 Specifications for Record Structure, Character Sets, and Exchange Media- The official MARC-8 standard as maintained by the
US Library of Congress
The Library of Congress (LC) is the research library that officially serves the United States Congress and is the ''de facto'' national library of the United States. It is the oldest federal cultural institution in the United States. The libr ...
{{DEFAULTSORT:Marc-8
Character setsThe category
Category, plural categories, may refer to:
Philosophy and general uses
*Categorization, categories in cognitive science, information science and generally
*Category of being
*Categories (Aristotle), ''Categories'' (Aristotle)
*Categ ...