HOME
*





MARC-8
The MARC-8 charset is a MARC standard used in MARC-21 library records. The MARC formats are standards for the representation and communication of bibliographic and related information in machine-readable form, and they are frequently used in library database systems. The character encoding now known as MARC-8 was introduced in 1968 as part of the MARC format. Originally based on the Latin alphabet, from 1979 to 1983 the JACKPHY initiative expanded the repertoire to include Japanese, Arabic, Chinese, and Hebrew characters (among others), with the later addition of Cyrillic and Greek scripts. If a character is not representable in MARC-8 of a MARC-21 record, then UTF-8 must be used instead. UTF-8 has support for many more characters than MARC-8, which is rarely used outside library data. Technical details MARC-8 uses a variant of the ISO-2022 encoding. It uses escape characters to represent characters beyond the 7-bit ASCII range of characters. It generally uses the same logical ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

ISO-2022
ISO/IEC 2022 ''Information technology—Character code structure and extension techniques'', is an ISO/ IEC standard (equivalent to the ECMA standard ECMA-35, the ANSI standard ANSI X3.41 and the Japanese Industrial Standard JIS X 0202) in the field of character encoding. Originating in 1971, it was most recently revised in 1994. ISO 2022 specifies a general structure which character encodings can conform to, dedicating particular ranges of bytes ( 0x00–1F and 0x7F–9F) to be used for non-printing control codes for formatting and in-band instructions (such as line breaks or formatting instructions for text terminals), rather than graphical characters. It also specifies a syntax for escape sequences, multiple-byte sequences beginning with the control code, which can likewise be used for in-band instructions. Specific sets of control codes and escape sequences designed to be used with ISO 2022 include ISO/IEC 6429, portions of which are implemented by ANSI.SYS and terminal ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

ISO/IEC 2022
ISO/IEC 2022 ''Information technology—Character code structure and extension techniques'', is an ISO/IEC standard (equivalent to the ECMA standard ECMA-35, the ANSI standard ANSI X3.41 and the Japanese Industrial Standard JIS X 0202) in the field of character encoding. Originating in 1971, it was most recently revised in 1994. ISO 2022 specifies a general structure which character encodings can conform to, dedicating particular ranges of bytes ( 0x00–1F and 0x7F–9F) to be used for non-printing control codes for formatting and in-band instructions (such as line breaks or formatting instructions for text terminals), rather than graphical characters. It also specifies a syntax for escape sequences, multiple-byte sequences beginning with the control code, which can likewise be used for in-band instructions. Specific sets of control codes and escape sequences designed to be used with ISO 2022 include ISO/IEC 6429, portions of which are implemented by ANSI.SYS and terminal e ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


MARC Standards
MARC (machine-readable cataloging) standards are a set of digital formats for the description of items catalogued by libraries, such as books, DVDs, and digital resources. Computerized library catalogs and library management software need to structure their catalog records as per an industry-wide standard, which is MARC, so that bibliographic information can be shared freely between computers. The structure of bibliographic records almost universally follows the MARC standard. Other standards work in conjunction with MARC, for example, Anglo-American Cataloguing Rules (AACR)/Resource Description and Access (RDA) provide guidelines on formulating bibliographic data into the MARC record structure, while the International Standard Bibliographic Description (ISBD) provides guidelines for displaying MARC records in a standard, human-readable form. History Working with the Library of Congress, American computer scientist Henriette Avram developed MARC during 1965–1968 to create re ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Chinese Character Code For Information Interchange
The Chinese Character Code for Information Interchange () or CCCII is a character set developed by the Chinese Character Analysis Group in Taiwan. It was first published in 1980, and significantly expanded in 1982 and 1987. It is used mostly by library systems. It is one of the earliest established and most sophisticated encodings for traditional Chinese (predating the establishment of Big5 in 1984 and CNS 11643 in 1986). It is distinguished by its unique system for encoding simplified versions and other variants of its main set of hanzi characters. A variant of an earlier version of CCCII is used by the Library of Congress as part of MARC-8, under the name East Asian Character Code (EACC, ANSI/NISO Z39.64), where it comprises part of MARC 21's JACKPHY support. However, EACC contains fewer characters than the most recent versions of CCCII. Design Byte ranges CCCII is designed as an 94n set, as defined by ISO/IEC 2022. Each Chinese character is represented by a 3-byte code i ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


MARC Standards
MARC (machine-readable cataloging) standards are a set of digital formats for the description of items catalogued by libraries, such as books, DVDs, and digital resources. Computerized library catalogs and library management software need to structure their catalog records as per an industry-wide standard, which is MARC, so that bibliographic information can be shared freely between computers. The structure of bibliographic records almost universally follows the MARC standard. Other standards work in conjunction with MARC, for example, Anglo-American Cataloguing Rules (AACR)/Resource Description and Access (RDA) provide guidelines on formulating bibliographic data into the MARC record structure, while the International Standard Bibliographic Description (ISBD) provides guidelines for displaying MARC records in a standard, human-readable form. History Working with the Library of Congress, American computer scientist Henriette Avram developed MARC during 1965–1968 to create re ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


ANSEL
ANSEL, the American National Standard for Extended Latin Alphabet Coded Character Set for Bibliographic Use, was a character set used in text encoding. It provided a table of coded values for the representation of characters of the extended Latin alphabet in machine-readable form for thirty-five languages written in the Latin alphabet and for fifty-one romanized languages. ANSEL adds 63 graphic characters to ASCII, including 29 combining diacritic characters. The initial revision of ANSEL was released in 1985, and before 1993 it was registered as Registration #231 in the ISO International Register of Coded Character Sets to be Used with Escape Sequences. The standard was reaffirmed in 2003 although it has been administratively withdrawn by ANSI effective 14 February 2013. The requirement of hardware capable of overprinting accents doomed this from ever becoming a popular extended ASCII. Code page layout The following table shows ANSI/NISO Z39.47-1993 (R2003). Non-ASCII charact ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




JACKPHY
In library automation the initialism JACKPHY refers to a group of language scripts not based on Roman characters, specifically: Japanese, Arabic, Chinese, Korean, Persian, Hebrew, and Yiddish. Focus on these seven writing systems by Library of Congress, based on sharing bibliographic records using MARC standards, included a partnership between 1979 and 1983 with the Research Libraries Group to develop cataloging capability for non-Roman scripts in the RLIN bibliographic utility.Finding JACKPHY: Online Cataloging to Include Arabic, Hebrew, Other Scripts
by Susan Morris, ''Library of Congress Information Bulletin'', vol. 66: no. 12, December 2007. Ongoing efforts (JACKPHY Plus) enabled functionality for
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Unicode Normalization
Unicode equivalence is the specification by the Unicode character encoding standard that some sequences of code points represent essentially the same character. This feature was introduced in the standard to allow compatibility with preexisting standard character sets, which often included similar or identical characters. Unicode provides two such notions, canonical equivalence and compatibility. Code point sequences that are defined as canonically equivalent are assumed to have the same appearance and meaning when printed or displayed. For example, the code point U+006E (the Latin lowercase "n") followed by U+0303 (the combining tilde "◌̃") is defined by Unicode to be canonically equivalent to the single code point U+00F1 (the lowercase letter " ñ" of the Spanish alphabet). Therefore, those sequences should be displayed in the same manner, should be treated in the same way by applications such as alphabetizing names or searching, and may be substituted for each other. S ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Multi-byte Character Set
A variable-width encoding is a type of character encoding scheme in which codes of differing lengths are used to encode a character set (a repertoire of symbols) for representation, usually in a computer. Most common variable-width encodings are multibyte encodings, which use varying numbers of bytes ( octets) to encode different characters. (Some authors, notably in Microsoft documentation, use the term ''multibyte character set,'' which is a misnomer, because representation size is an attribute of the encoding, not of the character set.) Early variable width encodings using less than a byte per character were sometimes used to pack English text into fewer bytes in adventure games for early microcomputers. However disks (which unlike tapes allowed random access allowing text to be loaded on demand), increases in computer memory and general purpose compression algorithms have rendered such tricks largely obsolete. Multibyte encodings are usually the result of a need to incre ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


SBCS
SBCS, or Single Byte Character Set, is used to refer to character encodings that use exactly one byte for each graphic character. An SBCS can accommodate a maximum of 256 symbols, and is useful for scripts that do not have many symbols or accented letters such as the Latin, Greek and Cyrillic scripts used mainly for European languages. Examples of SBCS encodings include ISO/IEC 646, the various ISO 8859 encodings, and the various Microsoft/ IBM code pages. The term SBCS is commonly contrasted against the terms DBCS (double-byte character set) and TBCS (triple-byte character set), as well as MBCS (multi-byte character set). The multi-byte character sets are used to accommodate languages with scripts that have large numbers of characters and symbols, predominantly Asian languages such as Chinese, Japanese, and Korean. These are sometimes referred to by the acronym CJK. In these computing systems, SBCSs are traditionally associated with half-width characters, so-called because su ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

ASCII
ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of technical limitations of computer systems at the time it was invented, ASCII has just 128 code points, of which only 95 are , which severely limited its scope. All modern computer systems instead use Unicode, which has millions of code points, but the first 128 of these are the same as the ASCII set. The Internet Assigned Numbers Authority (IANA) prefers the name US-ASCII for this character encoding. ASCII is one of the IEEE milestones. Overview ASCII was developed from telegraph code. Its first commercial use was as a seven-bit teleprinter code promoted by Bell data services. Work on the ASCII standard began in May 1961, with the first meeting of the American Standards Association's (ASA) (now the American National Standards I ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Unicode
Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, which is maintained by the Unicode Consortium, defines as of the current version (15.0) 149,186 characters covering 161 modern and historic scripts, as well as symbols, emoji (including in colors), and non-visual control and formatting codes. Unicode's success at unifying character sets has led to its widespread and predominant use in the internationalization and localization of computer software. The standard has been implemented in many recent technologies, including modern operating systems, XML, and most modern programming languages. The Unicode character repertoire is synchronized with ISO/IEC 10646, each being code-for-code identical with the other. ''The Unicode Standard'', however, includes more than just the base code. Along ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]