HOME

TheInfoList



OR:

This is a list of some binary codes that are (or have been) used to represent
text Text may refer to: Written word * Text (literary theory) In literary theory, a text is any object that can be "read", whether this object is a work of literature, a street sign, an arrangement of buildings on a city block, or styles of clothi ...
as a sequence of binary digits "0" and "1". Fixed-width binary codes use a set number of bits to represent each character in the text, while in variable-width binary codes, the number of bits may vary from character to character.


Five-bit binary codes

Several different five-bit codes were used for early
punched tape file:PaperTapes-5and8Hole.jpg, Five- and eight-hole wide punched paper tape file:Harwell-dekatron-witch-10.jpg, Paper tape reader on the Harwell computer with a small piece of five-hole tape connected in a circle – creating a physical program ...
systems. Five bits per character only allows for 32 different characters, so many of the five-bit codes used two sets of characters per value referred to as FIGS (figures) and LTRS (letters), and reserved two characters to switch between these sets. This effectively allowed the use of 60 characters. Standard five-bit standard codes are: * International Telegraph Alphabet No. 1 (ITA1) – Also commonly referred to as
Baudot code The Baudot code () is an early character encoding for telegraphy invented by Émile Baudot in the 1870s. It was the predecessor to the International Telegraph Alphabet No. 2 (ITA2), the most common teleprinter code in use before ASCII. Each ch ...
*
International Telegraph Alphabet No. 2 The Baudot code () is an early character encoding for telegraphy invented by Émile Baudot in the 1870s. It was the predecessor to the International Telegraph Alphabet No. 2 (ITA2), the most common teleprinter code in use before ASCII. Each ch ...
(ITA2) – Also commonly referred to as
Murray code The Baudot code () is an early character encoding for telegraphy invented by Émile Baudot in the 1870s. It was the predecessor to the International Telegraph Alphabet No. 2 (ITA2), the most common teleprinter code in use before ASCII. Each char ...
* American Teletypewriter code (USTTY) – A variant of ITA2 used in the USA * DIN 66006 – Developed for the presentation of
ALGOL ALGOL (; short for "Algorithmic Language") is a family of imperative computer programming languages originally developed in 1958. ALGOL heavily influenced many other languages and was the standard method for algorithm description used by the ...
/ ALCOR programs on paper tape and punch cards The following early computer systems each used its own five-bit code: * J. Lyons and Co. LEO (Lyon's Electronic Office) *
English Electric The English Electric Company Limited (EE) was a British industrial manufacturer formed after World War I by amalgamating five businesses which, during the war, made munitions, armaments and aeroplanes. It initially specialised in industrial el ...
DEUCE *
University of Illinois at Urbana-Champaign The University of Illinois Urbana-Champaign (UIUC, U of I, Illinois, or University of Illinois) is a public land-grant research university in the Champaign–Urbana metropolitan area, Illinois, United States. Established in 1867, it is the f ...
ILLIAC *
ZEBRA Zebras (, ) (subgenus ''Hippotigris'') are African equines with distinctive black-and-white striped coats. There are three living species: Grévy's zebra (''Equus grevyi''), the plains zebra (''E. quagga''), and the mountain zebra (''E. ...
* EMI 1100 * Ferranti Mercury,
Pegasus Pegasus (; ) is a winged horse in Greek mythology, usually depicted as a white stallion. He was sired by Poseidon, in his role as horse-god, and foaled by the Gorgon Medusa. Pegasus was the brother of Chrysaor, both born from Medusa's blood w ...
, and Orion systems The steganographic code, commonly known as Bacon's cipher uses groups of 5 binary-valued elements to represent letters of the alphabet.


Six-bit binary codes

Six bits per character allows 64 distinct characters to be represented. Examples of six-bit binary codes are: * International Telegraph Alphabet No. 4 ( ITA4) * Six-bit BCD (Binary Coded Decimal), used by early
mainframe A mainframe computer, informally called a mainframe or big iron, is a computer used primarily by large organizations for critical applications like bulk data processing for tasks such as censuses, industry and consumer statistics, enterpris ...
computers. * Six-bit ASCII subset of the primitive seven-bit ASCII *
Braille Braille ( , ) is a Tactile alphabet, tactile writing system used by blindness, blind or visually impaired people. It can be read either on embossed paper or by using refreshable braille displays that connect to computers and smartphone device ...
– Braille characters are represented using six dot positions, arranged in a rectangle. Each position may contain a raised dot or not, so Braille can be considered to be a six-bit binary code. See also:
Six-bit character code A six-bit character code is a character encoding designed for use on computers with word lengths a multiple of 6. Six bits can only encode 64 distinct characters, so these codes generally include only the upper-case letters, the numerals, some pun ...
s


Seven-bit binary codes

Examples of seven-bit binary codes are: * International Telegraph Alphabet No. 3 ( ITA3) – derived from the Moore ARQ code, and also known as the RCA *
ASCII ASCII ( ), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable character, printable and 33 control character, control c ...
– The ubiquitous ASCII code was originally defined as a seven-bit character set. The ASCII article provides a detailed set of equivalent standards and variants. In addition, there are various extensions of ASCII to eight bits (see Eight-bit binary codes) * CCIR 476 – Extends ITA2 from 5 to 7 bits, using the extra 2 bits as check digits * International Telegraph Alphabet No. 4 ( ITA4)


Eight-bit binary codes

*
Extended ASCII Extended ASCII is a repertoire of character encodings that include (most of) the original 96 ASCII character set, plus up to 128 additional characters. There is no formal definition of "extended ASCII", and even use of the term is sometimes critic ...
– A number of standards extend ASCII to eight bits by adding a further 128 characters, such as: ** HP Roman **
ISO/IEC 8859 ISO/IEC 8859 is a joint International Organization for Standardization, ISO and International Electrotechnical Commission, IEC series of standards for 8-bit character encodings. The series of standards consists of numbered parts, such as ISO/IEC ...
** Mac OS Roman **
Windows-1252 Windows-1252 or CP-1252 ( Windows code page 1252) is a legacy single-byte character encoding that is used by default (as the "ANSI code page") in Microsoft Windows throughout the Americas, Western Europe, Oceania, and much of Africa. Initially ...
*
EBCDIC Extended Binary Coded Decimal Interchange Code (EBCDIC; ) is an eight- bit character encoding used mainly on IBM mainframe and IBM midrange computer operating systems. It descended from the code used with punched cards and the corresponding si ...
– Used in early
IBM International Business Machines Corporation (using the trademark IBM), nicknamed Big Blue, is an American Multinational corporation, multinational technology company headquartered in Armonk, New York, and present in over 175 countries. It is ...
computers and current
IBM i IBM i (the ''i'' standing for ''integrated'') is an operating system developed by IBM for IBM Power Systems. It was originally released in 1988 as OS/400, as the sole operating system of the IBM AS/400 line of systems. It was renamed to i5/OS in 2 ...
and System z systems.


10-bit binary codes

*AUTOSPEC – Also known as Bauer code. AUTOSPEC repeats a five-bit character twice, but if the character has odd parity, the repetition is inverted. * Decabit – A datagram of electronic pulses which are transmitted commonly through power lines. Decabit is mainly used in Germany and other European countries.


16-bit binary codes

* UCS-2 – An obsolete encoding capable of representing the
basic multilingual plane In the Unicode standard, a plane is a contiguous group of 65,536 (216) code points. There are 17 planes, identified by the numbers 0 to 16, which corresponds with the possible values 00–1016 of the first two positions in six position hexadecimal ...
of Unicode


32-bit binary codes

*
UTF-32/UCS-4 UTF-32 (32- bit Unicode Transformation Format), sometimes called UCS-4, is a fixed-length encoding used to encode Unicode code points that uses exactly 32 bits (four bytes) per code point (but a number of leading bits must be zero as there are far ...
– A four-bytes-per-character representation of
Unicode Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
.


Variable-length binary codes

*
UTF-8 UTF-8 is a character encoding standard used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode Transformation Format 8-bit''. Almost every webpage is transmitted as UTF-8. UTF-8 supports all 1,112,0 ...
– Encodes characters in a way that is mostly compatible with
ASCII ASCII ( ), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable character, printable and 33 control character, control c ...
but can also encode the full repertoire of Unicode characters with sequences of up to four 8-bit bytes. *
UTF-16 UTF-16 (16-bit Unicode Transformation Format) is a character encoding that supports all 1,112,064 valid code points of Unicode. The encoding is variable-length as code points are encoded with one or two ''code units''. UTF-16 arose from an earli ...
– Extends UCS-2 to cover the whole of Unicode with sequences of one or two 16-bit elements *
GB 18030 GB 18030 is a Chinese government standard, described as ''Information Technology — Chinese coded character set'' and defines the required language and character support necessary for software in China. GB18030 is the registered Internet n ...
– A full-Unicode variable-length code designed for compatibility with older Chinese multibyte encodings *
Huffman coding In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression. The process of finding or using such a code is Huffman coding, an algorithm developed by ...
– A technique for expressing more common characters using shorter bit strings than are used for less common characters
Data compression In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compressi ...
systems such as
Lempel–Ziv–Welch Lempel–Ziv–Welch (LZW) is a universal lossless data compression algorithm created by Abraham Lempel, Jacob Ziv, and Terry Welch. It was published by Welch in 1984 as an improved implementation of the LZ78 algorithm published by Lem ...
can compress arbitrary binary data. They are therefore not binary codes themselves but may be applied to binary codes to reduce storage needs.


Other

*
Morse code Morse code is a telecommunications method which Character encoding, encodes Written language, text characters as standardized sequences of two different signal durations, called ''dots'' and ''dashes'', or ''dits'' and ''dahs''. Morse code i ...
is a variable-length telegraphy code, which traditionally uses a series of long and short pulses to encode characters. It relies on gaps between the pulses to provide separation between letters and words, as the letter codes do not have the "prefix property". This means that Morse code is not necessarily a binary system, but in a sense may be a ternary system, with a 10 for a "dit" or a "dot", a 1110 for a dash, and a 00 for a single unit of separation. Morse code can be represented as a binary stream by allowing each bit to represent one unit of time. Thus a "dit" or "dot" is represented as a 1 bit, while a "dah" or "dash" is represented as three consecutive 1 bits. Spaces between symbols, letters, and words are represented as one, three, or seven consecutive 0 bits. For example, "NO U" in Morse code is "", which could be represented in binary as "1110100011101110111000000010101110". If, however, Morse code is represented as a ternary system, "NO U" would be represented as "1110, 10, 00, 1110, 1110, 1110, 00, 00, 00, 10, 10, 1110".


See also

*
List of computer character sets This list provides an inventory of character coding standards mainly before modern standards like ISO/IEC 646 etc. Some of these standards have been deeply involved in historic events that still have consequences. One notable example of this is th ...


References

{{Reflist Primitive types Data types Computing terminology Data unit Units of information