8-bit clean
   HOME

TheInfoList



OR:

''8-bit clean'' is an attribute of
computer system A computer is a machine that can be programmed to carry out sequences of arithmetic or logical operations ( computation) automatically. Modern digital electronic computers can perform generic sets of operations known as programs. These prog ...
s, communication channels, and other devices and software, that handle
8-bit In computer architecture, 8-bit integers or other data units are those that are 8 bits wide (1 octet). Also, 8-bit central processing unit (CPU) and arithmetic logic unit (ALU) architectures are those that are based on registers or data buses ...
character encoding Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers. The numerical values tha ...
s correctly. Such encoding include the
ISO 8859 ISO/IEC 8859 is a joint ISO and IEC series of standards for 8-bit character encodings. The series of standards consists of numbered parts, such as ISO/IEC 8859-1, ISO/IEC 8859-2, etc. There are 15 parts, excluding the abandoned ISO/IEC 8859-12. ...
series and the
UTF-8 UTF-8 is a variable-length character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode'' (or ''Universal Coded Character Set'') ''Transformation Format 8-bit''. UTF-8 is capable of e ...
encoding of
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, ...
.


History

Until the early 1990s, many programs and data transmission channels were character-oriented and treated some characters, e.g., ETX, as
control characters In computing and telecommunication, a control character or non-printing character (NPC) is a code point (a number) in a character set, that does not represent a written symbol. They are used as in-band signaling to cause effects other than th ...
. Other assumed a stream of seven-bit characters, with values between 0 and 127; for example, the
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
standard used only seven bits per character, avoiding an 8-bit representation in order to save on data transmission costs. On computers and data links using 8-bit bytes this left the top
bit The bit is the most basic unit of information in computing and digital communications. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represente ...
of each
byte The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable uni ...
free for use as a
parity Parity may refer to: * Parity (computing) ** Parity bit in computing, sets the parity of data for the purpose of error detection ** Parity flag in computing, indicates if the number of set bits is odd or even in the binary representation of the ...
, flag bit, or meta data control bit. 7-bit systems and data links are unable to directly handle more complex character codes which are commonplace in non-
English English usually refers to: * English language * English people English may also refer to: Peoples, culture, and language * ''English'', an adjective for something of, from, or related to England ** English national ...
-speaking countries with larger
alphabet An alphabet is a standardized set of basic written graphemes (called letters) that represent the phonemes of certain spoken languages. Not all writing systems represent language in this way; in a syllabary, each character represents a syllab ...
s.
Binary file A binary file is a computer file that is not a text file. The term "binary file" is often used as a term meaning "non-text file". Many binary file formats contain parts that can be interpreted as text; for example, some computer document fil ...
s of
octets Octet may refer to: Music * Octet (music), ensemble consisting of eight instruments or voices, or composition written for such an ensemble ** String octet, a piece of music written for eight string instruments *** Octet (Mendelssohn), 1825 compo ...
cannot be transmitted through 7-bit data channels directly. To work around this,
binary-to-text encoding A binary-to-text encoding is encoding of data in plain text. More precisely, it is an encoding of binary data in a sequence of printable characters. These encodings are necessary for transmission of data when the channel does not allow binary ...
s have been devised which use only 7-bit
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
characters. Some of these encodings are uuencoding, Ascii85, SREC,
BinHex BinHex, originally short for "binary-to-hexadecimal", is a binary-to-text encoding system that was used on the classic Mac OS for sending binary files through e-mail. Originally a hexadecimal encoding, subsequent versions of BinHex are more simil ...
, kermit and
MIME Multipurpose Internet Mail Extensions (MIME) is an Internet standard that extends the format of email messages to support text in character sets other than ASCII, as well as attachments of audio, video, images, and application programs. Message ...
's
Base64 In computer programming, Base64 is a group of binary-to-text encoding schemes that represent binary data (more specifically, a sequence of 8-bit bytes) in sequences of 24 bits that can be represented by four 6-bit Base64 digits. Common to all bina ...
.
EBCDIC Extended Binary Coded Decimal Interchange Code (EBCDIC; ) is an eight- bit character encoding used mainly on IBM mainframe and IBM midrange computer operating systems. It descended from the code used with punched cards and the corresponding ...
-based systems cannot handle all characters used in UUencoded data. However, the base64 encoding does not have this problem.


SMTP and NNTP 8-bit cleanness

Historically, various media were used to transfer messages, some of them only supporting 7-bit data, so an 8-bit message had high chances to be garbled during transmission in the 20th century. But some implementations really did not care about formal discouraging of 8-bit data and allowed high bit set bytes to pass through. Such implementations are said to be 8-bit clean. In general, a communications protocol is said to be 8-bit clean if it correctly passes through the high bit of each byte in the communication process. Many early
communications protocol A communication protocol is a system of rules that allows two or more entities of a communications system to transmit information via any kind of variation of a physical quantity. The protocol defines the rules, syntax, semantics and synch ...
standards, such as (for
SMTP The Simple Mail Transfer Protocol (SMTP) is an Internet standard communication protocol for electronic mail transmission. Mail servers and other message transfer agents use SMTP to send and receive mail messages. User-level email clients ty ...
), (for
NNTP The Network News Transfer Protocol (NNTP) is an application protocol used for transporting Usenet news articles (''netnews'') between news servers, and for reading/posting articles by the end user client applications. Brian Kantor of the Univers ...
) and , were designed to work over such "7-bit" communication links. They specifically require the use of ASCII character set "transmitted as an 8-bit byte with the high-order bit cleared to zero" and some of these explicitly restrict ''all'' data to 7-bit characters. For the first few decades of email networks (1971 to the early 1990s), most email messages were
plain text In computing, plain text is a loose term for data (e.g. file contents) that represent only characters of readable material but not its graphical representation nor other objects (floating-point numbers, images, etc.). It may also include a limit ...
in the 7-bit US-ASCII character set. The definition of SMTP, like its predecessor , limits Internet Mail to lines (1000 characters or less) of 7-bit US-ASCII characters. Later the format of email messages was re-defined in order to support messages that are not entirely US-ASCII text (text messages in character sets other than US-ASCII, and non-text messages, such as audio and images). specifies "NNTP operates over any reliable bi-directional 8-bit-wide data stream channel." and changes the character set for commands to UTF-8. However, still limits the character set to ASCII, including and MIME encoding of non-ASCII data. The Internet community generally adds features by ''extension'', allowing communication in both directions between upgraded machines and not-yet-upgraded machines, rather than declaring formerly standards-compliant legacy software to be "broken" and insisting that all software worldwide be upgraded to the latest standard. In the mid-1990s, people objected to "just send 8 bits (to SMTP servers)", perhaps because of a perception that "just send 8 bits" is an implicit declaration that
ISO 8859-1 ISO/IEC 8859-1:1998, ''Information technology — 8-bit single-byte coded graphic character sets — Part 1: Latin alphabet No. 1'', is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in ...
become the new "standard encoding", forcing everyone in the world to use the same
character set Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers. The numerical values tha ...
. Instead, the recommended way to take advantage of 8-bit-clean links between machines is to use the ESMTP ()
8BITMIME The Simple Mail Transfer Protocol (SMTP) is an Internet standard communication protocol for electronic mail transmission. Mail servers and other message transfer agents use SMTP to send and receive mail messages. User-level email clients typi ...
extension for message bodies and the SMTP
SMTPUTF8 The Simple Mail Transfer Protocol (SMTP) is an Internet standard communication protocol for electronic mail transmission. Mail servers and other message transfer agents use SMTP to send and receive mail messages. User-level email clients typica ...
extension for message headers. Despite this, some
mail transfer agent The mail or post is a system for physically transporting postcards, letters, and parcels. A postal service can be private or public, though many governments place restrictions on private systems. Since the mid-19th century, national postal syst ...
s, notably Exim and
qmail qmail is a mail transfer agent (MTA) that runs on Unix. It was written, starting December 1995, by Daniel J. Bernstein as a more secure replacement for the popular Sendmail program. Originally license-free software, qmail's source cod ...
, relay mail to servers that do not advertise 8BITMIME without performing the conversion to 7-bit MIME (typically
quoted-printable Quoted-Printable, or QP encoding, is a binary-to-text encoding system using printable ASCII characters (alphanumeric and the equals sign =) to transmit 8-bit data over a 7-bit data path or, generally, over a medium which is not 8-bit clean. Hist ...
, "Q-P conversion") required by . This "just-send-8" attitude does not in fact cause problems in practice, since virtually all modern email servers are 8-bit clean.


See also

*
32-bit clean Historically, the classic Mac OS used a form of memory management that has fallen out of favor in modern systems. Criticism of this approach was one of the key areas addressed by the change to . The original problem for the engineers of the Maci ...
* *


References

{{Reflist Character encoding