''8-bit clean'' is an attribute of
computer system
A computer is a machine that can be programmed to automatically carry out sequences of arithmetic or logical operations (''computation''). Modern digital electronic computers can perform generic sets of operations known as ''programs'', wh ...
s, communication channels, and other devices and software, that process
8-bit
In computer architecture, 8-bit integers or other data units are those that are 8 bits wide (1 octet). Also, 8-bit central processing unit (CPU) and arithmetic logic unit (ALU) architectures are those that are based on registers or data bu ...
character encoding
Character encoding is the process of assigning numbers to graphical character (computing), characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using computers. The numerical v ...
s without treating any byte as an
in-band
In telecommunications, in-band signaling is the sending of control information within the same band or channel used for data such as voice or video. This is in contrast to out-of-band signaling which is sent over a different channel, or even o ...
control code.
History
Until the early 1990s, many programs and data transmission channels were character-oriented and treated some characters, e.g.,
ETX, as
control characters
In computing and telecommunications, a control character or non-printing character (NPC) is a code point in a character set that does not represent a written character or symbol. They are used as in-band signaling to cause effects other than ...
. Others assumed a stream of seven-bit characters, with values between 0 and 127; for example, the
ASCII
ASCII ( ), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable character, printable and 33 control character, control c ...
standard used only seven bits per character, avoiding an 8-bit representation
in order to save on data transmission costs. On computers and data links using
8-bit bytes, this left the top
bit
The bit is the most basic unit of information in computing and digital communication. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represented as ...
of each
byte
The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable un ...
free for use as a
parity,
flag bit, or metadata control bit. 7-bit systems and data links are unable to directly handle more complex character codes which are commonplace in non-
English-speaking countries with larger
alphabet
An alphabet is a standard set of letter (alphabet), letters written to represent particular sounds in a spoken language. Specifically, letters largely correspond to phonemes as the smallest sound segments that can distinguish one word from a ...
s.
Binary file
A binary file is a computer file that is not a text file. The term "binary file" is often used as a term meaning "non-text file". Many binary file formats contain parts that can be interpreted as text; for example, some computer document files ...
s of
octets
Octet may refer to:
Music
* Octet (music), ensemble consisting of eight instruments or voices, or composition written for such an ensemble
** String octet, a piece of music written for eight string instruments
*** Octet (Mendelssohn), 1825 compos ...
cannot be transmitted through 7-bit data channels directly. To work around this,
binary-to-text encoding
A binary-to-text encoding is code, encoding of data (computing), data in plain text. More precisely, it is an encoding of binary data in a sequence of character (computing), printable characters. These encodings are necessary for transmission of ...
s have been devised which use only 7-bit
ASCII
ASCII ( ), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable character, printable and 33 control character, control c ...
characters. Some of these encodings are
uuencoding,
Ascii85
Ascii85, also called Base85, is a form of binary-to-text encoding developed by Paul E. Rutter for the btoa utility. By using five ASCII characters to represent four bytes of binary data (making the encoded size larger than the original, assuming e ...
,
SREC,
BinHex,
kermit and
MIME
A mime artist, or simply mime (from Greek language, Greek , , "imitator, actor"), is a person who uses ''mime'' (also called ''pantomime'' outside of Britain), the acting out of a story through body motions without the use of speech, as a the ...
's
Base64
In computer programming, Base64 is a group of binary-to-text encoding schemes that transforms binary data into a sequence of printable characters, limited to a set of 64 unique characters. More specifically, the source binary data is taken 6 bits ...
.
EBCDIC
Extended Binary Coded Decimal Interchange Code (EBCDIC; ) is an eight- bit character encoding used mainly on IBM mainframe and IBM midrange computer operating systems. It descended from the code used with punched cards and the corresponding si ...
-based systems cannot handle all characters used in UUencoded data. However, the base64 encoding does not have this problem.
SMTP and NNTP 8-bit cleanness
Historically, various media were used to transfer messages, some of them only supporting 7-bit data, so an 8-bit message had high chances to be
garbled during transmission in the 20th century. But some implementations really did not care about formal discouraging of 8-bit data and allowed high bit set bytes to pass through. Such implementations are said to be 8-bit clean. In general, a communications protocol is said to be 8-bit clean if it correctly passes through the high bit of each byte in the communication process.
Many early
communications protocol
A communication protocol is a system of rules that allows two or more entities of a communications system to transmit information via any variation of a physical quantity. The protocol defines the rules, syntax, semantics (computer science), sem ...
standards, such as (for
SMTP
The Simple Mail Transfer Protocol (SMTP) is an Internet standard communication protocol for electronic mail transmission. Mail servers and other message transfer agents use SMTP to send and receive mail messages. User-level email clients typi ...
), (for
NNTP) and , were designed to work over such "7-bit" communication links. They specifically require the use of ASCII character set "transmitted as an 8-bit byte with the high-order bit cleared to zero" and some of these explicitly restrict ''all'' data to 7-bit characters.
For the first few decades of email networks (1971 to the early 1990s), most email messages were
plain text
In computing, plain text is a loose term for data (e.g. file contents) that represent only characters of readable material but not its graphical representation nor other objects ( floating-point numbers, images, etc.). It may also include a lim ...
in the 7-bit US-ASCII character set.
The definition of SMTP, like its predecessor , limits Internet Mail to lines (1000 characters or less) of 7-bit US-ASCII characters.
Later, the format of email messages was redefined in order to support messages that are not entirely US-ASCII text (text messages in character sets other than US-ASCII, and non-text messages, such as audio and images).
[
] The header field Content-Transfer-Encoding=binary requires an 8-bit clean transport.
specifies that "NNTP operates over any reliable bi-directional 8-bit-wide data stream channel", and changes the character set for commands to UTF-8. However, still limits the character set to ASCII, including and MIME encoding of non-ASCII data.
The Internet community generally adds features by ''extension'', allowing communication in both directions between upgraded machines and not-yet-upgraded machines, rather than declaring formerly standards-compliant legacy software to be "broken" and insisting that all software worldwide be upgraded to the latest standard. The recommended way to take advantage of 8-bit-clean links between machines is to use the ESMTP ()
8BITMIME extension for message bodies and the SMTP
SMTPUTF8 extension for message headers. Despite this, some
mail transfer agent
Within the Internet email system, a message transfer agent (MTA), mail transfer agent, or mail relay is software that transfers electronic mail messages from one computer to another using the Simple Mail Transfer Protocol. In some contexts, the a ...
s, notably
Exim and
qmail, relay mail to servers that do not advertise 8BITMIME without performing the conversion to 7-bit MIME (typically
quoted-printable
Quoted-Printable, or QP encoding, is a binary-to-text encoding system using printable ASCII characters (alphanumeric and the equals sign =) to transmit 8-bit data over a 7-bit data path or, generally, over a medium which is not 8-bit clean. Hi ...
, "Q-P conversion") required by . This "just-send-8" attitude does not, in fact, cause problems in practice because virtually all modern email servers are 8-bit clean.
See also
*
32-bit clean
*
*
Notes
References
{{Reflist
Character encoding