Ascii85
   HOME

TheInfoList



OR:

Ascii85, also called Base85, is a form of
binary-to-text encoding A binary-to-text encoding is encoding of data in plain text. More precisely, it is an encoding of binary data in a sequence of printable characters. These encodings are necessary for transmission of data when the channel does not allow binary ...
developed by Paul E. Rutter for the
btoa Ascii85, also called Base85, is a form of binary-to-text encoding developed by Paul E. Rutter for the btoa utility. By using five ASCII characters to represent four bytes of binary data (making the encoded size larger than the original, assuming ...
utility. By using five
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
characters to represent four bytes of
binary data Binary data is data whose unit can take on only two possible states. These are often labelled as 0 and 1 in accordance with the binary numeral system and Boolean algebra. Binary data occurs in many different technical and scientific fields, wher ...
(making the encoded size larger than the original, assuming eight bits per ASCII character), it is more efficient than uuencode or
Base64 In computer programming, Base64 is a group of binary-to-text encoding schemes that represent binary data (more specifically, a sequence of 8-bit bytes) in sequences of 24 bits that can be represented by four 6-bit Base64 digits. Common to all bina ...
, which use four characters to represent three bytes of data ( increase, assuming eight bits per ASCII character). Its main modern uses are in Adobe's PostScript and
Portable Document Format Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating syste ...
file formats, as well as in the
patch Patch or Patches may refer to: Arts, entertainment and media * Patch Johnson, a fictional character from ''Days of Our Lives'' * Patch (''My Little Pony''), a toy * "Patches" (Dickey Lee song), 1962 * "Patches" (Chairmen of the Board song) ...
encoding for
binary file A binary file is a computer file that is not a text file. The term "binary file" is often used as a term meaning "non-text file". Many binary file formats contain parts that can be interpreted as text; for example, some computer document fil ...
s used by
Git Git () is a distributed version control system: tracking changes in any set of files, usually used for coordinating work among programmers collaboratively developing source code during software development. Its goals include speed, data in ...
.


Overview

The basic need for a binary-to-text encoding comes from a need to communicate arbitrary
binary data Binary data is data whose unit can take on only two possible states. These are often labelled as 0 and 1 in accordance with the binary numeral system and Boolean algebra. Binary data occurs in many different technical and scientific fields, wher ...
over preexisting
communications protocol A communication protocol is a system of rules that allows two or more entities of a communications system to transmit information via any kind of variation of a physical quantity. The protocol defines the rules, syntax, semantics and synchr ...
s that were designed to carry only English language
human-readable A human-readable medium or human-readable format is any encoding of data or information that can be naturally read by humans. In computing, ''human-readable'' data is often encoded as ASCII or Unicode text, rather than as binary data. In most c ...
text. Those communication protocols may only be 7-bit safe (and within that avoid certain ASCII control codes), and may require line breaks at certain maximum intervals, and may not maintain whitespace. Thus, only the 94 printable ASCII characters are "safe" to use to convey data. Eighty-five is the minimum integral value of ''n'' such that ; so ''any'' sequence of 4 bytes can be encoded as 5 symbols, as long as at least 85 distinct symbols are available. (Five radix-85 digits can represent the integers from 0 to 4 437 053 124 inclusive, which suffice to represent all 4 294 967 296 possible 4-byte sequences.)


Encoding

When encoding, each group of 4 bytes is taken as a 32-bit binary number, most significant byte first (Ascii85 uses a
big-endian In computing, endianness, also known as byte sex, is the order or sequence of bytes of a word of digital data in computer memory. Endianness is primarily expressed as big-endian (BE) or little-endian (LE). A big-endian system stores the most sig ...
convention). This is converted, by repeatedly dividing by 85 and taking the remainder, into 5 radix-85 digits. Then each digit (again, most significant first) is encoded as an ASCII printable character by adding 33 to it, giving the ASCII characters 33 (!) through 117 (u). Because all-zero data is quite common, an exception is made for the sake of
data compression In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compressio ...
, and an all-zero group is encoded as a single character z instead of !!!!!. Groups of characters that decode to a value greater than (encoded as s8W-!) will cause a decoding error, as will z characters in the middle of a group. White space between the characters is ignored and may occur anywhere to accommodate line-length limitations.


Limitations

The original specification only allows a stream that is a multiple of 4 bytes to be encoded. Encoded data may contain
character Character or Characters may refer to: Arts, entertainment, and media Literature * ''Character'' (novel), a 1936 Dutch novel by Ferdinand Bordewijk * ''Characters'' (Theophrastus), a classical Greek set of character sketches attributed to The ...
s that have special meaning in many programming languages and in some text-based protocols, such as left-angle-bracket <, backslash \, and the single and double quotes ' & ". Other base-85 encodings like Z85 and are designed to be safe in source code.


History


btoa version

The original btoa program always encoded full groups (padding the source as necessary), with a prefix line of "xbtoa Begin", and suffix line of "xbtoa End", followed by the original file length (in decimal and hexadecimal) and three 32-bit
checksum A checksum is a small-sized block of data derived from another block of digital data for the purpose of detecting errors that may have been introduced during its transmission or storage. By themselves, checksums are often used to verify data ...
s. The decoder needs to use the file length to see how much of the group was padding. The initial proposal for btoa encoding used an encoding alphabet starting at the ASCII space character through "t" inclusive, but this was replaced with an encoding alphabet of "!" to "u" to avoid "problems with some mailers (stripping off trailing blanks)". This program also introduced the special "z" short form for an all-zero group. Version 4.2 added a "y" exception for a group of all ASCII
space Space is the boundless three-dimensional extent in which objects and events have relative position and direction. In classical physics, physical space is often conceived in three linear dimensions, although modern physicists usually cons ...
characters (0x20202020).


ZMODEM version

"ZMODEM Pack-7 encoding" encodes groups of 4 octets into groups of 5 printable ASCII characters in a similar, or possibly in the same way as Ascii85 does. When a
ZMODEM ZMODEM is an inline file transfer protocol developed by Chuck Forsberg in 1986, in a project funded by Telenet in order to improve file transfers on their X.25 network. In addition to dramatically improved performance compared to older protocol ...
program sends pre-compressed 8-bit data files over 7-bit data channels, it uses "ZMODEM Pack-7 encoding".Chuck Forsberg. . "ZMODEM Pack-7 packs 4 bytes into 5 printing characters."


Adobe version

Adobe adopted the basic btoa encoding, but with slight changes, and gave it the name Ascii85. The characters used are the ASCII characters 33 (!) through 117 (u) inclusive (to represent the base-85 digits 0 through 84), together with the letter z (as a special case to represent a 32-bit 0 value), and white space is ignored. Adobe uses the delimiter "~>" to mark the end of an Ascii85-encoded string and represents the length by truncating the final group: If the last block of source bytes contains fewer than 4 bytes, the block is padded with up to 3 null bytes before encoding. After encoding, as many bytes as were added as padding are removed from the end of the output. The reverse is applied when decoding: The last block is padded to 5 bytes with the Ascii85 character u, and as many bytes as were added as padding are omitted from the end of the output (see example). NOTE: The padding is not arbitrary. Converting from binary to base 64 only regroups bits and does not change them or their order (a high bit in binary does not affect the low bits in the base64 representation). In converting a binary number to base85 (85 is ''not'' a power of two) high bits do affect the low order base85 digits and conversely. Padding the binary low (with zero bits) while encoding and padding the base85 value high (with us) in decoding assures that the high order bits are preserved (the zero padding in the binary gives enough room so that a small addition is trapped and there is no "carry" to the high bits). In Ascii85-encoded blocks, whitespace and line-break characters may be present anywhere, including in the middle of a 5-character block, but they must be silently ignored. Adobe's specification does not support the y exception.


Example for Ascii85

A quote from Thomas Hobbes's ''
Leviathan Leviathan (; he, לִוְיָתָן, ) is a sea serpent noted in theology and mythology. It is referenced in several books of the Hebrew Bible, including Psalms, the Book of Job, the Book of Isaiah, the Book of Amos, and, according to some ...
'': : ''Man is distinguished, not only by his reason, but by this singular passion from other animals, which is a lust of the mind, that by a perseverance of delight in the continued and indefatigable generation of knowledge, exceeds the short vehemence of any carnal pleasure.'' If this is initially encoded using US-ASCII, it can be reencoded in Ascii85 as follows:
9jqo^BlbD-BleB1DJ+*+F(f,q/0JhKFCj@.4Gp$d7F!,L7@<6@)/0JDEF@3BB/F*&OCAfu2/AKYi(
DIb:@FD,*)+C]U=@3BN#EcYf8ATD3s@q?d$AftVqCh[NqF-FD5W8ARlolDIal(
DIdu
D.RTpAKYo'+CT/5+Cei#DII?(E,9)oF*2M7/c
Since the last 4-tuple is incomplete, it must be padded with three zero bytes: Since three bytes of padding had to be added, the three final characters 'YkO' are omitted from the output. Decoding is done inversely, except that the last 5-tuple is padded with 'u' characters: Since the input had to be padded with three 'u' bytes, the last three bytes of the output are ignored and we end up with the original period. The input sentence does not contain 4 consecutive zero bytes, so the example does not show the use of the 'z' abbreviation.


Compatibility

The Ascii85 encoding is compatible with 7-bit and 8-bit MIME, while having less overhead than
Base64 In computer programming, Base64 is a group of binary-to-text encoding schemes that represent binary data (more specifically, a sequence of 8-bit bytes) in sequences of 24 bits that can be represented by four 6-bit Base64 digits. Common to all bina ...
. One potential compatibility issue of Ascii85 is that some of the characters it uses are significant in markup languages such as
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable ...
or SGML. To include ascii85 data in these documents, it may be necessary to escape the
quote Quote is a hypernym of quotation, as the repetition or copy of a prior statement or thought. Quotation marks are punctuation marks that indicate a quotation. Both ''quotation'' and ''quotation marks'' are sometimes abbreviated as "quote(s)". C ...
,
angle brackets A bracket is either of two tall fore- or back-facing punctuation marks commonly used to isolate a segment of text or data from its surroundings. Typically deployed in symmetric pairs, an individual bracket may be identified as a 'left' or 'r ...
, and ampersands.


RFC 1924 version

Published on April 1, 1996, informational : "A Compact Representation of IPv6 Addresses" by Robert Elz suggests a base-85 encoding of
IPv6 Internet Protocol version 6 (IPv6) is the most recent version of the Internet Protocol (IP), the communications protocol that provides an identification and location system for computers on networks and routes traffic across the Internet. IPv ...
addresses. This differs from the scheme used above in that he proposes a different set of 85 ASCII characters, and proposes to do all arithmetic on the 128-bit number, converting it to a single 20-digit base-85 number (internal whitespace not allowed), rather than breaking it into four 32-bit groups. The proposed character set is, in order, 0–9, A–Z, a–z, and then the 23 characters !#$%&()*+-;<=>?@^_`~. The highest possible representable address, 2128−1 = 74×8519 + 53×8518 + 5×8517 + ..., would be encoded as =r54lj&NUUO~Hi%c2ym0. This character set excludes the characters "',./: nbsp;, making it suitable for use in JSON strings (where " and \ would require escaping). However, for SGML-based protocols, notably including
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable ...
, string escapes may still be required (to accommodate <, > and &).


See also

*
Base32 Base32 is the base-32 numeral system. It uses a set of 32 digits, each of which can be represented by 5 bits (25). One way to represent Base32 numbers in a human-readable way is by using a standard 32-character set, such as the twenty-two upper- ...
*
Base36 Base36 is a binary-to-text encoding scheme that represents binary data in an ASCII string format by translating it into a radix-36 representation. The choice of 36 is convenient in that the digits can be represented using the Arabic numerals 0†...
*
Base64 In computer programming, Base64 is a group of binary-to-text encoding schemes that represent binary data (more specifically, a sequence of 8-bit bytes) in sequences of 24 bits that can be represented by four 6-bit Base64 digits. Common to all bina ...
*
Binary-to-text encoding A binary-to-text encoding is encoding of data in plain text. More precisely, it is an encoding of binary data in a sequence of printable characters. These encodings are necessary for transmission of data when the channel does not allow binary ...
for a comparison of various encoding algorithms * PostScript Standard Encoding


References


External links


BasE91PostScript Language Reference
(Adobe) - see ASCII85Encode Filter {{Data Exchange Binary-to-text encoding formats