In
computer programming
Computer programming is the process of performing a particular computation (or more generally, accomplishing a specific computing result), usually by designing and building an executable computer program. Programming involves tasks such as ana ...
, Base64 is a group of
binary-to-text encoding
A binary-to-text encoding is encoding of data in plain text. More precisely, it is an encoding of binary data in a sequence of printable characters. These encodings are necessary for transmission of data when the channel does not allow binary dat ...
schemes that represent
binary data
Binary data is data whose unit can take on only two possible states. These are often labelled as 0 and 1 in accordance with the binary numeral system and Boolean algebra.
Binary data occurs in many different technical and scientific fields, wher ...
(more specifically, a sequence of 8-bit
byte
The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable unit ...
s) in sequences of 24
bit
The bit is the most basic unit of information in computing and digital communications. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represente ...
s that can be represented by four 6-bit Base64 digits.
Common to all binary-to-text encoding schemes, Base64 is designed to carry data stored in binary formats across channels that only reliably support text content. Base64 is particularly prevalent on the
World Wide Web
The World Wide Web (WWW), commonly known as the Web, is an information system enabling documents and other web resources to be accessed over the Internet.
Documents and downloadable media are made available to the network through web se ...
where one of its uses is the ability to embed
image files or other binary assets inside textual assets such as
HTML
The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScri ...
and
CSS
Cascading Style Sheets (CSS) is a style sheet language used for describing the presentation of a document written in a markup language such as HTML or XML (including XML dialects such as SVG, MathML or XHTML). CSS is a cornerstone techno ...
files.
Base64 is also widely used for sending
e-mail
Electronic mail (email or e-mail) is a method of exchanging messages ("mail") between people using electronic devices. Email was thus conceived as the electronic ( digital) version of, or counterpart to, mail, at a time when "mail" meant ...
attachments. This is required because
SMTP
The Simple Mail Transfer Protocol (SMTP) is an Internet standard communication protocol for electronic mail transmission. Mail servers and other message transfer agents use SMTP to send and receive mail messages. User-level email clients typical ...
– in its original form – was designed to transport
7-bit ASCII
ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of ...
characters only. This encoding causes an overhead of 33–37% (33% by the encoding itself; up to 4% more by the inserted line breaks).
Design
Each Base64 digit can take on 64 different values, encoding 6 bits of data. Which characters are chosen to represent the 64 values varies between implementations. The general strategy is to choose 64 characters that are common to most encodings and that are also
printable. This combination leaves the data unlikely to be modified in transit through information systems, such as email, that were traditionally not
8-bit clean
''8-bit clean'' is an attribute of computer systems, communication channels, and other devices and software, that handle 8-bit character encodings correctly. Such encoding include the ISO 8859 series and the UTF-8 encoding of Unicode.
History
...
.
For example,
MIME
Multipurpose Internet Mail Extensions (MIME) is an Internet standard that extends the format of email messages to support text in character sets other than ASCII, as well as attachments of audio, video, images, and application programs. Message ...
's Base64 implementation uses
A
–
Z
,
a
–
z
, and
0
–
9
for the first 62 values. Other variations share this property but differ in the symbols chosen for the last two values; an example is
UTF-7
UTF-7 (7-bit Unicode Transformation Format) is an obsolete variable-length character encoding for representing Unicode text using a stream of ASCII characters. It was originally intended to provide a means of encoding Unicode text for use in Inte ...
.
The earliest instances of this type of encoding were created for dial-up communication between systems running the same
OS, for example,
uuencode for
UNIX
Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, and ot ...
and
BinHex
BinHex, originally short for "binary-to-hexadecimal", is a binary-to-text encoding system that was used on the classic Mac OS for sending binary files through e-mail. Originally a hexadecimal encoding, subsequent versions of BinHex are more similar ...
for the
TRS-80
The TRS-80 Micro Computer System (TRS-80, later renamed the Model I to distinguish it from successors) is a desktop microcomputer launched in 1977 and sold by Tandy Corporation through their Radio Shack stores. The name is an abbreviation of '' ...
(later adapted for the
Macintosh
The Mac (known as Macintosh until 1999) is a family of personal computers designed and marketed by Apple Inc., Apple Inc. Macs are known for their ease of use and minimalist designs, and are popular among students, creative professionals, and ...
), and could therefore make more assumptions about what characters were safe to use. For instance, uuencode uses uppercase letters, digits, and many punctuation characters, but no lowercase.
Base64 table from RFC 4648
This is the Base64 alphabet defined i
RFC 4648 §4
. See also
Variants summary (below).
Examples
The example below uses
ASCII
ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of ...
text for simplicity, but this is not a typical use case, as it can already be safely transferred across all systems that can handle Base64. The more typical use is to encode
binary data
Binary data is data whose unit can take on only two possible states. These are often labelled as 0 and 1 in accordance with the binary numeral system and Boolean algebra.
Binary data occurs in many different technical and scientific fields, wher ...
(such as an image); the resulting Base64 data will only contain 64 different ASCII characters, all of which can reliably be transferred across systems that may corrupt the raw source bytes.
Here is a well-known
idiom
An idiom is a phrase or expression that typically presents a figurative, non-literal meaning attached to the phrase; but some phrases become figurative idioms while retaining the literal meaning of the phrase. Categorized as formulaic language, ...
from
distributed computing
A distributed system is a system whose components are located on different computer network, networked computers, which communicate and coordinate their actions by message passing, passing messages to one another from any system. Distributed com ...
:
When the quote (without trailing whitespace) is encoded into Base64, it is represented as a byte sequence of 8-bit-padded
ASCII
ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of ...
characters encoded in
MIME
Multipurpose Internet Mail Extensions (MIME) is an Internet standard that extends the format of email messages to support text in character sets other than ASCII, as well as attachments of audio, video, images, and application programs. Message ...
's Base64 scheme as follows (newlines and white spaces may be present anywhere but are to be ignored on decoding):
In the above quote, the encoded value of ''Man'' is ''TWFu''. Encoded in ASCII, the characters ''M'', ''a'', and ''n'' are stored as the byte values
77
,
97
, and
110
, which are the 8-bit binary values
01001101
,
01100001
, and
01101110
. These three values are joined together into a 24-bit string, producing
010011010110000101101110
. Groups of 6 bits (6 bits have a maximum of 2
6 = 64 different binary values) are
converted into individual numbers from start to end (in this case, there are four numbers in a 24-bit string), which are then converted into their corresponding Base64 character values.
As this example illustrates, Base64 encoding converts three
octets
Octet may refer to:
Music
* Octet (music), ensemble consisting of eight instruments or voices, or composition written for such an ensemble
** String octet, a piece of music written for eight string instruments
*** Octet (Mendelssohn), 1825 compos ...
into four encoded characters.
=
padding characters might be added to make the last encoded block contain four Base64 characters.
Hexadecimal
In mathematics and computing, the hexadecimal (also base-16 or simply hex) numeral system is a positional numeral system that represents numbers using a radix (base) of 16. Unlike the decimal system representing numbers using 10 symbols, hexa ...
to
octal
The octal numeral system, or oct for short, is the radix, base-8 number system, and uses the Numerical digit, digits 0 to 7. This is to say that 10octal represents eight and 100octal represents sixty-four. However, English, like most languages, ...
transformation is useful to convert between binary and Base64. Such conversion is available for both advanced calculators and programming languages. For example, the hexadecimal representation of the 24 bits above is 4D616E. The octal representation is 23260556. Those 8 octal digits can be split into pairs (), and each pair can be converted to decimal to yield . Using those four decimal numbers as indices for the Base64 alphabet, the corresponding ASCII characters are ''TWFu''.
If there are only two significant input octets (e.g., 'Ma'), or when the last input group contains only two octets, all 16 bits will be captured in the first three Base64 digits (18 bits); the two
least significant bit
In computing, bit numbering is the convention used to identify the bit positions in a binary number.
Bit significance and indexing
In computing, the least significant bit (LSB) is the bit position in a binary integer representing the binary 1 ...
s of the last content-bearing 6-bit block will turn out to be zero, and discarded on decoding (along with the succeeding
=
padding character):
If there is only one significant input octet (e.g., 'M'), or when the last input group contains only one octet, all 8 bits will be captured in the first two Base64 digits (12 bits); the four
least significant bit
In computing, bit numbering is the convention used to identify the bit positions in a binary number.
Bit significance and indexing
In computing, the least significant bit (LSB) is the bit position in a binary integer representing the binary 1 ...
s of the last content-bearing 6-bit block will turn out to be zero, and discarded on decoding (along with the succeeding two
=
padding characters):
Output padding
Because Base64 is a six-bit encoding, and because the decoded values are divided into 8-bit octets, every four characters of Base64-encoded text (4 sextets = = 24 bits) represents three octets of unencoded text or data (3 octets = = 24 bits). This means that when the length of the unencoded input is not a multiple of three, the encoded output must have padding added so that its length is a multiple of four. The padding character is
=
, which indicates that no further bits are needed to fully encode the input. (This is different from
A
, which means that the remaining bits are all zeros.) The example below illustrates how truncating the input of the above quote changes the output padding:
The padding character is not essential for decoding, since the number of missing bytes can be inferred from the length of the encoded text. In some implementations, the padding character is mandatory, while for others it is not used. An exception in which padding characters are required is when multiple Base64 encoded files have been concatenated.
Decoding Base64 with padding
When decoding Base64 text, four characters are typically converted back to three bytes. The only exceptions are when padding characters exist. A single
=
indicates that the four characters will decode to only two bytes, while
indicates that the four characters will decode to only a single byte. For example:
Another way to interpret the padding character is to consider it as an instruction to discard 2 trailing bits from the bit string each time a
=
is encountered. For example, when `` is decoded, we convert each character (except the trailing occurrences of
=
) into their corresponding 6-bit representation, and then discard 2 trailing bits for the first
=
and another 2 trailing bits for the other
=
. In this instance, we would get 6 bits from the
d
, and another 6 bits from the
w
for a bit string of length 12, but since we remove 2 bits for each
=
(for a total of 4 bits), the
dw
ends up producing 8 bits (1 byte) when decoded.
Decoding Base64 without padding
Without padding, after normal decoding of four characters to three bytes over and over again, fewer than four encoded characters may remain. In this situation, only two or three characters can remain. A single remaining encoded character is not possible, because a single Base64 character only contains 6 bits, and 8 bits are required to create a byte, so a minimum of two Base64 characters are required: The first character contributes 6 bits, and the second character contributes its first 2 bits. For example:
Implementations and history
Variants summary table
Implementations may have some constraints on the alphabet used for representing some bit patterns. This notably concerns the last two characters used in the alphabet at positions 62 and 63, and the character used for padding (which may be mandatory in some protocols or removed in others). The table below summarizes these known variants and provides links to the subsections below.
Privacy-enhanced mail
The first known standardized use of the encoding now called MIME Base64 was in the
Privacy-enhanced Electronic Mail
Privacy-Enhanced Mail (PEM) is a de facto file format for storing and sending cryptographic keys, certificates, and other data, based on a set of 1993 IETF standards defining "privacy-enhanced mail." While the original standards were never broadl ...
(PEM) protocol, proposed by in 1987. PEM defines a "printable encoding" scheme that uses Base64 encoding to transform an arbitrary sequence of
octets
Octet may refer to:
Music
* Octet (music), ensemble consisting of eight instruments or voices, or composition written for such an ensemble
** String octet, a piece of music written for eight string instruments
*** Octet (Mendelssohn), 1825 compos ...
to a format that can be expressed in short lines of 6-bit characters, as required by transfer protocols such as
SMTP
The Simple Mail Transfer Protocol (SMTP) is an Internet standard communication protocol for electronic mail transmission. Mail servers and other message transfer agents use SMTP to send and receive mail messages. User-level email clients typical ...
.
The current version of PEM (specified in ) uses a 64-character alphabet consisting of upper- and lower-case
Roman letters
The Latin script, also known as Roman script, is an alphabetic writing system based on the letters of the classical Latin alphabet, derived from a form of the Greek alphabet which was in use in the ancient Greece, Greek city of Cumae, in southe ...
(
A
–
Z
,
a
–
z
), the numerals (
0
–
9
), and the
+
and
/
symbols. The
=
symbol is also used as a padding suffix.
The original specification, , additionally used the
*
symbol to delimit encoded but unencrypted data within the output stream.
To convert data to PEM printable encoding, the first byte is placed in the
most significant eight bits of a 24-bit
buffer
Buffer may refer to:
Science
* Buffer gas, an inert or nonflammable gas
* Buffer solution, a solution used to prevent changes in pH
* Buffering agent, the weak acid or base in a buffer solution
* Lysis buffer, in cell biology
* Metal ion buffer
* ...
, the next in the middle eight, and the third in the
least significant eight bits. If there are fewer than three bytes left to encode (or in total), the remaining buffer bits will be zero. The buffer is then used, six bits at a time, most significant first, as indices into the string: "
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/
", and the indicated character is output.
The process is repeated on the remaining data until fewer than four octets remain. If three octets remain, they are processed normally. If fewer than three octets (24 bits) are remaining to encode, the input data is right-padded with zero bits to form an integral multiple of six bits.
After encoding the non-padded data, if two octets of the 24-bit buffer are padded-zeros, two
=
characters are appended to the output; if one octet of the 24-bit buffer is filled with padded-zeros, one
=
character is appended. This signals the decoder that the zero bits added due to padding should be excluded from the reconstructed data. This also guarantees that the encoded output length is a multiple of 4 bytes.
PEM requires that all encoded lines consist of exactly 64 printable characters, with the exception of the last line, which may contain fewer printable characters. Lines are delimited by whitespace characters according to local (platform-specific) conventions.
MIME
The
MIME
Multipurpose Internet Mail Extensions (MIME) is an Internet standard that extends the format of email messages to support text in character sets other than ASCII, as well as attachments of audio, video, images, and application programs. Message ...
(Multipurpose Internet Mail Extensions) specification lists Base64 as one of two
binary-to-text encoding
A binary-to-text encoding is encoding of data in plain text. More precisely, it is an encoding of binary data in a sequence of printable characters. These encodings are necessary for transmission of data when the channel does not allow binary dat ...
schemes (the other being
quoted-printable
Quoted-Printable, or QP encoding, is a binary-to-text encoding system using printable ASCII characters (alphanumeric and the equals sign =) to transmit 8-bit data over a 7-bit data path or, generally, over a medium which is not 8-bit clean. Hist ...
).
MIME's Base64 encoding is based on that of the version of PEM: it uses the same 64-character alphabet and encoding mechanism as PEM and uses the
=
symbol for output padding in the same way, as described at .
MIME does not specify a fixed length for Base64-encoded lines, but it does specify a maximum line length of 76 characters. Additionally, it specifies that any character outside the standard set of 64 encoding characters (For example CRLF sequences), must be ignored by a compliant decoder, although most implementations use a CR/LF
newline
Newline (frequently called line ending, end of line (EOL), next line (NEL) or line break) is a control character or sequence of control characters in character encoding specifications such as ASCII, EBCDIC, Unicode, etc. This character, or a ...
pair to delimit encoded lines.
Thus, the actual length of MIME-compliant Base64-encoded binary data is usually about 137% of the original data length (×), though for very short messages the overhead can be much higher due to the overhead of the headers. Very roughly, the final size of Base64-encoded binary data is equal to 1.37 times the original data size + 814 bytes (for headers). The size of the decoded data can be approximated with this formula:
bytes = (string_length(encoded_string) − 814) / 1.37
UTF-7
UTF-7
UTF-7 (7-bit Unicode Transformation Format) is an obsolete variable-length character encoding for representing Unicode text using a stream of ASCII characters. It was originally intended to provide a means of encoding Unicode text for use in Inte ...
, described first in , which was later superseded by , introduced a system called ''modified Base64''. This data encoding scheme is used to encode
UTF-16
UTF-16 (16-bit computing, 16-bit Unicode Transformation Format) is a character encoding capable of encoding all 1,112,064 valid code points of Unicode (in fact this number of code points is dictated by the design of UTF-16). The encoding is variab ...
as
ASCII
ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of ...
characters for use in 7-bit transports such as
SMTP
The Simple Mail Transfer Protocol (SMTP) is an Internet standard communication protocol for electronic mail transmission. Mail servers and other message transfer agents use SMTP to send and receive mail messages. User-level email clients typical ...
. It is a variant of the Base64 encoding used in MIME.
The "Modified Base64" alphabet consists of the MIME Base64 alphabet, but does not use the "
=
" padding character. UTF-7 is intended for use in mail headers (defined in ), and the "
=
" character is reserved in that context as the escape character for "quoted-printable" encoding. Modified Base64 simply omits the padding and ends immediately after the last Base64 digit containing useful bits leaving up to three unused bits in the last Base64 digit.
OpenPGP
OpenPGP
Pretty Good Privacy (PGP) is an encryption program that provides cryptographic privacy and authentication for data communication. PGP is used for signing, encrypting, and decrypting texts, e-mails, files, directories, and whole disk partitio ...
, described in , describes Radix-64 encoding, also known as "
ASCII armor
A binary-to-text encoding is encoding of data in plain text. More precisely, it is an encoding of binary data in a sequence of printable characters. These encodings are necessary for transmission of data when the channel does not allow binary da ...
". Radix-64 is identical to the "Base64" encoding described by MIME, with the addition of an optional 24-bit
CRC. The
checksum
A checksum is a small-sized block of data derived from another block of digital data for the purpose of detecting errors that may have been introduced during its transmission or storage. By themselves, checksums are often used to verify data ...
is calculated on the input data before encoding; the checksum is then encoded with the same Base64 algorithm and, prefixed by the "
=
" symbol as the separator, appended to the encoded output data.
RFC 3548
, entitled ''The Base16, Base32, and Base64 Data Encodings'', is an informational (non-normative) memo that attempts to unify the and specifications of Base64 encodings, alternative-alphabet encodings, and the Base32 (which is seldom used) and Base16 encodings.
Unless implementations are written to a specification that refers to and specifically requires otherwise, RFC 3548 forbids implementations from generating messages containing characters outside the encoding alphabet or without padding, and it also declares that decoder implementations must reject data that contain characters outside the encoding alphabet.
RFC 4648
This RFC obsoletes RFC 3548 and focuses on Base64/32/16:
: ''This document describes the commonly used Base64, Base32, and Base16 encoding schemes. It also discusses the use of line feeds in encoded data, the use of padding in encoded data, the use of non-alphabet characters in encoded data, the use of different encoding alphabets, and canonical encodings.''
URL applications
Base64 encoding can be helpful when fairly lengthy identifying information is used in an HTTP environment. For example, a database persistence framework for
Java
Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
objects might use Base64 encoding to encode a relatively large unique id (generally 128-bit
UUID
A universally unique identifier (UUID) is a 128-bit label used for information in computer systems. The term globally unique identifier (GUID) is also used.
When generated according to the standard methods, UUIDs are, for practical purposes, uni ...
s) into a string for use as an HTTP parameter in HTTP forms or HTTP GET
URLs
A Uniform Resource Locator (URL), colloquially termed as a web address, is a reference to a web resource that specifies its location on a computer network and a mechanism for retrieving it. A URL is a specific type of Uniform Resource Identifie ...
. Also, many applications need to encode binary data in a way that is convenient for inclusion in URLs, including in hidden web form fields, and Base64 is a convenient encoding to render them in a compact way.
Using standard Base64 in
URL
A Uniform Resource Locator (URL), colloquially termed as a web address, is a reference to a web resource that specifies its location on a computer network and a mechanism for retrieving it. A URL is a specific type of Uniform Resource Identifie ...
requires encoding of '
+
', '
/
' and '
=
' characters into special
percent-encoded
Percent-encoding, also known as URL encoding, is a method to encode arbitrary data in a Uniform Resource Identifier (URI) using only the limited US-ASCII characters legal within a URI. Although it is known as ''URL encoding'', it is also used m ...
hexadecimal sequences ('
+
' becomes '
%2B
', '
/
' becomes '
%2F
' and '
=
' becomes '
%3D
'), which makes the string unnecessarily longer.
For this reason, modified Base64 for URL variants exist (such as base64url in ), where the '
+
' and '
/
' characters of standard Base64 are respectively replaced by '
-
' and '
_
', so that using
URL encoders/decoders is no longer necessary and has no effect on the length of the encoded value, leaving the same encoded form intact for use in relational databases, web forms, and object identifiers in general. A popular site to make use of such is
YouTube
YouTube is a global online video platform, online video sharing and social media, social media platform headquartered in San Bruno, California. It was launched on February 14, 2005, by Steve Chen, Chad Hurley, and Jawed Karim. It is owned by ...
. Some variants allow or require omitting the padding '
=
' signs to avoid them being confused with field separators, or require that any such padding be percent-encoded. Some libraries will encode '
=
' to '
.
', potentially exposing applications to relative path attacks when a folder name is encoded from user data.
HTML
The
atob()
and
btoa()
JavaScript methods, defined in the HTML5 draft specification, provide Base64 encoding and decoding functionality to web pages. The
btoa()
method outputs padding characters, but these are optional in the input of the
atob()
method.
Other applications
Base64 can be used in a variety of contexts:
* Base64 can be used to transmit and store text that might otherwise cause
delimiter collision
A delimiter is a sequence of one or more characters for specifying the boundary between separate, independent regions in plain text, mathematical expressions or other data streams. An example of a delimiter is the comma character, which acts a ...
* Base64 is used to encode character strings in
LDAP Data Interchange Format
The LDAP Data Interchange Format (LDIF) is a standard plain text data interchange format for representing Lightweight Directory Access Protocol (LDAP) directory content and update requests. LDIF conveys directory content as a set of records, on ...
files
* Base64 is often used to embed binary data in an
XML
Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. T ...
file, using a syntax similar to
…
e.g.
favicon
A favicon (; short for favorite icon), also known as a shortcut icon, website icon, tab icon, URL icon, or bookmark icon, is a file containing one or more small icons, associated with a particular website or web page. A web designer can create s ...
s in
Firefox
Mozilla Firefox, or simply Firefox, is a free and open-source web browser developed by the Mozilla Foundation and its subsidiary, the Mozilla Corporation. It uses the Gecko rendering engine to display web pages, which implements current and ...
's exported
bookmarks.html
.
* Base64 is used to encode binary files such as images within scripts, to avoid depending on external files.
* The
data URI scheme can use Base64 to represent file contents. For instance, background images and fonts can be specified in a
CSS
Cascading Style Sheets (CSS) is a style sheet language used for describing the presentation of a document written in a markup language such as HTML or XML (including XML dialects such as SVG, MathML or XHTML). CSS is a cornerstone techno ...
stylesheet file as
data:
URIs, instead of being supplied in separate files.
* Although not part of the official specification for
SVG, some viewers can interpret Base64 when used for embedded elements, such as images inside SVG.
* Base64 can be used to store/transmit relatively small amounts of binary data via a computer's text
clipboard
A clipboard is a thin, rigid board with a clip at the top for holding paper in place. A clipboard is typically used to support paper with one hand while writing on it with the other, especially when other writing surfaces are not available. Th ...
functionality, especially in cases where the information doesn't warrant being permanently saved or when information must be quickly sent between a wide variety of different, potentially incompatible programs. An example is the representation of the public keys of
cryptocurrency
A cryptocurrency, crypto-currency, or crypto is a digital currency designed to work as a medium of exchange through a computer network that is not reliant on any central authority, such as a government or bank, to uphold or maintain it. It i ...
recipients as Base64 encoded text strings, which can be easily copied and pasted into users'
wallet software.
* Binary data that must be quickly verified by humans as a safety mechanism, such as
file checksums or
key fingerprints, is often represented in Base64 for easy checking, sometimes with additional formatting, such as separating each group of four characters in the representation of a
PGP
PGP or Pgp may refer to:
Science and technology
* P-glycoprotein, a type of protein
* Pelvic girdle pain, a pregnancy discomfort
* Personal Genome Project, to sequence genomes and medical records
* Pretty Good Privacy, a computer program for the ...
key fingerprint with a space.
*
QR codes
A QR code (an initialism for quick response code) is a type of matrix barcode (or two-dimensional barcode) invented in 1994 by the Japanese company Denso Wave. A barcode is a machine-readable optical label that can contain information about the ...
which contain binary data will sometimes store it encoded in Base64 rather than simply storing the raw binary data, as there is a stronger guarantee that all QR code readers will accurately decode text, as well as the fact that some devices will more readily save text from a QR code than potentially malicious binary data.
Applications not compatible with RFC 4648 Base64
Some applications use a Base64 alphabet that is significantly different from the alphabets used in the most common Base64 variants (see
Variants summary table above).
* The
Uuencoding uuencoding is a form of binary-to-text encoding that originated in the Unix programs uuencode and uudecode written by Mary Ann Horton at UC Berkeley in 1980, for encoding binary data for transmission in email systems.
The name "uuencoding" is deriv ...
alphabet includes no lowercase characters, instead using ASCII codes 32 ("
" (space)) through 95 ("
_
"), consecutively. Uuencoding uses the alphabet
" !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ _
". Avoiding all lower-case letters was helpful, because many older printers only printed uppercase. Using consecutive ASCII characters saved computing power, because it was only necessary to add 32, without requiring a lookup table. Its use of most punctuation characters and the space character may limit its usefulness in some applications, such as those that use these characters as syntax.
*
BinHex 4 (HQX), which was used within the
classic Mac OS
Mac OS (originally System Software; retronym: Classic Mac OS) is the series of operating systems developed for the Macintosh family of personal computers by Apple Computer from 1984 to 2001, starting with System 1 and ending with Mac OS 9. The ...
, excludes some visually confusable characters like '
7
', '
O
', '
g
' and '
o
'. Its alphabet includes additional punctuation characters. It uses the alphabet
"!"#$%&'()*+,-012345689@ABCDEFGHIJKLMNPQRSTUVXYZ .
*_A_
".
*_A_UTF-8">abcdefhijklmpqr".
*_A_UTF-8_environment_can_use_non-synchronized_continuation_bytes_as_base64:_
0b10xxxxxx
._See_UTF-8#Comparison_with_other_encodings.html" ;"title="UTF-8.html" ;"title="abcdefhijklmpqr".
* A
abcdefhijklmpqr".
*_A_UTF-8_environment_can_use_non-synchronized_continuation_bytes_as_base64:_
0b10xxxxxx
._See_UTF-8#Comparison_with_other_encodings">UTF-8#Self-synchronization.
*_Several_other_applications_use_alphabets_similar_to_the_common_variations,_but_in_a_different_order:
**_Unix_stores_password_hashes_computed_with_
abcdefhijklmpqr".
*_A_UTF-8_environment_can_use_non-synchronized_continuation_bytes_as_base64:_
0b10xxxxxx
._See_UTF-8#Comparison_with_other_encodings">UTF-8#Self-synchronization.
*_Several_other_applications_use_alphabets_similar_to_the_common_variations,_but_in_a_different_order:
**_Unix_stores_password_hashes_computed_with_crypt_(C)">crypt_in_the_passwd#Password_file.html" ;"title="crypt_(C).html" ;"title="UTF-8">abcdefhijklmpqr".
* A UTF-8 environment can use non-synchronized continuation bytes as base64:
0b10xxxxxx
. See UTF-8#Comparison_with_other_encodings">UTF-8#Self-synchronization.
* Several other applications use alphabets similar to the common variations, but in a different order:
** Unix stores password hashes computed with crypt (C)">crypt in the passwd#Password file">
/etc/passwd
file using an encoding called
B64. crypt's alphabet puts the punctuation
.
and
/
before the alphanumeric characters. crypt uses the alphabet "
./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
". Padding is not used.
** The GEDCOM 5.5 standard for genealogical data interchange encodes multimedia files in its text-line hierarchical file format. GEDCOM uses the same alphabet as crypt, which is
"./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
".
**
bcrypt
bcrypt is a password-hashing function designed by Niels Provos and David Mazières, based on the Blowfish cipher and presented at USENIX in 1999. Besides incorporating a salt to protect against rainbow table attacks, bcrypt is an adaptive fu ...
hashes are designed to be used in the same way as traditional crypt(3) hashes, but bcrypt's alphabet is in a different order than crypt's. bcrypt uses the alphabet
"./ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789
".
**
Xxencoding
xxencode is a binary-to-text encoding similar to uuencode which uses only the alphanumeric characters, and the plus and minus signs. It was invented as a means to transfer files in a format which would survive character set translation, particular ...
uses a mostly-alphanumeric character set similar to crypt, but using
+
and
-
rather than
.
and
/
. Xxencoding uses the alphabet
"+-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
".
** 6PACK, used with some
terminal node controller
A terminal node controller (TNC) is a device used by amateur radio operators to participate in AX.25 packet radio networks. It is similar in function to the Packet Assembler/Disassemblers used on X.25 networks, with the addition of a modem to ...
s, uses an alphabet from 0x00 to 0x3f.
**
Bash supports numeric literals in Base64. Bash uses the alphabet
"0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ@_
".
One issue with the RFC 4648 alphabet is that, when a sorted list of ASCII-encoded strings is Base64-transformed and sorted again, the order of elements changes. This is because the padding character and the characters in the substitution alphabet are not ordered by ASCII character value (which can be seen by using the following sample table's sort buttons). Alphabets like (unpadded) ''B64'' address this.
See also
*
8BITMIME
The Simple Mail Transfer Protocol (SMTP) is an Internet standard communication protocol for electronic mail transmission. Mail servers and other message transfer agents use SMTP to send and receive mail messages. User-level email clients typical ...
*
Ascii85
Ascii85, also called Base85, is a form of binary-to-text encoding developed by Paul E. Rutter for the btoa utility. By using five ASCII characters to represent four bytes of binary data (making the encoded size larger than the original, assuming e ...
(also called Base85)
*
Base16
In mathematics and computing, the hexadecimal (also base-16 or simply hex) numeral system is a Numeral system#Positional systems in detail, positional numeral system that represents numbers using a radix (base) of 16. Unlike the decimal system ...
*
Base32
Base32 is the base-32 numeral system. It uses a set of 32 digits, each of which can be represented by 5 bits (25). One way to represent Base32 numbers in a human-readable way is by using a standard 32-character set, such as the twenty-two upper- ...
*
Base36
Base36 is a binary-to-text encoding scheme that represents binary data in an ASCII string format by translating it into a radix-36 representation. The choice of 36 is convenient in that the digits can be represented using the Arabic numerals 0†...
*
Base62
The base62 encoding scheme uses 62 characters. The characters consist of the capital letters A-Z, the lower case letters a-z and the numbers 0–9. It is a binary-to-text encoding schemes that represent binary data in an ASCII string format.
1234 ...
*
Binary-to-text encoding
A binary-to-text encoding is encoding of data in plain text. More precisely, it is an encoding of binary data in a sequence of printable characters. These encodings are necessary for transmission of data when the channel does not allow binary dat ...
for a comparison of various encoding algorithms
*
Binary number
A binary number is a number expressed in the base-2 numeral system or binary numeral system, a method of mathematical expression which uses only two symbols: typically "0" (zero) and "1" ( one).
The base-2 numeral system is a positional notatio ...
*
URL
A Uniform Resource Locator (URL), colloquially termed as a web address, is a reference to a web resource that specifies its location on a computer network and a mechanism for retrieving it. A URL is a specific type of Uniform Resource Identifie ...
References
{{Data Exchange
Usenet
Email
Internet Standards
Binary-to-text encoding formats
Data serialization formats
Power-of-two numeral systems