HOME

TheInfoList



OR:

Base32 is the base-32
numeral system A numeral system (or system of numeration) is a writing system for expressing numbers; that is, a mathematical notation for representing numbers of a given set, using Numerical digit, digits or other symbols in a consistent manner. The same s ...
. It uses a set of 32 digits, each of which can be represented by 5
bit The bit is the most basic unit of information in computing and digital communications. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represente ...
s (25). One way to represent Base32 numbers in a
human-readable A human-readable medium or human-readable format is any encoding of data or information that can be naturally read by humans. In computing, ''human-readable'' data is often encoded as ASCII or Unicode text, rather than as binary data. In most c ...
way is by using a standard 32-character set, such as the twenty-two upper-case letters A–V and the digits 0-9. However, many other variations are used in different contexts. The rest of this article discusses the use of Base32 for representing byte strings, not unsigned integer numbers, similar to the way
Base64 In computer programming, Base64 is a group of binary-to-text encoding schemes that represent binary data (more specifically, a sequence of 8-bit bytes) in sequences of 24 bits that can be represented by four 6-bit Base64 digits. Common to all bina ...
works. This is an example of a Base32 representation using the previously described 32-character set (
IPFS The InterPlanetary File System (IPFS) is a protocol, hypermedia and file sharing peer-to-peer network for storing and sharing data in a distributed file system. IPFS uses content-addressing to uniquely identify each file in a global namespac ...
CIDv1 in Base32 upper-case encoding):


Advantages

Base32 has a number of advantages over
Base64 In computer programming, Base64 is a group of binary-to-text encoding schemes that represent binary data (more specifically, a sequence of 8-bit bytes) in sequences of 24 bits that can be represented by four 6-bit Base64 digits. Common to all bina ...
: # The resulting
character set Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers. The numerical values that ...
is all one case, which can often be beneficial when using a
case-insensitive In computers, case sensitivity defines whether uppercase and lowercase letters are treated as distinct (case-sensitive) or equivalent (case-insensitive). For instance, when users interested in learning about dogs search an e-book, "dog" and "Dog" a ...
filesystem In computing, file system or filesystem (often abbreviated to fs) is a method and data structure that the operating system uses to control how data is stored and retrieved. Without a file system, data placed in a storage medium would be one larg ...
,
DNS The Domain Name System (DNS) is a hierarchical and distributed naming system for computers, services, and other resources in the Internet or other Internet Protocol (IP) networks. It associates various information with domain names assigned to ...
names, spoken language, or human memory. # The result can be used as a file name because it cannot possibly contain the '/' symbol, which is the
Unix Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, and ot ...
path separator. # The alphabet can be selected to avoid similar-looking pairs of different symbols, so the strings can be accurately transcribed by hand. (For example, the symbol set omits the digits for one, eight and zero, since they could be confused with the letters 'I', 'B', and 'O'.) # A result excluding padding can be included in a URL without
encoding In communications and information processing, code is a system of rules to convert information—such as a letter, word, sound, image, or gesture—into another form, sometimes shortened or secret, for communication through a communication ...
any characters. # 5-bit allows storing 2 more characters per 32-bit integer (for a total of 6 instead of 4 with 2 bits to spare), saving bandwidth in constrained domains such as radiomeshes. Base32 also has advantages over
hexadecimal In mathematics and computing, the hexadecimal (also base-16 or simply hex) numeral system is a positional numeral system that represents numbers using a radix (base) of 16. Unlike the decimal system representing numbers using 10 symbols, hexa ...
/
Base16 In mathematics and computing, the hexadecimal (also base-16 or simply hex) numeral system is a Numeral system#Positional systems in detail, positional numeral system that represents numbers using a radix (base) of 16. Unlike the decimal system ...
: # Base32 representation takes roughly 20%–21% less space. (1000 bits takes 200 characters, compared with 250 for Base16).


Disadvantages

Base32 representation takes roughly 20% more space than
Base64 In computer programming, Base64 is a group of binary-to-text encoding schemes that represent binary data (more specifically, a sequence of 8-bit bytes) in sequences of 24 bits that can be represented by four 6-bit Base64 digits. Common to all bina ...
. Also, because it encodes 5 bytes to 8 characters (rather than 3 bytes to 4 characters), padding to an 8-character boundary is a greater burden on short messages (which may be a reason to elide padding, which is an option in ).


RFC 4648 Base32 alphabet

The most widely used Base32 alphabet is defined in . It uses an
alphabet An alphabet is a standardized set of basic written graphemes (called letters) that represent the phonemes of certain spoken languages. Not all writing systems represent language in this way; in a syllabary, each character represents a syll ...
of AZ, followed by 27. The digits 0, 1 and 8 are skipped due to their similarity with the letters O, I and B (thus "2" has a decimal value of 26). In some circumstances padding is not required or used (the padding can be inferred from the length of the string modulo 8). RFC 4648 states that padding must be used unless the specification of the standard referring to the RFC explicitly states otherwise. Excluding padding is useful when using base32 encoded data in URL tokens or file names where the padding character could pose a problem.


Alternative versions

Changing the Base32 alphabet, all alternative standards have similar combinations of alphanumeric symbols.


z-base-32

z-base-32 is a Base32 encoding designed by
Zooko Wilcox-O'Hearn Zooko Wilcox-O'Hearn (born Bryce Wilcox; 13 May 1974 in Phoenix, Arizona), is an American Colorado-based computer security specialist, self-proclaimed cypherpunk, and CEO of the Electric Coin Company (ECC), a for-profit company leading the develo ...
to be easier for human use and more compact. It includes 1, 8 and 9 but excludes l, v and 2. It also permutes the alphabet so that the easier characters are the ones that occur more frequently. It compactly encodes bitstrings whose length in bits is not a multiple of 8 and omits trailing padding characters. z-base-32 was used in the
Mnet M-Net (an abbreviation of Electronic Media Network) is a South African pay television channel established by Naspers in 1986. The channel broadcasts both local and international programming, including general entertainment, children's series, ...
open source project, and is currently used in
Phil Zimmermann Philip R. Zimmermann (born 1954) is an American computer scientist and Cryptography, cryptographer. He is the creator of Pretty Good Privacy (PGP), the most widely used email encryption software in the world. He is also known for his work in VoI ...
's
ZRTP ZRTP (composed of Z and Real-time Transport Protocol) is a cryptographic key-agreement protocol to negotiate the keys for encryption between two end points in a Voice over IP (VoIP) phone telephony call based on the Real-time Transport Protocol. ...
protocol, and in the
Tahoe-LAFS Tahoe-LAFS (Tahoe Least-Authority File Store) is a free and open, secure, decentralized, fault-tolerant, distributed data store and distributed file system. It can be used as an online backup system, or to serve as a file or Web host similar to ...
open source project.


Crockford's Base32

Another alternative design for Base32 is created by
Douglas Crockford Douglas Crockford is an American computer programmer who is involved in the development of the JavaScript language. He specified the data format JSON (JavaScript Object Notation), and has developed various JavaScript related tools such as the ...
, who proposes using additional characters for a mod-37 checksum. It excludes the letters I, L, and O to avoid confusion with digits. It also excludes the letter U to reduce the likelihood of accidental obscenity. Libraries to encode binary data in Crockford's Base32 are available in a variety of languages.


Electrologica

An earlier form of base 32 notation was used by programmers working on the
Electrologica X1 The Electrologica X1 was a digital computer designed and manufactured in the Netherlands from 1958 to 1965. About thirty were produced and sold in the Netherlands and abroad. The X1 was designed by the Mathematical Centre in Amsterdam, an acade ...
to represent machine addresses. The "digits" were represented as decimal numbers from 0 to 31. For example, 12-16 would represent the machine address ''400'' (= 12*32 + 16).


base32hex

Triacontakaidecimal is another alternative design for Base 32, which extends
hexadecimal In mathematics and computing, the hexadecimal (also base-16 or simply hex) numeral system is a positional numeral system that represents numbers using a radix (base) of 16. Unlike the decimal system representing numbers using 10 symbols, hexa ...
in a more natural way and was first proposed by Christian Lanctot, a programmer working at
Sage software The Sage Group plc, commonly known as Sage, is a British multinational enterprise software company based in North Tyneside, England. As of 2017, it is the UK's second largest technology company, the world's third-largest supplier of enterpri ...
, in a letter to ''Dr. Dobb's'' magazine in March 1999 as a proposed solution for solving the Y2K bug and referred to as "Double Hex". This version was described in under the name "Base-32". RFC 4648, while acknowledging existing use of this version in NSEC3, refers to it as base32hex and discourages labelling it as "base32". Similarly to hexadecimal, the digits used are 0-9 followed by consecutive letters of the alphabet. This matches the digits used by the
JavaScript JavaScript (), often abbreviated as JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS. As of 2022, 98% of Website, websites use JavaScript on the Client (computing), client side ...
parseInt() function and the
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
int() constructor when a base larger than 10 (such as 16 or 32) is specified. It also retains hexadecimal's property of preserving bitwise sort order of the represented data, unlike RFC 4648's base-32 or base-64. Unlike many other base 32 notation systems, triacontakaidecimal is contiguous and includes characters that may visually conflict. With the right
font In metal typesetting, a font is a particular size, weight and style of a typeface. Each font is a matched set of type, with a piece (a "sort") for each glyph. A typeface consists of a range of such fonts that shared an overall design. In mod ...
it is possible to visually distinguish between 0, O and 1, I. Other fonts are unsuitable because the context that English usually provides is not provided by a notation system that is expressing numbers. However, the choice of font is not controlled by notation or encoding which is why it's risky to assume a distinguishable font will be used.


Geohash

See Geohash algorithm, used to represent latitude and longitude values in one (bit-interlaced) positive integer. The base32 representation of Geohash uses all decimal digits (0–9) and almost all of the lower case alphabet, except letters "a", "i", "l", "o", as shown by the following character map:


Video games

Before
NVRAM Non-volatile random-access memory (NVRAM) is random-access memory that retains data without applied power. This is in contrast to dynamic random-access memory (DRAM) and static random-access memory (SRAM), which both maintain data only for as lon ...
became universal, several video games for
Nintendo is a Japanese Multinational corporation, multinational video game company headquartered in Kyoto, Japan. It develops video games and video game consoles. Nintendo was founded in 1889 as by craftsman Fusajiro Yamauchi and originally produce ...
platforms used base 31 numbers for
passwords A password, sometimes called a passcode (for example in Apple devices), is secret data, typically a string of characters, usually used to confirm a user's identity. Traditionally, passwords were expected to be memorized, but the large number of ...
. These systems omit vowels (except Y) to prevent the game from accidentally giving a profane password. Thus, the characters are generally some minor variation of the following set: 0–9, B, C, D, F, G, H, J, K, L, M, N, P, Q, R, S, T, V, W, X, Y, Z, and some punctuation marks. Games known to use such a system include ''
Mario Is Missing! ''Mario Is Missing!'' is a 1993 educational game developed and published by The Software Toolworks for MS-DOS, Nintendo Entertainment System, and Super Nintendo Entertainment System, later released on Macintosh in 1994. The player controls Luig ...
'', ''
Mario's Time Machine ''Mario's Time Machine'' is an educational video game originally released for MS-DOS and then for the Nintendo Entertainment System and Super NES consoles. The Software Toolworks both developed and published the MS-DOS and Super NES versions in 1 ...
'', ''
Tetris Blast This is a list of variants of the game ''Tetris''. It includes officially licensed ''Tetris'' sequels, as well as unofficial clones. Official games {, class="sortable wikitable" , - ! Title ! Year ! Platform ! Publisher ! class = "unsortab ...
'', and ''The Lord of the Rings'' (Super NES).


Word-safe alphabet

The word-safe Base32 alphabet is an extension of the
Open Location Code The Open Location Code (OLC) is a geocode based in a system of regular grids for identifying an area anywhere on the Earth. It was developed at Google's Zürich engineering office, and released late October 2014. Location codes created by the OL ...
Base20 alphabet. That alphabet uses 8 numeric digits and 12 case-sensitive letter digits chosen to avoid accidentally forming words. Treating the alphabet as case-sensitive produces a 32 (8+12+12) digit set.


Software

Base32 is a notation for encoding arbitrary byte data using a restricted set of symbols that can be conveniently used by humans and processed by computers. Base32 consists of a symbol set made up of 32 different characters, as well as an algorithm for encoding arbitrary sequences of 8-bit bytes into the Base32 alphabet. Because more than one 5-bit Base32 symbol is needed to represent each 8-bit input byte, it also specifies requirements on the allowed lengths of Base32 strings (which must be multiples of 40 bits). The closely related Base64 system, in contrast, uses a set of 64 symbols. Base32 implementations in C/C++, Perl, Java, JavaScript Python, Go and Ruby are available. String To Hex Converter
/ref>


See also


References

* {{Data Exchange Binary-to-text encoding formats Power-of-two numeral systems