In
computing and
telecommunications, a unit of information is the capacity of some standard
data storage system or
communication channel, used to measure the capacities of other systems and channels. In
information theory
Information theory is the scientific study of the quantification (science), quantification, computer data storage, storage, and telecommunication, communication of information. The field was originally established by the works of Harry Nyquist a ...
, units of information are also used to measure
information contained in messages and the
entropy of random variables.
The most commonly used units of data storage capacity are the
bit
The bit is the most basic unit of information in computing and digital communications. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represented a ...
, the capacity of a system that has only two states, and the
byte (or
octet), which is equivalent to eight bits. Multiples of these units can be formed from these with the
SI prefixes
A metric prefix is a unit prefix that precedes a basic unit of measure to indicate a multiple or submultiple of the unit. All metric prefixes used today are decadic. Each prefix has a unique symbol that is prepended to any unit symbol. The pre ...
(power-of-ten prefixes) or the newer IEC
binary prefixes (power-of-two prefixes).
Primary units
In 1928,
Ralph Hartley observed a fundamental storage principle,
which was further formalized by
Claude Shannon in 1945: the information that can be stored in a system is proportional to the
logarithm of ''N'' possible states of that system, denoted . Changing the base of the logarithm from ''b'' to a different number ''c'' has the effect of multiplying the value of the logarithm by a fixed constant, namely .
Therefore, the choice of the base ''b'' determines the unit used to measure information. In particular, if ''b'' is a
positive integer, then the unit is the amount of information that can be stored in a system with ''b'' possible states.
When ''b'' is 2, the unit is the
shannon, equal to the information content of one "bit" (a portmanteau of binary digit
). A system with 8 possible states, for example, can store up to bits of information. Other units that have been named include:
; Base ''b'' = 3 : the unit is called "
trit", and is equal to (≈ 1.585) bits.
; Base ''b'' = 10 : the unit is called ''
decimal
The decimal numeral system (also called the base-ten positional numeral system and denary or decanary) is the standard system for denoting integer and non-integer numbers. It is the extension to non-integer numbers of the Hindu–Arabic numeral ...
digit
Digit may refer to:
Mathematics and science
* Numerical digit, as used in mathematics or computer science
** Hindu-Arabic numerals, the most common modern representation of numerical digits
* Digit (anatomy), the most distal part of a limb, such ...
'', ''
hartley
Hartley may refer to:
Places Australia
*Hartley, New South Wales
*Hartley, South Australia
**Electoral district of Hartley, a state electoral district
Canada
*Hartley Bay, British Columbia
United Kingdom
*Hartley, Cumbria
*Hartley, Plymou ...
'', ''ban'', ''decit'', or ''dit'', and is equal to log
2 10 (≈ 3.322) bits.
; Base ''b'' = ''e'', the
base of natural logarithms : the unit is called a ''
nat'', ''nit'', or ''nepit'' (from
Neperian), and is worth (≈ 1.443) bits.
The trit, ban, and nat are rarely used to measure storage capacity; but the nat, in particular, is often used in information theory, because natural logarithms are mathematically more convenient than logarithms in other bases.
Units derived from bit
Several conventional names are used for collections or groups of bits.
Byte
Historically, a
byte was the number of bits used to encode a
character
Character or Characters may refer to:
Arts, entertainment, and media Literature
* ''Character'' (novel), a 1936 Dutch novel by Ferdinand Bordewijk
* ''Characters'' (Theophrastus), a classical Greek set of character sketches attributed to The ...
of text in the computer, which depended on computer hardware architecture; but today it almost always means eight bits – that is, an
octet. A byte can represent 256 (2
8) distinct values, such as non-negative integers from 0 to 255, or
signed integers from −128 to 127. The
IEEE 1541-2002
IEEE 1541-2002 is a standard issued in 2002 by the Institute of Electrical and Electronics Engineers (IEEE) concerning the use of prefixes for binary multiples of units of measurement related to digital electronics and computing.
While the Interna ...
standard specifies "B" (upper case) as the symbol for byte (
IEC 80000-13
ISO 80000 or IEC 80000 is an international standard introducing the International System of Quantities (ISQ).
It was developed and promulgated jointly by the International Organization for Standardization (ISO) and the International Electrote ...
uses "o" for octet in French,
but also allows "B" in English, which is what is actually being used). Bytes, or multiples thereof, are almost always used to specify the sizes of computer files and the capacity of storage units. Most modern computers and peripheral devices are designed to manipulate data in whole bytes or groups of bytes, rather than individual bits.
Nibble
A group of four bits, or half a byte, is sometimes called a
nibble
In computing, a nibble (occasionally nybble, nyble, or nybl to match the spelling of byte) is a four-bit aggregation, or half an octet. It is also known as half-byte or tetrade. In a networking or telecommunication context, the nibble is oft ...
, nybble or nyble. This unit is most often used in the context of
hexadecimal
In mathematics and computing, the hexadecimal (also base-16 or simply hex) numeral system is a positional numeral system that represents numbers using a radix (base) of 16. Unlike the decimal system representing numbers using 10 symbols, hexa ...
number representations, since a nibble has the same amount of information as one hexadecimal digit.
Crumb
A group of two bits or a quarter byte was called a crumb,
often used in early 8-bit computing (see
Atari 2600,
ZX Spectrum). It is now largely defunct.
Word, block, and page
Computers usually manipulate bits in groups of a fixed size, conventionally called ''
words''. The number of bits in a word is usually defined by the size of the
registers in the computer's
CPU
A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, and ...
, or by the number of data bits that are fetched from its
main memory
Computer data storage is a technology consisting of computer components and recording media that are used to retain digital data. It is a core function and fundamental component of computers.
The central processing unit (CPU) of a computer ...
in a single operation. In the
IA-32 architecture more commonly known as x86-32, a word is 16 bits, but other past and current architectures use words with 4, 8, 9, 12, 13, 16, 18, 20, 21, 22, 24, 25, 29, 30, 31, 32, 33, 35, 36, 38, 39, 40, 42, 44, 48, 50, 52, 54, 56, 60, 64, 72
bits or others.
Some
machine instructions and
computer number formats use two words (a "double word" or "dword"), or four words (a "quad word" or "quad").
Computer
memory caches usually operate on ''
blocks'' of memory that consist of several consecutive words. These units are customarily called ''cache blocks'', or, in
CPU caches, ''cache lines''.
Virtual memory systems partition the computer's
main storage into even larger units, traditionally called ''
pages''.
Systematic multiples
Terms for large quantities of bits can be formed using the standard range of SI prefixes for powers of 10, e.g.,
kilo = 10
3 = 1000 (as in
kilobit or kbit),
mega
Mega or MEGA may refer to:
Science
* mega-, a metric prefix denoting 106
* Mega (number), a certain very large integer in Steinhaus–Moser notation
* "mega-" a prefix meaning "large" that is used in taxonomy
* Gravity assist, for ''Moon-Earth ...
= 10
6 = (as in
megabit or Mbit) and
giga = 10
9 = (as in
gigabit
The bit is the most basic unit of information in computing and digital communications. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represented a ...
or Gbit). These prefixes are more often used for multiples of bytes, as in
kilobyte (1 kB = 8000 bit),
megabyte
The megabyte is a multiple of the unit byte for digital information. Its recommended unit symbol is MB. The unit prefix ''mega'' is a multiplier of (106) in the International System of Units (SI). Therefore, one megabyte is one million bytes o ...
(1 MB = ), and
gigabyte
The gigabyte () is a multiple of the unit byte for digital information. The prefix ''giga'' means 109 in the International System of Units (SI). Therefore, one gigabyte is one billion bytes. The unit symbol for the gigabyte is GB.
This defini ...
(1 GB = ).
However, for technical reasons, the capacities of computer memories and some storage units are often multiples of some large power of two, such as 2
28 = bytes. To avoid such unwieldy numbers, people have often repurposed the SI prefixes to mean the nearest power of two, e.g., using the prefix ''kilo'' for 2
10 = 1024, ''mega'' for 2
20 = , and ''giga'' for 2
30 = , and so on. For example, a
random access memory
Random-access memory (RAM; ) is a form of computer memory that can be read and changed in any order, typically used to store working Data (computing), data and machine code. A Random access, random-access memory device allows data items to b ...
chip with a capacity of 2
28 bytes would be referred to as a 256-megabyte chip. The table below illustrates these differences.
In the past, uppercase ''K'' has been used instead of lowercase ''k'' to indicate 1024 instead of 1000. However, this usage was never consistently applied.
On the other hand, for external storage systems (such as
optical discs), the SI prefixes are commonly used with their decimal values (powers of 10). There have been many attempts to resolve the confusion by providing alternative notations for power-of-two multiples. In 1998 the
International Electrotechnical Commission
The International Electrotechnical Commission (IEC; in French: ''Commission électrotechnique internationale'') is an international standards organization that prepares and publishes international standards for all electrical, electronic and r ...
(IEC) issued a standard for this purpose, namely a series of
binary prefixes that use 1024 instead of 1000 as the main radix:
The
JEDEC memory standard JESD88F notes that the definitions of kilo (K), giga (G), and mega (M) based on powers of two are included only to reflect common usage.
Size examples
* 1 bit: Answer to a yes/no question
* 1 byte: A number from 0 to 255
* 90 bytes: Enough to store a typical line of text from a book
* 512 bytes = 0.5 KiB: The typical
sector of a
hard disk
A hard disk drive (HDD), hard disk, hard drive, or fixed disk is an electro-mechanical data storage device that stores and retrieves digital data using magnetic storage with one or more rigid rapidly rotating platters coated with magnet ...
* 1024 bytes = 1 KiB: The classical
block size in
UNIX filesystems
* 2048 bytes = 2 KiB: A
CD-ROM
A CD-ROM (, compact disc read-only memory) is a type of read-only memory consisting of a pre-pressed optical compact disc that contains data. Computers can read—but not write or erase—CD-ROMs. Some CDs, called enhanced CDs, hold both comput ...
sector
* 4096 bytes = 4 KiB: A
memory page in
x86 (since
Intel 80386)
* 4 kB: About one page of text from a
novel
A novel is a relatively long work of narrative fiction, typically written in prose and published as a book. The present English word for a long work of prose fiction derives from the for "new", "news", or "short story of something new", itsel ...
* 120 kB: The text of a typical pocket book
* 1 MiB: A 1024×1024 pixel
bitmap image with 256 colors (8 bpp color depth)
* 3 MB: A three-minute
song (133 kbit/s)
* 650–900 MB – a CD-ROM
* 1 GB: 114 minutes of uncompressed CD-quality audio at 1.4 Mbit/s
* 32/64/128 GB: Three common sizes of
USB flash drives
* 6 TB: The size of a $100 hard disk (as of early 2022)
* 20 TB: Largest
hard disk drive (as of early 2022)
* 100 TB: Largest commercially available
solid state drive (as of early 2022)
* 200 TB: Largest solid state drive constructed (prediction for mid 2022)
* 1.3 ZB: Prediction of the volume of the whole internet in 2016
Obsolete and unusual units
Several other units of information storage have been named:
* 1 bit: unibit,
sniff
* 2 bits: dibit,
crumb,
quartic digit,
quad, quarter, taste, tayste, tidbit, tydbit, lick, lyck, semi-nibble, snort, nyp
* 3 bits: tribit,
triad,
triade,
tribble
* 4 bits:
character
Character or Characters may refer to:
Arts, entertainment, and media Literature
* ''Character'' (novel), a 1936 Dutch novel by Ferdinand Bordewijk
* ''Characters'' (Theophrastus), a classical Greek set of character sketches attributed to The ...
(on
Intel 4004 – however, characters are typically 8 bits wide or larger on other processors), for others see ''
nibble
In computing, a nibble (occasionally nybble, nyble, or nybl to match the spelling of byte) is a four-bit aggregation, or half an octet. It is also known as half-byte or tetrade. In a networking or telecommunication context, the nibble is oft ...
''
* 5 bits: pentad, pentade,
nickel, nyckle
* 6 bits: byte (in early
IBM machines using
BCD alphamerics), hexad, hexade,
sextet
* 7 bits: heptad, heptade
* 8 bits:
octet, commonly also called
byte
* 9 bits: nonet,
rarely used
* 10 bits: declet,
decle,
deckle, dyme
* 12 bits:
slab
* 15 bits: parcel (on
CDC 6600 and
CDC 7600)
* 16 bits: doublet,
wyde,
parcel (on
Cray-1
The Cray-1 was a supercomputer designed, manufactured and marketed by Cray Research. Announced in 1975, the first Cray-1 system was installed at Los Alamos National Laboratory in 1976. Eventually, over 100 Cray-1s were sold, making it one of the ...
), plate, playte, chomp, chawmp (on a 32-bit machine)
* 18 bits: chomp, chawmp (on a 36-bit machine)
* 32 bits: quadlet,
tetra,
dinner, dynner, gawble (on a 32-bit machine)
* 48 bits: gobble, gawble (under circumstances that remain obscure)
* 64 bits: octlet,
octa
* 96 bits: bentobox (in
ITRON OS)
* 128 bits: hexlet
* 16 bytes:
paragraph (on
Intel x86 processors)
* 256 bytes:
page (on Intel 4004,
8080 and 8086 processors,
also many other 8-bit processors – typically much larger on many 16-bit/32-bit processors)
* 6
trits:
tryte
* combit, comword
Some of these names are
jargon, obsolete, or used only in very restricted contexts.
See also
*
Metric prefix
*
File size
File size is a measure of how much data a computer file contains or, alternately, how much storage it consumes. Typically, file size is expressed in units of measurement based on the byte. By convention, file size units use either a metric pre ...
Notes
References
External links
Representation of numerical values and SI units in character strings for information interchangesBit Calculatorake conversions between bits, bytes, kilobits, kilobytes, megabits, megabytes, gigabits, gigabytes, terabits, terabytes, petabits, petabytes, exabits, exabytes, zettabits, zettabytes, yottabits, yottabytes.
Paper on standardized units for use in information technologyData Byte ConverterHigh Precision Data Unit Converters
{{DEFAULTSORT:Units of Information