The Hamming weight of a
string
String or strings may refer to:
*String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects
Arts, entertainment, and media Films
* ''Strings'' (1991 film), a Canadian anim ...
is the number of symbols that are different from the zero-symbol of the
alphabet
An alphabet is a standardized set of basic written graphemes (called letters) that represent the phonemes of certain spoken languages. Not all writing systems represent language in this way; in a syllabary, each character represents a s ...
used. It is thus equivalent to the
Hamming distance from the all-zero string of the same length. For the most typical case, a string of
bit
The bit is the most basic unit of information in computing and digital communications. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represented a ...
s, this is the number of 1's in the string, or the
digit sum of the
binary representation of a given number and the
''ℓ''₁ norm of a bit vector. In this binary case, it is also called the population count,
popcount, sideways sum,
or bit summation.
History and usage
The Hamming weight is named after
Richard Hamming although he did not originate the notion.
The Hamming weight of binary numbers was already used in 1899 by
James W. L. Glaisher to give a formula for
the number of odd binomial coefficients in a single row of
Pascal's triangle
In mathematics, Pascal's triangle is a triangular array of the binomial coefficients that arises in probability theory, combinatorics, and algebra. In much of the Western world, it is named after the French mathematician Blaise Pascal, althoug ...
.
Irving S. Reed
Irving Stoy Reed (November 12, 1923 – September 11, 2012) was an American mathematician and engineer. He is best known for co-inventing a class of algebraic error-correcting and error-detecting codes known as Reed–Solomon codes in collabora ...
introduced a concept, equivalent to Hamming weight in the binary case, in 1954.
Hamming weight is used in several disciplines including
information theory
Information theory is the scientific study of the quantification, storage, and communication of information. The field was originally established by the works of Harry Nyquist and Ralph Hartley, in the 1920s, and Claude Shannon in the 1940s. ...
,
coding theory
Coding theory is the study of the properties of codes and their respective fitness for specific applications. Codes are used for data compression, cryptography, error detection and correction, data transmission and data storage. Codes are stud ...
, and
cryptography
Cryptography, or cryptology (from grc, , translit=kryptós "hidden, secret"; and ''graphein'', "to write", or ''-logia'', "study", respectively), is the practice and study of techniques for secure communication in the presence of adve ...
. Examples of applications of the Hamming weight include:
* In modular
exponentiation by squaring, the number of modular multiplications required for an exponent ''e'' is log
2 ''e'' + weight(''e''). This is the reason that the public key value ''e'' used in
RSA
RSA may refer to:
Organizations Academia and education
* Rabbinical Seminary of America, a yeshiva in New York City
*Regional Science Association International (formerly the Regional Science Association), a US-based learned society
*Renaissance S ...
is typically chosen to be a number of low Hamming weight.
* The Hamming weight determines path lengths between nodes in
Chord distributed hash tables.
*
IrisCode lookups in biometric databases are typically implemented by calculating the
Hamming distance to each stored record.
* In
computer chess programs using a
bitboard representation, the Hamming weight of a bitboard gives the number of pieces of a given type remaining in the game, or the number of squares of the board controlled by one player's pieces, and is therefore an important contributing term to the value of a position.
* Hamming weight can be used to efficiently compute
find first set using the identity ffs(x) = pop(x ^ (x - 1)). This is useful on platforms such as
SPARC that have hardware Hamming weight instructions but no hardware find first set instruction.
* The Hamming weight operation can be interpreted as a conversion from the
unary numeral system
The unary numeral system is the simplest numeral system to represent natural numbers: to represent a number ''N'', a symbol representing 1 is repeated ''N'' times.
In the unary system, the number 0 (zero) is represented by the empty string, th ...
to
binary number
A binary number is a number expressed in the base-2 numeral system or binary numeral system, a method of mathematical expression which uses only two symbols: typically "0" ( zero) and "1" (one).
The base-2 numeral system is a positional notati ...
s.
* In implementation of some
succinct data structures like
bit vectors and
wavelet trees.
Efficient implementation
The population count of a
bitstring is often needed in cryptography and other applications. The
Hamming distance of two words ''A'' and ''B'' can be calculated as the Hamming weight of ''A''
xor ''B''.
The problem of how to implement it efficiently has been widely studied. A single operation for the calculation, or parallel operations on bit vectors are
available on some processors. For processors lacking those features, the best solutions known are based on adding counts in a tree pattern. For example, to count the number of 1 bits in the 16-bit binary number a = 0110 1100 1011 1010, these operations can be done:
Here, the operations are as in
C programming language
''The C Programming Language'' (sometimes termed ''K&R'', after its authors' initials) is a computer programming book written by Brian Kernighan and Dennis Ritchie, the latter of whom originally designed and implemented the language, as well as ...
, so means to shift X right by Y bits, X & Y means the bitwise AND of X and Y, and + is ordinary addition. The best algorithms known for this problem are based on the concept illustrated above and are given here:
//types and constants used in the functions below
//uint64_t is an unsigned 64-bit integer variable type (defined in C99 version of C language)
const uint64_t m1 = 0x5555555555555555; //binary: 0101...
const uint64_t m2 = 0x3333333333333333; //binary: 00110011..
const uint64_t m4 = 0x0f0f0f0f0f0f0f0f; //binary: 4 zeros, 4 ones ...
const uint64_t m8 = 0x00ff00ff00ff00ff; //binary: 8 zeros, 8 ones ...
const uint64_t m16 = 0x0000ffff0000ffff; //binary: 16 zeros, 16 ones ...
const uint64_t m32 = 0x00000000ffffffff; //binary: 32 zeros, 32 ones
const uint64_t h01 = 0x0101010101010101; //the sum of 256 to the power of 0,1,2,3...
//This is a naive implementation, shown for comparison,
//and to help in understanding the better functions.
//This algorithm uses 24 arithmetic operations (shift, add, and).
int popcount64a(uint64_t x)
//This uses fewer arithmetic operations than any other known
//implementation on machines with slow multiplication.
//This algorithm uses 17 arithmetic operations.
int popcount64b(uint64_t x)
//This uses fewer arithmetic operations than any other known
//implementation on machines with fast multiplication.
//This algorithm uses 12 arithmetic operations, one of which is a multiply.
int popcount64c(uint64_t x)
The above implementations have the best worst-case behavior of any known algorithm. However, when a value is expected to have few nonzero bits, it may instead be more efficient to use algorithms that count these bits one at a time. As Wegner described in 1960,
the
bitwise AND of ''x'' with ''x'' − 1 differs from ''x'' only in zeroing out the least significant nonzero bit: subtracting 1 changes the rightmost string of 0s to 1s, and changes the rightmost 1 to a 0. If ''x'' originally had ''n'' bits that were 1, then after only ''n'' iterations of this operation, ''x'' will be reduced to zero. The following implementation is based on this principle.
//This is better when most bits in x are 0
//This algorithm works the same for all data sizes.
//This algorithm uses 3 arithmetic operations and 1 comparison/branch per "1" bit in x.
int popcount64d(uint64_t x)
If greater memory usage is allowed, we can calculate the Hamming weight faster than the above methods. With unlimited memory, we could simply create a large lookup table of the Hamming weight of every 64 bit integer. If we can store a lookup table of the hamming function of every 16 bit integer, we can do the following to compute the Hamming weight of every 32 bit integer.
static uint8_t wordbits 5536= ;
//This algorithm uses 3 arithmetic operations and 2 memory reads.
int popcount32e(uint32_t x)
//Optionally, the wordbits[] table could be filled using this function
int popcount32e_init(void)
Muła et al.
have shown that a vectorized version of popcount64b can run faster than dedicated instructions (e.g., popcnt on x64 processors).
Minimum weight
In
error-correcting coding, the minimum Hamming weight, commonly referred to as the minimum weight ''w''
min of a code is the weight of the lowest-weight non-zero code word. The weight ''w'' of a code word is the number of 1s in the word. For example, the word 11001010 has a weight of 4.
In a
linear block code the minimum weight is also the
minimum Hamming distance
In information theory, the Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. In other words, it measures the minimum number of ''substitutions'' required to chang ...
(''d''
min) and defines the error correction capability of the code. If ''w''
min = ''n'', then ''d''
min = ''n'' and the code will correct up to ''d''
min/2 errors.
Language support
Some C compilers provide intrinsic functions that provide bit counting facilities. For example,
GCC (since version 3.4 in April 2004) includes a builtin function
__builtin_popcount
that will use a processor instruction if available or an efficient library implementation otherwise.
LLVM-GCC has included this function since version 1.5 in June 2005.
In the
C++ Standard Library, the bit-array data structure
bitset
has a
count()
method that counts the number of bits that are set. In
C++20, a new header
was added, containing functions
std::popcount
and
std::has_single_bit
, taking arguments of unsigned integer types.
In Java, the growable bit-array data structure has a method that counts the number of bits that are set. In addition, there are and functions to count bits in primitive 32-bit and 64-bit integers, respectively. Also, the arbitrary-precision integer class also has a method that counts bits.
In
Python, the
int
type has a
bit_count()
method to count the number of bits set. This functionality was introduced in Python 3.10, released in October 2021.
In
Common Lisp
Common Lisp (CL) is a dialect of the Lisp programming language, published in ANSI standard document ''ANSI INCITS 226-1994 (S20018)'' (formerly ''X3.226-1994 (R1999)''). The Common Lisp HyperSpec, a hyperlinked HTML version, has been derived fr ...
, the function
logcount
, given a non-negative integer, returns the number of 1 bits. (For negative integers it returns the number of 0 bits in 2's complement notation.) In either case the integer can be a BIGNUM.
Starting in
GHC 7.4, the
Haskell base package has a
popCount
function available on all types that are instances of the
Bits
class (available from the
Data.Bits
module).
MySQL
MySQL () is an open-source relational database management system (RDBMS). Its name is a combination of "My", the name of co-founder Michael Widenius's daughter My, and "SQL", the acronym for Structured Query Language. A relational database ...
version of
SQL language provides
BIT_COUNT()
as a standard function.
Fortran 2008 has the standard, intrinsic, elemental function
popcnt
returning the number of nonzero bits within an integer (or integer array).
Some programmable scientific pocket calculators feature special commands to calculate the number of set bits, e.g.
#B
on the
HP-16C and
WP 43S,
#BITS
or
BITSUM
on HP-16C emulators, and
nBITS
on the
WP 34S.
FreePascal implements popcnt since version 3.0.
Processor support
* The
IBM STRETCH computer in the 1960s calculated the number of set bits as well as the
number of leading zeros
In computer software and hardware, find first set (ffs) or find first one is a bit operation that, given an unsigned machine word, designates the index or position of the least significant bit set to one in the word counting from the least signifi ...
as a by-product of all logical operations.
*
Cray
Cray Inc., a subsidiary of Hewlett Packard Enterprise, is an American supercomputer manufacturer headquartered in Seattle, Washington. It also manufactures systems for data storage and analytics. Several Cray supercomputer systems are listed i ...
supercomputers early on featured a population count
machine instruction, rumoured to have been specifically requested by the U.S. government
National Security Agency
The National Security Agency (NSA) is a national-level intelligence agency of the United States Department of Defense, under the authority of the Director of National Intelligence (DNI). The NSA is responsible for global monitoring, collectio ...
for
cryptanalysis applications.
*
Control Data Corporation's (CDC)
6000 and
Cyber 70/170 series machines included a population count instruction; in
COMPASS
A compass is a device that shows the cardinal directions used for navigation and geographic orientation. It commonly consists of a magnetized needle or other element, such as a compass card or compass rose, which can pivot to align itself with ...
, this instruction was coded as
CXi
.
* The 64-bit
SPARC version 9 architecture defines a
POPC
instruction,
but most implementations do not implement it, requiring it be emulated by the operating system.
*
Donald Knuth
Donald Ervin Knuth ( ; born January 10, 1938) is an American computer scientist, mathematician, and professor emeritus at Stanford University. He is the 1974 recipient of the ACM Turing Award, informally considered the Nobel Prize of computer ...
's model computer
MMIX that is going to replace
MIX
Mix, mixes or mixing may refer to:
Persons & places
* Mix (surname)
** Tom Mix (1880-1940), American film star
* nickname of Mix Diskerud (born Mikkel, 1990), Norwegian-American soccer player
* Mix camp, an informal settlement in Namibia
* Mix ...
in his book
The Art of Computer Programming
''The Art of Computer Programming'' (''TAOCP'') is a comprehensive monograph written by the computer scientist Donald Knuth presenting programming algorithms and their analysis. Volumes 1–5 are intended to represent the central core of comp ...
has an
SADD
instruction since 1999.
SADD a,b,c
counts all bits that are 1 in b and 0 in c and writes the result to a.
*
Compaq
Compaq Computer Corporation (sometimes abbreviated to CQ prior to a 2007 rebranding) was an American information technology company founded in 1982 that developed, sold, and supported computers and related products and services. Compaq produced ...
's
Alpha 21264A
The Alpha 21264 is a Digital Equipment Corporation RISC microprocessor launched on 19 October 1998. The 21264 implemented the Alpha instruction set architecture (ISA).
Description
The Alpha 21264 is a four-issue superscalar microprocessor with ...
, released in 1999, was the first Alpha series CPU design that had the count extension (
CIX
).
*
Analog Devices'
Blackfin processors feature the
ONES
instruction to perform a 32-bit population count.
*
AMD's
Barcelona
Barcelona ( , , ) is a city on the coast of northeastern Spain. It is the capital and largest city of the autonomous community of Catalonia, as well as the second most populous municipality of Spain. With a population of 1.6 million within ...
architecture introduced the advanced bit manipulation (ABM)
ISA
Isa or ISA may refer to:
Places
* Isa, Amur Oblast, Russia
* Isa, Kagoshima, Japan
* Isa, Nigeria
* Isa District, Kagoshima, former district in Japan
* Isa Town, middle class town located in Bahrain
* Mount Isa, Queensland, Australia
* Mount Is ...
introducing the
POPCNT
instruction as part of the
SSE4a extensions in 2007.
*
Intel Core
Intel Core is a line of streamlined midrange consumer, workstation and enthusiast computer central processing units (CPUs) marketed by Intel Corporation. These processors displaced the existing mid- to high-end Pentium processors at the time ...
processors introduced a
POPCNT
instruction with the
SSE4.2
SSE4 (Streaming SIMD Extensions 4) is a SIMD CPU instruction set used in the Intel Core microarchitecture and AMD K10 (K8L). It was announced on September 27, 2006, at the Fall 2006 Intel Developer Forum, with vague details in a white paper; more ...
instruction set
In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called a ...
extension, first available in a
Nehalem-based
Core i7 processor, released in November 2008.
* The
ARM architecture
ARM (stylised in lowercase as arm, formerly an acronym for Advanced RISC Machines and originally Acorn RISC Machine) is a family of reduced instruction set computer (RISC) instruction set architectures for computer processors, configure ...
introduced the
VCNT
instruction as part of the
Advanced SIMD (
NEON
Neon is a chemical element with the symbol Ne and atomic number 10. It is a noble gas. Neon is a colorless, odorless, inert monatomic gas under standard conditions, with about two-thirds the density of air. It was discovered (along with krypt ...
) extensions.
* The
RISC-V
RISC-V (pronounced "risk-five" where five refers to the number of generations of RISC architecture that were developed at the University of California, Berkeley since 1981) is an open standard instruction set architecture (ISA) based on establi ...
architecture introduced the
PCNT
instruction as part of the Bit Manipulation (B) extension.
See also
*
Two's complement
Two's complement is a mathematical operation to reversibly convert a positive binary number into a negative binary number with equivalent (but negative) value, using the binary digit with the greatest place value (the leftmost bit in big- endian ...
*
Fan out
In digital electronics, the fan-out is the number of gate inputs driven by the output of another single logic gate.
In most designs, logic gates are connected to form more complex circuits. While no logic gate input can be fed by more than one ...
References
Further reading
*
Item 169 Population count assembly code for the PDP/6-10.)
External links
Aggregate Magic Algorithms Optimized population count and other algorithms explained with sample code.
Several algorithms with code for counting bits set.
Necessary and Sufficient- by Damien Wintour - Has code in C# for various Hamming Weight implementations.
Best algorithm to count the number of set bits in a 32-bit integer?- Stackoverflow
{{DEFAULTSORT:Hamming Weight
Coding theory
Articles with example C code