The non-cryptographic hash functions (NCHFs) are
hash function
A hash function is any Function (mathematics), function that can be used to map data (computing), data of arbitrary size to fixed-size values, though there are some hash functions that support variable-length output. The values returned by a ...
s intended for applications that do not need the rigorous security requirements of the
cryptographic hash function
A cryptographic hash function (CHF) is a hash algorithm (a map (mathematics), map of an arbitrary binary string to a binary string with a fixed size of n bits) that has special properties desirable for a cryptography, cryptographic application: ...
s (e.g.,
preimage resistance) and therefore can be faster and less resource-intensive. Typical examples of CPU-optimized non-cryptographic hashes include
FNV-1a and
Murmur3. Some non-cryptographic hash functions are used in cryptographic applications (usually in combination with other cryptographic primitives); in this case they are described as
universal hash functions.
Applications and requirements
Among the typical uses of non-cryptographic hash functions are
bloom filter
In computing, a Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives ar ...
s,
hash table
In computer science, a hash table is a data structure that implements an associative array, also called a dictionary or simply map; an associative array is an abstract data type that maps Unique key, keys to Value (computer science), values. ...
s, and
count sketches. These applications require, in addition to speed,
uniform distribution and
avalanche
An avalanche is a rapid flow of snow down a Grade (slope), slope, such as a hill or mountain. Avalanches can be triggered spontaneously, by factors such as increased precipitation or snowpack weakening, or by external means such as humans, othe ...
properties.
Collision resistance
In cryptography, collision resistance is a property of cryptographic hash functions: a hash function ''H'' is collision-resistant if it is hard to find two inputs that hash to the same output; that is, two inputs ''a'' and ''b'' where ''a'' ≠ ' ...
is an additional feature that can be useful against
hash flooding attacks; simple NCHFs, like the
cyclic redundancy check
A cyclic redundancy check (CRC) is an error-detecting code commonly used in digital networks and storage devices to detect accidental changes to digital data. Blocks of data entering these systems get a short ''check value'' attached, based on ...
(CRC), have essentially no collision resistance and thus cannot be used with an input open to manipulation by an attacker.
NCHFs are used in diverse systems:
lexical analyzers,
compiler
In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primaril ...
s,
database
In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and a ...
s,
communication network
A telecommunications network is a group of nodes interconnected by telecommunications links that are used to exchange messages between the nodes. The links may use a variety of technologies based on the methodologies of circuit switching, messa ...
s, video games,
DNS server
A name server is a computer application that implements a network service for providing responses to queries against a directory service. It translates an often humanly meaningful, text-based identifier to a system-internal, often numeric identi ...
s,
filesystems—anywhere in computing where there is a need to find the information very quickly (preferably in the
O(1) time, which will also achieve perfect
scalability
Scalability is the property of a system to handle a growing amount of work. One definition for software systems specifies that this may be done by adding resources to the system.
In an economic context, a scalable business model implies that ...
).
Estébanez et al. list the "most important" NCHFs:
* The
Fowler–Noll–Vo hash function
Fowler–Noll–Vo (or FNV) is a non-cryptographic hash function created by Glenn Fowler, Landon Curt Noll, and Kiem-Phong Vo.
The basis of the FNV hash algorithm was taken from an idea sent as reviewer comments to the IEEE POSIX P1003.2 committe ...
(FNV) was created by Glenn Fowler and Phong Vo in 1991 with contributions from
Landon Curt Noll
Landon Curt Noll (born October 28, 1960) is an American computer scientist, co-discoverer of the 25th Mersenne prime and discoverer of the 26th, which he found while still enrolled at Hayward High School (California), Hayward High School and conc ...
. FNV with its two variants, FNV-1 and FNV-1a, is very widely used in
Linux
Linux ( ) is a family of open source Unix-like operating systems based on the Linux kernel, an kernel (operating system), operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically package manager, pac ...
,
FreeBSD
FreeBSD is a free-software Unix-like operating system descended from the Berkeley Software Distribution (BSD). The first version was released in 1993 developed from 386BSD, one of the first fully functional and free Unix clones on affordable ...
OSes, DNS servers,
NFS,
Twitter
Twitter, officially known as X since 2023, is an American microblogging and social networking service. It is one of the world's largest social media platforms and one of the most-visited websites. Users can share short text messages, image ...
,
PlayStation 2
The PlayStation 2 (PS2) is a home video game console developed and marketed by Sony Interactive Entertainment, Sony Computer Entertainment. It was first released in Japan on 4 March 2000, in North America on 26 October, in Europe on 24 Novembe ...
, and
Xbox
Xbox is a video gaming brand that consists of four main home video game console lines, as well as application software, applications (games), the streaming media, streaming service Xbox Cloud Gaming, and online services such as the Xbox networ ...
, among others.
*
lookup3
The Jenkins hash functions are a family of non-cryptographic hash functions for multi-byte keys designed by Bob Jenkins. The first one was formally published in 1997.
The hash functions
one_at_a_time
Jenkins's one_at_a_time hash is adapted here ...
was created by
Robert Jenkins. This hash is also widely used and can be found in
PostgreSQL
PostgreSQL ( ) also known as Postgres, is a free and open-source software, free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. PostgreSQL features transaction processing, transactions ...
, Linux,
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Though Perl is not officially an acronym, there are various backronyms in use, including "Practical Extraction and Reporting Language".
Perl was developed ...
,
Ruby
Ruby is a pinkish-red-to-blood-red-colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called sapph ...
, and
Infoseek
Infoseek (also known as the "big yellow") was an American internet search engine founded in 1994 by Steve Kirsch.
Infoseek was originally operated by the Infoseek Corporation, headquartered in Sunnyvale, California. Infoseek was bought by The W ...
.
* SuperFastHash was created by Paul Hsieh using ideas from FNV and lookup3, with one of the goals being a high degree of avalanche effect. The hash is used in
WebKit
WebKit is a browser engine primarily used in Apple's Safari web browser, as well as all web browsers on iOS and iPadOS. WebKit is also used by the PlayStation consoles starting with the PS3, the Tizen mobile operating systems, the Amazon K ...
(part of
Safari
A safari (; originally ) is an overland journey to observe wildlife, wild animals, especially in East Africa. The so-called big five game, "Big Five" game animals of Africa – lion, African leopard, leopard, rhinoceros, African elephant, elep ...
and
Google Chrome
Google Chrome is a web browser developed by Google. It was first released in 2008 for Microsoft Windows, built with free software components from Apple WebKit and Mozilla Firefox. Versions were later released for Linux, macOS, iOS, iPadOS, an ...
).
*
MurmurHash2 was created by Austin Appleby in 2008 and is used in
libmemcached, Maatkit, and
Apache Hadoop
Apache Hadoop () is a collection of open-source software utilities for reliable, scalable, distributed computing. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop wa ...
.
* DJBX33A ("Daniel J. Bernstein, Times 33 with Addition"). This very simple multiplication-and-addition function was proposed by
Daniel J. Bernstein. It is fast and efficient during initialization. Many programming environments based on
PHP 5,
Python, and
ASP.NET
ASP.NET is a server-side web-application framework designed for web development to produce dynamic web pages. It was developed by Microsoft to allow programmers to build dynamic web sites, applications and services. The name stands for Ac ...
use variants of this hash. The hash is easy to
flood
A flood is an overflow of water (list of non-water floods, or rarely other fluids) that submerges land that is usually dry. In the sense of "flowing water", the word may also be applied to the inflow of the tide. Floods are of significant con ...
, exposing the servers.
* BuzHash was created by Robert Uzgalis in 1992. It is designed around a substitution table and can tolerate extremely skewed distributions on the input.
* DEK is an early multiplicative hash based on a proposal by
Donald Knuth
Donald Ervin Knuth ( ; born January 10, 1938) is an American computer scientist and mathematician. He is a professor emeritus at Stanford University. He is the 1974 recipient of the ACM Turing Award, informally considered the Nobel Prize of comp ...
and is one of the oldest hashes that is still in use.
Design
Non-cryptographic hash functions optimized for software frequently involve the multiplication operation. Since in-hardware multiplication is resource-intensive and frequency-limiting,
ASIC
An application-specific integrated circuit (ASIC ) is an integrated circuit (IC) chip customized for a particular use, rather than intended for general-purpose use, such as a chip designed to run in a digital voice recorder or a high-efficien ...
-friendlier designs had been proposed, including
SipHash (which has an additional benefit of being able to use a secret
key for
message authentication), NSGAhash, and XORhash. Although technically
lightweight cryptography can be used for the same applications, the
latency of its algorithms is usually too high due to a large number of
rounds. Sateesan et al. propose using the
reduced-round versions of lightweight hashes and ciphers as non-cryptographic hash functions.
Many NCHFs have a relatively small result size (e.g., 64 bits for
SipHash or even less): large result size does not increase the performance of the target applications, but slows down the calculation, as more bits need to be generated.
See also
* A list of
non-cryptographic hash functions
References
Sources
*
*
*
*
* {{cite book , last=Mittelbach , first=Arno , last2=Fischlin , first2=Marc , title=The Theory of Hash Functions and Random Oracles , chapter=Non-cryptographic Hashing , publisher=Springer International Publishing , publication-place=Cham , date=2021 , isbn=978-3-030-63286-1 , doi=10.1007/978-3-030-63287-8_7 , pages=303–334
Hash function (non-cryptographic)