computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includin ...

, SUHA (Simple Uniform Hashing Assumption) is a basic assumption that facilitates the mathematical analysis of

hash tables In computing, a hash table, also known as hash map, is a data structure that implements an associative array or dictionary. It is an abstract data type that maps keys to values. A hash table uses a hash function to compute an ''index'', ...

. The assumption states that a hypothetical

hashing function A hash function is any function that can be used to map data of arbitrary size to fixed-size values. The values returned by a hash function are called ''hash values'', ''hash codes'', ''digests'', or simply ''hashes''. The values are usually ...

will evenly distribute items into the slots of a hash table. Moreover, each item to be hashed has an equal

probability Probability is the branch of mathematics concerning numerical descriptions of how likely an Event (probability theory), event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and ...

of being placed into a slot, regardless of the other elements already placed. This assumption generalizes the details of the hash function and allows for certain assumptions about the stochastic system.

Applications

SUHA is most commonly used as a foundation for mathematical proofs describing the properties and behavior of hash tables in

theoretical computer science Theoretical computer science (TCS) is a subset of general computer science and mathematics that focuses on mathematical aspects of computer science such as the theory of computation, lambda calculus, and type theory. It is difficult to circumsc ...

. Minimizing hashing collisions can be achieved with a uniform hashing function. These functions often rely on the specific input data set and can be quite difficult to implement. Assuming uniform hashing allows hash table analysis to be made without exact knowledge of the input or the hash function used.

Mathematical implications

Certain properties of hash tables can be derived once uniform hashing is assumed.

Uniform distribution

Under the assumption of uniform hashing, given a hash function ''h'', and a hash table of size ''m'', the probability that two non-equal elements will hash to the same slot is :

P(h(a) = h(b)) =  \frac.

Collision chain length

Under the assumption of uniform hashing, the load factor

\alpha

and the

average In ordinary language, an average is a single number taken as representative of a list of numbers, usually the sum of the numbers divided by how many numbers are in the list (the arithmetic mean). For example, the average of the numbers 2, 3, 4, 7, ...

chain length of a hash table of size ''m'' with ''n'' elements will be :

\alpha = \tfrac

Successful lookup

Under the assumption of uniform hashing, the average time (in

big-O notation Big ''O'' notation is a mathematical notation that describes the limiting behavior of a function when the argument tends towards a particular value or infinity. Big O is a member of a family of notations invented by Paul Bachmann, Edmund Land ...

) to successfully find an element in a hash table using

chaining Chaining is a type of intervention that aims to create associations between behaviors in a behavior chain. A behavior chain is a sequence of behaviors that happen in a particular order where the outcome of the previous step in the chain serves as ...

is :

O(\alpha + 1)\,

Unsuccessful lookup

Under the assumption of uniform hashing, the average time (in big-O notation) to unsuccessfully find an element in a hash table using chaining is :

O(\alpha + 1)\,

Example

A simple example of using SUHA can be seen while observing an arbitrary hash table of size 10 and a data set of 30 unique elements. If chaining is used to deal with collisions, the average chain length of this hash table may be a desirable value. Without any assumptions and with no more additional information about the data or hash function, the chain length cannot be estimated. With SUHA however, we can state that because of an assumed uniform hashing, each element has an equal probability of mapping to a slot. Since no particular slot should be favored over another, the 30 elements should hash into the 10 slots uniformly. This will produce a hash table with, on average, 10 chains each of length 3 :

\alpha = \tfrac

\alpha = \tfrac

\alpha = 3\,

References

General

* * {{cite book , last = Cormen , first = Thomas H. , author-link = Thomas H. Cormen , author2=Charles E. Leiserson , author2-link=Charles E. Leiserson , author3=Ronald L. Rivest , author3-link=Ronald L. Rivest , author4=Clifford Stein , author4-link=Clifford Stein , title =

Introduction to Algorithms ''Introduction to Algorithms'' is a book on computer programming by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. The book has been widely used as the textbook for algorithms courses at many universities and is ...

, publisher = MIT Press and McGraw-Hill , date= 2001 , chapter = Section 11.2: Hash Tables , pages = 226–228 , isbn = 0-262-03293-7 Hashing