SUHA (computer Science)
   HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, SUHA (Simple Uniform Hashing Assumption) is a basic assumption that facilitates the mathematical analysis of
hash tables In computing, a hash table, also known as hash map, is a data structure that implements an associative array or dictionary. It is an abstract data type that maps keys to values. A hash table uses a hash function to compute an ''index'', als ...
. The assumption states that a hypothetical hashing function will evenly distribute items into the slots of a hash table. Moreover, each item to be hashed has an equal
probability Probability is the branch of mathematics concerning numerical descriptions of how likely an Event (probability theory), event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and ...
of being placed into a slot, regardless of the other elements already placed. This assumption generalizes the details of the hash function and allows for certain assumptions about the stochastic system.


Applications

SUHA is most commonly used as a foundation for mathematical proofs describing the properties and behavior of hash tables in
theoretical computer science Theoretical computer science (TCS) is a subset of general computer science and mathematics that focuses on mathematical aspects of computer science such as the theory of computation, lambda calculus, and type theory. It is difficult to circumsc ...
. Minimizing hashing collisions can be achieved with a uniform hashing function. These functions often rely on the specific input data set and can be quite difficult to implement. Assuming uniform hashing allows hash table analysis to be made without exact knowledge of the input or the hash function used.


Mathematical implications

Certain properties of hash tables can be derived once uniform hashing is assumed.


Uniform distribution

Under the assumption of uniform hashing, given a hash function ''h'', and a hash table of size ''m'', the probability that two non-equal elements will hash to the same slot is :P(h(a) = h(b)) = \frac.


Collision chain length

Under the assumption of uniform hashing, the load factor \alpha and the
average In ordinary language, an average is a single number taken as representative of a list of numbers, usually the sum of the numbers divided by how many numbers are in the list (the arithmetic mean). For example, the average of the numbers 2, 3, 4, 7, ...
chain length of a hash table of size ''m'' with ''n'' elements will be :\alpha = \tfrac


Successful lookup

Under the assumption of uniform hashing, the average time (in
big-O notation Big ''O'' notation is a mathematical notation that describes the limiting behavior of a function when the argument tends towards a particular value or infinity. Big O is a member of a family of notations invented by Paul Bachmann, Edmund Land ...
) to successfully find an element in a hash table using
chaining Chaining is a type of intervention that aims to create associations between behaviors in a behavior chain. A behavior chain is a sequence of behaviors that happen in a particular order where the outcome of the previous step in the chain serves as ...
is :O(\alpha + 1)\,


Unsuccessful lookup

Under the assumption of uniform hashing, the average time (in big-O notation) to unsuccessfully find an element in a hash table using chaining is :O(\alpha + 1)\,


Example

A simple example of using SUHA can be seen while observing an arbitrary hash table of size 10 and a data set of 30 unique elements. If chaining is used to deal with collisions, the average chain length of this hash table may be a desirable value. Without any assumptions and with no more additional information about the data or hash function, the chain length cannot be estimated. With SUHA however, we can state that because of an assumed uniform hashing, each element has an equal probability of mapping to a slot. Since no particular slot should be favored over another, the 30 elements should hash into the 10 slots uniformly. This will produce a hash table with, on average, 10 chains each of length 3 :\alpha = \tfrac :\alpha = \tfrac :\alpha = 3\,


See also

*
Hash Table In computing, a hash table, also known as hash map, is a data structure that implements an associative array or dictionary. It is an abstract data type that maps keys to values. A hash table uses a hash function to compute an ''index'', als ...
*
Hash Collision In computer science, a hash collision or hash clash is when two pieces of data in a hash table share the same hash value. The hash value in this case is derived from a hash function which takes a data input and returns a fixed length of bits. Al ...
*
Perfect Hashing In computer science, a perfect hash function for a set is a hash function that maps distinct elements in to a set of integers, with no collisions. In mathematical terms, it is an injective function. Perfect hash functions may be used to im ...


References


General

* * {{cite book , last = Cormen , first = Thomas H. , author-link = Thomas H. Cormen , author2=Charles E. Leiserson , author2-link=Charles E. Leiserson , author3=Ronald L. Rivest , author3-link=Ronald L. Rivest , author4=Clifford Stein , author4-link=Clifford Stein , title =
Introduction to Algorithms ''Introduction to Algorithms'' is a book on computer programming by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. The book has been widely used as the textbook for algorithms courses at many universities and is co ...
, publisher = MIT Press and McGraw-Hill , date= 2001 , chapter = Section 11.2: Hash Tables , pages = 226–228 , isbn = 0-262-03293-7 Hashing