Rendezvous Hashing
   HOME

TheInfoList



OR:

Rendezvous or highest random weight (HRW) hashing is an algorithm that allows clients to achieve distributed agreement on a set of k options out of a possible set of n options. A typical application is when clients need to agree on which sites (or proxies) objects are assigned to.
Consistent hashing In computer science, consistent hashing is a special kind of hashing technique such that when a hash table is resized, only n/m keys need to be remapped on average where n is the number of keys and m is the number of slots. In contrast, in most tr ...
addresses the special case k = 1, using a different method. Rendezvous hashing is both much simpler and more general than consistent hashing (see below).


History

Rendezvous hashing was invented by David Thaler and Chinya Ravishankar at the
University of Michigan , mottoeng = "Arts, Knowledge, Truth" , former_names = Catholepistemiad, or University of Michigania (1817–1821) , budget = $10.3 billion (2021) , endowment = $17 billion (2021)As o ...
in 1996.
Consistent hashing In computer science, consistent hashing is a special kind of hashing technique such that when a hash table is resized, only n/m keys need to be remapped on average where n is the number of keys and m is the number of slots. In contrast, in most tr ...
appeared a year later in the literature. Given its simplicity and generality, rendezvous hashing is now being preferred to consistent hashing in real-world applications. Rendezvous hashing was used very early on in many applications including mobile caching, router design, secure key establishment, and
sharding A database shard, or simply a shard, is a horizontal partition of data in a database or search engine. Each shard is held on a separate database server instance, to spread load. Some data within a database remains present in all shards, but so ...
and
distributed database A distributed database is a database in which data is stored across different physical locations. It may be stored in multiple computers located in the same physical location (e.g. a data centre); or maybe dispersed over a network of interconnect ...
s. Other examples of real-world systems that use Rendezvous Hashing include the Github load balancer, the Apache Ignite distributed database, the Tahoe-LAFS file store, the CoBlitz large-file distribution service, Apache Druid, IBM's Cloud Object Store, the Arvados Data Management System, Apache Kafka, and by the Twitter EventBus pub/sub platform. One of the first applications of rendezvous hashing was to enable
multicast In computer networking, multicast is group communication where data transmission is addressed to a group of destination computers simultaneously. Multicast can be one-to-many or many-to-many distribution. Multicast should not be confused with ...
clients on the Internet (in contexts such as the
MBONE Mbone (short for " multicast backbone") was an experimental backbone and virtual network built on top of the Internet for carrying IP multicast traffic on the Internet. It was developed in the early 1990s and required specialized hardware and s ...
) to identify multicast rendezvous points in a distributed fashion. It was used in 1998 by Microsoft's
Cache Array Routing Protocol The Cache Array Routing Protocol (CARP) is used in load-balancing HTTP requests across multiple proxy cache servers. It works by generating a hash for each URL requested. A different hash is generated for each URL and by splitting the hash namespa ...
(CARP) for distributed cache coordination and routing. Some
Protocol Independent Multicast 400px, Example of a multicast network architecture Protocol-Independent Multicast (PIM) is a family of multicast routing protocols for Internet Protocol (IP) networks that provide one-to-many and many-to-many distribution of data over a LAN, ...
routing protocols use rendezvous hashing to pick a rendezvous point.


Problem Definition and Approach


Algorithm

Rendezvous hashing solves a general version of the
distributed hash table A distributed hash table (DHT) is a distributed system that provides a lookup service similar to a hash table: key–value pairs are stored in a DHT, and any participating node can efficiently retrieve the value associated with a given key. The m ...
problem: We are given a set of n sites (servers or proxies, say). How can any set of clients, given an object O, agree on a ''k-''subset of sites to assign to O? The standard version of the problem uses ''k = 1.'' Each client is to make its selection independently, but all clients must end up picking the same subset of sites. This is non-trivial if we add a ''minimal disruption'' constraint, and require that when a site fails or is removed, only objects mapping to that site need be reassigned to other sites. The basic idea is to give each site S_j a score (a ''weight'') for each object O_i, and assign the object to the highest scoring site. All clients first agree on a hash function h(\cdot). For object O_i, the site S_j is defined to have weight w_ = h(O_i, S_j). Each client independently computes these weights w_, w_ \dots w_ and picks the ''k'' sites that yield the ''k'' largest hash values. The clients have thereby achieved distributed k-agreement. If a site S is added or removed, only the objects mapping to S are remapped to different sites, satisfying the minimal disruption constraint above. The HRW assignment can be computed independently by any client, since it depends only on the identifiers for the set of sites S_1, S_2 \dots S_n and the object being assigned. HRW easily accommodates different capacities among sites. If site S_k has twice the capacity of the other sites, we simply represent S_k twice in the list, say, as S_, S_. Clearly, twice as many objects will now map to S_k as to the other sites.


Properties

Consider the simple version of the problem, with ''k = 1,'' where all clients are to agree on a single site for an object ''O.'' Approaching the problem naively, it might appear sufficient to treat the ''n'' sites as buckets in a
hash table In computing, a hash table, also known as hash map, is a data structure that implements an associative array or dictionary. It is an abstract data type that maps keys to values. A hash table uses a hash function to compute an ''index'', als ...
and hash the object name ''O'' into this table. Unfortunately, if any of the sites fails or is unreachable, the hash table size changes, forcing all objects to be remapped. This massive disruption makes such direct hashing unworkable. Under rendezvous hashing, however, clients handle site failures by picking the site that yields the next largest weight. Remapping is required only for objects currently mapped to the failed site, and disruption is minimal. Rendezvous hashing has the following properties: # Low overhead: The hash function used is efficient, so overhead at the clients is very low. # Load balancing: Since the hash function is randomizing, each of the ''n ''sites is equally likely to receive the object ''O''. Loads are uniform across the sites. ## Site capacity: Sites with different capacities can be represented in the site list with multiplicity in proportion to capacity. A site with twice the capacity of the other sites will be represented twice in the list, while every other site is represented once. # High
hit rate Hit rate is a metric or measure of business performance traditionally associated with sales. It is defined as the number of sales of a product divided by the number of customers who go online, planned call, or visit a company to find out about the ...
: Since all clients agree on placing an object ''O'' into the same site ''SO'' , each fetch or placement of ''O'' into ''SO'' yields the maximum utility in terms of hit rate. The object ''O'' will always be found unless it is evicted by some replacement algorithm at ''SO''. # Minimal disruption: When a site fails, only the objects mapped to that site need to be remapped. Disruption is at the minimal possible level, as proved in. # Distributed ''k''-agreement: Clients can reach distributed agreement on ''k'' sites simply by selecting the top ''k'' sites in the ordering.


Comparison with Consistent Hashing

Because of its simplicity, lower overhead, and generality (it works for any ''k < n''), rendezvous hashing is increasingly being preferred over consistent hashing. Recent examples of its use include the Github load balancer, the Apache Ignite distributed database, and by the Twitter EventBus pub/sub platform. Consistent hashing operates by mapping sites uniformly and randomly to points on a unit circle called tokens. Objects are also mapped to the unit circle and placed in the site whose token is the first encountered traveling clockwise from the object's location. When a site is removed, the objects it owns are transferred to the site owning the next token encountered moving clockwise. Provided each site is mapped to a large number (100–200, say) of tokens this will reassign objects in a relatively uniform fashion among the remaining sites. If sites are mapped to points on the circle randomly by hashing 200 variants of the site ID, say, the assignment of any object requires storing or recalculating 200 hash values for each site. However, the tokens associated with a given site can be precomputed and stored in a sorted list, requiring only a single application of the hash function to the object, and a binary search to compute the assignment. Even with many tokens per site, however, the basic version of consistent hashing may not balance objects uniformly over sites, since when a site is removed each object assigned to it is distributed only over as many other sites as the site has tokens (say 100–200). Variants of consistent hashing (such as Amazon's
Dynamo file:DynamoElectricMachinesEndViewPartlySection USP284110.png, "Dynamo Electric Machine" (end view, partly section, ) A dynamo is an electrical generator that creates direct current using a commutator (electric), commutator. Dynamos were the f ...
) that use more complex logic to distribute tokens on the unit circle offer better load balancing than basic consistent hashing, reduce the overhead of adding new sites, and reduce metadata overhead and offer other benefits.


Advantages of Rendezvous Hashing Over Consistent Hashing

Rendezvous hashing (HRW) is much simpler conceptually and in practice. It also distributes objects uniformly over all sites, given a uniform hash function. Unlike consistent hashing, HRW requires no precomputing or storage of tokens. Consider ''k =1.'' An object O_i is placed into one of n sites S_1, S_2 \dots S_n by computing the n hash values h(O_i, S_j) and picking the site S_k that yields the highest hash value. If a new site S_ is added, new object placements or requests will compute n+1 hash values, and pick the largest of these. If an object already in the system at S_k maps to this new site S_, it will be fetched afresh and cached at S_. All clients will henceforth obtain it from this site, and the old cached copy at S_k will ultimately be replaced by the local cache management algorithm. If S_k is taken offline, its objects will be remapped uniformly to the remaining n-1 sites. Variants of the HRW algorithm, such as the use of a skeleton (see below), can reduce the O(n) time for object location to O(\log n), at the cost of less global uniformity of placement. When n is not too large, however, the O(n) placement cost of basic HRW is not likely to be a problem. HRW completely avoids all the overhead and complexity associated with correctly handling multiple tokens for each site and associated metadata. Rendezvous hashing also has the great advantage that it provides simple solutions to other important problems, such as distributed k-agreement.


Consistent hashing is a special case of Rendezvous hashing

Rendezvous hashing is both simpler and more general than consistent hashing. Consistent hashing can be shown to be a special case of HRW by an appropriate choice of a two-place hash function. From the site identifier S the simplest version of consistent hashing computes a list of token positions, e.g., t_s = h_c(S) where h_c hashes values to locations on the unit circle. Define the two place hash function h(S,O) to be \frac where h_c(O) - t_s denotes the distance along the unit circle from h_c(O) to t_s (since h_c(O) - t_s has some minimal non-zero value there is no problem translating this value to a unique integer in some bounded range). This will duplicate exactly the assignment produced by consistent hashing. It is not possible, however, to
reduce Reduction, reduced, or reduce may refer to: Science and technology Chemistry * Reduction (chemistry), part of a reduction-oxidation (redox) reaction in which atoms have their oxidation state changed. ** Organic redox reaction, a redox react ...
HRW to consistent hashing (assuming the number of tokens per site is bounded), since HRW potentially reassigns the objects from a removed site to an unbounded number of other sites.


Weighted variations

In the standard implementation of rendezvous hashing, every node receives a statically equal proportion of the keys. This behavior, however, is undesirable when the nodes have different capacities for processing or holding their assigned keys. For example, if one of the nodes had twice the storage capacity as the others, it would be beneficial if the algorithm could take this into account such that this more powerful node would receive twice the number of keys as each of the others. A straightforward mechanism to handle this case is to assign two virtual locations to this node, so that if either of that larger node's virtual locations has the highest hash, that node receives the key. But this strategy does not work when the relative weights are not integer multiples. For example, if one node had 42% more storage capacity, it would require adding many virtual nodes in different proportions, leading to greatly reduced performance. Several modifications to rendezvous hashing have been proposed to overcome this limitation.


Cache Array Routing Protocol

The
Cache Array Routing Protocol The Cache Array Routing Protocol (CARP) is used in load-balancing HTTP requests across multiple proxy cache servers. It works by generating a hash for each URL requested. A different hash is generated for each URL and by splitting the hash namespa ...
(CARP) is a 1998 IETF draft that describes a method for computing ''load factors'' which can be multiplied by each node's hash score to yield an arbitrary level of precision for weighting nodes differently. However, one disadvantage of this approach is that when any node's weight is changed, or when any node is added or removed, all the load factors must be re-computed and relatively scaled. When the load factors change relative to one another, it triggers movement of keys between nodes whose weight was not changed, but whose load factor did change relative to other nodes in the system. This results in excess movement of keys.


Controlled replication

Controlled replication under scalable hashing or CRUSH is an extension to RUSH that improves upon rendezvous hashing by constructing a tree where a pseudo-random function (hash) is used to navigate down the tree to find which node is ultimately responsible for a given key. It permits perfect stability for adding nodes however it is not perfectly stable when removing or re-weighting nodes, with the excess movement of keys being proportional to the height of the tree. The CRUSH algorithm is used by the ceph data storage system to map data objects to the nodes responsible for storing them.


Skeleton-based variant

When n is extremely large, a skeleton-based variant can improve running time. This approach creates a virtual hierarchical structure (called a "skeleton"), and achieves O(\log n) running time by applying HRW at each level while descending the hierarchy. The idea is to first choose some constant m and organize the n sites into c = \lceil n / m \rceil clusters C_1 = \left\, C_2 = \left\ \dots Next, build a virtual hierarchy by choosing a constant f and imagining these c clusters placed at the leaves of a tree T of virtual nodes, each with fanout f. In the accompanying diagram, the cluster size is m = 4, and the skeleton fanout is f = 3. Assuming 108 sites (real nodes) for convenience, we get a three-tier virtual hierarchy. Since f = 3, each virtual node has a natural numbering in octal. Thus, the 27 virtual nodes at the lowest tier would be numbered 000, 001, 002, ..., 221, 222 in octal (we can, of course, vary the fanout at each level - in that case, each node will be identified with the corresponding mixed-radix number). Instead of applying HRW to all 108 real nodes, we can first apply HRW to the 27 lowest-tier virtual nodes, selecting one. We then apply HRW to the four real nodes in its cluster, and choose the winning site. We only need 27 + 4 = 31 hashes, rather than 108. If we apply this method starting one level higher in the hierarchy, we would need 9 + 3 + 4 = 16 hashes to get to the winning site. The figure shows how, if we proceed starting from the root of the skeleton, we may successively choose the virtual nodes (2)_3, (20)_3, and (200)_3 , and finally end up with site 74. We can start at any level in the virtual hierarchy, not just at the root. Starting lower in the hierarchy requires more hashes, but may improve load distribution in the case of failures. Also, the virtual hierarchy need not be stored, but can be created on demand, since the virtual nodes names are simply prefixes of base-f (or mixed-radix) representations. We can easily create appropriately sorted strings from the digits, as required. In the example, we would be working with the strings 0, 1, 2 (at tier 1), 20, 21, 22 (at tier 2), and 200, 201, 202 (at tier 3). Clearly, T has height h = O(\log c) = O(\log n), since m and f are both constants. The work done at each level is O(1), since f is a constant. For any given object, it is clear that the method chooses each cluster, and hence each of the n sites, with equal probability. If the site finally selected is unavailable, we can select a different site within the same cluster, in the usual manner. Alternatively, we could go up one or more tiers in the skeleton and select an alternate from among the sibling virtual nodes at that tier, and once again descend the hierarchy to the real nodes, as above. The value of m can be chosen based on factors like the anticipated failure rate and the degree of desired load balancing. A higher value of m leads to less load skew in the event of failure at the cost of higher search overhead. The choice m = m is equivalent to non-hierarchical rendezvous hashing. In practice, the hash function h(\cdot) is very cheap, so m = n can work quite well unless n is very high.


Other variants

In 2005, Christian Schindelhauer and Gunnar Schomaker described a logarithmic method for re-weighting hash scores in a way that does not require relative scaling of load factors when a node's weight changes or when nodes are added or removed. This enabled the dual benefits of perfect precision when weighting nodes, along with perfect stability, as only a minimum number of keys needed to be remapped to new nodes. A similar logarithm-based hashing strategy is used to assign data to storage nodes in
Cleversafe IBM Cloud Object Storage is a service offered by IBM for storing and accessing unstructured data. The object storage service can be deployed on-premise, as part of IBM Cloud Platform offerings, or in hybrid form. The offering can store any ty ...
's data storage system, now
IBM Cloud Object Storage IBM Cloud Object Storage is a service offered by IBM for storing and accessing unstructured data. The object storage service can be deployed on-premise, as part of IBM Cloud Platform offerings, or in hybrid form. The offering can store any typ ...
.


Systems Using Rendezvous Hashing

Rendezvous Hashing is being used widely in real-world systems. A partial list includes Oracle's Database in-memory, the Github load balancer, the Apache Ignite distributed database, the Tahoe-LAFS file store, the CoBlitz large-file distribution service, Apache Druid, IBM's Cloud Object Store, the Arvados Data Management System, Apache Kafka, and by the Twitter EventBus pub/sub platform.


Implementation

Implementation is straightforward once a
hash function A hash function is any function that can be used to map data of arbitrary size to fixed-size values. The values returned by a hash function are called ''hash values'', ''hash codes'', ''digests'', or simply ''hashes''. The values are usually u ...
h(\cdot) is chosen (the original work on the HRW method makes a hash function recommendation). Each client only needs to compute a hash value for each of the n sites, and then pick the largest. This algorithm runs in O(n) time. If the hash function is efficient, the O(n) running time is not a problem unless n is very large.


Weighted rendezvous hash

Python code implementing a weighted rendezvous hash: import mmh3 import math from dataclasses import dataclass from typing import List def hash_to_unit_interval(s: str) -> float: """Hashes a string onto the unit interval (0, 1]""" return (mmh3.hash128(s) + 1) / 2**128 @dataclass class Node: """Class representing a node that is assigned keys as part of a weighted rendezvous hash.""" name: str weight: float def compute_weighted_score(self, key: str): score = hash_to_unit_interval(f": ") log_score = 1.0 / -math.log(score) return self.weight * log_score def determine_responsible_node(nodes: list
ode An ode (from grc, ᾠδή, ōdḗ) is a type of lyric poetry. Odes are elaborately structured poems praising or glorifying an event or individual, describing nature intellectually as well as emotionally. A classic ode is structured in three majo ...
key: str): """Determines which node of a set of nodes of various weights is responsible for the provided key.""" return max( nodes, key=lambda node: node.compute_weighted_score(key), default=None)
Example outputs of WRH: >>> import wrh >>> node1 = wrh.Node("node1", 100) >>> node2 = wrh.Node("node2", 200) >>> node3 = wrh.Node("node3", 300) >>> str(wrh.determine_responsible_node( ode1, node2, node3 "foo")) "Node(name='node1', weight=100)" >>> str(wrh.determine_responsible_node( ode1, node2, node3 "bar")) "Node(name='node2', weight=300)" >>> str(wrh.determine_responsible_node( ode1, node2, node3 "hello")) "Node(name='node2', weight=300)" >>> nodes = ode1, node2, node3>>> from collections import Counter >>> responsible_nodes = rh.determine_responsible_node( ... nodes, f"key: ").name for key in range(45_000)>>> print(Counter(responsible_nodes)) Counter()


References

{{Reflist


External links


Rendezvous Hashing: an alternative to Consistent Hashing
Algorithms Hashing