P-Grid
   HOME

TheInfoList



OR:

In
distributed data storage A distributed data store is a computer network where information is stored on more than one node, often in a replicated fashion. It is usually specifically used to refer to either a distributed database where users store information on a ''nu ...
, a P-Grid is a self-organizing structured
peer-to-peer Peer-to-peer (P2P) computing or networking is a distributed application architecture that partitions tasks or workloads between peers. Peers are equally privileged, equipotent participants in the network. They are said to form a peer-to-peer ...
system, which can accommodate arbitrary key distributions (and hence support lexicographic key ordering and range queries), still providing storage load-balancing and efficient search by using randomized routing.


Salient features

* Good storage load-balancing despite arbitrary load-distribution over the key-space. * Range queries can be naturally supported and efficiently processed on P-Grid because P-Grid abstracts a trie-structure, and supports (rather) arbitrary distribution of keys, as observed in realistic scenarios. * A Self-referential directory is realized to provide peer identity persistence over multiple sessions. * A gossip primitive based update mechanism for keeping replicated content up-to-date. * Easy merger of multiple P-Grids, and hence decentralized bootstrapping of the P-Grid network. * Query-adaptive caching is easy to realize on P-Grid to provide query load-balancing where peers have restricted capacity.


Overview

P-Grid abstracts a
trie In computer science, a trie, also called digital tree or prefix tree, is a type of ''k''-ary search tree, a tree data structure used for locating specific keys from within a set. These keys are most often strings, with links between nodes ...
and resolves queries based on prefix matching. The actual topology has no hierarchy. Queries are resolved by matching prefixes. This also determines the choice of routing table entries. Each peer, for each level of the trie, maintains autonomously routing entries chosen randomly from the complementary sub-trees. In fact, multiple entries are maintained for each level at each peer to provide fault-tolerance (as well as potentially for query-load management). For diverse reasons including fault-tolerance and load-balancing, multiple peers are responsible for each leaf node in the P-Grid tree. These are called replicas. The replica peers maintain an independent replica sub-network and uses gossip based communication to keep the replica group up-to-date. The redundancy in both the replication of key-space partitions as well as the routing network together is called structural replication. The figure above shows how a query is resolved by forwarding it based on prefix matching.


Range queries in P-Grid

P-Grid partitions the key-space in a granularity adaptive to the load at that part of the key-space. Consequently, its possible to realize a P-Grid overlay network where each peer has similar storage load even for non-uniform load distributions. This network probably provides as efficient search of keys as traditional
distributed hash table A distributed hash table (DHT) is a distributed system that provides a lookup service similar to a hash table: key–value pairs are stored in a DHT, and any participating node can efficiently retrieve the value associated with a given key. The ...
s (DHTs) do. Note that in contrast to P-Grid, DHTs work efficiently only for uniform load-distributions. Hence we can use a lexicographic order preserving function to generate the keys, and still realize a load-balanced P-Grid network which supports efficient search of exact keys. Moreover, because of the preservation of lexicographic ordering, range queries can be done efficiently and precisely on P-Grid. The trie-structure of P-Grid allows different range query strategies, processed serially or in parallel, trading off message overheads and query resolution latency. Simple vector-based data storage architectural frameworks are also subject to variable query limitations within the P-Grid environment.


References

{{reflist


External links


manfredhauswirth.org
File sharing Peer-to-peer file sharing Distributed data storage