Hash List
   HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
, a hash list is typically a
list A list is a Set (mathematics), set of discrete items of information collected and set forth in some format for utility, entertainment, or other purposes. A list may be memorialized in any number of ways, including existing only in the mind of t ...
of hashes of the data blocks in a file or set of files. Lists of hashes are used for many different purposes, such as fast table lookup (
hash table In computer science, a hash table is a data structure that implements an associative array, also called a dictionary or simply map; an associative array is an abstract data type that maps Unique key, keys to Value (computer science), values. ...
s) and distributed databases (
distributed hash table A distributed hash table (DHT) is a Distributed computing, distributed system that provides a lookup service similar to a hash table. Key–value pairs are stored in a DHT, and any participating node (networking), node can efficiently retrieve the ...
s). A hash list is an extension of the concept of hashing an item (for instance, a file). A hash list is a
subtree In computer science, a tree is a widely used abstract data type that represents a hierarchical tree structure with a set of connected Node (computer science), nodes. Each node in the tree can be connected to many children (depending on the type of ...
of a Merkle tree.


Root hash

Often, an additional hash of the hash list itself (a ''top hash'', also called ''root hash'' or ''master hash'') is used. Before downloading a file on a
p2p network Peer-to-peer (P2P) computing or networking is a distributed application architecture that partitions tasks or workloads between peers. Peers are equally privileged, equipotent participants in the network, forming a peer-to-peer network of Node ...
, in most cases the top hash is acquired from a trusted source, for instance a friend or a web site that is known to have good recommendations of files to download. When the top hash is available, the hash list can be received from any non-trusted source, like any peer in the p2p network. Then the received hash list is checked against the trusted top hash, and if the hash list is damaged or fake, another hash list from another source will be tried until the program finds one that matches the top hash. In some systems (for example,
BitTorrent BitTorrent is a Protocol (computing), communication protocol for peer-to-peer file sharing (P2P), which enables users to distribute data and electronic files over the Internet in a Decentralised system, decentralized manner. The protocol is d ...
), instead of a top hash the whole hash list is available on a web site in a small file. Such a "
torrent file In the BitTorrent file distribution system, a torrent file or meta-info file is a computer file that contains metadata about files and folders to be distributed, and usually also a list of the network locations of trackers, which are computers ...
" contains a description, file names, a hash list and some additional data.


Applications

Hash lists can be used to protect any kind of data stored, handled and transferred in and between computers. An important use of hash lists is to make sure that data blocks received from other peers in a
peer-to-peer network Peer-to-peer (P2P) computing or networking is a distributed application Distributed computing is a field of computer science that studies distributed systems, defined as computer systems whose inter-communicating components are located on d ...
are received undamaged and unaltered, and to check that the other peers do not "lie" and send fake blocks. Usually a
cryptographic hash function A cryptographic hash function (CHF) is a hash algorithm (a map (mathematics), map of an arbitrary binary string to a binary string with a fixed size of n bits) that has special properties desirable for a cryptography, cryptographic application: ...
such as
SHA-256 SHA-2 (Secure Hash Algorithm 2) is a set of cryptographic hash functions designed by the United States National Security Agency (NSA) and first published in 2001. They are built using the Merkle–Damgård construction, from a one-way compressi ...
is used for the hashing. If the hash list only needs to protect against unintentional damage unsecured
checksum A checksum is a small-sized block of data derived from another block of digital data for the purpose of detecting errors that may have been introduced during its transmission or storage. By themselves, checksums are often used to verify dat ...
s such as CRCs can be used. Hash lists are better than a simple hash of the entire file since, in the case of a data block being damaged, this is noticed, and only the damaged block needs to be redownloaded. With only a hash of the file, many undamaged blocks would have to be redownloaded, and the file reconstructed and tested until the correct hash of the entire file is obtained. Hash lists also protect against nodes that try to sabotage by sending fake blocks, since in such a case the damaged block can be acquired from some other source. Hash lists are used to identify CSAM online.


Protocols using hash lists

*
Rsync rsync (remote sync) is a utility for transferring and synchronizing files between a computer and a storage drive and across networked computers by comparing the modification times and sizes of files. It is commonly found on Unix-like opera ...
* Zsync *
Bittorrent BitTorrent is a Protocol (computing), communication protocol for peer-to-peer file sharing (P2P), which enables users to distribute data and electronic files over the Internet in a Decentralised system, decentralized manner. The protocol is d ...


See also

* Hash tree *
Hash table In computer science, a hash table is a data structure that implements an associative array, also called a dictionary or simply map; an associative array is an abstract data type that maps Unique key, keys to Value (computer science), values. ...
*
Hash chain A hash chain is the successive application of a cryptographic hash function to a piece of data. In computer security, a hash chain is a method used to produce many one-time keys from a single key or password. For non-repudiation, a hash functi ...
* Ed2k: URI scheme, which uses an MD4 top hash of an MD4 hash list to uniquely identify a file *
Cryptographic hash function A cryptographic hash function (CHF) is a hash algorithm (a map (mathematics), map of an arbitrary binary string to a binary string with a fixed size of n bits) that has special properties desirable for a cryptography, cryptographic application: ...
*
List A list is a Set (mathematics), set of discrete items of information collected and set forth in some format for utility, entertainment, or other purposes. A list may be memorialized in any number of ways, including existing only in the mind of t ...


References

{{Cryptography navbox Error detection and correction Cryptographic hash functions Hashing