Merkel Tree
   HOME

TheInfoList



OR:

In
cryptography Cryptography, or cryptology (from grc, , translit=kryptós "hidden, secret"; and ''graphein'', "to write", or ''-logia'', "study", respectively), is the practice and study of techniques for secure communication in the presence of adver ...
and
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, a hash tree or Merkle tree is a
tree In botany, a tree is a perennial plant with an elongated stem, or trunk, usually supporting branches and leaves. In some usages, the definition of a tree may be narrower, including only woody plants with secondary growth, plants that are ...
in which every "leaf" (
node In general, a node is a localized swelling (a "knot") or a point of intersection (a vertex). Node may refer to: In mathematics *Vertex (graph theory), a vertex in a mathematical graph *Vertex (geometry), a point where two or more curves, lines, ...
) is labelled with the
cryptographic hash A cryptographic hash function (CHF) is a hash algorithm (a map of an arbitrary binary string to a binary string with fixed size of n bits) that has special properties desirable for cryptography: * the probability of a particular n-bit output re ...
of a data block, and every node that is not a leaf (called a ''branch'', ''inner node'', or ''inode'') is labelled with the cryptographic hash of the labels of its child nodes. A hash tree allows efficient and secure verification of the contents of a large
data structure In computer science, a data structure is a data organization, management, and storage format that is usually chosen for efficient access to data. More precisely, a data structure is a collection of data values, the relationships among them, a ...
. A hash tree is a generalization of a
hash list In computer science, a hash list is typically a list of hashes of the data blocks in a file or set of files. Lists of hashes are used for many different purposes, such as fast table lookup (hash tables) and distributed databases (distributed has ...
and a
hash chain A hash chain is the successive application of a cryptographic hash function to a piece of data. In computer security, a hash chain is a method to produce many one-time keys from a single key or password. For non-repudiation a hash function can be ...
. Demonstrating that a leaf node is a part of a given binary hash tree requires computing a number of hashes proportional to the
logarithm In mathematics, the logarithm is the inverse function to exponentiation. That means the logarithm of a number  to the base  is the exponent to which must be raised, to produce . For example, since , the ''logarithm base'' 10 o ...
of the number of leaf nodes in the tree. Conversely, in a hash list, the number is proportional to the number of leaf nodes itself. A Merkle tree is therefore an efficient example of a cryptographic commitment scheme, in which the root of the tree is seen as a commitment and leaf nodes may be revealed and proven to be part of the original commitment. The concept of a hash tree is named after
Ralph Merkle Ralph C. Merkle (born February 2, 1952) is a computer scientist and mathematician. He is one of the inventors of public-key cryptography, the inventor of cryptographic hashing, and more recently a researcher and speaker on cryonics. Contribution ...
, who patented it in 1979.


Uses

Hash trees can be used to verify any kind of data stored, handled and transferred in and between computers. They can help ensure that data blocks received from other peers in a
peer-to-peer network Peer-to-peer (P2P) computing or networking is a distributed application architecture that partitions tasks or workloads between peers. Peers are equally privileged, equipotent participants in the network. They are said to form a peer-to-peer n ...
are received undamaged and unaltered, and even to check that the other peers do not lie and send fake blocks. Hash trees are used in
hash-based cryptography Hash-based cryptography is the generic term for constructions of cryptographic primitives based on the security of hash functions. It is of interest as a type of post-quantum cryptography. So far, hash-based cryptography is used to construct digit ...
. Hash trees are also used in the
InterPlanetary File System The InterPlanetary File System (IPFS) is a protocol, hypermedia and file sharing peer-to-peer network for storing and sharing data in a distributed file system. IPFS uses content-addressing to uniquely identify each file in a global namespace ...
(IPFS),
Btrfs Btrfs (pronounced as "better F S", "butter F S", "b-tree F S", or simply by spelling it out) is a computer storage format that combines a file system based on the copy-on-write (COW) principle with a logical volume manager (not to be confused ...
and
ZFS ZFS (previously: Zettabyte File System) is a file system with volume management capabilities. It began as part of the Sun Microsystems Solaris operating system in 2001. Large parts of Solaris – including ZFS – were published under an ope ...
file systems (to counter
data degradation Data degradation is the gradual corruption of computer data due to an accumulation of non-critical failures in a data storage device. The phenomenon is also known as data decay, data rot or bit rot. Example Below are several digital images ill ...
); Dat protocol;
Apache Wave Google Wave, later known as Apache Wave, was a software framework for real-time collaborative editing online. Originally developed by Google and announced on May 28, 2009, it was renamed to ''Apache Wave'' when the project was adopted by the Ap ...
protocol;
Git Git () is a distributed version control system: tracking changes in any set of files, usually used for coordinating work among programmers collaboratively developing source code during software development. Its goals include speed, data in ...
and
Mercurial Mercurial is a distributed revision control tool for software developers. It is supported on Microsoft Windows and Unix-like systems, such as FreeBSD, macOS, and Linux. Mercurial's major design goals include high performance and scalability, d ...
distributed revision control systems; the
Tahoe-LAFS Tahoe-LAFS (Tahoe Least-Authority File Store) is a free and open, secure, decentralized, fault-tolerant, distributed data store and distributed file system. It can be used as an online backup system, or to serve as a file or Web host similar to ...
backup system;
Zeronet ZeroNet is a decentralized web-like network of peer-to-peer users, created by Tamas Kocsis in 2015, programming for the network was based in Budapest, Hungary; is built in Python; and is fully open source. Instead of having an IP address, sites ...
; the
Bitcoin Bitcoin ( abbreviation: BTC; sign: ₿) is a decentralized digital currency that can be transferred on the peer-to-peer bitcoin network. Bitcoin transactions are verified by network nodes through cryptography and recorded in a public distr ...
and
Ethereum Ethereum is a decentralized, open-source blockchain with smart contract functionality. Ether (Abbreviation: ETH; sign: Ξ) is the native cryptocurrency of the platform. Among cryptocurrencies, ether is second only to bitcoin in market capita ...
peer-to-peer networks; the
Certificate Transparency Certificate Transparency (CT) is an Internet security standard for monitoring and auditing the issuance of digital certificates. The standard creates a system of public logs that seek to eventually record all certificates issued by publicly trus ...
framework; the
Nix package manager Nix is a cross-platform package manager that utilizes a purely functional deployment model where software is installed into unique directories generated through cryptographic hashes. It is also the name of the tool's programming language. A pac ...
and descendants like
GNU Guix GNU Guix () is a functional cross-platform package manager and a tool to instantiate and manage Unix-like operating systems, based on the Nix package manager. Configuration and package recipes are written in Guile Scheme. GNU Guix is the default ...
; and a number of
NoSQL A NoSQL (originally referring to "non- SQL" or "non-relational") database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Such databases have existed ...
systems such as
Apache Cassandra Cassandra is a free and open-source, distributed, wide-column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandr ...
,
Riak Riak (pronounced "ree-ack" ) is a distributed NoSQL key-value data store based on Amazon's Dynamo paper, including its "tunable AP" approach, that is tunable consistency, to the tradeoffs imposed by the CAP Theorem. Riak offers high availability, ...
, and
Dynamo file:DynamoElectricMachinesEndViewPartlySection USP284110.png, "Dynamo Electric Machine" (end view, partly section, ) A dynamo is an electrical generator that creates direct current using a commutator (electric), commutator. Dynamos were the f ...
. Suggestions have been made to use hash trees in
trusted computing Trusted Computing (TC) is a technology developed and promoted by the Trusted Computing Group. The term is taken from the field of trusted systems and has a specialized meaning that is distinct from the field of Confidential Computing. The core ide ...
systems. The initial Bitcoin implementation of Merkle trees by
Satoshi Nakamoto Satoshi Nakamoto is the name used by the presumed pseudonymous person or persons who developed bitcoin, authored the bitcoin white paper, and created and deployed bitcoin's original reference implementation. As part of the implementation, Nakam ...
applies the compression step of the hash function to an excessive degree, which is mitigated by using Fast Merkle Trees.


Overview

A hash tree is a
tree In botany, a tree is a perennial plant with an elongated stem, or trunk, usually supporting branches and leaves. In some usages, the definition of a tree may be narrower, including only woody plants with secondary growth, plants that are ...
of hashes in which the leaves (i.e., leaf nodes, sometimes also called "leafs") are hashes of data blocks in, for instance, a file or set of files. Nodes farther up in the tree are the hashes of their respective children. For example, in the above picture ''hash 0'' is the result of hashing the
concatenation In formal language, formal language theory and computer programming, string concatenation is the operation of joining character string (computer science), character strings wikt:end-to-end, end-to-end. For example, the concatenation of "sno ...
of ''hash 0-0'' and ''hash 0-1''. That is, ''hash 0'' = ''hash''( ''hash 0-0'' + ''hash 0-1'' ) where "+" denotes concatenation. Most hash tree implementations are binary (two child nodes under each node) but they can just as well use many more child nodes under each node. Usually, a
cryptographic hash function A cryptographic hash function (CHF) is a hash algorithm (a map of an arbitrary binary string to a binary string with fixed size of n bits) that has special properties desirable for cryptography: * the probability of a particular n-bit output re ...
such as
SHA-2 SHA-2 (Secure Hash Algorithm 2) is a set of cryptographic hash functions designed by the United States National Security Agency (NSA) and first published in 2001. They are built using the Merkle–Damgård construction, from a one-way compression ...
is used for the hashing. If the hash tree only needs to protect against unintentional damage, unsecured
checksum A checksum is a small-sized block of data derived from another block of digital data for the purpose of detecting errors that may have been introduced during its transmission or storage. By themselves, checksums are often used to verify data ...
s such as CRCs can be used. In the top of a hash tree there is a ''top hash'' (or ''root hash'' or ''master hash''). Before downloading a file on a
P2P network Peer-to-peer (P2P) computing or networking is a distributed application architecture that partitions tasks or workloads between peers. Peers are equally privileged, equipotent participants in the network. They are said to form a peer-to-peer n ...
, in most cases the top hash is acquired from a trusted source, for instance a friend or a web site that is known to have good recommendations of files to download. When the top hash is available, the hash tree can be received from any non-trusted source, like any peer in the P2P network. Then, the received hash tree is checked against the trusted top hash, and if the hash tree is damaged or fake, another hash tree from another source will be tried until the program finds one that matches the top hash. The main difference from a
hash list In computer science, a hash list is typically a list of hashes of the data blocks in a file or set of files. Lists of hashes are used for many different purposes, such as fast table lookup (hash tables) and distributed databases (distributed has ...
is that one branch of the hash tree can be downloaded at a time and the integrity of each branch can be checked immediately, even though the whole tree is not available yet. For example, in the picture, the integrity of ''data block L2'' can be verified immediately if the tree already contains ''hash 0-0'' and ''hash 1'' by hashing the data block and iteratively combining the result with ''hash 0-0'' and then ''hash 1'' and finally comparing the result with the ''top hash''. Similarly, the integrity of ''data block L3'' can be verified if the tree already has ''hash 1-1'' and ''hash 0''. This can be an advantage since it is efficient to split files up in very small data blocks so that only small blocks have to be re-downloaded if they get damaged. If the hashed file is big, such a hash list or hash chain becomes fairly big. But if it is a tree, one small branch can be downloaded quickly, the integrity of the branch can be checked, and then the downloading of data blocks can start.


Second preimage attack

The Merkle hash root does not indicate the tree depth, enabling a
second-preimage attack In cryptography, a preimage attack on cryptographic hash functions tries to find a message that has a specific hash value. A cryptographic hash function should resist attacks on its preimage (set of possible inputs). In the context of attack, th ...
in which an attacker creates a document other than the original that has the same Merkle hash root. For the example above, an attacker can create a new document containing two data blocks, where the first is ''hash 0-0'' + ''hash 0-1'', and the second is ''hash 1-0'' + ''hash 1-1''. One simple fix is defined in
Certificate Transparency Certificate Transparency (CT) is an Internet security standard for monitoring and auditing the issuance of digital certificates. The standard creates a system of public logs that seek to eventually record all certificates issued by publicly trus ...
: when computing leaf node hashes, a 0x00 byte is prepended to the hash data, while 0x01 is prepended when computing internal node hashes. Limiting the hash tree size is a prerequisite of some formal security proofs, and helps in making some proofs tighter. Some implementations limit the tree depth using hash tree depth prefixes before hashes, so any extracted hash chain is defined to be valid only if the prefix decreases at each step and is still positive when the leaf is reached.


Tiger tree hash

The Tiger tree hash is a widely used form of hash tree. It uses a binary hash tree (two child nodes under each node), usually has a data block size of 1024
byte The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable unit ...
s and uses the Tiger hash. Tiger tree hashes are used in
Gnutella Gnutella is a peer-to-peer network protocol. Founded in 2000, it was the first decentralized peer-to-peer network of its kind, leading to other, later networks adopting the model. In June 2005, Gnutella's population was 1.81 million computer ...
,
Gnutella2 Gnutella2, often referred to as G2, is a peer-to-peer protocol developed mainly by Michael Stokes and released in 2002. While inspired by the gnutella protocol, G2 shares little of its design with the exception of its connection handshake and ...
, and Direct Connect
P2P P2P may refer to: * Pay to play, where money is exchanged for services * Peer-to-peer, a distributed application architecture in computing or networking ** List of P2P protocols * Phenylacetone, an organic compound commonly known as P2P * Poin ...
file sharing protocols and in
file sharing File sharing is the practice of distributing or providing access to digital media, such as computer programs, multimedia (audio, images and video), documents or electronic books. Common methods of storage, transmission and dispersion include r ...
applications such as
Phex Phex is a peer-to-peer file sharing client for the gnutella network, released under the terms of the GNU General Public License, so Phex is free software. Phex is based on Java SE 5.0 or later. Features Phex supports most of the recent feature ...
,
BearShare BearShare was a peer-to-peer-file-sharing-application originally created by Free Peers, Inc. for Microsoft Windows and also a rebranded version of iMesh by MusicLab, LLC, tightly integrated with their music subscription service. History The pr ...
,
LimeWire LimeWire was a free software, free peer-to-peer file sharing client for Microsoft Windows, Windows, MacOS, Linux and Solaris OS, Solaris. Created by Mark Gorton in 2000, it was most prominently a tool used for the download and distribution of O ...
,
Shareaza Shareaza is a peer-to-peer file sharing client running under Microsoft Windows which supports the gnutella, Gnutella2 (G2), eDonkey, BitTorrent, FTP, HTTP and HTTPS network protocols and handles magnet links, ed2k links, and the now deprecat ...
,
DC++ DC++ is a free and open-source, peer-to-peer file-sharing client that can be used for connecting to the Direct Connect network or to the ADC protocol. It is developed primarily by Jacek Sieka, nicknamed arnetheduck. History and background DC+ ...
and
gtk-gnutella gtk-gnutella is a peer-to-peer file sharing application which runs on the gnutella network. gtk-gnutella uses the GTK+ toolkit for its graphical user interface. Released under the GNU General Public License, gtk-gnutella is free software. Histo ...
.


Example

Base32 Base32 is the base-32 numeral system. It uses a set of 32 digits, each of which can be represented by 5 bits (25). One way to represent Base32 numbers in a human-readable way is by using a standard 32-character set, such as the twenty-two upper- ...
: R5T6Y8UYRYO5SUXGER5NMUOEZ5O6E4BHPP2MRFQ
URN An urn is a vase, often with a cover, with a typically narrowed neck above a rounded body and a footed pedestal. Describing a vessel as an "urn", as opposed to a vase or other terms, generally reflects its use rather than any particular shape or ...
: urn:tree:tiger:R5T6Y8UYRYO5SUXGER5NMUOEZ5O6E4BHPP2MRFQ
magnet A magnet is a material or object that produces a magnetic field. This magnetic field is invisible but is responsible for the most notable property of a magnet: a force that pulls on other ferromagnetic materials, such as iron, steel, nickel, ...
: magnet:?xt=urn:tree:tiger:R5T6Y8UYRYO5SUXGER5NMUOEZ5O6E4BHPP2MRFQ


See also

*
Binary tree In computer science, a binary tree is a k-ary k = 2 tree data structure in which each node has at most two children, which are referred to as the ' and the '. A recursive definition using just set theory notions is that a (non-empty) binary t ...
*
Blockchain A blockchain is a type of distributed ledger technology (DLT) that consists of growing lists of records, called ''blocks'', that are securely linked together using cryptography. Each block contains a cryptographic hash of the previous block, a ...
*
Distributed hash table A distributed hash table (DHT) is a distributed system that provides a lookup service similar to a hash table: key–value pairs are stored in a DHT, and any participating node can efficiently retrieve the value associated with a given key. The m ...
*
Hash table In computing, a hash table, also known as hash map, is a data structure that implements an associative array or dictionary. It is an abstract data type that maps keys to values. A hash table uses a hash function to compute an ''index'', als ...
*
Hash trie In computer science, hash trie can refer to: * Hash tree (persistent data structure), a trie used to map hash values to keys * A space-efficient implementation of a sparse trie, in which the descendants of each node may be interleaved in memory. ( ...
*
Linked timestamping Linked timestamping is a type of trusted timestamping where issued time-stamps are related to each other. Description Linked timestamping creates time-stamp tokens which are dependent on each other, entangled in some authenticated data structure. ...
*
Radix tree In computer science, a radix tree (also radix trie or compact prefix tree or compressed trie) is a data structure that represents a space-optimized trie (prefix tree) in which each node that is the only child is merged with its parent. The resul ...


References


Further reading

* explains both the hash tree structure and the use of it to handle many one-time signatures
Tree Hash EXchange format (THEX)
a detailed description of Tiger trees


External links


A C implementation of a dynamically re-sizeable binary SHA-256 hash tree (Merkle tree)

Merkle tree implementation in Java

Tiger Tree Hash (TTH) source code in C#
by Gil Schmidt
Tiger Tree Hash (TTH) implementations in C and Java

RHash
an open source command-line tool, which can calculate TTH and magnet links with TTH {{CS-Trees Error detection and correction Cryptographic hash functions Hashing Trees (data structures)