Venti
   HOME

TheInfoList



OR:

Venti is a network storage system that permanently stores data blocks. A 160-bit
SHA-1 In cryptography, SHA-1 (Secure Hash Algorithm 1) is a cryptographically broken but still widely used hash function which takes an input and produces a 160- bit (20- byte) hash value known as a message digest – typically rendered as 40 hexa ...
hash of the data (called ''score'' by Venti) acts as the address of the data. This enforces a ''write-once'' policy since no other data block can be found with the same address: the addresses of multiple writes of the same data are identical, so duplicate data is easily identified and the data block is stored only once. Data blocks cannot be removed, making it ideal for permanent or backup storage. Venti is typically used with
Fossil A fossil (from Classical Latin , ) is any preserved remains, impression, or trace of any once-living thing from a past geological age. Examples include bones, shells, exoskeletons, stone imprints of animals or microbes, objects preserved ...
to provide a file system with permanent snapshots.


History

Venti was designed and implemented by Sean Quinlan and
Sean Dorward Sean, also spelled Seán or Séan in Irish English, is a male given name of Irish origin. It comes from the Irish versions of the Biblical Hebrew name ''Yohanan'' (), Seán (anglicized as ''Shaun/ Shawn/ Shon'') and Séan (Ulster variant; anglic ...
at
Bell Labs Nokia Bell Labs, originally named Bell Telephone Laboratories (1925–1984), then AT&T Bell Laboratories (1984–1996) and Bell Labs Innovations (1996–2007), is an American industrial research and scientific development company owned by mul ...
. It appeared in the Plan 9 distribution in 2002. Development has been continued by
Russ Cox Russ is a masculine given name, often a short form of Russell, and also a surname. People Given name or nickname * Russ Abbot (born 1947), British musician, comedian and actor * Russ Adams (born 1980), American retired baseball player * Russ Ba ...
who has reimplemented most of the server, written a library for creating datastructures (files, directories and meta-data) to store in Venti and implemented optimizations. Venti is available both in the Plan 9 distribution and for many UNIX-like operating systems as part of
Plan 9 from User Space Plan 9 from User Space (also plan9port or p9p) is a port of many Plan 9 from Bell Labs libraries and applications to Unix-like operating systems. Currently it has been tested on a variety of operating systems including: Linux, macOS, FreeBSD, Net ...
. Venti is included as part of Inferno with accompanying modules for access. There is a Go set of programs to build your own Venti servers. Included are examples using different kinds of backend storage.


Details

Venti is a
user space A modern computer operating system usually segregates virtual memory into user space and kernel space. Primarily, this separation serves to provide memory protection and hardware protection from malicious or errant software behaviour. Kernel ...
daemon.Lukkien, Mechiel. Venti Analysis and Memventi Implementation. Thesis. University of Twente, 2007. N.p.: n.p., n.d. University of Twente Theses Repository. Web. 13 Oct. 2014. . Clients connect to Venti over TCP and communicate using a simple RPC-protocol. The most important messages of the protocol are listed below. Note that there is no message to delete an address or modify data at a given address. * ''read(score, type)'', returns the data identified by ''score'' and ''type'' * ''write(data, type)'', stores ''data'' at the address calculated by SHA-1 hashing ''data'', combined with ''type''. The data block stored by Venti must be greater than 512 bytes in length and smaller than 56 kilobytes. So, if a Venti user/client wants to store larger data blocks, it has to make a datastructure (which can be stored in Venti). For example,
Fossil A fossil (from Classical Latin , ) is any preserved remains, impression, or trace of any once-living thing from a past geological age. Examples include bones, shells, exoskeletons, stone imprints of animals or microbes, objects preserved ...
uses
hash tree Hash, hashes, hash mark, or hashing may refer to: Substances * Hash (food), a coarse mixture of ingredients * Hash, a nickname for hashish, a cannabis product Hash mark *Hash mark (sports), a marking on hockey rinks and gridiron football fiel ...
s to store large files. Venti itself is not concerned with the contents of a data block; it does however store the ''type'' of a data block. The design of Venti has the following consequences: * Since writes are permanent, the file system is append-only (which allows for a simple implementation with lower chance of data-destroying bugs); no file system fragmentation occurs. * Clients can verify the correctness of the server: the score of the returned data should be the same as the address requested. Since SHA-1 is a cryptographically secure hash, it is computationally infeasible to fabricate data. * Data cannot be overwritten. If an ''address'' is already present, the ''data'' is already present. * There is little need for user authentication: Data cannot be deleted, and can be read only if the score is known. The only potential problem is a user filling up the disks. * Data can be compressed without making the disk structure complicated. The data blocks are stored on
hard drive A hard disk drive (HDD), hard disk, hard drive, or fixed disk is an electro-mechanical data storage device that stores and retrieves digital data using magnetic storage with one or more rigid rapidly rotating platters coated with mag ...
s. The disks making up the available storage, typically a
RAID Raid, RAID or Raids may refer to: Attack * Raid (military), a sudden attack behind the enemy's lines without the intention of holding ground * Corporate raid, a type of hostile takeover in business * Panty raid, a prankish raid by male college ...
, is called the ''data log''. This data log is split up in smaller pieces called ''arenas'', which are sized so they can be written to other media such as CD/ DVD or
magnetic tape Magnetic tape is a medium for magnetic storage made of a thin, magnetizable coating on a long, narrow strip of plastic film. It was developed in Germany in 1928, based on the earlier magnetic wire recording from Denmark. Devices that use magnet ...
. Another set of hard drives is used for the index, which maps scores to addresses in the data log. The data structure used for the index is a
hash table In computing, a hash table, also known as hash map, is a data structure that implements an associative array or dictionary. It is an abstract data type that maps keys to values. A hash table uses a hash function to compute an ''index'', ...
with fixed-sized buckets. Venti relies on the scores to be randomly distributed so buckets do not fill up. Since each lookup costs one disk
seek time Higher performance in hard disk drives comes from devices which have better performance characteristics. These performance characteristics can be grouped into two categories: access time and data transfer time (or rate). Access time The ''acces ...
, an index usually consists of multiple hard drives with low
access time Access time is the time delay or latency between a request to an electronic system, and the access being completed or the requested data returned * In a computer, it is the time interval between the instant at which an instruction control uni ...
.


Usage

The Venti server may be used by clients in several ways. The Plan 9 operating system makes use of Venti for daily archival snapshots of the file system. These copies of the main filesystem can be mounted as a filetree of full copies organized by date. The utility programs 'vac' and 'unvac' can be used to store and retrieve data from a Venti server in the form of individual files or as a directory and its contents. 'Vacfs' allows browsing of the data associated with a vac score without full retrieval of all remotely stored data. Data and index scores can be duplicated between Venti servers using 'rdarena' and 'wrarena'.
Plan 9 from Bell Labs Plan 9 from Bell Labs is a distributed operating system which originated from the Computing Science Research Center (CSRC) at Bell Labs in the mid-1980s and built on UNIX concepts first developed there in the late 1960s. Since 2000, Plan 9 has be ...
,
Plan 9 from User Space Plan 9 from User Space (also plan9port or p9p) is a port of many Plan 9 from Bell Labs libraries and applications to Unix-like operating systems. Currently it has been tested on a variety of operating systems including: Linux, macOS, FreeBSD, Net ...
, Inferno and any other clients that implement the Venti protocol can all be used interchangeably to store and retrieve data.


Hash collisions

A basic principle of
information theory Information theory is the scientific study of the quantification, storage, and communication of information. The field was originally established by the works of Harry Nyquist and Ralph Hartley, in the 1920s, and Claude Shannon in the 1940s. ...
is the
pigeonhole principle In mathematics, the pigeonhole principle states that if items are put into containers, with , then at least one container must contain more than one item. For example, if one has three gloves (and none is ambidextrous/reversible), then there mu ...
, which states that if set A contains more values than set B, then for any function that maps A to B there will be members of B that are associated with more than one member of set A. In the case of Venti, the set of possible SHA-1 hashes is obviously smaller than the set of all possible blocks that could be stored in the filesystem, and thus a
hash collision In computer science, a hash collision or hash clash is when two pieces of data in a hash table share the same hash value. The hash value in this case is derived from a hash function which takes a data input and returns a fixed length of bits. ...
is possible. The risk of accidental hash collision in a 160-bit hash is very small, even for exabytes of data. Historically, however, many hash functions become increasingly vulnerable to malicious hash collisions due to both cryptographic and computational advances."HASH COLLISION Q&A." Cryptography Research. Rambus, n.d. Web. 12 Jan. 2010. . Venti does not address the issue of hash collisions; as of this time, it is still computationally infeasible to find collisions in SHA-1, but it may become necessary for Venti to switch to a different hash function at some point in the future. On 23 February 2017, Google announced the SHAttered attack, in which they generated two different PDF files with the same SHA-1 hash in roughly 263.1 SHA-1 evaluations.


See also

*
Fossil A fossil (from Classical Latin , ) is any preserved remains, impression, or trace of any once-living thing from a past geological age. Examples include bones, shells, exoskeletons, stone imprints of animals or microbes, objects preserved ...
- snapshot file system that uses Venti for permanent storage *
Plan 9 from User Space Plan 9 from User Space (also plan9port or p9p) is a port of many Plan 9 from Bell Labs libraries and applications to Unix-like operating systems. Currently it has been tested on a variety of operating systems including: Linux, macOS, FreeBSD, Net ...


References

{{reflist


External links


Venti: a new approach to archival storage
paper describing Venti.
New Venti manual page (overview)
section 7 venti manual page including general description and storage format.
New Venti manual page (server)
section 8 venti server manual page.
New Venti manual page (tools)
section 1 venti utilities manual page.
Go code for implementing clients and servers


kindly brought to life thanks to the Google Summer of Code. Plan 9 from Bell Labs