Grid file system
   HOME

TheInfoList



OR:

A grid file system is a computer file system whose goal is improved reliability and availability by taking advantage of many smaller file storage areas.


Components

File systems contain up to three components: * File table (FAT table, MFT, etc.) * File data *
Metadata Metadata (or metainformation) is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive ...
(user permissions, etc.) A grid file system would have similar needs: *
File table In Unix and Unix-like computer operating systems, a file descriptor (FD, less frequently fildes) is a process-unique identifier (handle) for a file or other input/output resource, such as a pipe or network socket. File descriptors typically hav ...
(or search index) * File data * Metadata


Comparisons

Because file systems are designed to appear as a single disk for a single computer to manage (entirely), many new challenges arise in a grid scenario whereby any single disk within the grid should be capable of handling requests for any data contained in the grid.


Features

Most file storage utilizes layers of redundancy to achieve a high level of data protection (inability to lose data). Current means of redundancy include replication and parity checks. Such redundancy can be implemented via a
RAID RAID (; redundant array of inexpensive disks or redundant array of independent disks) is a data storage virtualization technology that combines multiple physical Computer data storage, data storage components into one or more logical units for th ...
array (whereby multiple physical disks appear to a local computer as a single disk, which may include data replication, and/or disk partitioning). Similarly, a grid file system would consist of some level of redundancy (either at the logical file level, or at the block level, possibly including some sort of parity check) across the various disks present in the "grid".


Framework

First and foremost, a file table mechanism is necessary. Additionally, the file table must include a mechanism for locating the (target/destination) file within the grid. Secondly, a mechanism for working with file data must exist. This mechanism is responsible for making file data available to requests.


Implementation

With
BitTorrent BitTorrent is a Protocol (computing), communication protocol for peer-to-peer file sharing (P2P), which enables users to distribute data and electronic files over the Internet in a Decentralised system, decentralized manner. The protocol is d ...
technology, a parallel can be drawn to a grid file system, in that a torrent tracker (and search engine) would be the "file table", and the torrent applications (transmitting the files) would be the "file data" component. An
RSS RSS ( RDF Site Summary or Really Simple Syndication) is a web feed that allows users and applications to access updates to websites in a standardized, computer-readable format. Subscribing to RSS feeds can allow a user to keep track of many ...
feed like mechanism could be utilized by file table nodes to indicate when new files are added to the table, to instigate replication and other similar components. A file system may incorporate similar technology (distributed replication, distributed data request/fulfillment). If both such systems (file table, and file data) were capable of being addressed as a single entity (i.e. using virtual nodes in a cluster), then growth into such a system could be easily controlled simply by deciding which uses the grid member would be responsible (file table and file lookups, and/or file data).


Availability

Assuming there exists some method of managing data replication (assigning quotas, etc.) autonomously within the grid, data could be configured for high availability, regardless of loss or outage.


Challenges

The largest problem currently revolves around distributing data updates. Torrents support minimal hierarchy (currently implemented either as
metadata Metadata (or metainformation) is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive ...
in the torrent tracker, or strictly as UI and basic categorization). Updating multiple nodes concurrently (assuming atomic transactions are required) presents latency during updates and additions, usually to the point of not being feasible. Additionally, a grid (network based) file system breaks traditional TCP/IP paradigms in that a file system (generally low level, ring 0 type of operations) require complicated TCP/IP implementations, introducing layers of abstraction and complication to the process of creating such a grid file system.


Examples

Examples of high-available data include: * Network load balancing /
CARP The term carp (: carp) is a generic common name for numerous species of freshwater fish from the family (biology), family Cyprinidae, a very large clade of ray-finned fish mostly native to Eurasia. While carp are prized game fish, quarries and a ...
– splitting incoming requests to multiple computers, usually configured identically or as one whole. * Shared storage clustering / SANs – a single disk (one or more physical disks acting as a single logical disk) is presented to multiple computers which split incoming requests. This is usually used when more computing power is required than disk access. * Data replication / mirroring – multiple computers may attempt to synchronize data (usually point-in-time or snapshot based). Used more often for either Reporting (based on last snapshot) or backup purposes. * Data partitioning – splitting data among multiple computers. In databases, data is often partitioned based on tables (certain tables exist on certain computers, or a table is split among multiple computers at certain "break points")... general files tend to be partitioned either by category (category based folders), or location (geographically separated).


See also

*
Distributed computing Distributed computing is a field of computer science that studies distributed systems, defined as computer systems whose inter-communicating components are located on different networked computers. The components of a distributed system commu ...
*
XtreemFS XtreemFS is an object-based, distributed file system for wide area networks.F. Hupfeld, T. Cortes, B. Kolbeck, E. Focht, M. Hess, J. Malo, J. Marti, J. Stender, E. Cesario"XtreemFS - a case for object-based storage in Grid data management" VLDB W ...
*
Tahoe-LAFS Tahoe-LAFS (Tahoe Least-Authority File Store) is a free and open, secure, decentralized, fault-tolerant, distributed data store and distributed file system. It can be used as an online backup system, or to serve as a file or Web host similar ...
*
List of volunteer computing projects This is a comprehensive list of volunteer computing projects, which are a type of distributed computing where volunteers donate computing time to specific causes. The donated computing power comes from idle CPUs and GPUs in personal computers, vide ...
*
InterPlanetary File System The InterPlanetary File System (IPFS) is a protocol, hypermedia and file sharing peer-to-peer network for sharing data using a distributed hash table to store provider information. By using content addressing, IPFS uniquely identifies each fi ...
* grid-oriented storage *
Grid computing Grid computing is the use of widely distributed computer resources to reach a common goal. A computing grid can be thought of as a distributed system with non-interactive workloads that involve many files. Grid computing is distinguished fro ...
*
GNUnet GNUnet is a software framework for decentralized, peer-to-peer networking and an official GNU package. The framework offers link encryption, peer discovery, resource allocation, communication over many transports (such as TCP, UDP, HTTP, ...
*
GlusterFS Gluster Inc. (formerly known as Z RESEARCH) was a software company that provided an open source platform for scale-out public and private cloud storage. The company was privately funded and headquartered in Sunnyvale, California, with an engine ...
*
distributed hash table A distributed hash table (DHT) is a Distributed computing, distributed system that provides a lookup service similar to a hash table. Key–value pairs are stored in a DHT, and any participating node (networking), node can efficiently retrieve the ...
*
cooperative storage cloud A cooperative storage cloud is a decentralized model of networked online storage where data is stored on multiple computers ( nodes), hosted by the participants cooperating in the cloud. For the cooperative scheme to be viable, the total storage ...
*
clustered file system A clustered file system (CFS) is a file system which is shared by being simultaneously mounted on multiple servers. There are several approaches to clustering, most of which do not employ a clustered file system (only direct attached stora ...
* CephFS


References

{{Computer files Computer file systems