GlusterFS
   HOME

TheInfoList



OR:

Gluster Inc. (formerly known as Z RESEARCH) was a
software company A software company is a company whose primary products are various forms of software, software technology, distribution, and software product development. They make up the software industry. Types There are a number of different types of softw ...
that provided an
open source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
platform for
scale-out Scalability is the property of a system to handle a growing amount of work by adding resources to the system. In an economic context, a scalable business model implies that a company can increase sales given increased resources. For example, a ...
public and private
cloud storage Cloud storage is a model of computer data storage in which the digital data is stored in logical pools, said to be on "the cloud". The physical storage spans multiple servers (sometimes in multiple locations), and the physical environment is t ...
. The company was privately funded and headquartered in
Sunnyvale, California Sunnyvale () is a city located in the Santa Clara Valley in northwest Santa Clara County in the U.S. state of California. Sunnyvale lies along the historic El Camino Real and Highway 101 and is bordered by portions of San Jose to the nort ...
, with an engineering center in
Bangalore Bangalore (), officially Bengaluru (), is the capital and largest city of the Indian state of Karnataka. It has a population of more than and a metropolitan population of around , making it the third most populous city and fifth most ...
, India. Gluster was funded by Nexus Venture Partners and
Index Ventures Index Ventures is a European venture capital firm with dual headquarters in San Francisco and London, investing in technology-enabled companies with a focus on e-commerce, fintech, mobility, gaming, infrastructure/ AI, and security. Since its f ...
. Gluster was acquired by
Red Hat Red Hat, Inc. is an American software company that provides open source software products to enterprises. Founded in 1993, Red Hat has its corporate headquarters in Raleigh, North Carolina, with other offices worldwide. Red Hat has become ass ...
on October 7, 2011.


History

The name ''Gluster'' comes from the combination of the terms ''
GNU GNU () is an extensive collection of free software (383 packages as of January 2022), which can be used as an operating system or can be used in parts with other operating systems. The use of the completed GNU tools led to the family of operat ...
'' and ''cluster''. Despite the similarity in names, Gluster is not related to the
Lustre Lustre or Luster may refer to: Places * Luster, Norway, a municipality in Vestlandet, Norway ** Luster (village), a village in the municipality of Luster * Lustre, Montana, an unincorporated community in the United States Entertainment * '' ...
file system and does not incorporate any Lustre code. Gluster based its product on ''GlusterFS'', an open-source software-based network-attached
filesystem In computing, file system or filesystem (often abbreviated to fs) is a method and data structure that the operating system uses to control how data is stored and retrieved. Without a file system, data placed in a storage medium would be one larg ...
that deploys on commodity hardware. The initial version of GlusterFS was written by Anand Babu Periasamy, Gluster's founder and CTO. In May 2010 Ben Golub became the president and chief executive officer.
Red Hat Red Hat, Inc. is an American software company that provides open source software products to enterprises. Founded in 1993, Red Hat has its corporate headquarters in Raleigh, North Carolina, with other offices worldwide. Red Hat has become ass ...
became the primary author and maintainer of the GlusterFS
open-source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
project after acquiring the Gluster company in October 2011. The product was first marketed as Red Hat Storage Server, but in early 2015 renamed to be Red Hat Gluster Storage since Red Hat has also acquired the Ceph file system technology. Red Hat Gluster Storage is in the retirement phase of its lifecycle with a end of support life date of December 31, 2024.


Architecture

The GlusterFS architecture aggregates compute, storage, and I/O resources into a global namespace. Each server plus attached commodity storage (configured as
direct-attached storage Direct-attached storage (DAS) is digital storage directly attached to the computer accessing it, as opposed to storage accessed over a computer network (i.e. network-attached storage). DAS consists of one or more storage units such as hard drive ...
,
JBOD The most widespread standard for configuring multiple hard disk drives is RAID (Redundant Array of Inexpensive/Independent Disks), which comes in a number of standard configurations and non-standard configurations. Non-RAID drive architectures a ...
, or using a
storage area network A storage area network (SAN) or storage network is a computer network which provides access to consolidated, block-level data storage. SANs are primarily used to access data storage devices, such as disk arrays and tape libraries from serve ...
) is considered to be a node. Capacity is scaled by adding additional nodes or adding additional storage to each node. Performance is increased by deploying storage among more nodes. High availability is achieved by replicating data n-way between nodes.


Public cloud deployment

For public cloud deployments, GlusterFS offers an
Amazon Web Services Amazon Web Services, Inc. (AWS) is a subsidiary of Amazon.com, Amazon that provides Software as a service, on-demand cloud computing computing platform, platforms and Application programming interface, APIs to individuals, companies, and gover ...
(AWS)
Amazon Machine Image An Amazon Machine Image (AMI) is a special type of virtual appliance that is used to create a virtual machine within the Amazon Elastic Compute Cloud ("EC2"). It serves as the basic unit of deployment for services delivered using EC2. Contents ...
(AMI), which is deployed on Elastic Compute Cloud (EC2) instances rather than physical servers and the underlying storage is Amazon's
Elastic Block Storage Amazon Elastic Block Store (EBS) provides raw block-level storage that can be attached to Amazon EC2 instances and is used by Amazon Relational Database Service (RDS). It is one of the two block-storage options offered by AWS, with the other b ...
(EBS). In this environment, capacity is scaled by deploying more EBS storage units, performance is scaled by deploying more EC2 instances, and availability is scaled by n-way replication between AWS availability zones.


Private cloud deployment

A typical on-premises, or private cloud deployment will consist of GlusterFS installed as a virtual appliance on top of multiple commodity servers running
hypervisor A hypervisor (also known as a virtual machine monitor, VMM, or virtualizer) is a type of computer software, firmware or hardware that creates and runs virtual machines. A computer on which a hypervisor runs one or more virtual machines is calle ...
s such as KVM, Xen, or
VMware VMware, Inc. is an American cloud computing and virtualization technology company with headquarters in Palo Alto, California. VMware was the first commercially successful company to virtualize the x86 architecture. VMware's desktop software ru ...
; or on bare metal.


GlusterFS

GlusterFS is a
scale-out Scalability is the property of a system to handle a growing amount of work by adding resources to the system. In an economic context, a scalable business model implies that a company can increase sales given increased resources. For example, a ...
network-attached storage Network-attached storage (NAS) is a file-level (as opposed to block-level storage) computer data storage server connected to a computer network providing data access to a heterogeneous group of clients. The term "NAS" can refer to both the tech ...
file system In computing, file system or filesystem (often abbreviated to fs) is a method and data structure that the operating system uses to control how data is stored and retrieved. Without a file system, data placed in a storage medium would be one larg ...
. It has found applications including
cloud computing Cloud computing is the on-demand availability of computer system resources, especially data storage ( cloud storage) and computing power, without direct active management by the user. Large clouds often have functions distributed over mul ...
, streaming media services, and content delivery networks. GlusterFS was developed originally by Gluster, Inc. and then by
Red Hat Red Hat, Inc. is an American software company that provides open source software products to enterprises. Founded in 1993, Red Hat has its corporate headquarters in Raleigh, North Carolina, with other offices worldwide. Red Hat has become ass ...
, Inc., as a result of Red Hat acquiring Gluster in 2011. In June 2012,
Red Hat Storage Server Red Hat Gluster Storage, formerly Red Hat Storage Server, is a computer storage product from Red Hat. It is based on open source technologies such as GlusterFS and Red Hat Enterprise Linux. The latest release, RHGS 3.5, combines Red Hat Enterpris ...
was announced as a commercially supported integration of GlusterFS with
Red Hat Enterprise Linux Red Hat Enterprise Linux (RHEL) is a commercial open-source Linux distribution developed by Red Hat for the commercial market. Red Hat Enterprise Linux is released in server versions for x86-64, Power ISA, ARM64, and IBM Z and a desktop version ...
. Red Hat bought
Inktank Storage Inktank Storage was the lead development contributor and financial sponsor company behind the open source Ceph distributed file system. Inktank was founded by Sage Weil and Bryan Bogensberger and initially funded by DreamHost, Citrix and Mark Shu ...
in April 2014, which is the company behind the Ceph distributed file system, and re-branded GlusterFS-based Red Hat Storage Server to "Red Hat Gluster Storage".


Design

GlusterFS aggregates various storage servers over
Ethernet Ethernet () is a family of wired computer networking technologies commonly used in local area networks (LAN), metropolitan area networks (MAN) and wide area networks (WAN). It was commercially introduced in 1980 and first standardized in 198 ...
or
Infiniband InfiniBand (IB) is a computer networking communications standard used in high-performance computing that features very high throughput and very low latency. It is used for data interconnect both among and within computers. InfiniBand is also used ...
RDMA interconnect into one large parallel network file system. It is free software, with some parts licensed under the GNU
General Public License The GNU General Public License (GNU GPL or simply GPL) is a series of widely used free software licenses that guarantee end users the four freedoms to run, study, share, and modify the software. The license was the first copyleft for general us ...
(GPL) v3 while others are dual licensed under either GPL v2 or the Lesser General Public License (LGPL) v3. GlusterFS is based on a stackable user space design. GlusterFS has a client and server component. Servers are typically deployed as ''storage bricks'', with each server running a daemon to export a local file system as a ''
volume Volume is a measure of occupied three-dimensional space. It is often quantified numerically using SI derived units (such as the cubic metre and litre) or by various imperial or US customary units (such as the gallon, quart, cubic inch). The de ...
''. The client process, which connects to servers with a custom protocol over
TCP/IP The Internet protocol suite, commonly known as TCP/IP, is a framework for organizing the set of communication protocols used in the Internet and similar computer networks according to functional criteria. The foundational protocols in the suit ...
, InfiniBand or
Sockets Direct Protocol The Sockets Direct Protocol (SDP) is a transport-agnostic protocol to support stream sockets over remote direct memory access (RDMA) network fabrics. SDP was originally defined by the Software Working Group (SWG) of the InfiniBand Trade Associatio ...
, creates composite virtual volumes from multiple remote servers using stackable ''translators''. By default, files are stored whole, but
striping In computer data storage, data striping is the technique of segmenting logically sequential data, such as a file, so that consecutive segments are stored on different physical storage devices. Striping is useful when a processing device request ...
of files across multiple remote volumes is also possible. The client may
mount Mount is often used as part of the name of specific mountains, e.g. Mount Everest. Mount or Mounts may also refer to: Places * Mount, Cornwall, a village in Warleggan parish, England * Mount, Perranzabuloe, a hamlet in Perranzabuloe parish, C ...
the composite volume using a GlusterFS native protocol via the
FUSE Fuse or FUSE may refer to: Devices * Fuse (electrical), a device used in electrical systems to protect against excessive current ** Fuse (automotive), a class of fuses for vehicles * Fuse (hydraulic), a device used in hydraulic systems to protect ...
mechanism or using NFS v3 protocol using a built-in server translator, or access the volume via the client library. The client may re-export a native-protocol mount, for example via the kernel
NFSv4 Network File System (NFS) is a distributed file system protocol originally developed by Sun Microsystems (Sun) in 1984, allowing a user on a client computer to access files over a computer network much like local storage is accessed. NFS, li ...
server,
SAMBA Samba (), also known as samba urbano carioca (''urban Carioca samba'') or simply samba carioca (''Carioca samba''), is a Brazilian music genre that originated in the Afro-Brazilian communities of Rio de Janeiro in the early 20th century. Havin ...
, or the object-based
OpenStack OpenStack is a free, open standard cloud computing platform. It is mostly deployed as infrastructure-as-a-service (IaaS) in both public and private clouds where virtual servers and other resources are made available to users. The software plat ...
Storage (Swift) protocol using the "UFO" (Unified File and Object) translator. Most of the functionality of GlusterFS is implemented as translators, including file-based mirroring and replication, file-based
striping In computer data storage, data striping is the technique of segmenting logically sequential data, such as a file, so that consecutive segments are stored on different physical storage devices. Striping is useful when a processing device request ...
, file-based load balancing, volume
failover Failover is switching to a redundant or standby computer server, system, hardware component or network upon the failure or abnormal termination of the previously active application, server, system, hardware component, or network in a computer net ...
,
scheduling A schedule or a timetable, as a basic time-management tool, consists of a list of times at which possible task (project management), tasks, events, or actions are intended to take place, or of a sequence of events in the chronological order ...
and disk caching, storage quotas, and volume
snapshots Snapshot, snapshots or snap shot may refer to: * Snapshot (photography), a photograph taken without preparation Computing * Snapshot (computer storage), the state of a system at a particular point in time * Snapshot (file format) or SNP, a file ...
with user serviceability (since GlusterFS version 3.6). The GlusterFS server is intentionally kept simple: it exports an existing
directory Directory may refer to: * Directory (computing), or folder, a file system structure in which to store computer files * Directory (OpenVMS command) * Directory service, a software application for organizing information about a computer network's u ...
as-is, leaving it up to client-side translators to structure the store. The clients themselves are stateless, do not communicate with each other, and are expected to have translator configurations consistent with each other. GlusterFS relies on an elastic
hashing Hash, hashes, hash mark, or hashing may refer to: Substances * Hash (food), a coarse mixture of ingredients * Hash, a nickname for hashish, a cannabis product Hash mark * Hash mark (sports), a marking on hockey rinks and gridiron football fiel ...
algorithm, rather than using either a centralized or distributed metadata model. The user can add, delete, or migrate volumes dynamically, which helps to avoid configuration coherency problems. This allows GlusterFS to scale up to several
petabyte The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable unit ...
s on
commodity hardware Commodity computing (also known as commodity cluster computing) involves the use of large numbers of already-available computing components for parallel computing, to get the greatest amount of useful computation at low cost. It is computing done i ...
by avoiding bottlenecks that normally affect more tightly coupled distributed file systems. GlusterFS provides data reliability and availability through various kinds of replication: replicated volumes and
Geo-replication Geo-replication systems are designed to provide improved availability and disaster tolerance by using geographically distributed data centers. This is intended to improve the response time for applications such as web portals. Geo-replication can b ...
. Replicated volumes ensure that there exists at least one copy of each file across the bricks, so if one fails, data is still stored and accessible. Geo-replication provides a master-slave model of replication, where volumes are copied across geographically distinct locations. This happens asynchronously and is useful for availability in case of a whole data center failure. GlusterFS has been used as the foundation for academic research and a survey article. Red Hat markets the software for three markets: "on-premises",
public cloud Cloud computing is the on-demand availability of computer system resources, especially data storage (cloud storage) and computing power, without direct active management by the user. Large clouds often have functions distributed over multi ...
and "private cloud".


See also

* BeeGFS *
Ceph (software) Ceph (pronounced ) is an open-source software-defined storage platform that implements object storage on a single distributed computer cluster and provides 3-in-1 interfaces for object-, block- and file-level storage. Ceph aims primarily f ...
*
Distributed file system A clustered file system is a file system which is shared by being simultaneously mounted on multiple servers. There are several approaches to clustering, most of which do not employ a clustered file system (only direct attached storage for ...
* Distributed parallel fault-tolerant file systems * Gfarm file system *
IBM Spectrum Scale GPFS (General Parallel File System, brand name IBM Spectrum Scale) is high-performance clustered file system software developed by IBM. It can be deployed in shared-disk or shared-nothing distributed parallel modes, or a combination of these. It i ...
(GPFS) *
LizardFS LizardFS is an open source distributed file system that is POSIX-compliant and licensed under GPLv3. It was released in 2013 as fork of MooseFS. LizardFS is also offering a paid Technical Support (Standard, Enterprise and Enterprise Plus) with p ...
*
Lustre Lustre or Luster may refer to: Places * Luster, Norway, a municipality in Vestlandet, Norway ** Luster (village), a village in the municipality of Luster * Lustre, Montana, an unincorporated community in the United States Entertainment * '' ...
*
MapR FS The MapR File System (MapR FS) is a clustered file system that supports both very large-scale and high-performance uses. MapR FS supports a variety of interfaces including conventional read/write file access via NFS and a FUSE interface, as well ...
*
Moose File System Moose File System (MooseFS) is an open-source, POSIX-compliant distributed file system developed by Core Technology. MooseFS aims to be fault-tolerant, highly available, highly performing, scalable general-purpose network distributed file system ...
* OrangeFS * Parallel Virtual File System *
Quantcast File System Quantcast File System (QFS) is an open-source distributed file system software package for large-scale MapReduce or other batch-processing workloads. It was designed as an alternative to the Apache Hadoop Distributed File System (HDFS), intended ...
*
RozoFS RozoFS is a free software distributed file system. It comes as a free software, licensed under the GNU GPL v2. RozoFS uses erasure coding for redundancy. Design Rozo provides an open source POSIX filesystem, built on top of distributed file s ...
* XtreemFS *
ZFS ZFS (previously: Zettabyte File System) is a file system with volume management capabilities. It began as part of the Sun Microsystems Solaris operating system in 2001. Large parts of Solaris – including ZFS – were published under an ope ...


References


External links

*{{official website, https://www.gluster.org/ Computer storage companies Software companies based in the San Francisco Bay Area Cloud storage Companies based in Sunnyvale, California Software companies of the United States