IBM Spectrum Scale
   HOME

TheInfoList



OR:

GPFS (General Parallel File System, brand name IBM Spectrum Scale) is high-performance clustered file system software developed by IBM. It can be deployed in
shared-disk In computing, a shared resource, or network share, is a computer resource made available from one host to other hosts on a computer network. It is a device or piece of information on a computer that can be remotely accessed from another compu ...
or
shared-nothing A shared-nothing architecture (SN) is a distributed computing architecture in which each update request is satisfied by a single node (processor/memory/storage unit) in a computer cluster. The intent is to eliminate contention among nodes. Nodes do ...
distributed parallel modes, or a combination of these. It is used by many of the world's largest commercial companies, as well as some of the supercomputers on the Top 500 List. For example, it is the filesystem of the Summit at
Oak Ridge National Laboratory Oak Ridge National Laboratory (ORNL) is a U.S. multiprogram science and technology national laboratory sponsored by the U.S. Department of Energy (DOE) and administered, managed, and operated by UT–Battelle as a federally funded research an ...
which was the #1 fastest supercomputer in the world in the November 2019 TOP500 list of supercomputers. Summit is a 200
Petaflops In computing, floating point operations per second (FLOPS, flops or flop/s) is a measure of computer performance, useful in fields of scientific computations that require floating-point calculations. For such cases, it is a more accurate meas ...
system composed of more than 9,000
POWER9 POWER9 is a family of superscalar, multithreading, multi-core microprocessors produced by IBM, based on the Power ISA. It was announced in August 2016. The POWER9-based processors are being manufactured using a 14 nm FinFET process, in ...
processors and 27,000 NVIDIA Volta
GPU A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobi ...
s. The storage filesystem called Alpine has 250 PB of storage using Spectrum Scale on IBM ESS storage hardware, capable of approximately 2.5TB/s of sequential I/O and 2.2TB/s of random I/O. Like typical cluster filesystems, GPFS provides concurrent high-speed file access to applications executing on multiple nodes of clusters. It can be used with
AIX Aix or AIX may refer to: Computing * AIX, a line of IBM computer operating systems *An Alternate Index, for a Virtual Storage Access Method Key Sequenced Data Set * Athens Internet Exchange, a European Internet exchange point Places Belgi ...
clusters,
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, w ...
clusters, on Microsoft
Windows Server Windows Server (formerly Windows NT Server) is a group of operating systems (OS) for servers that Microsoft has been developing since July 27, 1993. The first OS that was released for this platform was Windows NT 3.1 Advanced Server. With the r ...
, or a heterogeneous cluster of AIX, Linux and Windows nodes running on
x86 x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was intr ...
,
Power Power most often refers to: * Power (physics), meaning "rate of doing work" ** Engine power, the power put out by an engine ** Electric power * Power (social and political), the ability to influence people or events ** Abusive power Power may a ...
or
IBM Z IBM Z is a family name used by IBM for all of its z/Architecture mainframe computers. In July 2017, with another generation of products, the official family was changed to IBM Z from IBM z Systems; the IBM Z family now includes the newest mod ...
processor architectures. In addition to providing filesystem storage capabilities, it provides tools for management and administration of the GPFS cluster and allows for shared access to file systems from remote clusters.


History

GPFS began as the ''Tiger Shark'' file system, a research project at IBM's
Almaden Research Center IBM Research is the research and development division for IBM, an American multinational information technology company headquartered in Armonk, New York, with operations in over 170 countries. IBM Research is the largest industrial research org ...
as early as 1993. Tiger Shark was initially designed to support high throughput multimedia applications. This design turned out to be well suited to scientific computing. Another ancestor is IBM's ''Vesta'' filesystem, developed as a research project at IBM's
Thomas J. Watson Research Center The Thomas J. Watson Research Center is the headquarters for IBM Research. The center comprises three sites, with its main laboratory in Yorktown Heights, New York, U.S., 38 miles (61 km) north of New York City, Albany, New York and wit ...
between 1992 and 1995. Vesta introduced the concept of file partitioning to accommodate the needs of parallel applications that run on high-performance multicomputers with parallel I/O subsystems. With partitioning, a file is not a sequence of bytes, but rather multiple disjoint sequences that may be accessed in parallel. The partitioning is such that it abstracts away the number and type of I/O nodes hosting the filesystem, and it allows a variety of logically partitioned views of files, regardless of the physical distribution of data within the I/O nodes. The disjoint sequences are arranged to correspond to individual processes of a parallel application, allowing for improved scalability. Vesta was commercialized as the PIOFS filesystem around 1994, and was succeeded by GPFS around 1998. The main difference between the older and newer filesystems was that GPFS replaced the specialized interface offered by Vesta/PIOFS with the standard
Unix Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, an ...
API An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how ...
: all the features to support high performance parallel I/O were hidden from users and implemented under the hood. GPFS has been available on IBM's
AIX Aix or AIX may refer to: Computing * AIX, a line of IBM computer operating systems *An Alternate Index, for a Virtual Storage Access Method Key Sequenced Data Set * Athens Internet Exchange, a European Internet exchange point Places Belgi ...
since 1998, on Linux since 2001, and on Windows Server since 2008. Today it is used by many of the top 500 supercomputers listed on the Top 500 Supercomputing List. Since inception, it has been successfully deployed for many commercial applications including digital media, grid analytics, and scalable file services. In 2010, IBM previewed a version of GPFS that included a capability known as GPFS-SNC, where SNC stands for Shared Nothing Cluster. This was officially released with GPFS 3.5 in December 2012, and is now known as FPO (File Placement Optimizer). This allows it to use locally attached disks on a cluster of network connected servers rather than requiring dedicated servers with shared disks (e.g. using a SAN). FPO is suitable for workloads with high data locality such as shared nothing database clusters such as SAP HANA and DB2 DPF, and can be used as a
HDFS Apache Hadoop () is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage ...
-compatible filesystem.


Architecture

It is a clustered file system. It breaks a file into blocks of a configured size, less than 1 megabyte each, which are distributed across multiple cluster nodes. The system stores data on standard block storage volumes, but includes an internal RAID layer that can virtualize those volumes for redundancy and parallel access much like a RAID block storage system. It also has the ability to replicate across volumes at the higher file level. Features of the architecture include * Distributed metadata, including the directory tree. There is no single "directory controller" or "index server" in charge of the filesystem. * Efficient indexing of directory entries for very large directories. * Distributed locking. This allows for full
POSIX The Portable Operating System Interface (POSIX) is a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems. POSIX defines both the system- and user-level application programming inter ...
filesystem semantics, including locking for exclusive file access. * Partition Aware. A failure of the network may partition the filesystem into two or more groups of nodes that can only see the nodes in their group. This can be detected through a heartbeat protocol, and when a partition occurs, the filesystem remains live for the largest partition formed. This offers a graceful degradation of the filesystem — some machines will remain working. * Filesystem maintenance can be performed online. Most of the filesystem maintenance chores (adding new disks, rebalancing data across disks) can be performed while the filesystem is live. This ensures the filesystem is available more often, so keeps the supercomputer cluster itself available for longer. Other features include high availability, ability to be used in a heterogeneous cluster, disaster recovery, security,
DMAPI Data Management API (DMAPI) is the interface defined in the X/Open document "Systems Management: Data Storage Management (XDSM) API" dated February 1997. XFS, IBM JFS, VxFS, AdvFS, StorNext and IBM Spectrum Scale file systems support DMAPI for Hie ...
, HSM and
ILM Ilm or ILM may refer to: Acronyms * Identity Lifecycle Manager, a Microsoft Server Product * '' I Love Money,'' a TV show on VH1 * Independent Loading Mechanism, a mounting system for CPU sockets * Industrial Light & Magic, an American motion ...
.


Compared to Hadoop Distributed File System (HDFS)

Hadoop Apache Hadoop () is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage an ...
's HDFS filesystem, is designed to store similar or greater quantities of data on commodity hardware — that is, datacenters without
RAID Raid, RAID or Raids may refer to: Attack * Raid (military), a sudden attack behind the enemy's lines without the intention of holding ground * Corporate raid, a type of hostile takeover in business * Panty raid, a prankish raid by male college ...
disks and a
storage area network A storage area network (SAN) or storage network is a computer network which provides access to consolidated, block-level data storage. SANs are primarily used to access data storage devices, such as disk arrays and tape libraries from ser ...
(SAN). * HDFS also breaks files up into blocks, and stores them on different filesystem nodes. * GPFS has full Posix filesystem semantics. * GPFS distributes its directory indices and other metadata across the filesystem. Hadoop, in contrast, keeps this on the Primary and Secondary Namenodes, large servers which must store all index information in-RAM. * GPFS breaks files up into small blocks. Hadoop HDFS likes blocks of or more, as this reduces the storage requirements of the Namenode. Small blocks or many small files fill up a filesystem's indices fast, so limit the filesystem's size.


Information lifecycle management

Storage pools allow for the grouping of disks within a file system. An administrator can create tiers of storage by grouping disks based on performance, locality or reliability characteristics. For example, one pool could be high-performance Fibre Channel disks and another more economical SATA storage. A fileset is a sub-tree of the file system namespace and provides a way to partition the namespace into smaller, more manageable units. Filesets provide an administrative boundary that can be used to set quotas and be specified in a policy to control initial data placement or data migration. Data in a single fileset can reside in one or more storage pools. Where the file data resides and how it is migrated is based on a set of rules in a user defined policy. There are two types of user defined policies: file placement and file management. File placement policies direct file data as files are created to the appropriate storage pool. File placement rules are selected by attributes such as file name, the user name or the fileset. File management policies allow the file's data to be moved or replicated or files to be deleted. File management policies can be used to move data from one pool to another without changing the file's location in the directory structure. File management policies are determined by file attributes such as last access time, path name or size of the file. The policy processing engine is scalable and can be run on many nodes at once. This allows management policies to be applied to a single file system with billions of files and complete in a few hours.


See also


References


External links


IBM Spectrum Scale at Almaden

Spectrum Scale User Group


{{DEFAULTSORT:GPFS Shared disk file systems Distributed file systems supported by the Linux kernel IBM file systems Spectrum Scale Distributed file systems