High Performance Storage System (HPSS) is a flexible,
scalable
Scalability is the property of a system to handle a growing amount of work by adding resources to the system.
In an economic context, a scalable business model implies that a company can increase sales given increased resources. For example, a ...
, policy-based
Hierarchical Storage Management
Hierarchical storage management (HSM), also known as Tiered storage, is a data storage and Data management technique that automatically moves data between high-cost and low-cost storage media. HSM systems exist because high-speed storage devices, ...
product developed by
the HPSS Collaboration. It provides scalable
hierarchical storage management
Hierarchical storage management (HSM), also known as Tiered storage, is a data storage and Data management technique that automatically moves data between high-cost and low-cost storage media. HSM systems exist because high-speed storage devices, ...
(HSM), archive, and file system services using cluster,
LAN and
SAN technologies to aggregate the capacity and performance of many computers, disks, disk systems, tape drives and tape libraries.
Architecture
HPSS supports a variety of methods for accessing and creating data. Among them are support for
FTP, parallel FTP,
FUSE
Fuse or FUSE may refer to:
Devices
* Fuse (electrical), a device used in electrical systems to protect against excessive current
** Fuse (automotive), a class of fuses for vehicles
* Fuse (hydraulic), a device used in hydraulic systems to prote ...
(Linux), as well as a robust client
API with support for parallel I/O.
As of version 7.5, HPSS has full support on
Linux
Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which i ...
. The HPSS client
API is supported on
AIX,
Linux
Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which i ...
, and
Solaris.
The implementation is built around IBM's
Db2
Db2 is a family of data management products, including database servers, developed by IBM. It initially supported the relational model, but was extended to support object–relational features and non-relational structures like JSON and ...
, a scalable
relational database management system
A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relati ...
.
The HPSS Collaboration
The collaboration which produced HPSS began in the fall of 1992,
and involved
IBM's
Houston
Houston (; ) is the List of cities in Texas by population, most populous city in Texas, the Southern United States#Major cities, most populous city in the Southern United States, the List of United States cities by population, fourth-most pop ...
Global Services and five
United States Department of Energy
The United States Department of Energy (DOE) is an executive department of the U.S. federal government that oversees U.S. national energy policy and manages the research and development of nuclear power and nuclear weapons in the United Stat ...
(DOE)
National Laboratories (
Lawrence Berkeley,
Lawrence Livermore,
Los Alamos,
Oak Ridge, and
Sandia).
At that time, the DOE national laboratory and IBM HPSS design team recognized there would be a data storage explosion driven by computing power rising to
teraops/
petaops requiring data stored in HSMs to rise to petabytes and beyond, data transfer rates with the HSM to rise to gigabytes/s and higher, and daily throughput with a HSM in 10s of terabytes/day. Therefore, the collaboration set out to design and deploy a system that would scale by a factor of 1,000 or more and evolve from the base above toward these expected targets and beyond.
[Largest HPSS Sites 1+ petabytes](_blank)
/ref>
The HPSS collaboration is based on the premise that no single organization has the experience and resources to meet all the challenges represented by the growing imbalance between computing power and data collection capabilities, and storage system I/O, capacity, and functionality. Over twenty organizations worldwide including industry, US Department of Energy (DOE), other federal laboratories, universities, National Science Foundation
The National Science Foundation (NSF) is an independent agency of the United States government that supports fundamental research and education in all the non-medical fields of science and engineering. Its medical counterpart is the National ...
(NSF) supercomputer centers, French Commissariat a l'Energie Atomique ( CEA), and Gleicher Enterprises have contributed to various aspects of this effort.
As of 2022, the primary HPSS development team consists of:
* IBM Global Business Services (Houston
Houston (; ) is the List of cities in Texas by population, most populous city in Texas, the Southern United States#Major cities, most populous city in the Southern United States, the List of United States cities by population, fourth-most pop ...
, TX)
* Los Alamos National Laboratory
Los Alamos National Laboratory (often shortened as Los Alamos and LANL) is one of the sixteen research and development laboratories of the United States Department of Energy (DOE), located a short distance northwest of Santa Fe, New Mexico, i ...
( Los Alamos, NM)
* Lawrence Livermore National Laboratory
Lawrence Livermore National Laboratory (LLNL) is a federal research facility in Livermore, California, United States. The lab was originally established as the University of California Radiation Laboratory, Livermore Branch in 1952 in response ...
( Livermore, CA)
* Lawrence Berkeley National Energy Research Scientific Computing Center
The National Energy Research Scientific Computing Center (NERSC), is a high-performance computing (supercomputer) National User Facility operated by Lawrence Berkeley National Laboratory for the United States Department of Energy Office of Scien ...
( Berkeley, CA)
* Oak Ridge National Laboratory
Oak Ridge National Laboratory (ORNL) is a U.S. multiprogram science and technology national laboratory sponsored by the U.S. Department of Energy (DOE) and administered, managed, and operated by UT–Battelle as a federally funded research an ...
( Oak Ridge, TN)
* Sandia National Laboratory
Sandia National Laboratories (SNL), also known as Sandia, is one of three research and development laboratories of the United States Department of Energy's National Nuclear Security Administration (NNSA). Headquartered in Kirtland Air Force Bas ...
(Albuquerque
Albuquerque ( ; ), ; kee, Arawageeki; tow, Vakêêke; zun, Alo:ke:k'ya; apj, Gołgéeki'yé. abbreviated ABQ, is the most populous city in the U.S. state of New Mexico. Its nicknames, The Duke City and Burque, both reference its founding i ...
, NM)
Notable achievements
* Two of the larger HPSS sites, ECMWF
The European Centre for Medium-Range Weather Forecasts (ECMWF) is an independent intergovernmental organisation supported by most of the nations of Europe. It is based at three sites: Shinfield Park, Reading, United Kingdom; Bologna, Italy; an ...
and UK Met Office
The Meteorological Office, abbreviated as the Met Office, is the United Kingdom's national weather service. It is an executive agency and trading fund of the Department for Business, Energy and Industrial Strategy and is led by CEO Penelope E ...
, had 217 and 99 petabytes of data stored within a single HPSS instance and namespace as of December 7, 2016.
* On November 14, 2007, the San Diego Supercomputer Center
The San Diego Supercomputer Center (SDSC) is an organized research unit of the University of California, San Diego (UCSD). SDSC is located at the UCSD campus' Eleanor Roosevelt College east end, immediately north the Hopkins Parking Structure ...
along with IBM, DataDirect, and Brocade demonstrated a "Billion File" test which successfully backed up a billion files from GPFS
GPFS (General Parallel File System, brand name IBM Spectrum Scale) is high-performance clustered file system software developed by IBM. It can be deployed in shared-disk or shared-nothing distributed parallel modes, or a combination of these. I ...
into HPSS.[HPCWire Nov 15, 2007](_blank)
* In May 2013 a 380 Petabyte HPSS installation entered service at the National Center for Supercomputing Applications
The National Center for Supercomputing Applications (NCSA) is a state-federal partnership to develop and deploy national-scale computer infrastructure that advances research, science and engineering based in the United States. NCSA operates as a ...
(NCSA) at the University of Illinois at Urbana-Champaign
The University of Illinois Urbana-Champaign (U of I, Illinois, University of Illinois, or UIUC) is a public land-grant research university in Illinois in the twin cities of Champaign and Urbana. It is the flagship institution of the Uni ...
.[{{cite web , url=http://www.ncsa.illinois.edu/news/story/ncsa_puts_worlds_largest_high_performance_storage_system_into_production , title=NCSA puts world’s largest High Performance Storage System into production , date=2013-05-30 , accessdate=2014-08-30]
References
IBM storage devices
Storage software