HOME

TheInfoList



OR:

High Performance Storage System (HPSS) is a flexible,
scalable Scalability is the property of a system to handle a growing amount of work by adding resources to the system. In an economic context, a scalable business model implies that a company can increase sales given increased resources. For example, a ...
, policy-based
Hierarchical Storage Management Hierarchical storage management (HSM), also known as Tiered storage, is a data storage and Data management technique that automatically moves data between high-cost and low-cost storage media. HSM systems exist because high-speed storage devices, ...
product developed by the HPSS Collaboration. It provides scalable
hierarchical storage management Hierarchical storage management (HSM), also known as Tiered storage, is a data storage and Data management technique that automatically moves data between high-cost and low-cost storage media. HSM systems exist because high-speed storage devices, ...
(HSM), archive, and file system services using cluster, LAN and SAN technologies to aggregate the capacity and performance of many computers, disks, disk systems, tape drives and tape libraries.


Architecture

HPSS supports a variety of methods for accessing and creating data. Among them are support for FTP, parallel FTP,
FUSE Fuse or FUSE may refer to: Devices * Fuse (electrical), a device used in electrical systems to protect against excessive current ** Fuse (automotive), a class of fuses for vehicles * Fuse (hydraulic), a device used in hydraulic systems to prote ...
(Linux), as well as a robust client API with support for parallel I/O. As of version 7.5, HPSS has full support on
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which i ...
. The HPSS client API is supported on AIX,
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which i ...
, and Solaris. The implementation is built around IBM's
Db2 Db2 is a family of data management products, including database servers, developed by IBM. It initially supported the relational model, but was extended to support object–relational features and non-relational structures like JSON and ...
, a scalable
relational database management system A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relati ...
.


The HPSS Collaboration

The collaboration which produced HPSS began in the fall of 1992, and involved IBM's
Houston Houston (; ) is the List of cities in Texas by population, most populous city in Texas, the Southern United States#Major cities, most populous city in the Southern United States, the List of United States cities by population, fourth-most pop ...
Global Services and five
United States Department of Energy The United States Department of Energy (DOE) is an executive department of the U.S. federal government that oversees U.S. national energy policy and manages the research and development of nuclear power and nuclear weapons in the United Stat ...
(DOE) National Laboratories ( Lawrence Berkeley, Lawrence Livermore, Los Alamos, Oak Ridge, and Sandia). At that time, the DOE national laboratory and IBM HPSS design team recognized there would be a data storage explosion driven by computing power rising to teraops/ petaops requiring data stored in HSMs to rise to petabytes and beyond, data transfer rates with the HSM to rise to gigabytes/s and higher, and daily throughput with a HSM in 10s of terabytes/day. Therefore, the collaboration set out to design and deploy a system that would scale by a factor of 1,000 or more and evolve from the base above toward these expected targets and beyond.Largest HPSS Sites 1+ petabytes
/ref> The HPSS collaboration is based on the premise that no single organization has the experience and resources to meet all the challenges represented by the growing imbalance between computing power and data collection capabilities, and storage system I/O, capacity, and functionality. Over twenty organizations worldwide including industry, US Department of Energy (DOE), other federal laboratories, universities,
National Science Foundation The National Science Foundation (NSF) is an independent agency of the United States government that supports fundamental research and education in all the non-medical fields of science and engineering. Its medical counterpart is the National ...
(NSF) supercomputer centers, French Commissariat a l'Energie Atomique ( CEA), and Gleicher Enterprises have contributed to various aspects of this effort. As of 2022, the primary HPSS development team consists of: * IBM Global Business Services (
Houston Houston (; ) is the List of cities in Texas by population, most populous city in Texas, the Southern United States#Major cities, most populous city in the Southern United States, the List of United States cities by population, fourth-most pop ...
, TX) *
Los Alamos National Laboratory Los Alamos National Laboratory (often shortened as Los Alamos and LANL) is one of the sixteen research and development laboratories of the United States Department of Energy (DOE), located a short distance northwest of Santa Fe, New Mexico, i ...
( Los Alamos, NM) *
Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory (LLNL) is a federal research facility in Livermore, California, United States. The lab was originally established as the University of California Radiation Laboratory, Livermore Branch in 1952 in response ...
( Livermore, CA) * Lawrence Berkeley
National Energy Research Scientific Computing Center The National Energy Research Scientific Computing Center (NERSC), is a high-performance computing (supercomputer) National User Facility operated by Lawrence Berkeley National Laboratory for the United States Department of Energy Office of Scien ...
( Berkeley, CA) *
Oak Ridge National Laboratory Oak Ridge National Laboratory (ORNL) is a U.S. multiprogram science and technology national laboratory sponsored by the U.S. Department of Energy (DOE) and administered, managed, and operated by UT–Battelle as a federally funded research an ...
( Oak Ridge, TN) *
Sandia National Laboratory Sandia National Laboratories (SNL), also known as Sandia, is one of three research and development laboratories of the United States Department of Energy's National Nuclear Security Administration (NNSA). Headquartered in Kirtland Air Force Bas ...
(
Albuquerque Albuquerque ( ; ), ; kee, Arawageeki; tow, Vakêêke; zun, Alo:ke:k'ya; apj, Gołgéeki'yé. abbreviated ABQ, is the most populous city in the U.S. state of New Mexico. Its nicknames, The Duke City and Burque, both reference its founding i ...
, NM)


Notable achievements

* Two of the larger HPSS sites,
ECMWF The European Centre for Medium-Range Weather Forecasts (ECMWF) is an independent intergovernmental organisation supported by most of the nations of Europe. It is based at three sites: Shinfield Park, Reading, United Kingdom; Bologna, Italy; an ...
and
UK Met Office The Meteorological Office, abbreviated as the Met Office, is the United Kingdom's national weather service. It is an executive agency and trading fund of the Department for Business, Energy and Industrial Strategy and is led by CEO Penelope E ...
, had 217 and 99 petabytes of data stored within a single HPSS instance and namespace as of December 7, 2016. * On November 14, 2007, the
San Diego Supercomputer Center The San Diego Supercomputer Center (SDSC) is an organized research unit of the University of California, San Diego (UCSD). SDSC is located at the UCSD campus' Eleanor Roosevelt College east end, immediately north the Hopkins Parking Structure ...
along with IBM, DataDirect, and Brocade demonstrated a "Billion File" test which successfully backed up a billion files from
GPFS GPFS (General Parallel File System, brand name IBM Spectrum Scale) is high-performance clustered file system software developed by IBM. It can be deployed in shared-disk or shared-nothing distributed parallel modes, or a combination of these. I ...
into HPSS.HPCWire Nov 15, 2007
* In May 2013 a 380 Petabyte HPSS installation entered service at the
National Center for Supercomputing Applications The National Center for Supercomputing Applications (NCSA) is a state-federal partnership to develop and deploy national-scale computer infrastructure that advances research, science and engineering based in the United States. NCSA operates as a ...
(NCSA) at the
University of Illinois at Urbana-Champaign The University of Illinois Urbana-Champaign (U of I, Illinois, University of Illinois, or UIUC) is a public land-grant research university in Illinois in the twin cities of Champaign and Urbana. It is the flagship institution of the Uni ...
.{{cite web , url=http://www.ncsa.illinois.edu/news/story/ncsa_puts_worlds_largest_high_performance_storage_system_into_production , title=NCSA puts world’s largest High Performance Storage System into production , date=2013-05-30 , accessdate=2014-08-30


References

IBM storage devices Storage software