Storage Management System
   HOME

TheInfoList



OR:

Hierarchical storage management (HSM), also known as Tiered storage, is a
data storage Data storage is the recording (storing) of information (data) in a storage medium. Handwriting, phonographic recording, magnetic tape, and optical discs are all examples of storage media. Biological molecules such as RNA and DNA are conside ...
and
Data management Data management comprises all disciplines related to handling data as a valuable resource. Concept The concept of data management arose in the 1980s as technology moved from sequential processing (first punched cards, then magnetic tape) to r ...
technique that automatically moves data between high-cost and low-cost
storage media Data storage is the recording (storing) of information (data) in a storage medium. Handwriting, phonographic recording, magnetic tape, and optical discs are all examples of storage media. Biological molecules such as RNA and DNA are consi ...
. HSM systems exist because high-speed storage devices, such as
solid state drive A solid-state drive (SSD) is a solid-state storage device that uses integrated circuit assemblies to store data persistently, typically using flash memory, and functioning as secondary storage in the hierarchy of computer storage. It is ...
arrays, are more expensive (per
byte The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable unit ...
stored) than slower devices, such as
hard disk drive A hard disk drive (HDD), hard disk, hard drive, or fixed disk is an electro-mechanical data storage device that stores and retrieves digital data using magnetic storage with one or more rigid rapidly rotating platters coated with magnet ...
s,
optical disc In computing and optical disc recording technologies, an optical disc (OD) is a flat, usually circular disc that encodes binary data (bits) in the form of pits and lands on a special material, often aluminum, on one of its flat surfaces. ...
s and magnetic
tape drive A tape drive is a data storage device that reads and writes data on a magnetic tape. Magnetic tape data storage is typically used for offline, archival data storage. Tape media generally has a favorable unit cost and a long archival stability. ...
s. While it would be ideal to have all data available on high-speed devices all the time, this is prohibitively expensive for many organizations. Instead, HSM systems store the bulk of the enterprise's data on slower devices, and then copy data to faster disk drives when needed. The HSM system monitors the way data is used and makes best guesses as to which data can safely be moved to slower devices and which data should stay on the fast devices. HSM may also be used where more robust storage is available for long-term archiving, but this is slow to access. This may be as simple as an off-site backup, for protection against a building fire. HSM is a long-established concept, dating back to the beginnings of commercial data processing. The techniques used though have changed significantly as new technology becomes available, for both storage and for long-distance communication of large data sets. The scale of measures such as 'size' and 'access time' have changed dramatically. Despite this, many of the underlying concepts keep returning to favour years later, although at much larger or faster scales.


Implementation

In a typical HSM scenario, data which is frequently used are stored on warm storage device, such as solid state disk (SSD). Data that is infrequently accessed is, after some time ''migrated'' to a slower, high capacity cold storage tier. If a user does access data which is on the cold storage tier, it is automatically moved back to warm storage. The advantage is that the total amount of stored data can be much larger than the capacity of the warm storage device, but since only rarely used files are on cold storage, most users will usually not notice any slowdown. Conceptually, HSM is analogous to the
cache Cache, caching, or caché may refer to: Places United States * Cache, Idaho, an unincorporated community * Cache, Illinois, an unincorporated community * Cache, Oklahoma, a city in Comanche County * Cache, Utah, Cache County, Utah * Cache County ...
found in most computer CPUs, where small amounts of expensive SRAM memory running at very high speeds is used to store frequently used data, but the least recently used data is evicted to the slower but much larger main
DRAM Dynamic random-access memory (dynamic RAM or DRAM) is a type of random-access semiconductor memory that stores each bit of data in a memory cell, usually consisting of a tiny capacitor and a transistor, both typically based on metal-oxid ...
memory when new data has to be loaded. In practice, HSM is typically performed by dedicated software, such as
IBM Tivoli Storage Manager IBM Spectrum Protect (Tivoli Storage Manager) is a data protection platform that gives enterprises a single point of control and administration for backup and recovery. It is the flagship product in the IBM Spectrum Protect (Tivoli Storage Mana ...
, or Oracle's SAM-QFS. The deletion of files from a higher level of the hierarchy (e.g. magnetic disk) after they have been moved to a lower level (e.g. optical media) is sometimes called file grooming.


History

Hierarchical Storage Manager (HSM, then DFHSM and finally DFSMShsm) was first implemented by IBM on March 31, 1978 for
MVS Multiple Virtual Storage, more commonly called MVS, was the most commonly used operating system on the System/370 and System/390 IBM mainframe computers. IBM developed MVS, along with OS/VS1 and SVS, as a successor to OS/360. It is unrelated ...
to reduce the cost of data storage, and to simplify the retrieval of data from slower media. The user would not need to know where the data was stored and how to get it back; the computer would retrieve the data automatically. The only difference to the user was the speed at which data was returned. HSM could originally migrate datasets only to disk volumes and virtual volumes on a
IBM 3850 The IBM 3850 Mass Storage System was an online tape library used to hold large amounts of infrequently accessed data. It was one of the earliest examples of nearline storage. History Starting in the late-1960s IBM's lab in Boulder, Colorado b ...
Mass Storage Facility, but a latter release supported magnetic tape volumes for migration level 2 (ML2). Later, IBM ported HSM to its
AIX operating system AIX (Advanced Interactive eXecutive, pronounced , "ay-eye-ex") is a series of Proprietary software, proprietary Unix operating systems developed and sold by IBM for several of its computer platforms. Background Originally released for the ...
, and then to other
Unix-like A Unix-like (sometimes referred to as UN*X or *nix) operating system is one that behaves in a manner similar to a Unix system, although not necessarily conforming to or being certified to any version of the Single UNIX Specification. A Unix-li ...
operating systems such as
Solaris Solaris may refer to: Arts and entertainment Literature, television and film * ''Solaris'' (novel), a 1961 science fiction novel by Stanisław Lem ** ''Solaris'' (1968 film), directed by Boris Nirenburg ** ''Solaris'' (1972 film), directed by ...
,
HP-UX HP-UX (from "Hewlett Packard Unix") is Hewlett Packard Enterprise's proprietary implementation of the Unix operating system, based on Unix System V (initially System III) and first released in 1984. Current versions support HPE Integrity Ser ...
and
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which ...
. CSIRO Australia's Division of Computing Research implemented an HSM in its DAD (Drums and Display) operating system with its Document Region in the 1960s, with copies of documents being written to 7-track tape and automatic retrieval upon access to the documents. HSM was also implemented on the DEC
VAX/VMS OpenVMS, often referred to as just VMS, is a multi-user, multiprocessing and virtual memory-based operating system. It is designed to support time-sharing, batch processing, transaction processing and workstation applications. Customers using Ope ...
systems and the Alpha/VMS systems. The first implementation date should be readily determined from the VMS System Implementation Manuals or the VMS Product Description Brochures. More recently, the development of
Serial ATA SATA (Serial AT Attachment) is a computer bus interface that connects host bus adapters to mass storage devices such as hard disk drives, optical drives, and solid-state drives. Serial ATA succeeded the earlier Parallel ATA (PATA) standard t ...
(SATA) disks has created a significant market for three-stage HSM: files are migrated from high-performance
Fibre Channel Fibre Channel (FC) is a high-speed data transfer protocol providing in-order, lossless delivery of raw block data. Fibre Channel is primarily used to connect computer data storage to servers in storage area networks (SAN) in commercial data cen ...
storage area network A storage area network (SAN) or storage network is a computer network which provides access to consolidated, block-level data storage. SANs are primarily used to access data storage devices, such as disk arrays and tape libraries from serve ...
devices to somewhat slower but much cheaper SATA
disk array A disk array is a disk storage system which contains multiple disk drives. It is differentiated from a disk enclosure, in that an array has cache memory and advanced functionality, like RAID, deduplication, encryption and virtualization. Compo ...
s totaling several
terabyte The byte is a units of information, unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character (computing), character of text in a computer and for this ...
s or more, and then eventually from the SATA disks to tape.


Use cases

HSM is often used for deep archival storage of data to be held long term at low cost. Automated tape robots can silo large quantities of data efficiently with low power consumption. Some HSM software products allow the user to place portions of data files on high-speed disk cache and the rest on tape. This is used in applications that stream video over the internet—the initial portion of a video is delivered immediately from disk while a robot finds, mounts and streams the rest of the file to the end user. Such a system greatly reduces disk cost for large content provision systems. HSM software is today used also for tiering between
hard disk drive A hard disk drive (HDD), hard disk, hard drive, or fixed disk is an electro-mechanical data storage device that stores and retrieves digital data using magnetic storage with one or more rigid rapidly rotating platters coated with magnet ...
s and
flash memory Flash memory is an electronic non-volatile computer memory storage medium that can be electrically erased and reprogrammed. The two main types of flash memory, NOR flash and NAND flash, are named for the NOR and NAND logic gates. Both us ...
, with flash memory being over 30 times faster than magnetic disks, but disks being considerably cheaper.


Algorithms

The key factor behind HSM is a Data migration policy that controls the file transfers in the system. More precisely, the policy decides which tier a file should be stored in, so that the entire storage system can be well-organized and have a shortest response time to requests. There are several algorithms realizing this process, such as Least Recently Used replacement(LRU), Size-Temperature Replacement(STP), Heuristic Threshold(STEP) etc. In research of recent years, there are also some intelligent policies coming up by using machine learning technologies.


Tiering vs. Caching

While tiering solutions and caching may look the same on the surface, the fundamental differences lie in the way the faster storage is utilized and the algorithms used to detect and accelerate frequently accessed data. Caching operates by making a copy of frequently accessed blocks of data, and storing the copy in the faster storage device and use this copy instead of the original data source on the slower, high capacity backend storage. Every time a storage read occurs, the caching software look to see if a copy of this data already exists on the cache and uses that copy, if available. Otherwise, the data is read from the slower, high capacity storage. Tiering on the other hand operates very differently. Rather than making a ''copy'' of frequently accessed data into fast storage, tiering ''moves'' data across tiers, for example, by relocating
cold data In computer storage, cold data refers to data that is rarely accessed, therefore considered "cold". Cold data is the opposite of hot data, which is data that is frequently accessed. To optimize storage costs, cold data can be stored on lower perform ...
to low cost, high capacity nearline storage devices. The basic idea is, mission-critical and highly accesses or "hot" data is stored in expensive medium such as SSD to take advantage of high I/O performance, while
nearline Nearline storage (a portmanteau of "near" and "online storage") is a term used in computer science to describe an intermediate type of data storage that represents a compromise between online storage (supporting frequent, very rapid access to data ...
or rarely accessed or "cold" data is stored in nearline storage medium such as HHD and
tapes Tape or Tapes may refer to: Material A long, narrow, thin strip of material (see also Ribbon (disambiguation): Adhesive tapes * Adhesive tape, any of many varieties of backing materials coated with an adhesive * Athletic tape, pressure-sensiti ...
which are inexpensive. Thus, the "data temperature" or activity levels determines the primary storage hierarchy.


Implementations

*
Alluxio Alluxio is an open-source virtual distributed file system (VDFS). Initially as research project "Tachyon", Alluxio was created at the University of California, Berkeley's AMPLab as Haoyuan Li's Ph.D. Thesis, advised by Professor Scott Shenker & ...
*AMASS/DATAMGR from ADIC (Was available on SGI IRIX, Sun and HP-UX) *
IBM 3850 The IBM 3850 Mass Storage System was an online tape library used to hold large amounts of infrequently accessed data. It was one of the earliest examples of nearline storage. History Starting in the late-1960s IBM's lab in Boulder, Colorado b ...
IBM 3850 Mass Storage Facility *IBM DFSMS for z/VM *IBM DFSMShsm, originally Hierarchical Storage Manager (HSM), 5740-XRB, and later Data Facility Hierarchical Storage Manager Version 2 (DFHSM), 5665-329 *
IBM Tivoli Storage Manager IBM Spectrum Protect (Tivoli Storage Manager) is a data protection platform that gives enterprises a single point of control and administration for backup and recovery. It is the flagship product in the IBM Spectrum Protect (Tivoli Storage Mana ...
for Space Management (HSM available on
UNIX Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, and ot ...
(
IBM AIX AIX (Advanced Interactive eXecutive, pronounced , "ay-eye-ex") is a series of Proprietary software, proprietary Unix operating systems developed and sold by IBM for several of its computer platforms. Background Originally released for the ...
,
HP UX HP-UX (from "Hewlett Packard Unix") is Hewlett Packard Enterprise's proprietary implementation of the Unix operating system, based on Unix System V (initially System III) and first released in 1984. Current versions support HPE Integrity Ser ...
,
Solaris Solaris may refer to: Arts and entertainment Literature, television and film * ''Solaris'' (novel), a 1961 science fiction novel by Stanisław Lem ** ''Solaris'' (1968 film), directed by Boris Nirenburg ** ''Solaris'' (1972 film), directed by ...
) &
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which ...
) * IBM Tivoli Storage Manager HSM for Windows formerly OpenStore for File Servers (OS4FS) (HSM available on Microsoft
Windows Server Windows Server (formerly Windows NT Server) is a group of operating systems (OS) for servers that Microsoft has been developing since July 27, 1993. The first OS that was released for this platform was Windows NT 3.1 Advanced Server. With the r ...
) * HPSS b
HPSS collaboration
* Infinite Disk, an early PC system (defunct) * EMC DiskXtender, formerly Legato DiskXtender, formerly OTG DiskXtender *Moonwalk for Windows, NetApp, OES Linux *
Oracle An oracle is a person or agency considered to provide wise and insightful counsel or prophetic predictions, most notably including precognition of the future, inspired by deities. As such, it is a form of divination. Description The word '' ...
SAM-QFS (Open source under Opensolaris, then proprietary) *
Oracle An oracle is a person or agency considered to provide wise and insightful counsel or prophetic predictions, most notably including precognition of the future, inspired by deities. As such, it is a form of divination. Description The word '' ...
HSM (Proprietary, renamed from SAM-QFS) * Versity Storage Manager for Linux,
open-core model The open-core model is a business model for the monetization of commercially produced open-source software. Coined by Andrew Lampitt in 2008, the open-core model primarily involves offering a "core" or feature-limited version of a software pro ...
license * Dell Compellent Data Progression * Zarafa Archiver (component of ZCP, application specific archiving solution marketed as a 'HSM' solution) * HPE Data Management Framework (DMF, formerly
SGI SGI may refer to: Companies *Saskatchewan Government Insurance *Scientific Games International, a gambling company *Silicon Graphics, Inc., a former manufacturer of high-performance computing products *Silicon Graphics International, formerly Rac ...
Data Migration Facility) for SLES and
RHEL Red Hat Enterprise Linux (RHEL) is a commercial open-source Linux distribution developed by Red Hat for the commercial market. Red Hat Enterprise Linux is released in server versions for x86-64, Power ISA, ARM64, and IBM Z and a desktop versio ...
* Quantum's
StorNext StorNext File System (SNFS), colloquially referred to as StorNext is a shared disk file system made by Quantum Corporation. StorNext enables multiple Windows, Linux and Apple workstations to access shared block storage over a Fibre Channel networ ...
*
Apple An apple is an edible fruit produced by an apple tree (''Malus domestica''). Apple fruit tree, trees are agriculture, cultivated worldwide and are the most widely grown species in the genus ''Malus''. The tree originated in Central Asia, wh ...
Fusion Drive Fusion Drive is a type of hybrid drive technology created by Apple Inc. It combines a hard disk drive with a NAND flash storage (solid-state drive of 24 GB or more) and presents it as a single Core Storage managed logical volume with the sp ...
for
macOS macOS (; previously OS X and originally Mac OS X) is a Unix operating system developed and marketed by Apple Inc. since 2001. It is the primary operating system for Apple's Mac computers. Within the market of desktop and lapt ...
*
Microsoft Microsoft Corporation is an American multinational technology corporation producing computer software, consumer electronics, personal computers, and related services headquartered at the Microsoft Redmond campus located in Redmond, Washing ...
Storage Spaces The transition from Windows 7 to Windows 8 introduced a number of new features across various aspects of the operating system. These include a greater focus on optimizing the operating system for touchscreen-based devices (such as tablet compute ...
since version shipped with
Windows Server 2012 R2 Windows Server 2012 R2, codenamed "Windows Server 8.1" or "Windows Server Blue", is the seventh version of the Windows Server operating system by Microsoft, as part of the Windows NT family of operating systems. It was unveiled on June 3, 2013 a ...
. An older Microsoft product was Remote Storage, included with
Windows 2000 Windows 2000 is a major release of the Windows NT operating system developed by Microsoft and oriented towards businesses. It was the direct successor to Windows NT 4.0, and was Software release life cycle#Release to manufacturing (RTM), releas ...
and
Windows 2003 Windows Server 2003 is the sixth version of Windows Server operating system produced by Microsoft. It is part of the Windows NT family of operating systems and was released to manufacturing on March 28, 2003 and generally available on April 24, 2 ...
.


See also

* Active Archive Alliance *
Archive An archive is an accumulation of historical records or materials – in any medium – or the physical facility in which they are located. Archives contain primary source documents that have accumulated over the course of an individual or ...
*
Backup In information technology, a backup, or data backup is a copy of computer data taken and stored elsewhere so that it may be used to restore the original after a data loss event. The verb form, referring to the process of doing so, is "back up", w ...
*
Hybrid cloud storage Hybrid cloud storage, in Computer data storage, data storage, is a term for a storage infrastructure that uses a combination of on-premises storage resources with a public cloud storage provider. The on-premises storage is usually managed by the org ...
*
Data proliferation Data proliferation refers to the prodigious amount of data, structured and unstructured, that businesses and governments continue to generate at an unprecedented rate and the usability problems that result from attempting to store and manage that d ...
*
Disk storage Disk storage (also sometimes called drive storage) is a general category of storage mechanisms where data is recorded by various electronic, magnetic, optical, or mechanical changes to a surface layer of one or more rotating disks. A disk drive is ...
* Information lifecycle management *
Information repository In information technology, an information repository or simply a repository is "a central place in which an aggregation of data is kept and maintained in an organized way, usually in computer storage." It "may be just the aggregation of data itse ...
*
Magnetic tape data storage Magnetic-tape data storage is a system for storing digital information on magnetic tape using digital recording. Tape was an important medium for primary data storage in early computers, typically using large open reels of IBM 7 track, 7-track, ...
* Memory hierarchy *
Storage virtualization In computer science, storage virtualization is "the process of presenting a logical view of the physical storage resources to" a host computer system, "treating all storage media (hard disk, optical disk, tape, etc.) in the enterprise as a singl ...
* Cloud storage gateway


References

* {{cite book, first1=Keith, last1=Winnard, first2=Josh, last2=Biondo, date=6 June 2016, publisher=
IBM Press IBM Press is IBM's official retail book publisher for professionals and academia. A collaboration between IBM and Pearson Education, books are distributed in print and on Safari Books Online. Published topics range from general information tech ...
, isbn=9780738455372, lang=en-US, title=DFSMS: From Storage Tears to Storage Tiers, url=https://www.redbooks.ibm.com/abstracts/redp5341.html Business software Computer data storage