HOME

TheInfoList



OR:

In-situ processing, also known as in-storage processing (ISP), is a computer science term that refers to processing data where it resides. ''
In-situ is a Latin phrase meaning 'in place' or 'on site', derived from ' ('in') and ' ( ablative of ''situs'', ). The term typically refers to the examination or occurrence of a process within its original context, without relocation. The term is use ...
'' means "situated in the original, natural, or existing place or position." An in-situ process processes data where it is stored, such as in
solid-state drives A solid-state drive (SSD) is a type of solid-state storage device that uses integrated circuits to store data persistently. It is sometimes called semiconductor storage device, solid-state device, or solid-state disk. SSDs rely on non-vo ...
(SSDs) or memory devices like NVDIMM, rather than sending the data to a computer's
central processing unit A central processing unit (CPU), also called a central processor, main processor, or just processor, is the primary Processor (computing), processor in a given computer. Its electronic circuitry executes Instruction (computing), instructions ...
(CPU). The technology utilizes embedded processing engines inside the storage devices to make them capable of running user applications in-place, so data does not need to leave the device to be processed. The technology is not new, but modern SSD architecture, as well as the availability of powerful embedded processors, make it more appealing to run user applications in-place. SSDs deliver higher data throughput in comparison to
hard disk drives A hard disk drive (HDD), hard disk, hard drive, or fixed disk is an electro-mechanical data storage device that stores and retrieves digital data using magnetic storage with one or more rigid rapidly rotating platters coated with magnet ...
(HDDs). Additionally, in contrast to the HDDs, the SSDs can handle multiple I/O commands at the same time. The SSDs contain a considerable amount of processing horsepower for managing
flash memory Flash memory is an Integrated circuit, electronic Non-volatile memory, non-volatile computer memory storage medium that can be electrically erased and reprogrammed. The two main types of flash memory, NOR flash and NAND flash, are named for t ...
array and providing a high-speed interface to host machines. These processing capabilities can provide an environment to run user applications in-place. The computational storage device (CSD) term refers to an SSD which is capable of running user applications in-place. In an efficient CSD architecture, the embedded in-storage processing subsystem has access to the data stored in flash memory array through a low-power and high-speed link. The deployment of such CSDs in clusters can increase the overall performance and efficiency of
big data Big data primarily refers to data sets that are too large or complex to be dealt with by traditional data processing, data-processing application software, software. Data with many entries (rows) offer greater statistical power, while data with ...
and
high-performance computing High-performance computing (HPC) is the use of supercomputers and computer clusters to solve advanced computation problems. Overview HPC integrates systems administration (including network and security knowledge) and parallel programming into ...
(HPC) applications.


Reducing data transfer bottlenecks

Webscale data center designers have been trying to develop storage architectures that favor high-capacity hosts. In the following figure (from ), such a storage system is shown where 64 SSDs are attached to a host. For the sake of simplicity, only the details of one SSD are demonstrated. Modern SSDs usually contain 16 or more flash memory channels which can be utilized concurrently for flash memory array I/O operations. Considering 512 MB/s bandwidth per channel, the internal bandwidth of an SSD with 16 flash memory channels is about 8 GB/s. This huge bandwidth decreases to about 1 GB/s due to the complexity of the host interface software and hardware architecture. In other words, the accumulated bandwidth of all internal channels of the 64 SSDs reaches the multiplication of the number of SSDs, the number of channels per SSD, and 512 MB/s (bandwidth of each channel) which is equal to 512 GB/s. While the accumulated bandwidth of the SSDs’ external interfaces is equal to 64 multiply by 1 GB/s (the host interface bandwidth of each SSD) which is 64 GB/s. However, In order to talk to the host, all SSDs required to be connected to a PCIe switch. Hence, the available bandwidth of the host is limited to 32 GB/s. Overall, there is a 16X gap between the accumulated internal bandwidth of all SSDs and the bandwidth available to the host. In other words, for reading 32 TB of data, the host needs 16 minutes while internal components of the SSDs can read the same amount of data in about 1 minute. Additionally, in such storage systems, data need to continuously move through the complex hardware and software stack between hosts and storage units, which imposes a considerable amount of energy consumption and dramatically decreases the energy efficiency of large data centers. Hence, storage architects need to develop techniques to decrease data movement, and ISP technology has been introduced to overcome the aforementioned challenges by moving the process to data.


Efficiency and utilization

The computational storage technology minimizes the data movements in a cluster and also increases the processing horsepower of the cluster by augmenting power-efficient processing engines to the whole system. This technology can potentially be applied to both HDDs and SSDs; however, modern SSD architecture provides better tools for developing such technologies. The SSDs which can run user application in-place are called computational storage devices (CSDs). These storage units are augmentable processing resources, which means they are not designed to replace the high-end processors of modern servers. Instead, they can collaborate with the host's CPU and augment their efficient processing horsepower to the system. The scientific article “Computational storage: an efficient and scalable platform for big data and HPC applications” which is published by
Springer Publishing Springer Publishing Company is an American publishing company of academic journals and books, focusing on the fields of nursing, gerontology, psychology, social work, counseling, public health, and rehabilitation (neuropsychology). It was estab ...
under open access policy (free for the public to access) shows the benefits of CSD utilization in the clusters. Examples of in-storage processing can be seen in fields like visualization efforts, biology and chemistry. This showcases how this technology allows for actions and results to be seen more efficiently than through data movement, regardless of the data being moved. The following figures (from ) show how CSDs can be utilized in an
Apache Hadoop Apache Hadoop () is a collection of open-source software utilities for reliable, scalable, distributed computing. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop wa ...
cluster and on a
Message Passing Interface The Message Passing Interface (MPI) is a portable message-passing standard designed to function on parallel computing architectures. The MPI standard defines the syntax and semantics of library routines that are useful to a wide range of use ...
-based distributed environment.


Industry

In the storage industry, implementations from several companies are now available, including from NGD Systems, ScaleFlux and Eideticom. Other companies have tried to do similar work in the past, including
Micron Technology Micron Technology, Inc. is an American producer of computer memory and computer data storage including dynamic random-access memory, flash memory, and solid-state drives (SSDs). It is headquartered in Boise, Idaho. Micron's consumer produc ...
and
Samsung Samsung Group (; stylised as SΛMSUNG) is a South Korean Multinational corporation, multinational manufacturing Conglomerate (company), conglomerate headquartered in the Samsung Town office complex in Seoul. The group consists of numerous a ...
. The approach from all of these are the same direction, managing or processing data where it resided. NGD Systems was the first company to create in-situ processing storage and has produced two versions of the device since 2017. The Catalina-1 was a standalone SSD that offered 24 TB of flash along with processing. A second product called Newport was released in 2018 that offered up to 32 TB of
flash memory Flash memory is an Integrated circuit, electronic Non-volatile memory, non-volatile computer memory storage medium that can be electrically erased and reprogrammed. The two main types of flash memory, NOR flash and NAND flash, are named for t ...
. ScaleFlux uses a CSS-1000
NVMe NVM Express (NVMe) or Non-Volatile Memory Host Controller Interface Specification (NVMHCIS) is an open, logical-device interface specification for accessing a computer's non-volatile storage media usually attached via the PCI Express bus. The in ...
device that uses host resourcing and kernel changes to address the device and use Host resources to manage up to 6.4 TB flash on the device, or base SSD. Eideticom utilizes a device called a No-Load
DRAM Dram, DRAM, or drams may refer to: Technology and engineering * Dram (unit), a unit of mass and volume, and an informal name for a small amount of liquor, especially whisky or whiskey * Dynamic random-access memory, a type of electronic semicondu ...
-only NVMe device as an accelerator with no actual flash storage for persistent data. Micron called their version ‘Scale In’ at a Flash Memory Summit (FMS) event in 2013 but was never able to productize it and was based on a
SATA SATA (Serial AT Attachment) is a computer bus interface that connects host bus adapters to mass storage devices such as hard disk drives, optical drives, and solid-state drives. Serial ATA succeeded the earlier Parallel ATA (PATA) standard ...
SSD in production. Samsung has worked on various versions of devices from KV Store and others.


References

{{reflist Data storage