In-memory Processing
   HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, in-memory processing is an
emerging technology Emerging technologies are technologies whose development, practical applications, or both are still largely unrealized. These technologies are generally new but also include older technologies finding new applications. Emerging technologies ar ...
for processing of data stored in an in-memory database. In-memory processing is one method of addressing the performance and power bottlenecks caused by the movement of data between the processor and the main memory. Older systems have been based on
disk storage Disk storage (also sometimes called drive storage) is a general category of storage mechanisms where data is recorded by various electronic, magnetic, optical, or mechanical changes to a surface layer of one or more rotating disks. A disk drive is ...
and
relational database A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relati ...
s using SQL query language, but these are increasingly regarded as inadequate to meet
business intelligence Business intelligence (BI) comprises the strategies and technologies used by enterprises for the data analysis and management of business information. Common functions of business intelligence technologies include reporting, online analytical ...
(BI) needs. Because stored data is accessed much more quickly when it is placed in
random-access memory Random-access memory (RAM; ) is a form of computer memory that can be read and changed in any order, typically used to store working data and machine code. A random-access memory device allows data items to be read or written in almost the ...
(RAM) or
flash memory Flash memory is an electronic non-volatile computer memory storage medium that can be electrically erased and reprogrammed. The two main types of flash memory, NOR flash and NAND flash, are named for the NOR and NAND logic gates. Both use ...
, in-memory processing allows data to be analysed in real time, enabling faster reporting and decision-making in business.


Disk-based Business Intelligence


Data structures

With disk-based technology, data is loaded on to the computer's
hard disk A hard disk drive (HDD), hard disk, hard drive, or fixed disk is an electro-mechanical data storage device that stores and retrieves digital data using magnetic storage with one or more rigid rapidly rotating platters coated with magn ...
in the form of multiple tables and multi-dimensional structures against which queries are run. Disk-based technologies are
relational database management system A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relati ...
s (RDBMS), often based on the structured query language ( SQL), such as SQL Server,
MySQL MySQL () is an open-source relational database management system (RDBMS). Its name is a combination of "My", the name of co-founder Michael Widenius's daughter My, and "SQL", the acronym for Structured Query Language. A relational database ...
,
Oracle An oracle is a person or agency considered to provide wise and insightful counsel or prophetic predictions, most notably including precognition of the future, inspired by deities. As such, it is a form of divination. Description The word ...
and many others. RDBMS are designed for the requirements of transactional processing. Using a database that supports insertions and updates as well as performing aggregations,
join Join may refer to: * Join (law), to include additional counts or additional defendants on an indictment *In mathematics: ** Join (mathematics), a least upper bound of sets orders in lattice theory ** Join (topology), an operation combining two topo ...
s (typical in BI solutions) are typically very slow. Another drawback is that SQL is designed to efficiently fetch rows of data, while BI queries usually involve fetching of partial rows of data involving heavy calculations. To improve query performance, multidimensional databases or OLAP cubes - also called multidimensional online analytical processing (MOLAP) - are constructed. Designing a cube is an elaborate and lengthy process, and changing the cube's structure to adapt to dynamically changing business needs may be cumbersome. Cubes are pre-populated with data to answer specific queries and although they increase performance, they are still not suitable for answering ad-hoc queries. Information technology (IT) staff spend substantial development time on optimizing databases, constructing
index Index (or its plural form indices) may refer to: Arts, entertainment, and media Fictional entities * Index (''A Certain Magical Index''), a character in the light novel series ''A Certain Magical Index'' * The Index, an item on a Halo megastru ...
es and
aggregate Aggregate or aggregates may refer to: Computing and mathematics * collection of objects that are bound together by a root entity, otherwise known as an aggregate root. The aggregate root guarantees the consistency of changes being made within the ...
s, designing cubes and star schemas,
data modeling Data modeling in software engineering is the process of creating a data model for an information system by applying certain formal techniques. Overview Data modeling is a process used to define and analyze data requirements needed to su ...
, and query analysis.


Processing speed

Reading data from the hard disk is much slower (possibly hundreds of times) when compared to reading the same data from RAM. Especially when analyzing large volumes of data, performance is severely degraded. Though SQL is a very powerful tool, complex queries take a relatively long time to execute and often result in bringing down the performance of transactional processing. In order to obtain results within an acceptable response time, many
data warehouse In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis and is considered a core component of business intelligence. DWs are central repositories of integra ...
s have been designed to pre-calculate summaries and answer specific queries only. Optimized aggregation algorithms are needed to increase performance.


In-memory processing tools

Memory processing can be accomplished via traditional databases such as
Oracle An oracle is a person or agency considered to provide wise and insightful counsel or prophetic predictions, most notably including precognition of the future, inspired by deities. As such, it is a form of divination. Description The word ...
,
IBM Db2 Db2 is a family of data management products, including database servers, developed by IBM. It initially supported the relational model, but was extended to support object–relational features and non-relational structures like JSON a ...
or
Microsoft SQL Server Microsoft SQL Server is a relational database management system developed by Microsoft. As a database server, it is a software product with the primary function of storing and retrieving data as requested by other software applications—which ...
or via
NoSQL A NoSQL (originally referring to "non- SQL" or "non-relational") database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Such databases have existed ...
offerings such as in-memory data grid like
Hazelcast In computing, Hazelcast IMDG is an open source in-memory data grid based on Java. It is also the name of the company developing the product. The Hazelcast company is funded by venture capital and headquartered in Palo Alto, California. In a H ...
,
Infinispan Infinispan is a distributed cache and key-value NoSQL data store software developed by Red Hat. Java applications can embed it as library, use it as a service in WildFly or any non-java applications can use it as remote service through TCP/IP. ...
,
Oracle Coherence In computing, Oracle Coherence (originally Tangosol Coherence) is a Java-based distributed cache and in-memory data grid. It is claimed to be "intended for systems that require high availability, high scalability and low latency, particularly in ...
or ScaleOut Software. With both in-memory database and data grid, all information is initially loaded into memory RAM or flash memory instead of
hard disk A hard disk drive (HDD), hard disk, hard drive, or fixed disk is an electro-mechanical data storage device that stores and retrieves digital data using magnetic storage with one or more rigid rapidly rotating platters coated with magn ...
s. With a data grid processing occurs at three
order of magnitude An order of magnitude is an approximation of the logarithm of a value relative to some contextually understood reference value, usually 10, interpreted as the base of the logarithm and the representative of values of magnitude one. Logarithmic dis ...
faster than relational databases which have advanced functionality such as
ACID In computer science, ACID ( atomicity, consistency, isolation, durability) is a set of properties of database transactions intended to guarantee data validity despite errors, power failures, and other mishaps. In the context of databases, a sequ ...
which degrade performance in compensation for the additional functionality. The arrival of column centric databases, which store similar information together, allow data to be stored more efficiently and with greater
compression Compression may refer to: Physical science *Compression (physics), size reduction due to forces *Compression member, a structural element such as a column *Compressibility, susceptibility to compression * Gas compression *Compression ratio, of a ...
ratios. This allows huge amounts of data to be stored in the same physical space, reducing the amount of memory needed to perform a query and increasing processing speed. Many users and software vendors have integrated flash memory into their systems to allow systems to scale to larger data sets more economically. Oracle has been integrating flash memory into the
Oracle Exadata The Oracle Exadata Database Machine (Exadata) is a computing platform optimized for running Oracle Databases. Exadata is a combined hardware and software platform that includes scale-out Intel x86-64 compute and storage servers, RoCE or Infini ...
products for increased performance.
Microsoft SQL Server Microsoft SQL Server is a relational database management system developed by Microsoft. As a database server, it is a software product with the primary function of storing and retrieving data as requested by other software applications—which ...
2012 BI/Data Warehousing software has been coupled with Violin Memory flash memory arrays to enable in-memory processing of data sets greater than 20TB. Users query the data loaded into the system’s memory, thereby avoiding slower database access and performance bottlenecks. This differs from caching, a very widely used method to speed up query performance, in that caches are subsets of very specific pre-defined organized data. With in-memory tools, data available for analysis can be as large as a
data mart A data mart is a structure/access pattern specific to ''data warehouse'' environments, used to retrieve client-facing data. The data mart is a subset of the data warehouse and is usually oriented to a specific business line or team. Whereas data w ...
or small data warehouse which is entirely in memory. This can be accessed quickly by multiple concurrent users or applications at a detailed level and offers the potential for enhanced analytics and for scaling and increasing the speed of an application. Theoretically, the improvement in data access speed is 10,000 to 1,000,000 times compared to the disk. It also minimizes the need for performance tuning by IT staff and provides faster service for end users.


Advantages of in-memory processing technology

Certain developments in computer technology and business needs have tended to increase the relative advantages of in-memory technology. * ''Hardware'' becomes progressively cheaper and higher-performing, according to
Moore's law Moore's law is the observation that the number of transistors in a dense integrated circuit (IC) doubles about every two years. Moore's law is an observation and projection of a historical trend. Rather than a law of physics, it is an empir ...
. Computing power doubles every two to three years while decreasing in costs. CPU processing, memory and disk storage are all subject to some variation of this law. Also hardware innovations such as multi-core architecture,
NAND flash memory Flash memory is an electronic non-volatile computer memory storage medium that can be electrically erased and reprogrammed. The two main types of flash memory, NOR flash and NAND flash, are named for the NOR and NAND logic gates. Both use t ...
, parallel servers, and increased memory processing capability, in addition to software innovations such as column centric databases, compression techniques and handling aggregate tables, have all contributed to demand for in-memory products. * The advent of '' 64-bit operating systems'', which allow access to far more RAM (up to 100 GB or more) than the 2 or 4 GB accessible on 32-bit systems. By providing Terabytes (1 TB = 1,024 GB) of space for storage and analysis, 64-bit operating systems make in-memory processing scalable. The use of flash memory enables systems to scale to many Terabytes more economically. * Increasing ''volumes of data'' have meant that traditional data warehouses are no longer able to process the data in a timely and accurate way. The
extract, transform, load In computing, extract, transform, load (ETL) is a three-phase process where data is extracted, transformed (cleaned, sanitized, scrubbed) and loaded into an output data container. The data can be collated from one or more sources and it can also ...
(ETL) process that periodically updates data warehouses with operational data can take anywhere from a few hours to weeks to complete. So, at any given point of time data is at least a day old. In-memory processing enables instant access to terabytes of data for real time reporting. * In-memory processing is available at a ''lower cost'' compared to traditional BI tools, and can be more easily deployed and maintained. According to Gartner survey, deploying traditional BI tools can take as long as 17 months. Many data warehouse vendors are choosing in-memory technology over traditional BI to speed up implementation times. *Decreases in power consumption and increases in throughput due to a lower access latency, and greater memory bandwidth and hardware parallelism.


Application in business

A range of in-memory products provide ability to connect to existing data sources and access to visually rich interactive dashboards. This allows business analysts and end users to create custom reports and queries without much training or expertise. Easy navigation and ability to modify queries on the fly is of benefit to many users. Since these dashboards can be populated with fresh data, users have access to real time data and can create reports within minutes. In-memory processing may be of particular benefit in
call center A call centre ( Commonwealth spelling) or call center (American spelling; see spelling differences) is a managed capability that can be centralised or remote that is used for receiving or transmitting a large volume of enquiries by telephone. ...
s and warehouse management. With in-memory processing, the source database is queried only once instead of accessing the database every time a query is run, thereby eliminating repetitive processing and reducing the burden on database servers. By scheduling to populate the in-memory database overnight, the database servers can be used for operational purposes during peak hours.


Adoption of in-memory technology

With a large number of users, a large amount of RAM is needed for an in-memory configuration, which in turn affects the hardware costs. The investment is more likely to be suitable in situations where speed of query response is a high priority, and where there is significant growth in data volume and increase in demand for reporting facilities; it may still not be cost-effective where information is not subject to rapid change.
Security Security is protection from, or resilience against, potential harm (or other unwanted coercive change) caused by others, by restraining the freedom of others to act. Beneficiaries (technically referents) of security may be of persons and social ...
is another consideration, as in-memory tools expose huge amounts of data to end users. Makers advise ensuring that only authorized users are given access to the data.


See also

*
System on a chip A system on a chip or system-on-chip (SoC ; pl. ''SoCs'' ) is an integrated circuit that integrates most or all components of a computer or other electronic system. These components almost always include a central processing unit (CPU), memory ...
*
Network on a chip A network on a chip or network-on-chip (NoC or )This article uses the convention that "NoC" is pronounced . Therefore, it uses the convention "a" for the indefinite article corresponding to NoC ("a NoC"). Other sources may pronounce it as an ...


References

{{Reflist Computer memory Database management systems