Online analytical processing, or OLAP (), is an approach to answer
multi-dimensional analytical
In statistics, econometrics and related fields, multidimensional analysis (MDA) is a data analysis process that groups data into two categories: data dimensions and measurements. For example, a data set consisting of the number of wins for a single ...
(MDA) queries swiftly in
computing
Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes, and development of both hardware and software. Computing has scientific, e ...
.
OLAP is part of the broader category of
business intelligence
Business intelligence (BI) comprises the strategies and technologies used by enterprises for the data analysis and management of business information. Common functions of business intelligence technologies include reporting, online analytical pr ...
, which also encompasses
relational database
A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relatio ...
s, report writing and
data mining. Typical applications of OLAP include
business reporting
Business reporting or enterprise reporting refers to both "the public reporting of operating and financial data by a business enterprise," and "the regular provision of information to decision-makers within an organization to support them in their ...
for sales,
marketing
Marketing is the process of exploring, creating, and delivering value to meet the needs of a target market in terms of goods and services; potentially including selection of a target audience; selection of certain attributes or themes to emph ...
, management reporting,
business process management
Business process management (BPM) is the discipline in which people use various methods to discover, model, analyze, measure, improve, optimize, and automate business processes. Any combination of methods used to manage a company's business pro ...
(BPM),
budget
A budget is a calculation play, usually but not always financial, for a defined period, often one year or a month. A budget may include anticipated sales volumes and revenues, resource quantities including time, costs and expenses, environmenta ...
ing and
forecasting,
financial reporting
Financial statements (or financial reports) are formal records of the financial activities and position of a business, person, or other entity.
Relevant financial information is presented in a structured manner and in a form which is easy to un ...
and similar areas, with new applications emerging, such as
agriculture
Agriculture or farming is the practice of cultivating plants and livestock. Agriculture was the key development in the rise of sedentary human civilization, whereby farming of domesticated species created food surpluses that enabled people to ...
.
[
The term ''OLAP'' was created as a slight modification of the traditional database term ]online transaction processing In online transaction processing (OLTP), information systems typically facilitate and manage transaction-oriented applications. This is contrasted with online analytical processing.
The term "transaction" can have two different meanings, both of wh ...
(OLTP).
OLAP tools enable users to analyze multidimensional data interactively from multiple perspectives. OLAP consists of three basic analytical operations: consolidation (roll-up), drill-down, and slicing and dicing.[O'Brien, J. A., & Marakas, G. M. (2009). Management information systems (9th ed.). Boston, MA: McGraw-Hill/Irwin.] Consolidation involves the aggregation of data that can be accumulated and computed in one or more dimensions. For example, all sales offices are rolled up to the sales department or sales division to anticipate sales trends. By contrast, the drill-down is a technique that allows users to navigate through the details. For instance, users can view the sales by individual products that make up a region's sales. Slicing and dicing is a feature whereby users can take out (slicing) a specific set of data of the OLAP cube
An OLAP cube is a multi-dimensional array of data. Online analytical processing (OLAP) is a computer-based technique of analyzing data to look for insights. The term ''cube'' here refers to a multi-dimensional dataset, which is also sometimes c ...
and view (dicing) the slices from different viewpoints. These viewpoints are sometimes called dimensions (such as looking at the same sales by salesperson, or by date, or by customer, or by product, or by region, etc.).
Database
In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases sp ...
s configured for OLAP use a multidimensional data model, allowing for complex analytical and ad hoc
Ad hoc is a Latin phrase meaning literally 'to this'. In English, it typically signifies a solution for a specific purpose, problem, or task rather than a generalized solution adaptable to collateral instances. (Compare with ''a priori''.)
Com ...
queries with a rapid execution time. They borrow aspects of navigational database
A navigational database is a type of database in which records or objects are found primarily by following references from other objects. The term was popularized by the title of Charles Bachman's 1973 Turing Award paper, ''The Programmer as Navig ...
s, hierarchical database
A hierarchical database model is a data model in which the data are organized into a tree-like structure. The data are stored as records which are connected to one another through links. A record is a collection of fields, with each field containin ...
s and relational databases.
OLAP is typically contrasted to OLTP In online transaction processing (OLTP), information systems typically facilitate and manage transaction-oriented applications. This is contrasted with online analytical processing.
The term "transaction" can have two different meanings, both of w ...
(online transaction processing), which is generally characterized by much less complex queries, in a larger volume, to process transactions rather than for the purpose of business intelligence or reporting. Whereas OLAP systems are mostly optimized for read, OLTP has to process all kinds of queries (read, insert, update and delete).
Overview of OLAP systems
At the core of any OLAP system is an OLAP cube
An OLAP cube is a multi-dimensional array of data. Online analytical processing (OLAP) is a computer-based technique of analyzing data to look for insights. The term ''cube'' here refers to a multi-dimensional dataset, which is also sometimes c ...
(also called a 'multidimensional cube' or a hypercube
In geometry, a hypercube is an ''n''-dimensional analogue of a square () and a cube (). It is a closed, compact, convex figure whose 1- skeleton consists of groups of opposite parallel line segments aligned in each of the space's dimensions, ...
). It consists of numeric facts called ''measures'' that are categorized by ''dimensions
In physics and mathematics, the dimension of a mathematical space (or object) is informally defined as the minimum number of coordinates needed to specify any point within it. Thus, a line has a dimension of one (1D) because only one coordina ...
''. The measures are placed at the intersections of the hypercube, which is spanned by the dimensions as a vector space
In mathematics and physics, a vector space (also called a linear space) is a set whose elements, often called ''vectors'', may be added together and multiplied ("scaled") by numbers called '' scalars''. Scalars are often real numbers, but can ...
. The usual interface to manipulate an OLAP cube is a matrix interface, like Pivot table
A pivot table is a table of grouped values that aggregates the individual items of a more extensive table (such as from a database, spreadsheet, or business intelligence program) within one or more discrete categories. This summary might include ...
s in a spreadsheet program, which performs projection operations along the dimensions, such as aggregation or averaging.
The cube metadata is typically created from a star schema
In computing, the star schema is the simplest style of data mart schema and is the approach most widely used to develop data warehouses and dimensional data marts. The star schema consists of one or more fact tables referencing any number of dim ...
or snowflake schema
In computing, a snowflake schema is a logical arrangement of tables in a multidimensional database such that the entity relationship diagram resembles a snowflake shape. The snowflake schema is represented by centralized fact tables which ...
or fact constellation Fact constellation is a measure of online analytical processing, which is a collection of multiple fact tables sharing dimension tables, viewed as a collection of stars. It can be seen as an extension of the star schema.
A fact constellation sche ...
of tables in a relational database
A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relatio ...
. Measures are derived from the records in the fact table
In data warehousing, a fact table consists of the measurements, metrics or facts of a business process. It is located at the center of a star schema or a snowflake schema surrounded by dimension tables. Where multiple fact tables are used, these a ...
and dimensions are derived from the dimension table
A dimension is a structure that categorizes facts and measures in order to enable users to answer business questions. Commonly used dimensions are people, products, place and time. (Note: People and time sometimes are not modeled as dimensions.) ...
s.
Each ''measure'' can be thought of as having a set of ''labels'', or meta-data associated with it. A ''dimension'' is what describes these ''labels''; it provides information about the ''measure''.
A simple example would be a cube that contains a store's sales as a ''measure'', and Date/Time as a ''dimension''. Each Sale has a Date/Time ''label'' that describes more about that sale.
For example:
Sales Fact Table
+-------------+----------+
, sale_amount , time_id ,
+-------------+----------+ Time Dimension
, 2008.10, 1234 , ----+ +---------+-------------------+
+-------------+----------+ , , time_id , timestamp ,
, +---------+-------------------+
+---->, 1234 , 20080902 12:35:43 ,
+---------+-------------------+
Multidimensional databases
Multidimensional structure is defined as "a variation of the relational model that uses multidimensional structures to organize data and express the relationships between data". The structure is broken into cubes and the cubes are able to store and access data within the confines of each cube. "Each cell within a multidimensional structure contains aggregated data related to elements along each of its dimensions". Even when data is manipulated it remains easy to access and continues to constitute a compact database format. The data still remains interrelated.
Multidimensional structure is quite popular for analytical databases that use online analytical processing (OLAP) applications. Analytical databases use these databases because of their ability to deliver answers to complex business queries swiftly. Data can be viewed from different angles, which gives a broader perspective of a problem unlike other models.
Aggregations
It has been claimed that for complex queries OLAP cubes can produce an answer in around 0.1% of the time required for the same query on OLTP In online transaction processing (OLTP), information systems typically facilitate and manage transaction-oriented applications. This is contrasted with online analytical processing.
The term "transaction" can have two different meanings, both of w ...
relational data. The most important mechanism in OLAP which allows it to achieve such performance is the use of ''aggregations''. Aggregations are built from the fact table by changing the granularity on specific dimensions and aggregating up data along these dimensions, using an aggregate function
In database management, an aggregate function or aggregation function is a function where the values of multiple rows are grouped together to form a single summary value.
Common aggregate functions include:
* Average (i.e., arithmetic mean)
* C ...
(or ''aggregation function''). The number of possible aggregations is determined by every possible combination of dimension granularities.
The combination of all possible aggregations and the base data contains the answers to every query which can be answered from the data.
Because usually there are many aggregations that can be calculated, often only a predetermined number are fully calculated; the remainder are solved on demand. The problem of deciding which aggregations (views) to calculate is known as the view selection problem. View selection can be constrained by the total size of the selected set of aggregations, the time to update them from changes in the base data, or both. The objective of view selection is typically to minimize the average time to answer OLAP queries, although some studies also minimize the update time. View selection is NP-Complete
In computational complexity theory, a problem is NP-complete when:
# it is a problem for which the correctness of each solution can be verified quickly (namely, in polynomial time) and a brute-force search algorithm can find a solution by tryi ...
. Many approaches to the problem have been explored, including greedy algorithm
A greedy algorithm is any algorithm that follows the problem-solving heuristic of making the locally optimal choice at each stage. In many problems, a greedy strategy does not produce an optimal solution, but a greedy heuristic can yield locally ...
s, randomized search, genetic algorithm
In computer science and operations research, a genetic algorithm (GA) is a metaheuristic inspired by the process of natural selection that belongs to the larger class of evolutionary algorithms (EA). Genetic algorithms are commonly used to gene ...
s and A* search algorithm
A* (pronounced "A-star") is a graph traversal and path search algorithm, which is used in many fields of computer science due to its completeness, optimality, and optimal efficiency. One major practical drawback is its O(b^d) space complexity, ...
.
Some aggregation functions can be computed for the entire OLAP cube by precomputing values for each cell, and then computing the aggregation for a roll-up of cells by aggregating these aggregates, applying a divide and conquer algorithm
In computer science, divide and conquer is an algorithm design paradigm. A divide-and-conquer algorithm recursively breaks down a problem into two or more sub-problems of the same or related type, until these become simple enough to be solved dire ...
to the multidimensional problem to compute them efficiently. For example, the overall sum of a roll-up is just the sum of the sub-sums in each cell. Functions that can be decomposed in this way are called decomposable aggregation function
In database management, an aggregate function or aggregation function is a function where the values of multiple rows are grouped together to form a single summary value.
Common aggregate functions include:
* Average (i.e., arithmetic mean)
* ...
s, and include COUNT, MAX, MIN,
and SUM
, which can be computed for each cell and then directly aggregated; these are known as self-decomposable aggregation functions. In other cases the aggregate function can be computed by computing auxiliary numbers for cells, aggregating these auxiliary numbers, and finally computing the overall number at the end; examples include AVERAGE
(tracking sum and count, dividing at the end) and RANGE
(tracking max and min, subtracting at the end). In other cases the aggregate function cannot be computed without analyzing the entire set at once, though in some cases approximations can be computed; examples include DISTINCT COUNT, MEDIAN,
and MODE
; for example, the median of a set is not the median of medians of subsets. These latter are difficult to implement efficiently in OLAP, as they require computing the aggregate function on the base data, either computing them online (slow) or precomputing them for possible rollouts (large space).
Types
OLAP systems have been traditionally categorized using the following taxonomy.
Multidimensional OLAP (MOLAP)
MOLAP (multi-dimensional online analytical processing) is the classic form of OLAP and is sometimes referred to as just OLAP. MOLAP stores this data in an optimized multi-dimensional array storage, rather than in a relational database.
Some MOLAP tools require the pre-computation and storage of derived data, such as consolidations – the operation known as processing. Such MOLAP tools generally utilize a pre-calculated data set referred to as a data cube. The data cube contains all the possible answers to a given range of questions. As a result, they have a very fast response to queries. On the other hand, updating can take a long time depending on the degree of pre-computation. Pre-computation can also lead to what is known as data explosion.
Other MOLAP tools, particularly those that implement the functional database model The functional database model is used to support analytics applications such as financial planning and performance management. The functional database model, or the functional model for short, is different from but complementary to the relational m ...
do not pre-compute derived data but make all calculations on demand other than those that were previously requested and stored in a cache.
Advantages of MOLAP
* Fast query performance due to optimized storage, multidimensional indexing and caching.
* Smaller on-disk size of data compared to data stored in relational database
A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relatio ...
due to compression techniques.
* Automated computation of higher level aggregates of the data.
* It is very compact for low dimension data sets.
* Array models provide natural indexing.
* Effective data extraction achieved through the pre-structuring of aggregated data.
Disadvantages of MOLAP
* Within some MOLAP systems the processing step (data load) can be quite lengthy, especially on large data volumes. This is usually remedied by doing only incremental processing, i.e., processing only the data which have changed (usually new data) instead of reprocessing the entire data set.
* Some MOLAP methodologies introduce data redundancy.
Products
Examples of commercial products that use MOLAP are Cognos
Cognos Incorporated was an Ottawa, Ontario-based company making business intelligence (BI) and performance management (PM) software. Founded in 1969, at its peak Cognos employed almost 3,500 people and served more than 23,000 customers in over ...
Powerplay, Oracle Database OLAP Option, MicroStrategy
MicroStrategy Incorporated is an American company that provides business intelligence (BI), mobile software, and cloud-based services. Founded in 1989 by Michael J. Saylor, Sanju Bansal, and Thomas Spahr, the firm develops software to analyze ...
, Microsoft Analysis Services
Microsoft SQL Server Analysis Services (SSAS) is an online analytical processing (OLAP) and data mining tool in Microsoft SQL Server. SSAS is used as a tool by organizations to analyze and make sense of information possibly spread out across mul ...
, Essbase
Essbase is a multidimensional database management system (MDBMS) that provides a platform upon which to build analytic applications. Essbase began as a product from Arbor Software, which merged with Hyperion Software in 1998. Oracle Corporation a ...
, TM1
IBM Planning Analytics powered by TM1 (formerly IBM Cognos TM1, formerly Applix TM1, formerly Sinper TM/1) is a business performance management software suite designed to implement collaborative planning, budgeting and forecasting solutions, int ...
, Jedox
Jedox is an Enterprise Performance Management software which is used for planning, analytics and reporting in finance and other areas such as sales, human resources and procurement. Its core is a cell-oriented, multi-dimensional in-memory OLAP ...
, and icCube
icCube is a company founded in Switzerland that provides business intelligence (BI) software of the same name. The software can be fully embedded, can be hosted in a managed environment or installed in a customer's machine on premises.
The BI too ...
.
Relational OLAP (ROLAP)
ROLAP works directly with relational databases and does not require pre-computation. The base data and the dimension tables are stored as relational tables and new tables are created to hold the aggregated information. It depends on a specialized schema design. This methodology relies on manipulating the data stored in the relational database to give the appearance of traditional OLAP's slicing and dicing functionality. In essence, each action of slicing and dicing is equivalent to adding a "WHERE" clause in the SQL statement. ROLAP tools do not use pre-calculated data cubes but instead pose the query to the standard relational database and its tables in order to bring back the data required to answer the question. ROLAP tools feature the ability to ask any question because the methodology is not limited to the contents of a cube. ROLAP also has the ability to drill down to the lowest level of detail in the database.
While ROLAP uses a relational database source, generally the database must be carefully designed for ROLAP use. A database which was designed for OLTP In online transaction processing (OLTP), information systems typically facilitate and manage transaction-oriented applications. This is contrasted with online analytical processing.
The term "transaction" can have two different meanings, both of w ...
will not function well as a ROLAP database. Therefore, ROLAP still involves creating an additional copy of the data. However, since it is a database, a variety of technologies can be used to populate the database.
Advantages of ROLAP
* ROLAP is considered to be more scalable in handling large data volumes, especially models with dimensions
In physics and mathematics, the dimension of a mathematical space (or object) is informally defined as the minimum number of coordinates needed to specify any point within it. Thus, a line has a dimension of one (1D) because only one coordina ...
with very high cardinality
In mathematics, the cardinality of a set is a measure of the number of elements of the set. For example, the set A = \ contains 3 elements, and therefore A has a cardinality of 3. Beginning in the late 19th century, this concept was generalized ...
(i.e., millions of members).
* With a variety of data loading tools available, and the ability to fine-tune the extract, transform, load
In computing, extract, transform, load (ETL) is a three-phase process where data is extracted, transformed (cleaned, sanitized, scrubbed) and loaded into an output data container. The data can be collated from one or more sources and it can also ...
(ETL) code to the particular data model, load times are generally much shorter than with the automated MOLAP
Online analytical processing, or OLAP (), is an approach to answer multi-dimensional analytical (MDA) queries swiftly in computing. OLAP is part of the broader category of business intelligence, which also encompasses relational databases, rep ...
loads.
* The data are stored in a standard relational database
A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relatio ...
and can be accessed by any SQL reporting tool (the tool does not have to be an OLAP tool).
* ROLAP tools are better at handling ''non-aggregatable facts'' (e.g., textual descriptions). MOLAP
Online analytical processing, or OLAP (), is an approach to answer multi-dimensional analytical (MDA) queries swiftly in computing. OLAP is part of the broader category of business intelligence, which also encompasses relational databases, rep ...
tools tend to suffer from slow performance when querying these elements.
* By decoupling the data storage from the multi-dimensional model, it is possible to successfully model data that would not otherwise fit into a strict dimensional model.
* The ROLAP approach can leverage database
In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases sp ...
authorization controls such as row-level security, whereby the query results are filtered depending on preset criteria applied, for example, to a given user or group of users ( SQL WHERE clause).
Disadvantages of ROLAP
* There is a consensus in the industry that ROLAP tools have slower performance than MOLAP tools. However, see the discussion below about ROLAP performance.
* The loading of ''aggregate tables'' must be managed by custom ETL code. The ROLAP tools do not help with this task. This means additional development time and more code to support.
* When the step of creating aggregate tables is skipped, the query performance then suffers because the larger detailed tables must be queried. This can be partially remedied by adding additional aggregate tables, however it is still not practical to create aggregate tables for all combinations of dimensions/attributes.
* ROLAP relies on the general purpose database for querying and caching, and therefore several special techniques employed by MOLAP
Online analytical processing, or OLAP (), is an approach to answer multi-dimensional analytical (MDA) queries swiftly in computing. OLAP is part of the broader category of business intelligence, which also encompasses relational databases, rep ...
tools are not available (such as special hierarchical indexing). However, modern ROLAP tools take advantage of latest improvements in SQL language such as CUBE and ROLLUP operators, DB2 Cube Views, as well as other SQL OLAP extensions. These SQL improvements can mitigate the benefits of the MOLAP
Online analytical processing, or OLAP (), is an approach to answer multi-dimensional analytical (MDA) queries swiftly in computing. OLAP is part of the broader category of business intelligence, which also encompasses relational databases, rep ...
tools.
* Since ROLAP tools rely on SQL for all of the computations, they are not suitable when the model is heavy on calculations which don't translate well into SQL. Examples of such models include budgeting, allocations, financial reporting and other scenarios.
Performance of ROLAP
In the OLAP industry ROLAP is usually perceived as being able to scale for large data volumes, but suffering from slower query performance as opposed to MOLAP
Online analytical processing, or OLAP (), is an approach to answer multi-dimensional analytical (MDA) queries swiftly in computing. OLAP is part of the broader category of business intelligence, which also encompasses relational databases, rep ...
. Th
OLAP Survey
the largest independent survey across all major OLAP products, being conducted for 6 years (2001 to 2006) have consistently found that companies using ROLAP report slower performance than those using MOLAP even when data volumes were taken into consideration.
However, as with any survey there are a number of subtle issues that must be taken into account when interpreting the results.
* The survey shows that ROLAP tools have 7 times more users than MOLAP
Online analytical processing, or OLAP (), is an approach to answer multi-dimensional analytical (MDA) queries swiftly in computing. OLAP is part of the broader category of business intelligence, which also encompasses relational databases, rep ...
tools within each company. Systems with more users will tend to suffer more performance problems at peak usage times.
* There is also a question about complexity of the model, measured both in number of dimensions and richness of calculations. The survey does not offer a good way to control for these variations in the data being analyzed.
Downside of flexibility
Some companies select ROLAP because they intend to re-use existing relational database tables—these tables will frequently not be optimally designed for OLAP use. The superior flexibility of ROLAP tools allows this less than optimal design to work, but performance suffers. MOLAP
Online analytical processing, or OLAP (), is an approach to answer multi-dimensional analytical (MDA) queries swiftly in computing. OLAP is part of the broader category of business intelligence, which also encompasses relational databases, rep ...
tools in contrast would force the data to be re-loaded into an optimal OLAP design.
Hybrid OLAP (HOLAP)
The undesirable trade-off between additional ETL cost and slow query performance has ensured that most commercial OLAP tools now use a "Hybrid OLAP" (HOLAP) approach, which allows the model designer to decide which portion of the data will be stored in MOLAP
Online analytical processing, or OLAP (), is an approach to answer multi-dimensional analytical (MDA) queries swiftly in computing. OLAP is part of the broader category of business intelligence, which also encompasses relational databases, rep ...
and which portion in ROLAP.
There is no clear agreement across the industry as to what constitutes "Hybrid OLAP", except that a database will divide data between relational and specialized storage.[
] For example, for some vendors, a HOLAP database will use relational tables to hold the larger quantities of detailed data, and use specialized storage for at least some aspects of the smaller quantities of more-aggregate or less-detailed data. HOLAP addresses the shortcomings of MOLAP
Online analytical processing, or OLAP (), is an approach to answer multi-dimensional analytical (MDA) queries swiftly in computing. OLAP is part of the broader category of business intelligence, which also encompasses relational databases, rep ...
and ROLAP
Online analytical processing, or OLAP (), is an approach to answer multi-dimensional analytical (MDA) queries swiftly in computing. OLAP is part of the broader category of business intelligence, which also encompasses relational databases, re ...
by combining the capabilities of both approaches. HOLAP tools can utilize both pre-calculated cubes and relational data sources.
Vertical partitioning
In this mode HOLAP stores ''aggregations'' in MOLAP
Online analytical processing, or OLAP (), is an approach to answer multi-dimensional analytical (MDA) queries swiftly in computing. OLAP is part of the broader category of business intelligence, which also encompasses relational databases, rep ...
for fast query performance, and detailed data in ROLAP
Online analytical processing, or OLAP (), is an approach to answer multi-dimensional analytical (MDA) queries swiftly in computing. OLAP is part of the broader category of business intelligence, which also encompasses relational databases, re ...
to optimize time of cube ''processing''.
Horizontal partitioning
In this mode HOLAP stores some slice of data, usually the more recent one (i.e. sliced by Time dimension) in MOLAP
Online analytical processing, or OLAP (), is an approach to answer multi-dimensional analytical (MDA) queries swiftly in computing. OLAP is part of the broader category of business intelligence, which also encompasses relational databases, rep ...
for fast query performance, and older data in ROLAP
Online analytical processing, or OLAP (), is an approach to answer multi-dimensional analytical (MDA) queries swiftly in computing. OLAP is part of the broader category of business intelligence, which also encompasses relational databases, re ...
. Moreover, we can store some dices in MOLAP
Online analytical processing, or OLAP (), is an approach to answer multi-dimensional analytical (MDA) queries swiftly in computing. OLAP is part of the broader category of business intelligence, which also encompasses relational databases, rep ...
and others in ROLAP
Online analytical processing, or OLAP (), is an approach to answer multi-dimensional analytical (MDA) queries swiftly in computing. OLAP is part of the broader category of business intelligence, which also encompasses relational databases, re ...
, leveraging the fact that in a large cuboid, there will be dense and sparse subregions.
Products
The first product to provide HOLAP storage was Holos
Holos was an influential OLAP (Online Analytical Processing) product of the 1990s. Developed by Holistic Systems in 1987, the product remained in use until around 2004.
The core of the Holos Server was a business intelligence (BI) virtual mach ...
, but the technology also became available in other commercial products such as Microsoft Analysis Services
Microsoft SQL Server Analysis Services (SSAS) is an online analytical processing (OLAP) and data mining tool in Microsoft SQL Server. SSAS is used as a tool by organizations to analyze and make sense of information possibly spread out across mul ...
, Oracle Database OLAP Option, MicroStrategy
MicroStrategy Incorporated is an American company that provides business intelligence (BI), mobile software, and cloud-based services. Founded in 1989 by Michael J. Saylor, Sanju Bansal, and Thomas Spahr, the firm develops software to analyze ...
and SAP AG
Sap is a fluid transported in xylem cells (vessel elements or tracheids) or phloem sieve tube elements of a plant. These cells transport water and nutrients throughout the plant.
Sap is distinct from latex, resin, or cell sap; it is a sepa ...
BI Accelerator. The hybrid OLAP approach combines ROLAP and MOLAP technology, benefiting from the greater scalability of ROLAP and the faster computation of MOLAP. For example, a HOLAP server may store large volumes of detailed data in a relational database, while aggregations are kept in a separate MOLAP store. The Microsoft SQL Server 7.0 OLAP Services supports a hybrid OLAP server
Comparison
Each type has certain benefits, although there is disagreement about the specifics of the benefits between providers.
* Some MOLAP implementations are prone to database explosion, a phenomenon causing vast amounts of storage space to be used by MOLAP databases when certain common conditions are met: high number of dimensions, pre-calculated results and sparse multidimensional data.
* MOLAP generally delivers better performance due to specialized indexing and storage optimizations. MOLAP also needs less storage space compared to ROLAP because the specialized storage typically includes compression
Compression may refer to:
Physical science
*Compression (physics), size reduction due to forces
*Compression member, a structural element such as a column
*Compressibility, susceptibility to compression
* Gas compression
*Compression ratio, of a ...
techniques.
* ROLAP is generally more scalable. However, large volume pre-processing is difficult to implement efficiently so it is frequently skipped. ROLAP query performance can therefore suffer tremendously.
* Since ROLAP relies more on the database to perform calculations, it has more limitations in the specialized functions it can use.
* HOLAP attempts to mix the best of ROLAP and MOLAP. It can generally pre-process swiftly, scale well, and offer good function support.
Other types
The following acronyms are also sometimes used, although they are not as widespread as the ones above:
* WOLAP – Web-based OLAP
* DOLAP – Desktop
A desktop traditionally refers to:
* The surface of a desk (often to distinguish office appliances that fit on a desk, such as photocopiers and printers, from larger equipment covering its own area on the floor)
Desktop may refer to various compu ...
OLAP
* RTOLAP – Real-Time OLAP
* GOLAP – Graph OLAP
* CaseOLAP – Context-aware Semantic OLAP, developed for biomedical applications. The CaseOLAP platform includes data preprocessing (e.g., downloading, extraction, and parsing text documents), indexing and searching with Elasticsearch, creating a functional document structure called Text-Cube, and quantifying user-defined phrase-category relationships using the core CaseOLAP algorithm.
APIs and query languages
Unlike relational databases
A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relatio ...
, which had SQL as the standard query language, and widespread API
An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how ...
s such as ODBC
In computing, Open Database Connectivity (ODBC) is a standard application programming interface (API) for accessing database management systems (DBMS). The designers of ODBC aimed to make it independent of database systems and operating systems. An ...
, JDBC
Java Database Connectivity (JDBC) is an application programming interface (API) for the programming language Java, which defines how a client may access a database. It is a Java-based data access technology used for Java database connectivity. I ...
and OLEDB
OLE DB (''Object Linking and Embedding, Database'', sometimes written as OLEDB or OLE-DB), an API designed by Microsoft, allows accessing data from a variety of sources in a uniform manner. The API provides a set of interfaces implemented using ...
, there was no such unification in the OLAP world for a long time. The first real standard API was OLE DB for OLAP
OLE DB for OLAP (Object Linking and Embedding Database for Online Analytical Processing abbreviated ODBO) is a Microsoft published specification and an industry standard for multi-dimensional data processing. ODBO is the standard application prog ...
specification from Microsoft
Microsoft Corporation is an American multinational technology corporation producing computer software, consumer electronics, personal computers, and related services headquartered at the Microsoft Redmond campus located in Redmond, Washing ...
which appeared in 1997 and introduced the MDX query language. Several OLAP vendors – both server and client – adopted it. In 2001 Microsoft and Hyperion announced the XML for Analysis XML for Analysis (XMLA) is an industry standard for data access in analytical systems, such as online analytical processing (OLAP) and data mining. XMLA is based on other industry standards such as XML, SOAP and HTTP. XMLA is maintained by XMLA Cou ...
specification, which was endorsed by most of the OLAP vendors. Since this also used MDX as a query language, MDX became the de facto standard.
Since September-2011 LINQ
Language Integrated Query (LINQ, pronounced "link") is a Microsoft .NET Framework component that adds native data querying capabilities to .NET languages, originally released as a major part of .NET Framework 3.5 in 2007.
LINQ extends the langu ...
can be used to query SSAS OLAP cubes from Microsoft .NET.
Products
History
The first product that performed OLAP queries was ''Express,'' which was released in 1970 (and acquired by Oracle
An oracle is a person or agency considered to provide wise and insightful counsel or prophetic predictions, most notably including precognition of the future, inspired by deities. As such, it is a form of divination.
Description
The word '' ...
in 1995 from Information Resources). However, the term did not appear until 1993 when it was coined by Edgar F. Codd
Edgar Frank "Ted" Codd (19 August 1923 – 18 April 2003) was an English computer scientist who, while working for IBM, invented the relational model for database management, the theoretical basis for relational databases and relational databa ...
, who has been described as "the father of the relational database". Codd's paper[ resulted from a short consulting assignment which Codd undertook for former Arbor Software (later ]Hyperion Solutions
Hyperion Solutions Corporation was a software company located in Santa Clara, California, which was acquired by Oracle Corporation in 2007. Many of its products were targeted at the business intelligence (BI) and business performance managemen ...
, and in 2007 acquired by Oracle), as a sort of marketing coup. The company had released its own OLAP product, ''Essbase
Essbase is a multidimensional database management system (MDBMS) that provides a platform upon which to build analytic applications. Essbase began as a product from Arbor Software, which merged with Hyperion Software in 1998. Oracle Corporation a ...
'', a year earlier. As a result, Codd's "twelve laws of online analytical processing" were explicit in their reference to Essbase. There was some ensuing controversy and when Computerworld learned that Codd was paid by Arbor, it retracted the article. The OLAP market experienced strong growth in the late 1990s with dozens of commercial products going into market. In 1998, Microsoft released its first OLAP Server Microsoft Analysis Services
Microsoft SQL Server Analysis Services (SSAS) is an online analytical processing (OLAP) and data mining tool in Microsoft SQL Server. SSAS is used as a tool by organizations to analyze and make sense of information possibly spread out across mul ...
, which drove wide adoption of OLAP technology and moved it into the mainstream.
Product comparison
OLAP clients
OLAP clients include many spreadsheet programs like Excel, web application, SQL, dashboard tools, etc. Many clients support interactive data exploration where users select dimensions and measures of interest. Some dimensions are used as filters (for slicing and dicing the data) while others are selected as the axes of a pivot table or pivot chart. Users can also vary aggregation level (for drilling-down or rolling-up) the displayed view. Clients can also offer a variety of graphical widgets such as sliders, geographic maps, heat maps and more which can be grouped and coordinated as dashboards. An extensive list of clients appears in the visualization column of the comparison of OLAP servers table.
Market structure
Below is a list of top OLAP vendors in 2006, with figures in millions of US Dollar
The United States dollar (symbol: $; code: USD; also abbreviated US$ or U.S. Dollar, to distinguish it from other dollar-denominated currencies; referred to as the dollar, U.S. dollar, American dollar, or colloquially buck) is the official ...
s.
Open-source
* Apache Pinot is used at LinkedIn, Cisco, Uber, Slack, Stripe, DoorDash, Target, Walmart, Amazon, and Microsoft to deliver scalable real time analytics with low latency. It can ingest data from offline data sources (such as Hadoop and flat files) as well as online sources (such as Kafka). Pinot is designed to scale horizontally.
* Mondrian OLAP server
Mondrian is an open source OLAP (online analytical processing) server, written in Java. It supports the MDX
(multidimensional expressions) query language and the XML for Analysis anolap4jinterface specifications. It reads from SQL and other ...
is an open-source
Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
OLAP server written in Java
Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
. It supports the MDX query language, the XML for Analysis XML for Analysis (XMLA) is an industry standard for data access in analytical systems, such as online analytical processing (OLAP) and data mining. XMLA is based on other industry standards such as XML, SOAP and HTTP. XMLA is maintained by XMLA Cou ...
and th
olap4j
interface specifications.
* Apache Druid is a popular open-source distributed data store for OLAP queries that is used at scale in production by various organizations.
* Apache Kylin is a distributed data store for OLAP queries originally developed by eBay.
* Cubes (OLAP server)
Cubes is a light-weight open source multidimensional modelling and OLAP toolkit for development reporting applications and browsing of aggregated data written in Python programming language released under the MIT License.
Cubes provides to an ana ...
is another light-weight open-source
Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
toolkit implementation of OLAP functionality in the Python programming language
Python is a high-level, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation.
Python is dynamically-typed and garbage-collected. It supports multiple programming p ...
with built-in ROLAP.
* ClickHouse
ClickHouse is an open-source column-oriented DBMS (columnar database management system) for online analytical processing (OLAP) that allows users to generate analytical reports using SQL queries in real-time. ClickHouse Inc. is headquartered in ...
is a fairly new column orientated DBMS focusing on fast processing and response times.
* Duckdb is an in-process SQL OLAP database management system.
See also
* Comparison of OLAP servers
* Functional Database Model The functional database model is used to support analytics applications such as financial planning and performance management. The functional database model, or the functional model for short, is different from but complementary to the relational m ...
Bibliography
*
*
* Ling Liu and Tamer M. Özsu (Eds.) (2009).
Encyclopedia of Database Systems
4100 p. 60 illus. .
References
Citations
Sources
*
*
{{DEFAULTSORT:Online Analytical Processing
Data management