Data cube
   HOME

TheInfoList



OR:

In computer programming contexts, a data cube (or datacube) is a multi-dimensional ("n-D") array of values. Typically, the term data cube is applied in contexts where these arrays are massively larger than the hosting computer's main memory; examples include multi-terabyte/petabyte data warehouses and
time series In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Ex ...
of image data. The data cube is used to represent data (sometimes called facts) along some dimensions of interest. For example, in online analytical processing (OLAP) such dimensions could be the subsidiaries a company has, the products the company offers, and time; in this setup, a fact would be a sales event where a particular product has been sold in a particular subsidiary at a particular time. In satellite image timeseries dimensions would be latitude and longitude coordinates and time; a fact (sometimes called measure) would be a pixel at a given space and time as taken by the satellite (following some processing that is not of concern here). Even though it is called a ''cube'' (and the examples provided above happen to be 3-dimensional for brevity), a data cube generally is a multi-dimensional concept which can be 1-dimensional, 2-dimensional, 3-dimensional, or higher-dimensional. In any case, every dimension divides data into groups of cells whereas each cell in the cube represents a single measure of interest. Sometimes cubes hold only few values with the rest being ''empty'', i.e. undefined, sometimes most or all cube coordinates hold a cell value. In the first case such data are called ''sparse'', in the second case they are called ''dense'', although there is no hard delineation between both.


History

Multi-dimensional arrays have long been familiar in programming languages. Fortran offers arbitrarily-indexed 1-D arrays and arrays of arrays, which allows the construction of higher-dimensional arrays, up to 15 dimensions. APL supports n-D arrays with a rich set of operations. All these have in common that arrays must fit into the main memory and are available only while the particular program maintaining them (such as image processing software) is running. A series of data exchange formats support storage and transmission of data cube-like data, often tailored towards particular application domains. Examples include MDX for statistical (in particular, business) data, Hierarchical Data Format for general scientific data, and TIFF for imagery. In 1992, Peter Baumann introduced management of massive data cubes with high-level user functionality combined with an efficient software architecture. Datacube operations include subset extraction, processing, fusion, and in general queries in the spirit of data manipulation languages like SQL. Some years after, the data cube concept was applied to describe time-varying business data as data cubes by Jim Gray, et al., and by
Venky Harinarayan Venky Harinarayan is an Indian entrepreneur. He is the co-founder of Cambrian Ventures and Kosmix. Harinarayan also co-founded Junglee Corp. and played a significant role at Amazon.com in the late 1990s. Originally from Bombay, India, Harinara ...
,
Anand Rajaraman Anand Rajaraman is a Web and technology entrepreneur. He is the co-founder of Cambrian Ventures and Kosmix. Rajaraman also co-founded former Junglee Corp. and played a significant role at Amazon.com in the late 1990s. Personal life and education ...
and Jeff Ullman which rank among the top 500 most cited computer science articles over a 25-year period. Around that time, a working group on Multi-Dimensional Databases ("Arbeitskreis Multi-Dimensionale Datenbanken") was established at German
Gesellschaft für Informatik The German Informatics Society (GI) (german: Gesellschaft für Informatik) is a German professional society for computer science, with around 20,000 personal and 250 corporate members. It is the biggest organized representation of its kind in the ...
. Datacube Inc. was an image processing company selling hardware and
software Software is a set of computer programs and associated documentation and data. This is in contrast to hardware, from which the system is built and which actually performs the work. At the lowest programming level, executable code consist ...
applications for the PC market in 1996, however without addressing data cubes as such. The EarthServer initiative has established geo data cube service requirements.


Standardization

In 2018, the ISO SQL database language was extended with data cube functionality as "SQL -- Part 15: Multi-dimensional arrays (SQL/MDA)". Web Coverage Processing Service is a geo data cube analytics language issued by the Open Geospatial Consortium in 2008. In addition to the common data cube operations, the language knows about the semantics of space and time and supports both regular and irregular grid data cubes, based on the concept of coverage data. An industry standard for querying business data cubes, originally developed by
Microsoft Microsoft Corporation is an American multinational technology corporation producing computer software, consumer electronics, personal computers, and related services headquartered at the Microsoft Redmond campus located in Redmond, Washi ...
, is MultiDimensional eXpressions.


Implementation

Many high-level computer languages treat data cubes and other large arrays as single entities distinct from their contents. These languages, of which Fortran, APL, IDL, NumPy,
PDL PDL is an initialism for: Politics * Democratic Liberal Party (''Partidul Democrat Liberal''), a former political party in Romania *Labour Democratic Party (''Partito Democratico del Lavoro''), a former political party in Italy *Pole of Freedom ...
, and S-Lang are examples, allow the programmer to manipulate complete
film A film also called a movie, motion picture, moving picture, picture, photoplay or (slang) flick is a work of visual art that simulates experiences and otherwise communicates ideas, stories, perceptions, feelings, beauty, or atmospher ...
clips and other data en masse with simple expressions derived from
linear algebra Linear algebra is the branch of mathematics concerning linear equations such as: :a_1x_1+\cdots +a_nx_n=b, linear maps such as: :(x_1, \ldots, x_n) \mapsto a_1x_1+\cdots +a_nx_n, and their representations in vector spaces and through matrice ...
and vector mathematics. Some languages (such as PDL) distinguish between a list of images and a data cube, while many (such as IDL) do not. Array DBMSs (Database Management Systems) offer a data model which generically supports definition, management, retrieval, and manipulation of n-dimensional data cubes. This database category has been pioneered by the rasdaman system since 1994.


Applications

Multi-dimensional arrays can meaningfully represent spatio-temporal sensor, image, and simulation data, but also statistics data where the semantics of dimensions is not necessarily of spatial or temporal nature. Generally, any kind of axis can be combined with any other into a data cube.


Mathematics

In mathematics, a one-dimensional array corresponds to a vector, a two-dimensional array resembles a
matrix Matrix most commonly refers to: * ''The Matrix'' (franchise), an American media franchise ** '' The Matrix'', a 1999 science-fiction action film ** "The Matrix", a fictional setting, a virtual reality environment, within ''The Matrix'' (franchi ...
; more generally, a tensor may be represented as an n-dimensional data cube.


Science and engineering

For a time sequence of color images, the array is generally four-dimensional, with the dimensions representing image X and Y coordinates, time, and RGB (or other
color space A color space is a specific organization of colors. In combination with color profiling supported by various physical devices, it supports reproducible representations of colorwhether such representation entails an analog or a digital represen ...
) color plane. For example, the EarthServer initiative unites data centers from different continents offering 3-D x/y/t satellite image timeseries and 4-D x/y/z/t weather data for retrieval and server-side processing through the Open Geospatial Consortium WCPS geo data cube query language standard. A data cube is also used in the field of
imaging spectroscopy In imaging spectroscopy (also hyperspectral imaging or spectral imaging) each pixel of an image acquires many bands of light intensity data from the spectrum, instead of just the three bands of the RGB color model. More precisely, it is the simu ...
, since a spectrally-resolved image is represented as a three-dimensional volume. Earth observation data cubes combine satellite imagery such as Landsat 8 and Sentinel-2 with Geographic information system analytics.


Business intelligence

In online analytical processing (OLAP), data cubes are a common arrangement of business data suitable for analysis from different perspectives through operations like slicing, dicing, pivoting, and aggregation.


See also

* Array DBMS * rasdaman * OLAP cube * Australian Geoscience Data Cube *
Graph (discrete mathematics) In discrete mathematics, and more specifically in graph theory, a graph is a structure amounting to a set of objects in which some pairs of the objects are in some sense "related". The objects correspond to mathematical abstractions called '' ...
* Abstract semantic graph *
Apache Kylin Apache Kylin is an open source distributed analytics engine designed to provide a SQL interface and multi-dimensional analysis (OLAP) on Hadoop and Alluxio supporting extremely large datasets. It was originally developed by eBay, and is now a ...


References

{{DEFAULTSORT:Data Cube Image processing Database theory