HOME

TheInfoList



OR:

rasdaman ("raster data manager") is an
Array DBMS Array database management systems (array DBMSs) provide database services specifically for arrays (also called raster data), that is: homogeneous collections of data items (often called pixels, voxels, etc.), sitting on a regular grid of one, two, ...
, that is: a
Database Management System In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases span ...
which adds capabilities for storage and retrieval of massive multi-dimensional
arrays An array is a systematic arrangement of similar objects, usually in rows and columns. Things called an array include: {{TOC right Music * In twelve-tone and serial composition, the presentation of simultaneous twelve-tone sets such that the ...
, such as sensor, image, simulation, and statistics data. A frequently used synonym to arrays is ''raster data'', such as in 2-D
raster graphics upright=1, The Smiley, smiley face in the top left corner is a raster image. When enlarged, individual pixels appear as squares. Enlarging further, each pixel can be analyzed, with their colors constructed through combination of the values for ...
; this actually has motivated the name ''rasdaman''. However, rasdaman has no limitation in the number of dimensions - it can serve, for example, 1-D measurement data, 2-D satellite imagery, 3-D x/y/t image time series and x/y/z exploration data, 4-D ocean and climate data, and even beyond spatio-temporal dimensions.


History

In 1989,
Peter Baumann Peter Baumann (born 29 January 1953) is a German musician. He formed the core line-up of the pioneering German electronic group Tangerine Dream with Edgar Froese and Christopher Franke in 1971. Baumann composed his first solo album in 1976, whi ...
started a research on database support for images, then at Fraunhofer Computer Graphics Institute. Following an in-depth investigation on raster data formalizations in imaging, in particular the AFATL Image Algebra, he established a database model for multi-dimensional arrays, including a data model and declarative query language. pioneering the field of Array Databases. Today, multi-dimensional arrays are also known as Data Cubes. At
TU Munich The Technical University of Munich (TUM or TU Munich; german: Technische Universität München) is a public research university in Munich, Germany. It specializes in engineering, technology, medicine, and applied and natural sciences. Establis ...
, in the EU funded basic research project ''RasDaMan'', a first prototype was established, on top of the O2 object-oriented DBMS, and tested in Earth and Life science applications. Over further EU funded projects, this system was completed and extended to support relational DBMSs. A dedicated research spin-off, rasdaman GmbH, was established to give commercial support in addition to the research which subsequently has been continued at
Jacobs University Constructor University is an international, private, residential research university located in Vegesack, Bremen, Germany. It offers study programs in engineering, humanities, natural and social sciences, in which students can acquire bachelor' ...
. Since then, both entities collaborate on the further development and use of the rasdaman technology.


Concepts


Data model

Based on an array algebra specifically developed for database purposes, rasdaman adds a new attribute type, array, to the relational model. As this array definition is parametrized it constitutes a
second-order Second-order may refer to: Mathematics * Second order approximation, an approximation that includes quadratic terms * Second-order arithmetic, an axiomatization allowing quantification of sets of numbers * Second-order differential equation, a di ...
construct or
template Template may refer to: Tools * Die (manufacturing), used to cut or shape material * Mold, in a molding process * Stencil, a pattern or overlay used in graphic arts (drawing, painting, etc.) and sewing to replicate letters, shapes or designs Co ...
; this fact is reflected by the second-order functionals in the algebra and query language. For historical reasons,
tables Table may refer to: * Table (furniture), a piece of furniture with a flat surface and one or more legs * Table (landform), a flat area of land * Table (information), a data arrangement with rows and columns * Table (database), how the table data ...
are called ''collections'', as initial design emphasized an embedding into the object-oriented database standard, ODMG. Anticipating a full integration with SQL, rasdaman collections represent a binary relation with the first attribute being an
object identifier In computing, object identifiers or OIDs are an identifier mechanism standardized by the International Telecommunication Union (ITU) and ISO/IEC for naming any object, concept, or "thing" with a globally unambiguous persistent name. Syntax and le ...
and the second being the array. This allows the establishment of foreign key references between arrays and regular relational tuples.


Raster Query Language

The rasdaman query language, rasql, embeds itself into standard SQL and its set-oriented processing. On the new attribute type, multi-dimensional arrays, a set of extra operations is provided which all are based on a minimal set of algebraically defined core operators, an ''array constructor'' (which establishes a new array and fills it with values) and an ''array condenser'' (which, similarly to SQL aggregates, derives scalar summary information from an array). The query language is declarative (and, hence, optimizable) and safe in evaluation - that is: every query is guaranteed to return after a finite number of processing steps. The rasql query guide provides details, here some examples may illustrate its use: * "From all 4-D x/y/z/t climate simulation data cubes, a cutout which contains all in x, a y extract between 100 and 200, all available along z, and a slice at position 42 (effectively resulting in a 3-D x/y/z cube)": select c *:*, 100:200, *:*, 42 from ClimateSimulations as c * "In all Landsat satellite images, suppress all non-green areas": select img * (img.green > 130) from LandsatArchive as img Note: this is a ''very'' naive phrasing of vegetation search; in practice one would use the
NDVI The normalized difference vegetation index (NDVI) is a simple graphical indicator that can be used to analyze remote sensing measurements, often from a space platform, assessing whether or not the target being observed contains live green veget ...
formula, use null values for cloud masking, and several more techniques. * "All MRI images where, in some region defined by the bit masks, intensity exceeds a threshold of 250": select img from MRI as img, Masks as m where some_cells( img > 250 and m ) * "A 2-D x/y slice from all 4-D climate simulation data cubes, each one encoded in PNG format": select png( c *:*, *:*, 100, 42 ) from ClimateSimulations as c


Architecture


Storage management

Raster objects are maintained in a standard relational database, based on the partitioning of a raster object into ''tiles''. Aside from a regular subdivision, any user or system generated partitioning is possible. As tiles form the unit of disk access, it is of critical importance that the tiling pattern is adjusted to the query access patterns; several tiling strategies assist in establishing a well-performing tiling. A geo index is employed to quickly determine the tiles affected by a query. Optionally, tiles are compressed using one of various choices, including lossless and lossy (wavelet) algorithms; independently from that, query results can be compressed for transfer to the client. Both tiling strategy and compression comprise database tuning parameters. Tiles and tile index are stored as BLOBs in a relational database which also holds the data dictionary needed by rasdaman's dynamic type system. Adapters are available for several relational systems, among them open-source
PostgreSQL PostgreSQL (, ), also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. It was originally named POSTGRES, referring to its origins as a successor to the In ...
. For arrays larger than disk space, hierarchical storage management (HSM) support has been developed.


Query processing

Queries are parsed, optimised, and executed in the rasdaman server. The parser receives the query string and generates the operation tree. Further, it applies algebraic optimisation rules to the query tree where applicable; of the 150 algebraic rewriting rules, 110 are actually optimising while the other 40 serve to transform the query into canonical form. Parsing and optimization together take less than a millisecond on a laptop. Execution follows a ''tile streaming'' paradigm: whenever possible, array tiles addressed by a query are fetched sequentially, and each tile is discarded after processing. This leads to an architecture scalable to data volumes exceeding server main memory by orders of magnitude. Query execution is parallelised. First, rasdaman offers inter-query parallelism: A dispatcher schedules requests into a pool of server processes on a per-transaction basis. Intra-query parallelism transparently distributes query subtrees across available cores, GPUs, or cloud nodes.


Client APIs

The primary interface to rasdaman is the query language. Embeddings into C++ and Java APIs allow invocation of queries, as well as client-side convenience functions for array handling. Arrays per se are delivered in the main memory format of the client language and processor architecture, ready for further processing. Data format codecs allow to retrieve arrays in common raster formats, such as CSV, PNG, and
NetCDF NetCDF (Network Common Data Form) is a set of software libraries and self-describing, machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data. The project homepage is hosted by the Unidata ...
. A Web design toolkit, raswct, is provided which makes the creation of Web query frontends easy, including graphical widgets for parametrized query handling, such as sliders for thresholds in queries.


Geo Web Services

A
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
servlet, ''petascope'', running as a rasdaman client offers Web service interfaces specifically for geo data access, processing and filtering. The following OGC standards are supported: WMS, WCS, WCPS, and WPS. For WCS and WCPS, rasdaman is the
reference implementation In the software development process, a reference implementation (or, less frequently, sample implementation or model implementation) is a program that implements all requirements from a corresponding specification. The reference implementation o ...
.


Status and license model

Today, rasdaman is a fully-fledged implementation offering select / insert / update / delete array query functionality. It is being used in both research and commercial installations. In a collaboration of the original code owner, rasdaman GmbH and
Jacobs University Constructor University is an international, private, residential research university located in Vegesack, Bremen, Germany. It offers study programs in engineering, humanities, natural and social sciences, in which students can acquire bachelor' ...
, a code split was performed in 2008 - 2009 resulting in ''rasdaman community'', an
open-source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
branch, and ''rasdaman enterprise'', the commercial branch. Since then, ''rasdaman community'' is being maintained by Jacobs University whereas ''rasdaman enterprise'' remains proprietary to rasdaman GmbH. The difference between both variants mainly consists of performance boosters (such as specific optimization techniques) intended to support particularly large databases, user numbers, and complex queries; Details are available on the ''rasdaman community'' website. The ''rasdaman community'' license releases the server in
GPL The GNU General Public License (GNU GPL or simply GPL) is a series of widely used free software licenses that guarantee end users the four freedoms to run, study, share, and modify the software. The license was the first copyleft for general u ...
and all client parts in
LGPL The GNU Lesser General Public License (LGPL) is a free-software license published by the Free Software Foundation (FSF). The license allows developers and companies to use and integrate a software component released under the LGPL into their own ...
, thereby allowing the use of the system in any kind of license environment.


Impact and use

Being the first Array DBMS shipped (first prototype available in 1996), rasdaman has shaped this recent database research domain. Concepts of the data and query model (declarativeness, sometimes choice of operators) find themselves in more recent approaches. In 2008, the
Open Geospatial Consortium The Open Geospatial Consortium (OGC), an international voluntary consensus standards organization for geospatial content and location-based services, sensor web and Internet of Things, GIS data processing and data sharing. It originated in 1994 ...
released the
Web Coverage Processing Service The Web Coverage Processing Service (WCPS) defines a language for filtering and processing of multi-dimensional raster coverages, such as sensor, simulation, image, and statistics data. The Web Coverage Processing Service is maintained by the Ope ...
standard which defines a raster query language based on the concept of a
coverage Coverage may refer to: Filmmaking * Coverage (lens), the size of the image a lens can produce * Camera coverage, the amount of footage shot and different camera setups used in filming a scene * Script coverage, a short summary of a script, wri ...
. Operator semantics is influenced by the rasdaman array algebra. EarthLook is a showcase for OGC
coverage Coverage may refer to: Filmmaking * Coverage (lens), the size of the image a lens can produce * Camera coverage, the amount of footage shot and different camera setups used in filming a scene * Script coverage, a short summary of a script, wri ...
standards in action, offering 1-D through 4-D use cases of raster data access and ad-hoc processing. EarthLook is built on rasdaman. A sample large project in which rasdaman is being used for large-scale services in all
Earth sciences Earth science or geoscience includes all fields of natural science related to the planet Earth. This is a branch of science dealing with the physical, chemical, and biological complex constitutions and synergistic linkages of Earth's four sphere ...
is EarthServer, six services with a volume of at least 100 terabytes each have been set up for integrated data / metadata retrieval and distributed query processing.


References

{{Reflist Free database management systems Proprietary database management systems NoSQL Query languages