HOME

TheInfoList



OR:

In
computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes, and development of both hardware and software. Computing has scientific, e ...
, the term data warehouse appliance (DWA) was coined by Foster HinshawInfostor » Introducing 'data warehouse appliances'
/ref>TDWI » Still Another Data Warehouse Appliance Is Coming!
/ref> for a computer architecture for
data warehouse In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for Business reporting, reporting and data analysis and is considered a core component of business intelligence. DWs are central Repos ...
s (DW) specifically marketed for
big data Though used sometimes loosely partly because of a lack of formal definition, the interpretation that seems to best describe Big data is the one associated with large body of information that we could not comprehend when used only in smaller am ...
analysis Analysis ( : analyses) is the process of breaking a complex topic or substance into smaller parts in order to gain a better understanding of it. The technique has been applied in the study of mathematics and logic since before Aristotle (38 ...
and discovery that is simple to use (not a pre-configuration) and has a high performance for the workload. A DWA includes an integrated set of servers, storage,
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs. Time-sharing operating systems schedule tasks for efficient use of the system and may also in ...
s, and
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases sp ...
s. In marketing, the term evolved to include pre-installed and pre-optimized hardware and software as well as similar software-only systems promoted as easy to install on specific recommended hardware configurations or preconfigured as a complete system. These are marketing uses of the term and do not reflect the technical definition. A DWA is designed specifically for high performance big data analytics and is delivered as an easy-to-use packaged system. DW appliances are marketed for data volumes in the
terabyte The byte is a units of information, unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character (computing), character of text in a computer and for this ...
to
petabyte The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable unit ...
range.


Technology

The data warehouse appliance (DWA) has several characteristics which differentiate that architecture from similar machines in a
data center A data center (American English) or data centre (British English)See spelling differences. is a building, a dedicated space within a building, or a group of buildings used to house computer systems and associated components, such as telecommunic ...
, such as an enterprise data warehouse (EDW). # A DWA has a very tight integration of its internal components which are optimized for "data-centric" operations in contrast to "compute-centric" operations. The latter tend to emphasize number of CPU's, cores and network bandwidth. # A DWA is trivial to use and install. In contrast to a "pre-configuration" of components, a DWA has very few configuration switches or options. The elimination of such options significantly reduces configuration error – the number one cause for failure in large systems. # A DWA is optimized for analytics on
big data Though used sometimes loosely partly because of a lack of formal definition, the interpretation that seems to best describe Big data is the one associated with large body of information that we could not comprehend when used only in smaller am ...
. In contrast, preceding architectures (including parallel ones) focused on "enterprise data warehouse" being a general-purpose repository for data and supporting analytics as an ancillary task. Most DW appliances use
massively parallel Massively parallel is the term for using a large number of computer processors (or separate computers) to simultaneously perform a set of coordinated computations in parallel. GPUs are massively parallel architecture with tens of thousands of t ...
processing (MPP) architectures to provide high query performance and platform
scalability Scalability is the property of a system to handle a growing amount of work by adding resources to the system. In an economic context, a scalable business model implies that a company can increase sales given increased resources. For example, a ...
. MPP architectures consist of independent processors or servers executing in parallel. Most MPP architectures implement a "
shared-nothing architecture A shared-nothing architecture (SN) is a distributed computing architecture in which each update request is satisfied by a single node (processor/memory/storage unit) in a computer cluster. The intent is to eliminate contention among nodes. Nodes do ...
" where each server operates self-sufficiently and controls its own memory and disk. DW appliances distribute data onto dedicated disk storage units connected to each server in the appliance. This distribution allows DW appliances to resolve a
relational query A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relati ...
by scanning data on each server in parallel. The divide-and-conquer approach delivers high performance and scales linearly as new servers are added into the architecture.


History

"Data warehouse appliance" is a term coined by Foster Hinshaw, the founder of
Netezza IBM Netezza (pronounced ne-teez-a) is a subsidiary of American technology company IBM that designs and markets high-performance data warehouse appliances and advanced analytics applications for uses including enterprise data warehousing, busines ...
. In creating the first data warehouse appliance, Hinshaw and Netezza used the foundations developed by
Model 204 Model 204 (M204) is a database management system for IBM and compatible mainframe computers developed and commercialized by Computer Corporation of America. It was announced in 1965, and first deployed in 1972. It incorporates a programming langua ...
,
Teradata Teradata Corporation is an American software company that provides cloud database and analytics-related software, products, and services. The company was formed in 1979 in Brentwood, California, as a collaboration between researchers at Caltech a ...
, and others, to pioneer a new category to address consumer analytics efficiently by providing a modular, scalable, easy-to-manage database system that’s cost effective. MPP database architectures have a long pedigree. Some consider
Teradata Teradata Corporation is an American software company that provides cloud database and analytics-related software, products, and services. The company was formed in 1979 in Brentwood, California, as a collaboration between researchers at Caltech a ...
's initial product as the first DW appliance — or Britton-Lee's. Teradata acquired Britton Lee — renamed ShareBase — in June, 1990. Others disagree, considering appliances as a "disruptive technology" for Teradata Additional vendors, including
Tandem Computers Tandem Computers, Inc. was the dominant manufacturer of fault-tolerant computer systems for Automated teller machine, ATM networks, banks, stock exchanges, telephone switching centers, and other similar commercial transaction processing applicati ...
, and
Sequent Computer Systems Sequent Computer Systems was a computer company that designed and manufactured multiprocessing computer systems. They were among the pioneers in high-performance symmetric multiprocessing (SMP) open systems, innovating in both hardware (e.g., cach ...
also offered MPP architectures in the 1980s.
Open source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
and
commodity computing Commodity computing (also known as commodity cluster computing) involves the use of large numbers of already-available computing components for parallel computing, to get the greatest amount of useful computation at low cost. It is computing done i ...
components aided a re-emergence of MPP data warehouse appliances. Advances in technology reduced costs and improved performance in storage devices,
multi-core A multi-core processor is a microprocessor on a single integrated circuit with two or more separate processing units, called cores, each of which reads and executes program instructions. The instructions are ordinary CPU instructions (such a ...
CPUs and networking components. Open-source
RDBMS A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relation ...
products, such as
Ingres Jean-Auguste-Dominique Ingres ( , ; 29 August 1780 – 14 January 1867) was a French Neoclassicism, Neoclassical Painting, painter. Ingres was profoundly influenced by past artistic traditions and aspired to become the guardian of academic ...
and
PostgreSQL PostgreSQL (, ), also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. It was originally named POSTGRES, referring to its origins as a successor to the In ...
, reduce software-license costs and allow DW-appliance vendors to focus on optimization rather than providing basic database functionality. Open-source
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which ...
became a common operating system for DW appliances. Other DW appliance vendors use specialized hardware and advanced software, instead of MPP architectures.http://www.tpc.org/tpch/results/tpch_price_perf_results.asp
Netezza IBM Netezza (pronounced ne-teez-a) is a subsidiary of American technology company IBM that designs and markets high-performance data warehouse appliances and advanced analytics applications for uses including enterprise data warehousing, busines ...
announced a "data appliance" in 2003, and used specialized
field-programmable gate array A field-programmable gate array (FPGA) is an integrated circuit designed to be configured by a customer or a designer after manufacturinghence the term '' field-programmable''. The FPGA configuration is generally specified using a hardware d ...
hardware. Kickfire followed in 2008 with what they called a
dataflow In computing, dataflow is a broad concept, which has various meanings depending on the application and context. In the context of software architecture, data flow relates to stream processing or reactive programming. Software architecture Dataf ...
"sql chip". In 2009 more DW appliances emerged. IBM integrated its
InfoSphere Infosphere (''information'' + -''sphere''), analogous to a biosphere, is a metaphysical realm of information, data, knowledge, and communication, populated by informational entities called ''inforgs'' (or, ''informational organisms''). Though on ...
warehouse (formerly DB2 Warehouse) with its own servers and storage to create the IBM InfoSphere Balanced Warehouse. Netezza introduced its TwinFin platform based on commodity IBM hardware. Other DW appliance vendors have also partnered with major hardware vendors. DATAllegro, prior to acquisition by
Microsoft Microsoft Corporation is an American multinational technology corporation producing computer software, consumer electronics, personal computers, and related services headquartered at the Microsoft Redmond campus located in Redmond, Washing ...
, partnered with
EMC Corporation Dell EMC (EMC Corporation until 2016) is an American multinational corporation headquartered in Hopkinton, Massachusetts and Round Rock, Texas, United States. Dell EMC sells data storage, information security, virtualization, analytics, cloud ...
and
Dell Dell is an American based technology company. It develops, sells, repairs, and supports computers and related products and services. Dell is owned by its parent company, Dell Technologies. Dell sells personal computers (PCs), servers, data ...
and implemented open-source Ingres on Linux.
Greenplum Greenplum is a big data technology based on MPP architecture and the Postgres open source database technology. The technology was created by a company of the same name headquartered in San Mateo, California around 2005. Greenplum was acquired ...
had a partnership with
Sun Microsystems Sun Microsystems, Inc. (Sun for short) was an American technology company that sold computers, computer components, software, and information technology services and created the Java programming language, the Solaris operating system, ZFS, the ...
and implements Greenplum Database (based on PostgreSQL) on
Solaris Solaris may refer to: Arts and entertainment Literature, television and film * ''Solaris'' (novel), a 1961 science fiction novel by Stanisław Lem ** ''Solaris'' (1968 film), directed by Boris Nirenburg ** ''Solaris'' (1972 film), directed by ...
using the
ZFS ZFS (previously: Zettabyte File System) is a file system with volume management capabilities. It began as part of the Sun Microsystems Solaris operating system in 2001. Large parts of Solaris – including ZFS – were published under an ope ...
file system. HP Neoview uses HP
NonStop SQL NonStop SQL is a commercial relational database management system that is designed for fault tolerance and scalability, currently offered by Hewlett Packard Enterprise. The latest version is SQL/MX 3.4. The product was originally developed by Tan ...
. The market has also seen the emergence of data-warehouse bundles where vendors combine their hardware and database software together as a data warehouse platform. The
Oracle An oracle is a person or agency considered to provide wise and insightful counsel or prophetic predictions, most notably including precognition of the future, inspired by deities. As such, it is a form of divination. Description The word '' ...
Optimized Warehouse Initiative combines the Oracle Database with hardware from various computer manufacturers (
Dell Dell is an American based technology company. It develops, sells, repairs, and supports computers and related products and services. Dell is owned by its parent company, Dell Technologies. Dell sells personal computers (PCs), servers, data ...
, EMC, HP, IBM,
SGI SGI may refer to: Companies *Saskatchewan Government Insurance *Scientific Games International, a gambling company *Silicon Graphics, Inc., a former manufacturer of high-performance computing products *Silicon Graphics International, formerly Rac ...
and
Sun Microsystems Sun Microsystems, Inc. (Sun for short) was an American technology company that sold computers, computer components, software, and information technology services and created the Java programming language, the Solaris operating system, ZFS, the ...
). Oracle's Optimized Warehouses offer pre-validated configurations and the database software comes pre-installed. In September 2008 Oracle began offering a more classic appliance offering, the HP Oracle Database Machine, a jointly developed and co-branded platform that Oracle sold and supported and HP built in configurations specifically for Oracle. In September 2009, Oracle released a second-generation
Exadata The Oracle Exadata Database Machine (Exadata) is a computing platform optimized for running Oracle Databases. Exadata is a combined hardware and software platform that includes scale-out Intel x86-64 compute and storage servers, RoCE or Infini ...
system, based on their acquired
Sun Microsystems Sun Microsystems, Inc. (Sun for short) was an American technology company that sold computers, computer components, software, and information technology services and created the Java programming language, the Solaris operating system, ZFS, the ...
hardware.


See also

*
Business intelligence Business intelligence (BI) comprises the strategies and technologies used by enterprises for the data analysis and management of business information. Common functions of business intelligence technologies include reporting, online analytical pr ...
(BI) * Data mining *
Data mart A data mart is a structure/access pattern specific to ''data warehouse'' environments, used to retrieve client-facing data. The data mart is a subset of the data warehouse and is usually oriented to a specific business line or team. Whereas data w ...
*
Data warehouse In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for Business reporting, reporting and data analysis and is considered a core component of business intelligence. DWs are central Repos ...


References


External links

*
DBMS2 - Positioning the data warehouse appliances
{{DEFAULTSORT:Data Warehouse Appliance Data warehousing