HOME

TheInfoList



OR:

Actian Vector (formerly known as VectorWise) is an SQL
relational database management system A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relatio ...
designed for high performance in analytical database applications. It published record breaking results on the
Transaction Processing Performance Council In online transaction processing (OLTP), information systems typically facilitate and manage transaction-oriented applications. This is contrasted with online analytical processing. The term "transaction" can have two different meanings, both of wh ...
's TPC-H benchmark for database sizes of 100 GB, 300 GB, 1 TB and 3 TB on non-clustered hardware. Vectorwise originated from the X100 research project carried out within the
Centrum Wiskunde & Informatica The (abbr. CWI; English: "National Research Institute for Mathematics and Computer Science") is a research centre in the field of mathematics and theoretical computer science. It is part of the institutes organization of the Dutch Research Cou ...
(CWI, the Dutch National Research Institute for Mathematics and Computer Science) between 2003 and 2008. It was spun off as a start-up company in 2008, and acquired by
Ingres Corporation Actian is a computer software company headquartered in Sunnyvale, California that provides data management software. In July 2018, Actian was acquired by HCL Technologies and Sumeru Equity Partners for $330 million. On December 31, 2021, HCL Tech ...
in 2011. It was released as a commercial product in June, 2010, initially for 64-bit Linux platform, and later also for Windows. Starting from 3.5 release in April 2014, the product name was shortened to "Vector". In June 2014, Actian Vortex was announced as a clustered
massive parallel processing Massively parallel is the term for using a large number of computer processors (or separate computers) to simultaneously perform a set of coordinated computations in parallel. GPUs are massively parallel architecture with tens of thousands of th ...
version of Vector, in
Hadoop Apache Hadoop () is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage an ...
with storage in
HDFS Apache Hadoop () is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage a ...
. Actian Vortex was later renamed to Actian Vector in Hadoop.


Technology

The basic architecture and design principles of the X100 engine of the VectorWise database were well described in two Phd theses of VectorWise founders Marcin Żukowski: "Balancing Vectorized Query Execution with Bandwidth-Optimized Storage" and Sandor Héman: "Updating Compressed Column Stores", under supervision of another founder, professor
Peter Boncz Peter Boncz is a Dutch computer scientist specializing in database systems. He is a researcher at the Centrum Wiskunde & Informatica and professor at the Vrije Universiteit Amsterdam in the special chair of Large-Scale Analytical Data Managemen ...
. The X100 engine was integrated with Ingres SQL front-end, allowing the database to use the Ingres SQL syntax, and Ingres set of client and
database administration Database administration is the function of managing and maintaining database management systems (DBMS) software. Mainstream DBMS software such as Oracle, IBM Db2 and Microsoft SQL Server need ongoing management. As such, corporations that use D ...
tools. The query execution architecture makes use of "Vectorized Query Execution" processing in chunks of
cache Cache, caching, or caché may refer to: Places United States * Cache, Idaho, an unincorporated community * Cache, Illinois, an unincorporated community * Cache, Oklahoma, a city in Comanche County * Cache, Utah, Cache County, Utah * Cache County ...
-fitting vectors of data. This allows to involve the principles of
vector processing In computing, a vector processor or array processor is a central processing unit (CPU) that implements an instruction set where its instructions are designed to operate efficiently and effectively on large one-dimensional arrays of data called ' ...
and
single instruction, multiple data Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it should ...
(SIMD) to perform the same operation on multiple data simultaneously and exploit data level parallelism on modern hardware. It also reduces overheads found in traditional "row-at-a-time processing" found in most RDBMSes. The database storage is in a compressed column-oriented format, with scan-optimised buffer manager. In Actian Vortex in
HDFS Apache Hadoop () is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage a ...
the same proprietary format is used. Loading big amounts of data is supported through direct appends to stable storage, while small transactional updates are supported through patent-pending Positional Delta Trees (PDTs) specialized
B-tree In computer science, a B-tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree generalizes the binary search tree, allowing for n ...
-like structures of indexed differences on top of stable storage, which are seamlessly patched during scans, and which are transparently propagated to stable storage in a background process. The method of storing differences in patch-like structures and rewriting the stable storage in bulk made it possible to work in a filesystem like HDFS, in which files are append-only.


History

A comparative
Transaction Processing Performance Council In online transaction processing (OLTP), information systems typically facilitate and manage transaction-oriented applications. This is contrasted with online analytical processing. The term "transaction" can have two different meanings, both of wh ...
TPC-H performance test of
MonetDB MonetDB is an open-source column-oriented relational database management system (RDBMS) originally developed at the Centrum Wiskunde & Informatica (CWI) in the Netherlands. It is designed to provide high performance on complex queries against lar ...
carried out by its original creator at
Centrum Wiskunde & Informatica The (abbr. CWI; English: "National Research Institute for Mathematics and Computer Science") is a research centre in the field of mathematics and theoretical computer science. It is part of the institutes organization of the Dutch Research Cou ...
(CWI) in 2003 showed room for improvement in its performance as an analytical database. As a result, CWI researchers proposed a new architecture using pipelined query processing ("vectorised processing") to improve the performance of analytical queries. This led to the creation of the "X100" project, with the intention of designing a new kernel for MonetDB, to be called "MonetDB/X100". The X100 project team won the 2007 DaMoN Best Paper Award for the paper "Vectorized Data Processing on the Cell Broadband Engine" as well as the 2008 DaMoN Best Paper Award for the paper "DSM vs. NSM: CPU Performance Tradeoffs in Block-Oriented Query Processing". In August 2009 the originators for the X100 project won the "Ten Year Best Paper Award" at the 35th International Conference on Very Large Data Bases (VLDB) for their 1999 paper "Database architecture Optimized for the new bottleneck: Memory access". It was recognised by the VLDB that the project team had made great progress in implementing the ideas contained in the paper over the previous 10 years. The central premise of the paper is that traditional relational database systems were designed in the late 1970s and early 1980s during a time when database performance was dictated by the time required to read from and write data to hard disk. At that time available CPU was relatively slow and main memory was relatively small, so that very little data could be loaded into memory at a time. Over time hardware improved, with CPU speed and memory size doubling roughly every two years in accordance with
Moore’s law Moore's law is the observation that the number of transistors in a dense integrated circuit (IC) doubles about every two years. Moore's law is an observation and projection of a historical trend. Rather than a law of physics, it is an empiri ...
, but that the design of traditional relational database systems had not adapted. The CWI research team described improvements in database code and data structures to make best use of modern hardware. In 2008 the X100 project was spun off from MonetDB as a separate project, with its own company, and renamed "VectorWise". Co-founders included Peter A. Boncz and Marcin Żukowski. In June 2010, the VectorWise technology was officially announced by
Ingres Corporation Actian is a computer software company headquartered in Sunnyvale, California that provides data management software. In July 2018, Actian was acquired by HCL Technologies and Sumeru Equity Partners for $330 million. On December 31, 2021, HCL Tech ...
, with the release of Ingres VectorWise 1.0. In March 2011, VectorWise 1.5 was released, publishing a record breaking result on TPC-H 100 GB benchmark. New features included parallel query execution (single query executed on multiple CPU cores), improved bulk loading and enhanced SQL support. In June 2011, VectorWise 1.6 was released, publishing record breaking results on TPC-H 100 GB, 300 GB and 1 TB non-clustered benchmark. In December 2011, VectorWise 2.0 was released with new SQL support for analytical functions such as rank and percentile and enhanced date, time and timestamp datatypes, and support for disk spilling in hash joins and aggregation. In June 2012, VectorWise 2.5 was released. In this release storage format was reorganized to allow storing the database in multiple location, the background update propagation mechanism from PDTs to stable storage was enhanced to allow rewriting only the changed blocks instead of full rewrites, and a new patented Predictive Buffer Manager (PBM) was introduced. In March 2013, VectorWise 3.0 was released. New features included more efficient storage engine, support for more data types and analytical SQL functions, enhanced DDL features, and improved monitoring and profiling accessibility. In March 2014, Actian Vector 3.5 was released, with a new rebranded and shortened name. New features included support for partitioned tables, improved disk spilling, online backup capabilities and improved SQL support - e.g. MERGE/UPSERT DML operations and FIRST_VALUE and LAST_VALUE window aggregation functions. In March 2015 Actian Vector 4 was released In June 2014 at Hadoop Summit 2014 in San Jose Actian announced Actian Vortex clustered MPP version of Vector, with same level of SQL support working in Hadoop with storage directly in HDFS. Actian Vortex was later renamed to Actian Vector in Hadoop, and non-clustered Actian Vector releases are also updated to match. Actian Vector in Hadoop 4 was released in December 2015. In April 2019, Actian Avalanche was released as the cloud option. Actian Vector 5.0 was released in July 2016, and 5.1 was released in June 2018. Actian Vector in Hadoop 5.0 was released in October 2017, and 5.1 was released in November 2018. Avalanche version 5.1 for
Amazon Web Services Amazon Web Services, Inc. (AWS) is a subsidiary of Amazon.com, Amazon that provides Software as a service, on-demand cloud computing computing platform, platforms and Application programming interface, APIs to individuals, companies, and gover ...
(AWS) was released in April 2019, and version 5.1 for
Microsoft Azure Microsoft Azure, often referred to as Azure ( , ), is a cloud computing platform operated by Microsoft for application management via around the world-distributed data centers. Microsoft Azure has multiple capabilities such as software as a ...
was released in October 2019.


See also

*
Database management system In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases span ...
*
Relational database A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relatio ...
*
MonetDB MonetDB is an open-source column-oriented relational database management system (RDBMS) originally developed at the Centrum Wiskunde & Informatica (CWI) in the Netherlands. It is designed to provide high performance on complex queries against lar ...
*
Ingres (database) Ingres Database ( ) is a proprietary SQL relational database management system intended to support large commercial and government applications. Actian Corporation, which announced April 2018 that it is being acquired by HCL Technologies, cont ...


References

{{Reflist, 30em


External links


Official website of Actian Vector
Products introduced in 2010 Proprietary database management systems Relational database management systems