Vertica Systems is an
analytic database management
In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases spa ...
software company. Vertica was founded in 2005 by the database researcher
Michael Stonebraker
Michael Ralph Stonebraker (born October 11, 1943) is a computer scientist specializing in database systems. Through a series of academic prototypes and commercial startups, Stonebraker's research and products are central to many relational databa ...
, with Andrew Palmer as the founding CEO. Ralph Breslauer and
Christopher P. Lynch served as later CEOs.
Lynch joined as chairman and CEO in 2010 and was responsible for Vertica's acquisition by
Hewlett Packard in March 2011. The acquisition expanded the
HP Software
Micro Focus International plc is a British multinational software and information technology business based in Newbury, Berkshire, England. The firm provides software and consultancy. The company is listed on the London Stock Exchange and i ...
portfolio for enterprise companies and the public sector group. As part of the merger of
Micro Focus
Micro Focus International plc is a British multinational software and information technology business based in Newbury, Berkshire, England. The firm provides software and consultancy. The company is listed on the London Stock Exchange and is ...
and the Software division of
Hewlett Packard Enterprise
The Hewlett Packard Enterprise Company (HPE) is an American multinational information technology company based in Spring, Texas, United States.
HPE was founded on November 1, 2015, in Palo Alto, California, as part of the splitting of the ...
, Vertica joined Micro Focus in September, 2017.
Products
The column-oriented Vertica Analytics Platform was designed to manage large, fast-growing volumes of data and with fast query performance for
data warehouse
In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for Business reporting, reporting and data analysis and is considered a core component of business intelligence. DWs are central Repos ...
s and other query-intensive applications. The product claims to greatly improve query performance over traditional
relational database systems, and to provide high availability and
exabyte
The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable uni ...
scalability on
commodity enterprise servers. Vertica runs on multiple
cloud computing systems as well as on
Hadoop
Apache Hadoop () is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage an ...
nodes. Vertica's Eon Mode separates compute from storage, using
S3 object storage and dynamic allocation of compute notes.
Vertica's design features include:
*
Column-oriented storage organization, which increases performance of sequential record access at the expense of common transactional operations such as single record retrieval, updates, and deletes.
*
Massively parallel processing
Massively parallel is the term for using a large number of computer processors (or separate computers) to simultaneously perform a set of coordinated computations in parallel. GPUs are massively parallel architecture with tens of thousands of t ...
(MPP) architecture to distribute queries on independent nodes and scale performance linearly.
* Standard
SQL interface with many analytics capabilities built-in, such as time series gap filling/
interpolation, event-based windowing and sessionization,
pattern matching
In computer science, pattern matching is the act of checking a given sequence of tokens for the presence of the constituents of some pattern. In contrast to pattern recognition, the match usually has to be exact: "either it will or will not be ...
, event series joins, statistical computation (e.g.,
regression analysis
In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one ...
), and
geospatial analysis
Spatial analysis or spatial statistics includes any of the formal techniques which studies entities using their topological, geometric, or geographic properties. Spatial analysis includes a variety of techniques, many still in their early deve ...
.
* In-database
machine learning
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence.
Machine ...
including categorization, fitting and prediction without down-sampling and data movement. Vertica offers a variety of in-database algorithms, including
linear regression,
logistic regression
In statistics, the logistic model (or logit model) is a statistical model that models the probability of an event taking place by having the log-odds for the event be a linear function (calculus), linear combination of one or more independent var ...
,
''k''-means clustering,
Naive Bayes classification,
random forest decision trees,
XGBoost, and
support vector machine
In machine learning, support vector machines (SVMs, also support vector networks) are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laboratorie ...
regression and classification. It also allows deployment of ML models to multiple clusters.
*
High compression, possible because columns of homogeneous datatype are stored together and because updates to the main store are batched.
* Automated workload management, data replication, server recovery, query optimization, and storage optimization.
* Native integration with open source big data technologies like
Apache Kafka and
Apache Spark.
* Support for standard programming interfaces, including
ODBC,
JDBC
Java Database Connectivity (JDBC) is an application programming interface (API) for the programming language Java, which defines how a client may access a database. It is a Java-based data access technology used for Java database connectivity. I ...
,
ADO.NET, and
OLEDB
OLE DB (''Object Linking and Embedding, Database'', sometimes written as OLEDB or OLE-DB), an API designed by Microsoft, allows accessing data from a variety of sources in a uniform manner. The API provides a set of interfaces implemented using ...
.
* High-performance and parallel data transfer to statistical tools and built-in
machine learning algorithms
The following outline is provided as an overview of and topical guide to machine learning. Machine learning is a subfield of soft computing within computer science that evolved from the study of pattern recognition and computational learning ...
.
Vertica's specialized approach aims to significantly increase query performance in data warehouses, while reducing hardware costs.
Since 2011, Vertica has offered a limited-capacity community edition for free.
In July, 2021, Vertica announced an
SaaS offering, Vertica Accelerator, running on
Amazon AWS
Amazon Web Services, Inc. (AWS) is a subsidiary of Amazon.com, Amazon that provides Software as a service, on-demand cloud computing computing platform, platforms and Application programming interface, APIs to individuals, companies, and gover ...
.
Optimizations
Vertica originated as the
C-Store
C-Store is a database management system (DBMS) based on a column-oriented DBMS developed by a team at Brown University, Brandeis University, Massachusetts Institute of Technology and the University of Massachusetts Boston including Michael Stonebr ...
column-oriented database
A column-oriented DBMS or columnar DBMS is a database management system (DBMS) that stores data tables by column rather than by row. Benefits include more efficient access to data when only querying a subset of columns (by eliminating the need to ...
, an
open source
Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
research project at MIT and other universities, published in 2005.
Vertica runs on
clusters of
commodity servers or on commercial clouds. It integrates with
Hadoop
Apache Hadoop () is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage an ...
, using
HDFS
Apache Hadoop () is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage ...
.
In 2018, Vertica introduced Vertica in Eon Mode, a separation of compute and storage architecture. The Eon architecture allows for elastic increase and decrease in compute capability as needed for workload elasticity. It also allows instantiation of multiple isolated sub-clusters dedicated to different workloads while maintaining a single shared data repository. It operates on shared object storage in the cloud, and also runs on object storage compatible hardware on-premises for private cloud implementations.
Version 10.1.1 of Vertica introduced
Docker and Kubernetes support.
Many BI, data visualization, and ETL tools work with Vertica Analytics Platform. Vertica supports
Kafka
Franz Kafka (3 July 1883 – 3 June 1924) was a German-speaking Bohemian novelist and short-story writer, widely regarded as one of the major figures of 20th-century literature. His work fuses elements of realism and the fantastic. It typi ...
for streaming data ingestion.
In 2021, Vertica released a connector for
Spark
Spark commonly refers to:
* Spark (fire), a small glowing particle or ember
* Electric spark, a form of electrical discharge
Spark may also refer to:
Places
* Spark Point, a rocky point in the South Shetland Islands
People
* Spark (surname)
* ...
.
Vertica also integrates with Grafana, Helm, Go, and Distributed R.
Company events
In January 2008,
Sybase filed a patent-infringement lawsuit against Vertica. In January 2010, Vertica prevailed in a preliminary hearing, and in June, 2010, Sybase and Vertica resolved the suit, with the court dismissing all infringement claims.
[Vertica Press Release, "Vertica Resolves Sybase Patent Lawsuits" http://www.vertica.com/news/press/vertica-resolves-sybase-patent-lawsuits/]
Since 2013, Vertica has held an annual user conference, now called Vertica Unify.
References
{{Reflist, 2
External links
Official websiteUnofficial Vertica User Google GroupVertica GithubVertica on DockerHub
Software companies based in Massachusetts
Hewlett-Packard acquisitions
Software companies of the United States