Greenplum
   HOME

TheInfoList



OR:

Greenplum is a big data technology based on MPP architecture and the
Postgres PostgreSQL (, ), also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. It was originally named POSTGRES, referring to its origins as a successor to the ...
open source database technology. The technology was created by a company of the same name headquartered in San Mateo,
California California is a state in the Western United States, located along the Pacific Coast. With nearly 39.2million residents across a total area of approximately , it is the most populous U.S. state and the 3rd largest by area. It is also the m ...
around 2005. Greenplum was acquired by
EMC Corporation Dell EMC (EMC Corporation until 2016) is an American multinational corporation headquartered in Hopkinton, Massachusetts and Round Rock, Texas, United States. Dell EMC sells data storage, information security, virtualization, analytics, clo ...
in July 2010. Starting in 2012, its database management system software became known as the Pivotal Greenplum Database sold through
Pivotal Software Pivotal Software, Inc. was an American multinational software and services company based in San Francisco that provided cloud platform hosting and consulting services. Since December 2019, Pivotal has been part of VMware. History Pivotal S ...
. Pivotal open sourced the core engine and continued its development by the Greenplum Database open source community and Pivotal. Starting in 2020 Pivotal was acquired by
VMware VMware, Inc. is an American cloud computing and virtualization technology company with headquarters in Palo Alto, California. VMware was the first commercially successful company to virtualize the x86 architecture. VMware's desktop software ru ...
and VMware continued to sponsor the Greenplum Database open source community as well as commercialize the technology under the brand name VMware Tanzu Greenplum.


Company

Greenplum, the company, was founded in September 2003 by Scott Yara and Luke Lonergan. It was a merger of two smaller companies: Metapa (founded in August 2000 near
Los Angeles Los Angeles ( ; es, Los Ángeles, link=no , ), often referred to by its initials L.A., is the List of municipalities in California, largest city in the U.S. state, state of California and the List of United States cities by population, sec ...
) and Didera in
Fairfax, Virginia The City of Fairfax ( ), colloquially known as Fairfax City, Downtown Fairfax, Old Town Fairfax, Fairfax Courthouse, FFX, or simply Fairfax, is an independent city (United States), independent city in the Commonwealth (U.S. state), Commonwealth ...
. Investors included SoundView Ventures, Hudson Ventures and Royal Wulff Ventures. A total of in funding was announced at the merger. Greenplum, based in San Mateo, California, released its database management system software based on PostgreSQL in April 2005 calling it Bizgres. Rounds of
venture capital Venture capital (often abbreviated as VC) is a form of private equity financing that is provided by venture capital firms or funds to start-up company, startups, early-stage, and emerging companies that have been deemed to have high growth poten ...
of about each were invested in March 2006 and February 2007. In July 2006 a partnership with Sun Microsystems was announced. Sun, which had also acquired
MySQL AB MySQL AB was a Swedish software company founded in 1995. It was acquired by Sun Microsystems in 2008, Sun was in turn acquired by Oracle Corporation in 2010. MySQL AB is the creator of MySQL, a relational database management system, as well as ...
, participated in a round of investment in January 2009, led by
Meritech Capital Partners Meritech Capital Partners is an American Venture Firm company focused on late-stage venture capital investments in information technology companies with a focus on consumer Internet and media, software and services, enterprise infrastructure, an ...
. The Bizgres project included a few other members, and was supported through about 2008, when the product was just called "Greenplum" as well. The
Sun Fire X4500 The Sun Fire X4500 data server (code named Thumper) integrates server and storage technologies. It was announced in July, 2006 and is part of the Sun Fire server line from Sun Microsystems. In July 2008, Sun announced the X4540 model (code-named ...
was a reference architecture and used by the majority of customers until a transition was made to
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, w ...
around that time. Greenplum was acquired by
EMC Corporation Dell EMC (EMC Corporation until 2016) is an American multinational corporation headquartered in Hopkinton, Massachusetts and Round Rock, Texas, United States. Dell EMC sells data storage, information security, virtualization, analytics, clo ...
in July 2010, becoming the foundation of EMC's big data software division. Although EMC did not disclose the value, it was estimated at . Greenplum's products at the time of acquisition were the Greenplum Database, Chorus (a management tool), and Data Science Labs. Greenplum had customers in
vertical market A vertical market is a market in which vendors offer goods and services ''specific'' to an industry, trade, profession, or other group of customers with specialized needs. A horizontal market is a market in which a product or service meets a ...
s including
eBay eBay Inc. ( ) is an American multinational e-commerce company based in San Jose, California, that facilitates consumer-to-consumer and business-to-consumer sales through its website. eBay was founded by Pierre Omidyar in 1995 and became ...
. It became part of
Pivotal Software Pivotal Software, Inc. was an American multinational software and services company based in San Francisco that provided cloud platform hosting and consulting services. Since December 2019, Pivotal has been part of VMware. History Pivotal S ...
in 2012. A variant using Apache Hadoop to store data in the Hadoop file system called Hawq was announced in 2013. In 2015 the GreenplumDB and Hawq
open source software Open-source software (OSS) is computer software that is released under a license in which the copyright holder grants users the rights to use, study, change, and distribute the software and its source code to anyone and for any purpose. Open ...
projects were announced.


Technology

Pivotal's Greenplum database product uses
massively parallel Massively parallel is the term for using a large number of computer processors (or separate computers) to simultaneously perform a set of coordinated computations in parallel. GPUs are massively parallel architecture with tens of thousands of t ...
processing (MPP) techniques. Each computer cluster consists of a master node, standby master node, and segment nodes. All of the data resides on the segment nodes and the catalog information is stored in the master nodes. Segment nodes run one or more segments, which are modified PostgreSQL database instances and are assigned a content identifier. For each table the data is divided among the segment nodes based on the distribution column keys specified by the user in the data definition language. For each segment content identifier there is both a primary segment and mirror segment which are not running on the same physical host. When a query enters the master node, it is parsed, planned and dispatched to all of the segments to execute the query plan and either return the requested data or insert the result of the query into a database table. The Structured Query Language, version SQL:2003, is used to present queries to the system. Transaction semantics comply with constraints known as ACID. Competitors include other MPP database management systems provided by major vendors such as
Teradata Teradata Corporation is an American software company that provides cloud database and analytics-related software, products, and services. The company was formed in 1979 in Brentwood, California, as a collaboration between researchers at Caltech ...
,
Amazon Redshift Amazon Redshift is a data warehouse product which forms part of the larger cloud-computing platform Amazon Web Services. It is built on top of technology from the massive parallel processing (MPP) data warehouse company ParAccel (later acquire ...
, Microsoft Azure, Alibab
AnalyticDB
and, in the past, IBM
Netezza IBM Netezza (pronounced ne-teez-a) is a subsidiary of American technology company IBM that designs and markets high-performance data warehouse appliances and advanced analytics applications for uses including enterprise data warehousing, busin ...
. Additional competition comes from other smaller competitors,
column-oriented database A column-oriented DBMS or columnar DBMS is a database management system (DBMS) that stores data tables by column rather than by row. Benefits include more efficient access to data when only querying a subset of columns (by eliminating the need to ...
s such as HP
Vertica Vertica Systems is an analytic database management software company. Vertica was founded in 2005 by the database researcher Michael Stonebraker, with Andrew Palmer as the founding CEO. Ralph Breslauer and Christopher P. Lynch served as later ...
, Exasol and
data warehousing In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis and is considered a core component of business intelligence. DWs are central repositories of integr ...
vendors with non MPP architecture, such as
Oracle Exadata The Oracle Exadata Database Machine (Exadata) is a computing platform optimized for running Oracle Databases. Exadata is a combined hardware and software platform that includes scale-out Intel x86-64 compute and storage servers, RoCE or Infini ...
, IBM Db2 and
SAP HANA SAP HANA (HochleistungsANalyseAnwendung or High-performance ANalytic Application) is an in-memory, column-oriented, relational database management system developed and marketed by SAP SE. Its primary function as the software running a databa ...
.


Greenplum Version 5

In September 2017, Greenplum Database Version 5 was released. Version 5 includes the first iteration of the Greenplum project strategy of merging PostgreSQL later versions back into Greenplum and is based on PostgreSQL version 8.3 up from the previous version 8.2. Version 5 also introducing the General Availability of the GPORCA Optimizer for cost based optimization of SQL designed for big data.


Greenplum Version 6

In September 2019, Greenplum Database Version 6 was released. Version 6 is based on PostgreSQL version 9.4 and features massive gains in OLTP performance. Greenplum 6 was reviewed in the media by several sources and mentioned for its Postgres open source alignment and for its OLTP performance


References

{{EMC Big data companies Data warehousing products Pivotal Software