HOME

TheInfoList



OR:

Clustrix, Inc. is a San Francisco-based private company founded in 2006 that develops a
database management system In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases span ...
marketed as
NewSQL NewSQL is a class of relational database management systems that seek to provide the scalability of NoSQL systems for online transaction processing (OLTP) workloads while maintaining the ACID guarantees of a traditional database system. Man ...
.


History

Clustrix was founded in November 2006, and is sometimes called ''Sprout-Clustrix'' as it formed with the help of
Y Combinator Y Combinator (YC) is an American technology startup accelerator launched in March 2005. It has been used to launch more than 3,000 companies, including Airbnb, Coinbase, Cruise, DoorDash, Dropbox, Instacart, Quora, PagerDuty, Reddit, Str ...
. Founders include Paul Mikesell (formerly of
EMC Isilon Dell EMC Isilon is a scale out network-attached storage platform offered by Dell EMC for high-volume storage, backup and archiving of unstructured data. It provides a cluster-based storage array based on industry standard hardware, and is scalabl ...
) and Sergei Tsarev. Some of its technology tested at customers since 2008. Initially called ''Sierra'' during the development phase, at its official announcement in 2010, the product was launched with the product name ''Clustered Database System (CDS)''. The company received $10 million in funding from
Sequoia Capital Sequoia Capital is an American venture capital firm. The firm is headquartered in Menlo Park, California, and specializes in seed stage, early stage, and growth stage investments in private companies across technology sectors. , Sequoia's total a ...
,
U.S. Venture Partners U.S. Venture Partners (USVP) is a venture capital investment firm specializing in early-stage ventures in enterprise software, cybersecurity, consumer, e-commerce, healthcare, and IT-enabled healthcare services. The venture capital partnership ...
(USVP), and ATA Ventures in December 2010. Robin Purohit became chief executive in October 2011, and another round of $6.75 million was raised in July 2012. Another round of funding from the original backers of $16.5 million was announced in May 2013, and a round of $10 million in new funding in August 2013 was led by HighBAR Ventures. Purohit was replaced by Mike Azevedo in 2014. A round of over $23 million in debt financing was disclosed in February 2016. On September 20, 2018 it was announced that Clustrix was acquired by
MariaDB MariaDB is a community-developed, commercially supported fork of the MySQL relational database management system (RDBMS), intended to remain free and open-source software under the GNU General Public License. Development is led by some of the ori ...
Corporation.


Technology

Clustrix supports workloads that involve scaling transactions and real-time analytics. The system is a drop-in replacement for
MySQL MySQL () is an open-source relational database management system (RDBMS). Its name is a combination of "My", the name of co-founder Michael Widenius's daughter My, and "SQL", the acronym for Structured Query Language. A relational database o ...
, and is designed to overcome MySQL scalability issues with a minimum of disruption. It also has built in fault-tolerance features for high availability within a cluster. It has parallel backup and parallel replication among clusters for disaster recovery. Clustrix is a
scale-out Scalability is the property of a system to handle a growing amount of work by adding resources to the system. In an economic context, a scalable business model implies that a company can increase sales given increased resources. For example, a ...
SQL database management system and part of what are often called the
NewSQL NewSQL is a class of relational database management systems that seek to provide the scalability of NoSQL systems for online transaction processing (OLTP) workloads while maintaining the ACID guarantees of a traditional database system. Man ...
database systems (modern relational
database management system In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases span ...
s), closely following the
NoSQL A NoSQL (originally referring to "non- SQL" or "non-relational") database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Such databases have existed ...
movement. The product was marketed as a hardware "appliance" using
InfiniBand InfiniBand (IB) is a computer networking communications standard used in high-performance computing that features very high throughput and very low latency. It is used for data interconnect both among and within computers. InfiniBand is also used ...
through about 2014. Clustrix's database was made available as downloadable software and from the
Amazon Web Services Amazon Web Services, Inc. (AWS) is a subsidiary of Amazon.com, Amazon that provides Software as a service, on-demand cloud computing computing platform, platforms and Application programming interface, APIs to individuals, companies, and gover ...
Marketplace by 2013. The primary competitors like
Microsoft SQL Server Microsoft SQL Server is a relational database management system developed by Microsoft. As a database server, it is a software product with the primary function of storing and retrieving data as requested by other software applications—which ma ...
and MySQL supported
online transaction processing In online transaction processing (OLTP), information systems typically facilitate and manage transaction-oriented applications. This is contrasted with online analytical processing. The term "transaction" can have two different meanings, both of wh ...
and
online analytical processing Online analytical processing, or OLAP (), is an approach to answer multi-dimensional analytical (MDA) queries swiftly in computing. OLAP is part of the broader category of business intelligence, which also encompasses relational databases, repo ...
but were not distributed. Clustrix provides a distributed relational,
ACID In computer science, ACID ( atomicity, consistency, isolation, durability) is a set of properties of database transactions intended to guarantee data validity despite errors, power failures, and other mishaps. In the context of databases, a sequ ...
database that scales transactions and support real-time analytics. Other distributed relational databases are
columnar Epithelium or epithelial tissue is one of the four basic types of animal tissue, along with connective tissue, muscle tissue and nervous tissue. It is a thin, continuous, protective layer of compactly packed cells with a little intercellula ...
(they don't support primary transaction workload) and focus on offline analytics and this includes
EMC Greenplum Greenplum is a big data technology based on MPP architecture and the Postgres open source database technology. The technology was created by a company of the same name headquartered in San Mateo, California around 2005. Greenplum was acquired ...
, HP Vertica,
Infobright Infobright is a commercial provider of column-oriented relational database software with a focus in machine-generated data. The company's head office is located in Toronto, Ontario, Canada. Most of its research and development is based in Wars ...
, and
Amazon Redshift Amazon Redshift is a data warehouse product which forms part of the larger cloud-computing platform Amazon Web Services. It is built on top of technology from the massive parallel processing (MPP) data warehouse company ParAccel (later acquire ...
. Notable players in the primary SQL database space are in-memory. This includes
VoltDB Volt Active Data (formerly VoltDB) is an in-memory database designed by Michael Stonebraker, Sam Madden, and Daniel Abadi. It is an ACID-compliant RDBMS that uses a shared-nothing architecture, and is derived from work done by Stonebraker on O ...
and
MemSQL SingleStore (formerly MemSQL) is a proprietary, cloud-native database designed for data-intensive applications. A distributed, relational, SQL database management system (RDBMS) that features ANSI SQL support, it is known for speed in data ...
, which excel at low-latency transactions, but do not target real-time analytics. NoSQL competitors, like
MongoDB MongoDB is a source-available cross-platform document-oriented database program. Classified as a NoSQL database program, MongoDB uses JSON-like documents with optional schemas. MongoDB is developed by MongoDB Inc. and licensed under the Serve ...
are good at handling
unstructured data Unstructured data (or unstructured information) is information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy, but may contain data such as dates, num ...
and read heavy workloads, but do not compete in the space for write heavy workloads (no transactions, coarse grained (DB-level) locking, and no SQL features (like
joins Join may refer to: * Join (law), to include additional counts or additional defendants on an indictment *In mathematics: ** Join (mathematics), a least upper bound of sets orders in lattice theory ** Join (topology), an operation combining two topo ...
), so the NewSQL and NoSQL databases are complementary.


Query evaluation

The Clustrix database operates on a distributed cluster of
shared-nothing A shared-nothing architecture (SN) is a distributed computing architecture in which each update request is satisfied by a single node (processor/memory/storage unit) in a computer cluster. The intent is to eliminate contention among nodes. Nodes do ...
nodes using a query to data approach. Here nodes typically own a subset of the data. SQL queries are split into query fragments and sent to the nodes that own the data. This enables Clustrix to scale horizontally (
scale out Scalability is the property of a system to handle a growing amount of work by adding resources to the system. In an economics, economic context, a scalable business model implies that a company can increase sales given increased resources. For ...
) as additional nodes are added.


Data distribution

The Clustrix database automatically splits and distributes data evenly across nodes with each slice having copies on other nodes. Uniform data distribution is maintained as nodes are added, removed or if data is inserted unevenly. This automatic data distribution approach removes the need to
shard Shard or sherd is a sharp piece of glass, pottery or stone. Shard may also refer to: Places * Shard End, a place in Birmingham, United Kingdom Architecture * Dresden Shard, a redesign of the Bundeswehr Military History Museum in Dresden, German ...
and enables Clustrix to maintain database availability in the face of node loss.


Performance

In a performance test completed by
Percona Percona is an American company based in Durham, North Carolina and the developer of a number of open source software projects for MySQL, MariaDB, PostgreSQL, MongoDB and RocksDB users. The company’s revenue of around $25 million a year is de ...
in 2011, a three-node cluster saw about a 73% increase in speed over a similarly equipped single MySQL server running tests with 1024 simultaneous threads. Additional nodes added to the Clustrix cluster provided roughly linear increases in speed.Clustrix Delivers Software-Only Kit to Demo Shard-less MySQL Scaling
/ref>


References


External links

* *{{cite web , url = http://sergei.clustrix.com/ , title = Sergei Tsarev's Blog , author = Sergei Tsarev Database companies Companies based in San Francisco Companies established in 2006 NewSQL