MemSQL
   HOME

TheInfoList



OR:

SingleStore (formerly MemSQL) is a proprietary, cloud-native
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases ...
designed for data-intensive applications. A
distributed Distribution may refer to: Mathematics *Distribution (mathematics), generalized functions used to formulate solutions of partial differential equations *Probability distribution, the probability of a particular value or value range of a varia ...
, relational, SQL
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases ...
management system (RDBMS) that features ANSI SQL support, it is known for speed in data ingest,
transaction processing Transaction processing is information processing in computer science that is divided into individual, indivisible operations called ''transactions''. Each transaction must succeed or fail as a complete unit; it can never be only partially compl ...
, and query processing. SingleStore primarily stores relational data, though it can also store JSON data, graph data, and time series data. It supports blended workloads, commonly referred to as
HTAP Hybrid transaction/analytical processing (HTAP) is a term created by Gartner Inc., an information technology research and advisory company, in its early 2014 research report ''Hybrid Transaction/Analytical Processing Will Foster Opportunities for ...
workloads, as well as more traditional OLTP and OLAP use cases. For queries, it compiles Structured Query Language (SQL) into
machine code In computer programming, machine code is any low-level programming language, consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). Each instruction causes the CPU to perform a ve ...
. The SingleStore database engine can be run in various
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, whi ...
environments, including
on-premises On- premises software (abbreviated to on-prem, and incorrectly referred to as on-premise) is installed and runs on computers on the premises of the person or organization using the software, rather than at a remote facility such as a server farm ...
installations,
public In public relations and communication science, publics are groups of individual people, and the public (a.k.a. the general public) is the totality of such groupings. This is a different concept to the sociology, sociological concept of the ''Öf ...
and
private cloud Cloud computing is the on-demand availability of computer system resources, especially data storage (cloud storage) and computing power, without direct active management by the user. Large clouds often have functions distributed over multi ...
providers, in containers via a
Kubernetes Kubernetes (, commonly stylized as K8s) is an open-source container orchestration system for automating software deployment, scaling, and management. Google originally designed Kubernetes, but the Cloud Native Computing Foundation now maintains ...
operator, or as a hosted service in the cloud known as SingleStore Managed Service.


History


1999–2010 (First era)

Cloud data consisted of retrofitting on-premise and general-purpose SQL databases.


2010–2017 (Second era)

The marketplace saw a retreat from more commonly used SQL to achieve speed and scale, instead favoring the adoption of special-purpose,
NoSQL A NoSQL (originally referring to "non- SQL" or "non-relational") database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Such databases have existed ...
databases, and cheap object storage.


2017 — Present (Third era)

Several trends drove database market disruption: faster internet, growth in the adoption of modern applications for streaming, gaming,
IoT The Internet of things (IoT) describes physical objects (or groups of such objects) with sensors, processing ability, software and other technologies that connect and exchange data with other devices and systems over the Internet or other comm ...
, and more, and modern applications that drive the need for converged streaming data, transactional, and analytical processing for multi-model data. The third-era began in 2017 with real-time, hybrid multi-cloud, multi-model, and relational databases. This evolution in cloud data drove SingleStore to develop a frictionless, unified distributed SQL database for real-time, data-intensive applications. On April 23, 2013, SingleStore launched its first generally available version of the database to the public as MemSQL. Early versions only supported row-oriented tables, and were highly optimized for cases where all data can fit within
main memory Computer data storage is a technology consisting of computer components and recording media that are used to retain digital data. It is a core function and fundamental component of computers. The central processing unit (CPU) of a comput ...
. This design was based on the idea that the cost of
RAM Ram, ram, or RAM may refer to: Animals * A male sheep * Ram cichlid, a freshwater tropical fish People * Ram (given name) * Ram (surname) * Ram (director) (Ramsubramaniam), an Indian Tamil film director * RAM (musician) (born 1974), Dutch * ...
would continue to decrease exponentially over time, in a trend similar to
Moore's law Moore's law is the observation that the number of transistors in a dense integrated circuit (IC) doubles about every two years. Moore's law is an observation and projection of a historical trend. Rather than a law of physics, it is an empi ...
. This would eventually allow most use cases for database systems to store their data exclusively in memory. Shortly after launch, MemSQL added general support for an on-disk column-based storage format to work alongside the in-memory rowstore. The decreases in cost of memory slowed over time, and the market for purely in-memory database systems largely failed to materialize, with increasing demand for disk-based OLAP workloads. Thus, over time, MemSQL's columnstore became a major focus and a crucial feature for customers. On October 27, 2020, MemSQL rebranded to SingleStore to reflect a shift in focus away from exclusively in-memory workloads. The new name highlights the goal of achieving a universal storage format capable of supporting both transactional and analytical use cases. In its current product release, v.7.5, SingleStore became the first and only database to combine separation of storage and compute plus system of record into a single platform. Headquartered in
San Francisco, California San Francisco (; Spanish for " Saint Francis"), officially the City and County of San Francisco, is the commercial, financial, and cultural center of Northern California. The city proper is the fourth most populous in California and 17t ...
, in June 2021 SingleStore opened an office in
Raleigh, North Carolina Raleigh (; ) is the capital city of the state of North Carolina and the seat of Wake County in the United States. It is the second-most populous city in North Carolina, after Charlotte. Raleigh is the tenth-most populous city in the Sout ...
. As part of the office opening, SingleStore launched Launch Pad, a center for innovation to incubate and prototype solutions. Its other offices include
Sunnyvale, California Sunnyvale () is a city located in the Santa Clara Valley in northwest Santa Clara County in the U.S. state of California. Sunnyvale lies along the historic El Camino Real and Highway 101 and is bordered by portions of San Jose to the nor ...
,
Seattle, Washington Seattle ( ) is a seaport city on the West Coast of the United States. It is the seat of King County, Washington. With a 2020 population of 737,015, it is the largest city in both the state of Washington and the Pacific Northwest region ...
, and
Lisbon, Portugal Lisbon (; pt, Lisboa ) is the capital and largest city of Portugal, with an estimated population of 544,851 within its administrative limits in an area of 100.05 km2. Lisbon's urban area extends beyond the city's administrative limits w ...
.


Funding

In January 2013, SingleStore announced it raised $5 million. Since then, the company has raised $318.1M from various investors including
Khosla Ventures Khosla Ventures is an American venture capital firm founded by Vinod Khosla, focused on early-stage companies in the Internet, computing, mobile, financial services, agriculture, healthcare and clean technology sectors. Some of its most successf ...
, Accel, Google Ventures, Dell Capital and HPE, among others.


Architecture


Row and column table formats

SingleStore can store data in either row-oriented tables ("rowstores") or column-oriented tables ("columnstores"). The format used is determined by the user when creating the table. Rowstore tables, as the name implies, store information in row format, which is the traditional data format used by
RDBMS A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relatio ...
systems. Rowstores are optimized for singleton or small insert, update or delete queries and are most closely associated with OLTP (transactional) use cases. Data for rowstore tables is stored completely in-memory, making random reads fast, with snapshots and transaction logs persisted to disk. Columnstores are optimized for complex SELECT queries, typically associated with OLAP (analytics) and data warehousing use cases. As an example, a large clinical data set for data analysis is best stored in columnar format, since queries run against it will typically be ad hoc queries where aggregates are computed over large numbers of similar data items. Data for columnstore tables is stored on-disk, supporting fast sequential reads and compression that typically reaches 5-10x.


Indexing

Rather than the traditional B-tree index, SingleStore rowstores use skiplists optimized for fast, lock-free processing in memory. Columnstores store data indexed in sorted segments, in order to maximize on-disk compression and achieve fast ordered scans. SingleStore also supports using hash indexes as secondary indexes to speed up certain queries.


Distributed architecture

A SingleStore database is distributed across many commodity machines. Data is stored in partitions on leaf nodes, and users connect to aggregator nodes. A single piece of software is installed for SingleStore aggregator and leaf nodes; administrators designate each machine’s role in the cluster during setup. An aggregator node is responsible for receiving SQL queries, breaking them up across leaf nodes, and aggregating results back to the client. A leaf node stores SingleStore data and processes queries from the aggregator(s). All communication between aggregators and leaf nodes is done over the network using SQL. SingleStore uses hash partitioning to distribute data uniformly across the number of leaf nodes.


Real-time streaming data ingestion

SingleStore Pipelines is an integration technology built-in which provides streaming data ingestion in parallel from distributed data sources. It provides live de-duplication as data is ingested, exactly once semantics from message brokers, and simplifies architectures by reducing or eliminating the need to ETL middleware. Transformation and ML integration can be done via SingleStore Pipeline Transforms by embedding a binary. SingleStore Pipelines connect to data sources such as
Apache Kafka Apache Kafka is a distributed event store and stream-processing platform. It is an open-source system developed by the Apache Software Foundation written in Java and Scala. The project aims to provide a unified, high-throughput, low-latency plat ...
,
Apache Spark Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance. Originally developed at the University of Califor ...
,
Amazon S3 Amazon S3 or Amazon Simple Storage Service is a service offered by Amazon Web Services (AWS) that provides object storage through a web service interface. Amazon S3 uses the same scalable storage infrastructure that Amazon.com uses to run its ...
buckets,
Microsoft Azure Microsoft Azure, often referred to as Azure ( , ), is a cloud computing platform operated by Microsoft for application management via around the world-distributed data centers. Microsoft Azure has multiple capabilities such as software as a ...
Blob Storage Google Cloud Storage, HDFS, or files on disk and supports formats such as JSON, Parquet, Avro, and CSV. Because of the lock-free skip lists, queries can retrieve the data as soon as it lands, but are not blocked from continuing while data is ingested.


Bottomless storage

Bottomless storage separates storage and compute for SingleStore. Data files persist to S3 or comparable blob storage and NFS, asynchronously. The “blobs” are the compressed, encoded data structures that back the columnstore. High availability is maintained in the SingleStore cluster for the most recent data but long-term storage moves to blob storage. Blobs that are not queried are automatically deleted from SingleStore node’s local disk, allowing the cluster to hold more data than available disk, making the cluster’s storage “bottomless.” New replicas do not need to download all blob files to come online, creating and moving partitions. Bottomless acts as a “ continuous backup” that obviates the need for traditional disaster recovery and backup cloud-operation procedures. It also supports larger petabyte-sized datasets for historical analytics.


Durability

Durability for the in-memory rowstore is implemented with a write-ahead log and snapshots, similar to checkpoints. With default settings, as soon as a transaction is acknowledged in memory, the database will asynchronously write the transaction to disk as fast as the disk allows. The on-disk columnstore is actually fronted by an in-memory rowstore-like structure, indexed using a skiplist. This structure has the same durability guarantees as the SingleStore rowstore. Apart from that, the columnstore is durable, since its data is stored on disk.


Replication

A SingleStore cluster can be configured in "High Availability" (HA) mode, where every data partition is automatically created with master and slave versions on two separate leaf nodes. In HA mode, aggregators send transactions to the master partitions, which then send logs to the slave partitions. In the event of an unexpected master failure, the slave partitions take over as master partitions, in a fully online operation with no downtime.


Distribution formats

SingleStore can be downloaded for free and run on Linux for systems up to 4 leaf nodes of 32 gigs RAM each; an Enterprise license is required for larger deployments and for official SingleStore support. SingleStore clusters can be managed in containers using the SingleStore Kubernetes Operator. SingleStore is also available as a managed service named SingleStore Managed Service, available in various regions in Google Cloud and Amazon Web Services, with a Microsoft Azure implementation promised for the near future. The underlying engine and potential system performance are identical in all distribution formats. SingleStore ships with a set of installation, management, and monitoring tools called SingleStore Tools. When installing SingleStore, Tools can be used to set up the distributed SingleStore database across machines. SingleStore also provides a browser-based query and management UI called SingleStore Studio, which provides query processing and database monitoring, and shows health and informational details about the running cluster.


Recognition

In December 2021, SingleStore was recognized in the Magic Quadrant for Cloud Database Management Systems published by
Gartner Gartner, Inc is a technological research and consulting firm based in Stamford, Connecticut that conducts research on technology and shares this research both through private consulting as well as executive programs and conferences. Its client ...
for the first time. SingleStore was also included in Deloitte’s Technology Fast 500 North America, San Francisco Business Times Fast 100, Dresner Industry Excellence and Inc 5000 awards in 2020. The company is part of the Cloud Native Computing Foundation and Bytecode Alliance.


See also

*
Comparison of relational database management systems The following tables compare general and technical information for a number of relational database management systems. Please see the individual products' articles for further information. Unless otherwise specified in footnotes, comparisons are ba ...
*
Comparison of object-relational database management systems Comparison or comparing is the act of evaluating two or more things by determining the relevant, comparable characteristics of each thing, and then determining which characteristics of each are similar to the other, which are different, and t ...
*
Database management system In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases ...
* List of relational database management systems * List of column-oriented DBMSes *
List of in-memory databases This article is a list of in-memory database system software. References {{Reflist, colwidth=30em * In memory ''In Memory'' is Nevermore's only EP. It was recorded in April and May 1996 and released on July 23, 1996. It features a Bauhaus ...
*
List of databases using MVCC The following database management systems and other software use multiversion concurrency control. Databases * Altibase * Berkeley DB * Cloudant * Cloud Spanner * Clustrix * CockroachDB * Couchbase * CouchDB * CUBRID * IBM Db2 – since IBM D ...
*
Hybrid transactional/analytical processing Hybrid transaction/analytical processing (HTAP) is a term created by Gartner Inc., an information technology research and advisory company, in its early 2014 research report ''Hybrid Transaction/Analytical Processing Will Foster Opportunities for ...


References


External links

* {{Portal bar, Companies Relational database management systems NewSQL 2013 software