DuckDB is an
open-source
Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
column-oriented relational database management system
A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relatio ...
(RDBMS) originally developed by
Mark Raasveldt and
Hannes Mühleisen at the
Centrum Wiskunde & Informatica
The (abbr. CWI; English: "National Research Institute for Mathematics and Computer Science") is a research centre in the field of mathematics and theoretical computer science. It is part of the institutes organization of the Dutch Research Cou ...
(CWI) in the
Netherlands and first released in 2019.
The project has over 6 million downloads per month.
It is designed to provide high performance on complex queries against large databases in embedded configuration,
such as combining
tables with hundreds of columns and billions of rows. Unlike other embedded databases (for example,
SQLite) DuckDB is not focusing on transactional (
OLTP) applications and instead is specialized for
online analytical processing (OLAP) workloads.
DuckDB in its OLAP niche does not compete with the traditional DBMS like
MSSQL
Microsoft SQL Server is a relational database management system developed by Microsoft. As a database server, it is a software product with the primary function of storing and retrieving data as requested by other software applications—which ma ...
,
PostgreSQL
PostgreSQL (, ), also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. It was originally named POSTGRES, referring to its origins as a successor to the In ...
and
Oracle database
Oracle Database (commonly referred to as Oracle DBMS, Oracle Autonomous Database, or simply as Oracle) is a multi-model database management system produced and marketed by Oracle Corporation.
It is a database commonly used for running online t ...
. While using
SQL for queries, DuckDB targets the serverless applications and provides extremely fast responses using
Apache Parquet files for storage. These attributes make it a popular choice for large dataset analysis in interactive mode, but match poorly the requirements of the enterprise data storage.
DuckDB uses a
vectorized query processing engine. DuckDB is special amongst database management systems because it does not have any external dependencies and can build with just a C++11 compiler. DuckDB also deviates from the traditional
client–server model by running inside a host process (it has bindings, for example, for a Python interpreter with the ability to directly place data into
NumPy arrays
).
Commercial use
DuckDB is used at
Facebook,
Google, and
Airbnb.
DuckDB co-author Mühleisen also runs a support and consultancy firm for the software, DuckDB Labs.
The company has chosen not to take venture capital funding, stating "We feel investment would force the project direction towards monetization, and we would much prefer keeping DuckDB open and available for as many people as possible".
Another company, MotherDuck, has received $100m funding for its data platform based on DuckDB, with investors including
Andreessen Horowitz.
Language support
In addition to the native
C and
C++ APIs, DuckDB supports a range of programming languages.
References
Further reading
*
External links
Official homepage of DuckDB
Big data products
Embedded databases
Column-oriented DBMS software for Linux
Cross-platform free software
Cross-platform software
Data warehousing products
Database engines
Free database management systems
Relational database management systems
Structured storage
{{Software-stub