Cdb (software)
   HOME

TheInfoList



OR:

cdb, short for "constant database", refers to both a
library A library is a collection of materials, books or media that are accessible for use and not just for display purposes. A library provides physical (hard copies) or digital access (soft copies) materials, and may be a physical location or a vir ...
and data format created by Daniel J. Bernstein. cdb acts as an on-disk
associative array In computer science, an associative array, map, symbol table, or dictionary is an abstract data type that stores a collection of (key, value) pairs, such that each possible key appears at most once in the collection. In mathematical terms an ...
, mapping keys to values, and allows multiple values to be stored for a single key. A constant database allows only two operations: creation and reading. Both operations are designed to be very fast and highly reliable. Since the
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases s ...
does not change while it is in use, multiple processes can access a single database without locking. Additionally, since all modifications create a replacement database, it can take advantage of UNIX filesystem semantics to provide a guarantee of reliability. Record positions, key and value lengths, and hash values are 32-bit quantities and therefore must fit into 4 gigabytes. cdb is used by
djbdns The djbdns software package is a DNS implementation. It was created by Daniel J. Bernstein in response to his frustrations with repeated security holes in the widely used BIND DNS software. As a challenge, Bernstein offered a $1000 prize for th ...
, fastforward, mess822,
qmail qmail is a mail transfer agent (MTA) that runs on Unix. It was written, starting December 1995, by Daniel J. Bernstein as a more secure replacement for the popular Sendmail program. Originally license-free software, qmail's source code ...
and
ucspi-tcp ucspi-tcp is a public domain Unix TCP command-line tool for building TCP client-server applications. It consists of super-server ''tcpserver'' and ''tcpclient'' application. Fro"Life with qmail" Dave Sill, 2 January 200 ''ucspi-tcp'' is an a ...
to provide highly efficient, reliable, and simple data access.


Structure

A database contains an entire data set (e.g. a single associative array) in a single
computer file A computer file is a computer resource for recording data in a computer storage device, primarily identified by its file name. Just as words can be written to paper, so can data be written to a computer file. Files can be shared with and trans ...
. It consists of three parts: a fixed-size header, data, and a set of
hash table In computing, a hash table, also known as hash map, is a data structure that implements an associative array or dictionary. It is an abstract data type that maps keys to values. A hash table uses a hash function to compute an ''index'', ...
s. Lookups are designed for exact keys only, though other types of searches could be performed by scanning the entire database. Lookups are performed using the following
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...
: * Hash the key. * Determine at which hash table and slot this record should be located. * Test the indicated slot in the hash table. ** If the slot is empty, the record does not exist. Abort the search. ** If the slot's hash matches the key's hash, seek to the record. Read and compare the key. If it matches, the data has been found, so end the search. ** The record is not in this slot. Proceed to the next slot, wrapping around to the beginning of the hash table if necessary. For lookups of keys with multiple values, additional values may be found by simply resuming the search at the next slot.


Format

All numbers—offsets, lengths, and hash values—are unsigned 32-
bit The bit is the most basic unit of information in computing and digital communications. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represente ...
integer An integer is the number zero (), a positive natural number (, , , etc.) or a negative integer with a minus sign ( −1, −2, −3, etc.). The negative numbers are the additive inverses of the corresponding positive numbers. In the languag ...
s, stored in
little endian In computing, endianness, also known as byte sex, is the order or sequence of bytes of a word of digital data in computer memory. Endianness is primarily expressed as big-endian (BE) or little-endian (LE). A big-endian system stores the most si ...
format. Keys and data are considered to be opaque byte strings, and have no special treatment. The fixed-size header at the beginning of the database describes 256 hash tables by listing their position within the file and their length in slots. Data is stored as a series of records, each storing key length, data length, key, and data. There are no alignment or sorting rules. The records are followed by a set of 256 hash tables of varying lengths. Since zero is a valid length, there may be fewer than 256 hash tables physically stored in the database, but there are nonetheless considered to be 256 tables. Hash tables contain a series of slots, each of which contains a hash value and a record offset. "Empty slots" have an offset of zero. Hashes are unsigned 32 bit integers, and start with a value of 5381. For each byte of the key, the current hash is multiplied by 33, then XOR'ed with the current byte of the key. Overflow bits are discarded. Slots and tables are trivially computed from hashes. The target table is simply the lowest eight bits of the hash (i.e. hash modulo 256), and the slot within the table is the remaining bits of the hash modulo the table length (i.e. hash divided by 256 modulo table length).


Library

The official cdb library code is
public domain The public domain (PD) consists of all the creative work to which no exclusive intellectual property rights apply. Those rights may have expired, been forfeited, expressly waived, or may be inapplicable. Because those rights have expired, ...
: the individual source files are marked as such, and are also available in the public domain
djbdns The djbdns software package is a DNS implementation. It was created by Daniel J. Bernstein in response to his frustrations with repeated security holes in the widely used BIND DNS software. As a challenge, Bernstein offered a $1000 prize for th ...
package. However, the rest of the cdb package used to be
license-free software License-free software is computer software that is not explicitly in the public domain, but the authors appear to intend free use, modification, distribution and distribution of the modified software, similar to the freedoms defined for free softwa ...
, meaning it must be distributed verbatim. The unusual licensing and simplicity of the format has prompted others to re-implement the library and release it under more common terms, such as Michael Tokarev's TinyCDB library, available under the public domain. In 2009, all of cdb was put in the public domain.{{Cite web, url=https://cr.yp.to/distributors.html, title=Frequently asked questions from distributors Notably, the creator of cdb does not intend for cdb to be used as a
shared library In computer science, a library is a collection of non-volatile resources used by computer programs, often for software development. These may include configuration data, documentation, help data, message templates, pre-written code and subro ...
. This differs from virtually all similar ''dbm''-like databases, such as
Berkeley DB Berkeley DB (BDB) is an unmaintained embedded database software library for key/value data, historically significant in open source software. Berkeley DB is written in C with API bindings for many other programming languages. BDB stores arbitr ...
.


References


External links


cdb
official cdb website

detailed format description
QDBM benchmark
comparing cdb against similar packages Database management systems