Gizzard was an
open source
Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use and view the source code, design documents, or content of the product. The open source model is a decentrali ...
sharding framework to create custom
fault-tolerant, distributed
databases
In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and ana ...
. It was initially used by
Twitter
Twitter, officially known as X since 2023, is an American microblogging and social networking service. It is one of the world's largest social media platforms and one of the most-visited websites. Users can share short text messages, image ...
and emerged from a wide variety of data storage problems. Gizzard operated as a
middleware networking service that ran on the
Java Virtual Machine
A Java virtual machine (JVM) is a virtual machine that enables a computer to run Java programs as well as programs written in other languages that are also compiled to Java bytecode. The JVM is detailed by a specification that formally descr ...
. It managed
partitioning data across arbitrary backend datastores, which allowed it to be accessed efficiently. The partitioning rules were stored in a forwarding table that maps key ranges to partitions. Each partition managed its own
replication through a declarative replication
tree
In botany, a tree is a perennial plant with an elongated stem, or trunk, usually supporting branches and leaves. In some usages, the definition of a tree may be narrower, e.g., including only woody plants with secondary growth, only ...
. Gizzard handled both physical and logical shards. Physical shards point to a physical database backend whereas logical shards are trees of other shards.
In addition Gizzard also supported
migrations and gracefully handled failures. The system was made eventually consistent by requiring that all write operations are
idempotent
Idempotence (, ) is the property of certain operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application. The concept of idempotence arises in a number of pl ...
and
commutative
In mathematics, a binary operation is commutative if changing the order of the operands does not change the result. It is a fundamental property of many binary operations, and many mathematical proofs depend on it. Perhaps most familiar as a pr ...
. As operations fail they are retried at a later time. Gizzard is available at
GitHub
GitHub () is a Proprietary software, proprietary developer platform that allows developers to create, store, manage, and share their code. It uses Git to provide distributed version control and GitHub itself provides access control, bug trackin ...
and licensed under the
Apache License
The Apache License is a permissive free software license written by the Apache Software Foundation (ASF). It allows users to use the software for any purpose, to distribute it, to modify it, and to distribute modified versions of the software ...
2.0.
See also
*
Distributed hash table
A distributed hash table (DHT) is a Distributed computing, distributed system that provides a lookup service similar to a hash table. Key–value pairs are stored in a DHT, and any participating node (networking), node can efficiently retrieve the ...
(DHT)
*
Distributed database
A distributed database is a database in which data is stored across different physical locations. It may be stored in multiple computers located in the same physical location (e.g. a data centre); or maybe dispersed over a computer network, netwo ...
*
FlockDB
References
External links
Project Website
Data synchronization
Structured storage
Free database management systems
Data partitioning
Java platform
Free software programmed in Scala
Software using the Apache license
{{database-software-stub