Apache Accumulo is a highly scalable sorted, distributed key-value store based on
Google
Google LLC () is an American Multinational corporation, multinational technology company focusing on Search Engine, search engine technology, online advertising, cloud computing, software, computer software, quantum computing, e-commerce, ar ...
's
Bigtable
Bigtable is a fully managed wide-column and key-value NoSQL database service for large analytical and operational workloads as part of the Google Cloud portfolio.
History
Bigtable development began in 2004.. It is now used by a number of Googl ...
. It is a system built on top of
Apache Hadoop,
Apache ZooKeeper
Apache ZooKeeper is an open-source server for highly reliable distributed coordination of cloud applications. It is a project of the Apache Software Foundation.
ZooKeeper is essentially a service (systems architecture), service for distributed co ...
, and
Apache Thrift. Written in
Java
Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mos ...
, Accumulo has cell-level
access labels and
server-side programming mechanisms. According to
DB-Engines ranking The DB-Engines Ranking ranks database management systems by popularity, covering over 380 systems. The ranking criteria include number of search engine results when searching for the system names, Google Trends, Stack Overflow discussions, job offer ...
, Accumulo is the third most popular
NoSQL wide column store A wide-column store (or extensible record store) is a type of NoSQL database.[Wide Column Stores ...](_blank)
behind
Apache Cassandra
Cassandra is a free and open-source, distributed, wide-column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassand ...
and
HBase
HBase is an open-source non-relational distributed database modeled after Google's Bigtable and written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS (Hadoop Distributed File Sys ...
and the 67th most popular database engine of any type (complete) as of 2018.
History
Accumulo was created in 2008 by the US
National Security Agency
The National Security Agency (NSA) is a national-level intelligence agency of the United States Department of Defense, under the authority of the Director of National Intelligence (DNI). The NSA is responsible for global monitoring, collect ...
and contributed to the
Apache Foundation
The Apache Software Foundation (ASF) is an American nonprofit corporation (classified as a 501(c)(3) organization in the United States) to support a number of open source software projects. The ASF was formed from a group of developers of the ...
as an incubator project in September 2011.
[NSA Submits Open Source, Secure Database To Apache - Government](_blank)
Informationweek.com (2011-09-06). Retrieved on 2013-09-18.
On March 21, 2012, Accumulo graduated from incubation at Apache, making it a top-level project.
Controversy
In June 2012, the US
Senate Armed Services Committee
The Committee on Armed Services (sometimes abbreviated SASC for ''Senate Armed Services Committee'') is a committee of the United States Senate empowered with legislative oversight of the nation's military, including the Department of Def ...
(SASC) released the Draft 2012 Department of Defense (DoD) Authorization Bill, which included references to Apache Accumulo. In the draft bill SASC required DoD to evaluate whether Apache Accumulo could achieve commercial viability before implementing it throughout DoD. Specific criteria were not included in the draft language, but the establishment of commercial entities supporting Apache Accumulo could be considered a success factor.
SASC Accumulo language pro-open source, say proponents
FierceGovernmentIT (2012-06-14). Retrieved on 2013-09-18.
Main features
Cell-level security
Apache Accumulo extends the Bigtable data model, adding a new element to the key calle
Column Visibility
This element stores a logical combination of security labels that must be satisfied at query time in order for the key and value to be returned as part of a user request. This allows data of varying security requirements to be stored in the same table, and allows users to see only those keys and values for which they are authorized.
Server-side programming
In addition to Cell-Level Security, Apache Accumulo provides a server-side programming mechanism called Iterators that allows users to perform additional processing at the Tablet Server. The range of operations that can be applied is equivalent to those that can be implemented within
MapReduce Combiner function
which produces an aggregate value for several key-value pairs.
User key ordering
Apache Accumulo orders entries in order of user keys, and exposes an iterator over a key range. This allows locality of reference not available from some other distributed stores (including Cassandra and Voldemort that order by hash of the user key).
Papers
* 201
YCSB++: Benchmarking and Performance Debugging Advanced Features in Scalable Table Stores
by Carnegie Mellon University and the National Security Agency.
* 201
Driving Big Data With Big Compute
by MIT Lincoln Laboratory.
* 201
D4M 2.0 Schema:A General Purpose High Performance Schema for the Accumulo Database
by MIT Lincoln Laboratory.
* 201
Spatio-temporal Indexing in Non-relational Distributed Databases
by CCRi
See also
* Bigtable
Bigtable is a fully managed wide-column and key-value NoSQL database service for large analytical and operational workloads as part of the Google Cloud portfolio.
History
Bigtable development began in 2004.. It is now used by a number of Googl ...
* Apache Cassandra
Cassandra is a free and open-source, distributed, wide-column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassand ...
* Column-oriented DBMS
A column-oriented DBMS or columnar DBMS is a database management system (DBMS) that stores data tables by column rather than by row. Benefits include more efficient access to data when only querying a subset of columns (by eliminating the need to r ...
* Hypertable
Hypertable was an open-source software project to implement a database management system inspired by publications on the design of Google's Bigtable.
Hypertable runs on top of a distributed file system such as the Apache HDFS, GlusterFS or the ...
* HBase
HBase is an open-source non-relational distributed database modeled after Google's Bigtable and written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS (Hadoop Distributed File Sys ...
* Hadoop
Apache Hadoop () is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage an ...
* sqrrl
Sqrrl Data, Inc. is an American company founded in 2012 that markets software for big data analytics and cyber security. The company has roots in the United States Intelligence Community and National Security Agency. Sqrrl was involved in the crea ...
References
External links
*
{{DEFAULTSORT:Accumulo
Apache Software Foundation
Apache Software Foundation projects
Bigtable implementations
Distributed computing architecture
Distributed data stores
Free database management systems
Hadoop
NoSQL products
NoSQL