HBase is an
open-source non-relational distributed database modeled after
Google's Bigtable and written in
Java
Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mo ...
. It is developed as part of
Apache Software Foundation's
Apache Hadoop project and runs on top of
HDFS (Hadoop Distributed File System) or
Alluxio
Alluxio is an open-source virtual distributed file system (VDFS). Initially as research project "Tachyon", Alluxio was created at the University of California, Berkeley's AMPLab as Haoyuan Li's Ph.D. Thesis,
advised by Professor Scott Shenker ...
, providing Bigtable-like capabilities for Hadoop. That is, it provides a
fault-tolerant way of storing large quantities of
sparse
Sparse is a computer software tool designed to find possible coding faults in the Linux kernel. Unlike other such tools, this static analysis tool was initially designed to only flag constructs that were likely to be of interest to kernel de ...
data (small amounts of information caught within a large collection of empty or unimportant data, such as finding the 50 largest items in a group of 2 billion records, or finding the non-zero items representing less than 0.1% of a huge collection).
HBase features compression, in-memory operation, and
Bloom filter
A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not – in ...
s on a per-column basis as outlined in the original Bigtable paper. Tables in HBase can serve as the input and output for
MapReduce jobs run in Hadoop, and may be accessed through the Java API but also through
REST,
Avro or
Thrift
Thrift may refer to:
* Frugality
* A savings and loan association in the United States
* Apache Thrift, a remote procedure call (RPC) framework
* Thrift (plant), a plant in the genus ''Armeria''
* Syd Thrift (1929–2006), American baseball exec ...
gateway APIs. HBase is a
wide-column store and has been widely adopted because of its lineage with Hadoop and HDFS. HBase runs on top of HDFS and is well-suited for fast read and write operations on large datasets with high throughput and low input/output latency.
HBase is not a direct replacement for a classic
SQL database
In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases spa ...
, however
Apache Phoenix project provides a SQL layer for HBase as well as
JDBC driver that can be integrated with various
analytics
Analytics is the systematic computational analysis of data or statistics. It is used for the discovery, interpretation, and communication of meaningful patterns in data. It also entails applying data patterns toward effective decision-making. It ...
and
business intelligence applications. The
Apache Trafodion project provides a SQL query engine with
ODBC and
JDBC drivers and
distributed ACID transaction protection across multiple statements, tables and rows that use HBase as a storage engine.
HBase is now serving several data-driven websites but
Facebook
Facebook is an online social media and social networking service owned by American company Meta Platforms. Founded in 2004 by Mark Zuckerberg with fellow Harvard College students and roommates Eduardo Saverin, Andrew McCollum, Dustin ...
's Messaging Platform migrated from HBase to
MyRocks in 2018.
[Facebook: Why our 'next-gen' comms ditched MySQL](_blank)
Retrieved: 17 December 2010 Unlike relational and traditional databases, HBase does not support SQL scripting; instead the equivalent is written in Java, employing similarity with a MapReduce application.
In the parlance of Eric Brewer's
CAP Theorem, HBase is a CP type system.
History
Apache HBase began as a project by the company
Powerset
In mathematics, the power set (or powerset) of a set is the set of all subsets of , including the empty set and itself. In axiomatic set theory (as developed, for example, in the ZFC axioms), the existence of the power set of any set is p ...
out of a need to process massive amounts of data for the purposes of
natural-language search. Since 2010 it is a top-level Apache project.
Facebook
Facebook is an online social media and social networking service owned by American company Meta Platforms. Founded in 2004 by Mark Zuckerberg with fellow Harvard College students and roommates Eduardo Saverin, Andrew McCollum, Dustin ...
elected to implement its new messaging platform using HBase in November 2010, but migrated away from HBase in 2018.
The 2.2.z series is the current stable release line, it supersedes earlier release lines.
Use cases & production deployments
Enterprises that use HBase
The following is a list of notable enterprises that have used or are using HBase:
*
23andMe
*
Adobe
Adobe ( ; ) is a building material made from earth and organic materials. is Spanish for '' mudbrick''. In some English-speaking regions of Spanish heritage, such as the Southwestern United States, the term is used to refer to any kind of ...
*
Airbnb uses HBase as part of its AirStream realtime stream computation framework
*
Alibaba Group
*
Amadeus IT Group, as its main long-term storage DB.
*
Bloomberg Bloomberg may refer to:
People
* Daniel J. Bloomberg (1905–1984), audio engineer
* Georgina Bloomberg (born 1983), professional equestrian
* Michael Bloomberg (born 1942), American businessman and founder of Bloomberg L.P.; politician and ...
, for time series data storage
*
Facebook
Facebook is an online social media and social networking service owned by American company Meta Platforms. Founded in 2004 by Mark Zuckerberg with fellow Harvard College students and roommates Eduardo Saverin, Andrew McCollum, Dustin ...
used HBase for its messaging platform between 2010 and 2018
*
Flipkart uses HBase for its search index and user insights.
*
Flurry
*
HubSpot
*
Imgur
Imgur ( , stylized as imgur) is an American online image sharing and image hosting service with a focus on social gossip that was founded by Alan Schaaf in 2009. The service has hosted viral images and meme, particularly those posted on Reddi ...
uses HBase to power its notifications system
*
Kakao
*
Netflix
Netflix, Inc. is an American subscription video on-demand over-the-top streaming service and production company based in Los Gatos, California. Founded in 1997 by Reed Hastings and Marc Randolph in Scotts Valley, California, it offers a ...
*
Pinterest
*
Quicken Loans
*
Richrelevance
*
Rocket Fuel
*
Salesforce.com
*
Sears
Sears, Roebuck and Co. ( ), commonly known as Sears, is an American chain of department stores founded in 1892 by Richard Warren Sears and Alvah Curtis Roebuck and reincorporated in 1906 by Richard Sears and Julius Rosenwald, with what began ...
*
Sophos
Sophos Group plc is a British based security software and hardware company. Sophos develops products for communication endpoint, encryption, network security, email security, mobile security and unified threat management. Sophos is primari ...
, for some of their back-end systems.
*
Spotify
Spotify (; ) is a proprietary Swedish audio streaming and media services provider founded on 23 April 2006 by Daniel Ek and Martin Lorentzon. It is one of the largest music streaming service providers, with over 456 million monthly active us ...
uses HBase as base for Hadoop and machine learning jobs.
*
Tuenti uses HBase for its messaging platform.
*
Xiaomi
*
Yahoo!
Yahoo! (, styled yahoo''!'' in its logo) is an American web services provider. It is headquartered in Sunnyvale, California and operated by the namesake company Yahoo! Inc. (2017–present), Yahoo Inc., which is 90% owned by investment funds ma ...
See also
*
NoSQL
A NoSQL (originally referring to "non- SQL" or "non-relational") database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Such databases have existed ...
*
Wide column store
*
Bigtable
*
Apache Cassandra
*
Oracle NOSQL
*
Hypertable
*
Apache Accumulo
Apache Accumulo is a highly scalable sorted, distributed key-value store based on Google's Bigtable. It is a system built on top of Apache Hadoop, Apache ZooKeeper, and Apache Thrift. Written in Java, Accumulo has cell-level access labels a ...
*
MongoDB
*
Project Voldemort
*
Riak
*
Sqoop
Sqoop is a command-line interface application for transferring data between relational databases and Hadoop.
The Apache Sqoop project was retired in June 2021 and moved to the Apache Attic.
Description
Sqoop supports incremental loads of a single ...
*
Elasticsearch
*
Apache Phoenix
References
Bibliography
*
*
*
External links
*
/hbase.apache.org/ Official Apache HBase homepage
{{DEFAULTSORT:Hbase
HBase
Bigtable implementations
Hadoop
Free database management systems
NoSQL
Structured storage