HOME

TheInfoList



OR:

HBase is an open-source non-relational distributed database modeled after Google's Bigtable and written in
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mo ...
. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS (Hadoop Distributed File System) or
Alluxio Alluxio is an open-source virtual distributed file system (VDFS). Initially as research project "Tachyon", Alluxio was created at the University of California, Berkeley's AMPLab as Haoyuan Li's Ph.D. Thesis, advised by Professor Scott Shenker ...
, providing Bigtable-like capabilities for Hadoop. That is, it provides a fault-tolerant way of storing large quantities of
sparse Sparse is a computer software tool designed to find possible coding faults in the Linux kernel. Unlike other such tools, this static analysis tool was initially designed to only flag constructs that were likely to be of interest to kernel de ...
data (small amounts of information caught within a large collection of empty or unimportant data, such as finding the 50 largest items in a group of 2 billion records, or finding the non-zero items representing less than 0.1% of a huge collection). HBase features compression, in-memory operation, and
Bloom filter A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not – in ...
s on a per-column basis as outlined in the original Bigtable paper. Tables in HBase can serve as the input and output for MapReduce jobs run in Hadoop, and may be accessed through the Java API but also through REST, Avro or
Thrift Thrift may refer to: * Frugality * A savings and loan association in the United States * Apache Thrift, a remote procedure call (RPC) framework * Thrift (plant), a plant in the genus ''Armeria'' * Syd Thrift (1929–2006), American baseball exec ...
gateway APIs. HBase is a wide-column store and has been widely adopted because of its lineage with Hadoop and HDFS. HBase runs on top of HDFS and is well-suited for fast read and write operations on large datasets with high throughput and low input/output latency. HBase is not a direct replacement for a classic SQL
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases spa ...
, however Apache Phoenix project provides a SQL layer for HBase as well as JDBC driver that can be integrated with various
analytics Analytics is the systematic computational analysis of data or statistics. It is used for the discovery, interpretation, and communication of meaningful patterns in data. It also entails applying data patterns toward effective decision-making. It ...
and business intelligence applications. The Apache Trafodion project provides a SQL query engine with ODBC and JDBC drivers and distributed ACID transaction protection across multiple statements, tables and rows that use HBase as a storage engine. HBase is now serving several data-driven websites but
Facebook Facebook is an online social media and social networking service owned by American company Meta Platforms. Founded in 2004 by Mark Zuckerberg with fellow Harvard College students and roommates Eduardo Saverin, Andrew McCollum, Dustin ...
's Messaging Platform migrated from HBase to MyRocks in 2018.Facebook: Why our 'next-gen' comms ditched MySQL
Retrieved: 17 December 2010
Unlike relational and traditional databases, HBase does not support SQL scripting; instead the equivalent is written in Java, employing similarity with a MapReduce application. In the parlance of Eric Brewer's CAP Theorem, HBase is a CP type system.


History

Apache HBase began as a project by the company
Powerset In mathematics, the power set (or powerset) of a set is the set of all subsets of , including the empty set and itself. In axiomatic set theory (as developed, for example, in the ZFC axioms), the existence of the power set of any set is p ...
out of a need to process massive amounts of data for the purposes of natural-language search. Since 2010 it is a top-level Apache project.
Facebook Facebook is an online social media and social networking service owned by American company Meta Platforms. Founded in 2004 by Mark Zuckerberg with fellow Harvard College students and roommates Eduardo Saverin, Andrew McCollum, Dustin ...
elected to implement its new messaging platform using HBase in November 2010, but migrated away from HBase in 2018. The 2.2.z series is the current stable release line, it supersedes earlier release lines.


Use cases & production deployments


Enterprises that use HBase

The following is a list of notable enterprises that have used or are using HBase: * 23andMe *
Adobe Adobe ( ; ) is a building material made from earth and organic materials. is Spanish for '' mudbrick''. In some English-speaking regions of Spanish heritage, such as the Southwestern United States, the term is used to refer to any kind of ...
* Airbnb uses HBase as part of its AirStream realtime stream computation framework * Alibaba Group * Amadeus IT Group, as its main long-term storage DB. *
Bloomberg Bloomberg may refer to: People * Daniel J. Bloomberg (1905–1984), audio engineer * Georgina Bloomberg (born 1983), professional equestrian * Michael Bloomberg (born 1942), American businessman and founder of Bloomberg L.P.; politician and ...
, for time series data storage *
Facebook Facebook is an online social media and social networking service owned by American company Meta Platforms. Founded in 2004 by Mark Zuckerberg with fellow Harvard College students and roommates Eduardo Saverin, Andrew McCollum, Dustin ...
used HBase for its messaging platform between 2010 and 2018 * Flipkart uses HBase for its search index and user insights. * Flurry * HubSpot *
Imgur Imgur ( , stylized as imgur) is an American online image sharing and image hosting service with a focus on social gossip that was founded by Alan Schaaf in 2009. The service has hosted viral images and meme, particularly those posted on Reddi ...
uses HBase to power its notifications system * Kakao *
Netflix Netflix, Inc. is an American subscription video on-demand over-the-top streaming service and production company based in Los Gatos, California. Founded in 1997 by Reed Hastings and Marc Randolph in Scotts Valley, California, it offers a ...
* Pinterest * Quicken Loans * Richrelevance * Rocket Fuel * Salesforce.com *
Sears Sears, Roebuck and Co. ( ), commonly known as Sears, is an American chain of department stores founded in 1892 by Richard Warren Sears and Alvah Curtis Roebuck and reincorporated in 1906 by Richard Sears and Julius Rosenwald, with what began ...
*
Sophos Sophos Group plc is a British based security software and hardware company. Sophos develops products for communication endpoint, encryption, network security, email security, mobile security and unified threat management. Sophos is primari ...
, for some of their back-end systems. *
Spotify Spotify (; ) is a proprietary Swedish audio streaming and media services provider founded on 23 April 2006 by Daniel Ek and Martin Lorentzon. It is one of the largest music streaming service providers, with over 456 million monthly active us ...
uses HBase as base for Hadoop and machine learning jobs. * Tuenti uses HBase for its messaging platform. * Xiaomi *
Yahoo! Yahoo! (, styled yahoo''!'' in its logo) is an American web services provider. It is headquartered in Sunnyvale, California and operated by the namesake company Yahoo! Inc. (2017–present), Yahoo Inc., which is 90% owned by investment funds ma ...


See also

*
NoSQL A NoSQL (originally referring to "non- SQL" or "non-relational") database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Such databases have existed ...
* Wide column store * Bigtable * Apache Cassandra * Oracle NOSQL * Hypertable *
Apache Accumulo Apache Accumulo is a highly scalable sorted, distributed key-value store based on Google's Bigtable. It is a system built on top of Apache Hadoop, Apache ZooKeeper, and Apache Thrift. Written in Java, Accumulo has cell-level access labels a ...
* MongoDB * Project Voldemort * Riak *
Sqoop Sqoop is a command-line interface application for transferring data between relational databases and Hadoop. The Apache Sqoop project was retired in June 2021 and moved to the Apache Attic. Description Sqoop supports incremental loads of a single ...
* Elasticsearch * Apache Phoenix


References


Bibliography

* * *


External links

* /hbase.apache.org/ Official Apache HBase homepage {{DEFAULTSORT:Hbase HBase Bigtable implementations Hadoop Free database management systems NoSQL Structured storage