HOME

TheInfoList



OR:

HBase is an
open-source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
non-relational
distributed database A distributed database is a database in which data is stored across different physical locations. It may be stored in multiple computers located in the same physical location (e.g. a data centre); or maybe dispersed over a network of interconnect ...
modeled after Google's
Bigtable Bigtable is a fully managed wide-column and key-value NoSQL database service for large analytical and operational workloads as part of the Google Cloud portfolio. History Bigtable development began in 2004.. It is now used by a number of Googl ...
and written in
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
. It is developed as part of
Apache Software Foundation The Apache Software Foundation (ASF) is an American nonprofit corporation (classified as a 501(c)(3) organization in the United States) to support a number of open source software projects. The ASF was formed from a group of developers of the A ...
's
Apache Hadoop Apache Hadoop () is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage ...
project and runs on top of HDFS (Hadoop Distributed File System) or
Alluxio Alluxio is an open-source virtual distributed file system (VDFS). Initially as research project "Tachyon", Alluxio was created at the University of California, Berkeley's AMPLab as Haoyuan Li's Ph.D. Thesis, advised by Professor Scott Shenker & ...
, providing Bigtable-like capabilities for Hadoop. That is, it provides a
fault-tolerant Fault tolerance is the property that enables a system to continue operating properly in the event of the failure of one or more faults within some of its components. If its operating quality decreases at all, the decrease is proportional to the ...
way of storing large quantities of sparse data (small amounts of information caught within a large collection of empty or unimportant data, such as finding the 50 largest items in a group of 2 billion records, or finding the non-zero items representing less than 0.1% of a huge collection). HBase features compression, in-memory operation, and
Bloom filter A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not – in ...
s on a per-column basis as outlined in the original Bigtable paper. Tables in HBase can serve as the input and output for
MapReduce MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster. A MapReduce program is composed of a ''map'' procedure, which performs filtering ...
jobs run in Hadoop, and may be accessed through the Java API but also through
REST Rest or REST may refer to: Relief from activity * Sleep ** Bed rest * Kneeling * Lying (position) * Sitting * Squatting position Structural support * Structural support ** Rest (cue sports) ** Armrest ** Headrest ** Footrest Arts and enter ...
,
Avro AVRO, short for Algemene Vereniging Radio Omroep ("General Association of Radio Broadcasting"), was a Dutch public broadcasting association operating within the framework of the Nederlandse Publieke Omroep system. It was the first public broad ...
or Thrift gateway APIs. HBase is a
wide-column store A wide-column store (or extensible record store) is a type of NoSQL database.Wide Column Stores
and has been widely adopted because of its lineage with Hadoop and HDFS. HBase runs on top of HDFS and is well-suited for fast read and write operations on large datasets with high throughput and low input/output latency. HBase is not a direct replacement for a classic SQL
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases sp ...
, however
Apache Phoenix Apache Phoenix is an open source, massively parallel, relational database engine supporting OLTP for Hadoop using Apache HBase as its backing store. Phoenix provides a JDBC driver that hides the intricacies of the NoSQL store enabling users to cr ...
project provides a SQL layer for HBase as well as
JDBC Java Database Connectivity (JDBC) is an application programming interface (API) for the programming language Java, which defines how a client may access a database. It is a Java-based data access technology used for Java database connectivity. I ...
driver that can be integrated with various
analytics Analytics is the systematic computational analysis of data or statistics. It is used for the discovery, interpretation, and communication of meaningful patterns in data. It also entails applying data patterns toward effective decision-making. It ...
and
business intelligence Business intelligence (BI) comprises the strategies and technologies used by enterprises for the data analysis and management of business information. Common functions of business intelligence technologies include reporting, online analytical pr ...
applications. The
Apache Trafodion Apache Trafodion is an open-source Top-Level Project at the Apache Software Foundation. It was originally developed by the information technology division of Hewlett-Packard Company and HP Labs to provide the SQL query language on Apache HBase ...
project provides a SQL query engine with
ODBC In computing, Open Database Connectivity (ODBC) is a standard application programming interface (API) for accessing database management systems (DBMS). The designers of ODBC aimed to make it independent of database systems and operating systems. An ...
and
JDBC Java Database Connectivity (JDBC) is an application programming interface (API) for the programming language Java, which defines how a client may access a database. It is a Java-based data access technology used for Java database connectivity. I ...
drivers and distributed ACID transaction protection across multiple statements, tables and rows that use HBase as a storage engine. HBase is now serving several data-driven websites but
Facebook Facebook is an online social media and social networking service owned by American company Meta Platforms. Founded in 2004 by Mark Zuckerberg with fellow Harvard College students and roommates Eduardo Saverin, Andrew McCollum, Dustin M ...
's Messaging Platform migrated from HBase to
MyRocks MyRocks is open-source software developed at Facebook in order to use MySQL features with RocksDB implementations. It is based on Oracle MySQL 5.6. Starting from version 10.2.5, MariaDB includes MyRocks as an alpha-stage storage engine. MariaDB ...
in 2018.Facebook: Why our 'next-gen' comms ditched MySQL
Retrieved: 17 December 2010
Unlike relational and traditional databases, HBase does not support SQL scripting; instead the equivalent is written in Java, employing similarity with a MapReduce application. In the parlance of Eric Brewer's
CAP Theorem In theoretical computer science, the CAP theorem, also named Brewer's theorem after computer scientist Eric Brewer, states that any distributed data store can provide only two of the following three guarantees:Seth Gilbert and Nancy Lynch"Brewer' ...
, HBase is a CP type system.


History

Apache HBase began as a project by the company
Powerset In mathematics, the power set (or powerset) of a set is the set of all subsets of , including the empty set and itself. In axiomatic set theory (as developed, for example, in the ZFC axioms), the existence of the power set of any set is postu ...
out of a need to process massive amounts of data for the purposes of natural-language search. Since 2010 it is a top-level Apache project.
Facebook Facebook is an online social media and social networking service owned by American company Meta Platforms. Founded in 2004 by Mark Zuckerberg with fellow Harvard College students and roommates Eduardo Saverin, Andrew McCollum, Dustin M ...
elected to implement its new messaging platform using HBase in November 2010, but migrated away from HBase in 2018. The 2.2.z series is the current stable release line, it supersedes earlier release lines.


Use cases & production deployments


Enterprises that use HBase

The following is a list of notable enterprises that have used or are using HBase: *
23andMe 23andMe Holding Co. is a publicly held personal genomics and biotechnology company based in South San Francisco, California. It is best known for providing a direct-to-consumer genetic testing service in which customers provide a saliva sample t ...
*
Adobe Adobe ( ; ) is a building material made from earth and organic materials. is Spanish for ''mudbrick''. In some English-speaking regions of Spanish heritage, such as the Southwestern United States, the term is used to refer to any kind of e ...
*
Airbnb Airbnb, Inc. ( ), based in San Francisco, California, operates an online marketplace focused on short-term homestays and experiences. The company acts as a broker and charges a commission from each booking. The company was founded in 2008 b ...
uses HBase as part of its AirStream realtime stream computation framework * Alibaba Group *
Amadeus IT Group Amadeus IT Group, S.A. () is a major Spanish IT provider for the global travel and tourism industry. Company profile The company is structured around two areas: its global distribution system and its Information Technology business. Amadeus pro ...
, as its main long-term storage DB. *
Bloomberg Bloomberg may refer to: People * Daniel J. Bloomberg (1905–1984), audio engineer * Georgina Bloomberg (born 1983), professional equestrian * Michael Bloomberg (born 1942), American businessman and founder of Bloomberg L.P.; politician and ma ...
, for time series data storage *
Facebook Facebook is an online social media and social networking service owned by American company Meta Platforms. Founded in 2004 by Mark Zuckerberg with fellow Harvard College students and roommates Eduardo Saverin, Andrew McCollum, Dustin M ...
used HBase for its messaging platform between 2010 and 2018 *
Flipkart Flipkart Private Limited is an Indian e-commerce company, headquartered in Bengaluru, and incorporated in Singapore as a private limited company. The company initially focused on online book sales before expanding into other product categories ...
uses HBase for its search index and user insights. * Flurry *
HubSpot HubSpot is an American developer and marketer of software products for inbound marketing, sales, and customer service. HubSpot was founded by Brian Halligan and Dharmesh Shah in 2006. Its products and services aim to provide tools for customer r ...
*
Imgur Imgur ( , stylized as imgur) is an American online image sharing and image hosting service with a focus on social gossip that was founded by Alan Schaaf in 2009. The service has hosted viral images and meme, particularly those posted on Reddit. ...
uses HBase to power its notifications system *
Kakao Kakao ( ko, 카카오) is a South Korean internet company that was established in 2010. It formed as a result of a merger between Daum (web portal), Daum Communications and the original Kakao Inc. In 2014, the company was renamed Daum Kakao. Th ...
*
Netflix Netflix, Inc. is an American subscription video on-demand over-the-top streaming service and production company based in Los Gatos, California. Founded in 1997 by Reed Hastings and Marc Randolph in Scotts Valley, California, it offers a fil ...
*
Pinterest Pinterest is an American image sharing and social media service designed to enable saving and discovery of information (specifically "ideas") on the internet using images, and on a smaller scale, animated GIFs and videos, in the form of pinboard ...
*
Quicken Loans Rocket Mortgage, LLC (formerly known as Quicken Loans LLC) is a mortgage loan provider. It is headquartered in the One Campus Martius building in the financial district of Downtown Detroit, Michigan. In January 2018, the company became the larg ...
* Richrelevance *
Rocket Fuel Rocket propellant is the reaction mass of a rocket. This reaction mass is ejected at the highest achievable velocity from a rocket engine to produce thrust. The energy required can either come from the propellants themselves, as with a chemical ...
*
Salesforce.com Salesforce, Inc. is an American cloud-based software company headquartered in San Francisco, California. It provides customer relationship management (CRM) software and applications focused on sales, customer service, marketing automation, a ...
*
Sears Sears, Roebuck and Co. ( ), commonly known as Sears, is an American chain of department stores founded in 1892 by Richard Warren Sears and Alvah Curtis Roebuck and reincorporated in 1906 by Richard Sears and Julius Rosenwald, with what began a ...
* Sophos, for some of their back-end systems. *
Spotify Spotify (; ) is a proprietary Swedish audio streaming and media services provider founded on 23 April 2006 by Daniel Ek and Martin Lorentzon. It is one of the largest music streaming service providers, with over 456 million monthly active us ...
uses HBase as base for Hadoop and machine learning jobs. *
Tuenti Tuenti Technologies, S.L.U is a Mobile virtual network operator, mobile virtual network operator (MVNO) that operates with the Tuenti brand owned by Telefónica. It is a Spain-based tech company that focuses on providing a cloud-based services th ...
uses HBase for its messaging platform. *
Xiaomi Corporation (; ), commonly known as Xiaomi and registered as Xiaomi Inc., is a Chinese designer and manufacturer of consumer electronics and related software, home appliances, and household items. Behind Samsung, it is the second largest ma ...
*
Yahoo! Yahoo! (, styled yahoo''!'' in its logo) is an American web services provider. It is headquartered in Sunnyvale, California and operated by the namesake company Yahoo Inc., which is 90% owned by investment funds managed by Apollo Global Man ...


See also

*
NoSQL A NoSQL (originally referring to "non- SQL" or "non-relational") database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Such databases have existed ...
*
Wide column store A wide-column store (or extensible record store) is a type of NoSQL database.Wide Column Stores
*
Bigtable Bigtable is a fully managed wide-column and key-value NoSQL database service for large analytical and operational workloads as part of the Google Cloud portfolio. History Bigtable development began in 2004.. It is now used by a number of Googl ...
*
Apache Cassandra Cassandra is a free and open-source, distributed, wide-column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassand ...
* Oracle NOSQL *
Hypertable Hypertable was an open-source software project to implement a database management system inspired by publications on the design of Google's Bigtable. Hypertable runs on top of a distributed file system such as the Apache HDFS, GlusterFS or the ...
*
Apache Accumulo Apache Accumulo is a highly scalable sorted, distributed key-value store based on Google's Bigtable. It is a system built on top of Apache Hadoop, Apache ZooKeeper, and Apache Thrift. Written in Java, Accumulo has cell-level access labels and ...
*
MongoDB MongoDB is a source-available cross-platform document-oriented database program. Classified as a NoSQL database program, MongoDB uses JSON-like documents with optional schemas. MongoDB is developed by MongoDB Inc. and licensed under the Serve ...
*
Project Voldemort Voldemort is a distributed data store that was designed as a key-value store used by LinkedIn for highly-scalable storage. It is named after the fictional ''Harry Potter'' villain Lord Voldemort. Overview Voldemort does not try to satisfy arbitr ...
*
Riak Riak (pronounced "ree-ack" ) is a distributed NoSQL key-value data store based on Amazon's Dynamo paper, including its "tunable AP" approach, that is tunable consistency, to the tradeoffs imposed by the CAP Theorem. Riak offers high availability, ...
* Sqoop *
Elasticsearch Elasticsearch is a search engine based on the Lucene library. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. Elasticsearch is developed in Java and is dual ...
*
Apache Phoenix Apache Phoenix is an open source, massively parallel, relational database engine supporting OLTP for Hadoop using Apache HBase as its backing store. Phoenix provides a JDBC driver that hides the intricacies of the NoSQL store enabling users to cr ...


References


Bibliography

* * *


External links

* /hbase.apache.org/ Official Apache HBase homepage {{DEFAULTSORT:Hbase
HBase HBase is an open-source non-relational distributed database modeled after Google's Bigtable and written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS (Hadoop Distributed File Sys ...
Bigtable implementations Hadoop Free database management systems NoSQL Structured storage