ColumnFamily
   HOME

TheInfoList



OR:

The standard column family is a
NoSQL A NoSQL (originally referring to "non- SQL" or "non-relational") database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Such databases have existed ...
object that contains
column A column or pillar in architecture and structural engineering is a structural element that transmits, through compression, the weight of the structure above to other structural elements below. In other words, a column is a compression member. ...
s of related data. It is a
tuple In mathematics, a tuple is a finite ordered list (sequence) of elements. An -tuple is a sequence (or ordered list) of elements, where is a non-negative integer. There is only one 0-tuple, referred to as ''the empty tuple''. An -tuple is defi ...
(pair) that consists of a key–value pair, where the key is mapped to a value that is a set of columns. In analogy with relational databases, a standard column family is as a "table", each key–value pair being a "row". Each column is a
tuple In mathematics, a tuple is a finite ordered list (sequence) of elements. An -tuple is a sequence (or ordered list) of elements, where is a non-negative integer. There is only one 0-tuple, referred to as ''the empty tuple''. An -tuple is defi ...
( triplet) consisting of a column name, a value, and a
timestamp A timestamp is a sequence of characters or encoded information identifying when a certain event occurred, usually giving date and time of day, sometimes accurate to a small fraction of a second. Timestamps do not have to be based on some absolut ...
. In a relational
database table A table is a collection of related data held in a table format within a database. It consists of columns and rows. In relational databases, and flat file databases, a ''table'' is a set of data elements (values) using a model of vertical colum ...
, this data would be grouped together within a table with other non-related data. Standard column families are column containers sorted by their names can be referenced and sorted by their row key.


Benefits

Accessing the data in a
distributed Distribution may refer to: Mathematics *Distribution (mathematics), generalized functions used to formulate solutions of partial differential equations *Probability distribution, the probability of a particular value or value range of a varia ...
data store In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted. ...
would be expensive (time-consuming), if it would be saved in form of a table. It would also be inefficient to read all column families that would make up a row in a relational table and put it together to form a row, as the data for it is distributed on a large number of
nodes In general, a node is a localized swelling (a "knot") or a point of intersection (a Vertex (graph theory), vertex). Node may refer to: In mathematics *Vertex (graph theory), a vertex in a mathematical graph *Vertex (geometry), a point where two ...
. Therefore, the user accesses only the related information required. As an example, a relational table could consist of the columns UID, first name, surname, birthdate, gender, etc. In a distributed data store, the same table would be implemented by creating columns families for "UID, first name, surname", "birthdate, gender", etc. If one needs only the males that were born between 1950 and 1960, for a query in the relational database, all the table has to be read. In a distributed data store, it suffices to access only the second standard column family, as the rest of information is irrelevant.


Sorting and querying

There is no way to sort columns, nor to query an arbitrary query in
distributed data store A distributed data store is a computer network where information is stored on more than one node, often in a replicated fashion. It is usually specifically used to refer to either a distributed database where users store information on a ''numb ...
s. Columns are sorted when they are added to the column family. The way of sorting is defined by an attribute. For instance, this is done by the CompareWith attribute in
Apache Cassandra Cassandra is a free and open-source, distributed, wide-column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandr ...
that can have the following values: * AsciiType * BytesType * LexicalUUIDType * LongType * TimeUUIDType * UTF8Type It is also possible to add some user-defined sorting attributes. Using this way of sorting makes the process extremely quick.


Standard column families vs. rows

Standard column families have a schema-less nature so that each of their "row"s can contain a different number of columns, and even different column names could be in each row. So, they are a very different concept than the rows in relational database management system (RDBMS)s. This is one of the reasons why the concept is not trivial for an experienced RDBMS expert.


Examples

In JSON-like notation, a column family definition would look as follows: UserProfile = where "Cassandra", "TerryCho", "Cath" correspond to row keys; and "emailAddress", "age", "gender", "address" correspond to the column names.


See also

*
Column (data store) A column of a distributed data store is a NoSQL object of the lowest level in a keyspace. It is a tuple (a key–value pair) consisting of three elements: * Unique name: Used to reference the column * Value: The content of the column. It can ha ...
*
Column family {{Short description, A database project that organizes data in packed columns A column family is a database object that contains columns of related data. It is a tuple (pair) that consists of a key–value pair, where the key is mapped to a value t ...
*
Super column A super column is a tuple (a pair) with a binary super column name and a value that maps it to many columns. They consist of a key–value pairs, where the values are columns. Theoretically speaking, super columns are ( sorted) associative array o ...
*
Super column family A super column family is a NoSQL object that contains column families. It is a tuple (pair) that consists of a key–value pair, where the key is mapped to a value that are column families. In analogy with relational databases, a super column fam ...


References

{{reflist


External links


The Apache Cassandra data model
Distributed data stores NoSQL