Join (relational algebra)
   HOME

TheInfoList



OR:

In
database theory Database theory encapsulates a broad range of topics related to the study and research of the theoretical realm of databases and database management systems. Theoretical aspects of data management include, among other areas, the foundations of q ...
, relational algebra is a theory that uses algebraic structures with a well-founded semantics for modeling data, and defining queries on it. The theory was introduced by
Edgar F. Codd Edgar Frank "Ted" Codd (19 August 1923 – 18 April 2003) was an English computer scientist who, while working for IBM, invented the relational model for database management, the theoretical basis for relational databases and relational databa ...
. The main application of relational algebra is to provide a theoretical foundation for relational databases, particularly
query language Query languages, data query languages or database query languages (DQL) are computer languages used to make queries in databases and information systems. A well known example is the Structured Query Language (SQL). Types Broadly, query language ...
s for such databases, chief among which is SQL. Relational databases store tabular data represented as relations. Queries over relational databases often likewise return tabular data represented as relations. The main purpose of the relational algebra is to define operators that transform one or more input relations to an output relation. Given that these operators accept relations as input and produce relations as output, they can be combined and used to express potentially complex queries that transform potentially many input relations (whose data are stored in the database) into a single output relation (the query results). Unary operators accept as input a single relation; examples include operators to filter certain attributes (columns) or tuples (rows) from an input relation. Binary operators accept as input two relations; such operators combine the two input relations into a single output relation by, for example, taking all tuples found in either relation, removing tuples from the first relation found in the second relation, extending the tuples of the first relation with tuples in the second relation matching certain conditions, and so forth. Other more advanced operators can also be included, where the inclusion or exclusion of certain operators gives rise to a family of algebras.


Introduction

Relational algebra received little attention outside of pure mathematics until the publication of E.F. Codd's relational model of data in 1970. Codd proposed such an algebra as a basis for database query languages. (See section Implementations.) Five primitive operators of Codd's algebra are the ''
selection Selection may refer to: Science * Selection (biology), also called natural selection, selection in evolution ** Sex selection, in genetics ** Mate selection, in mating ** Sexual selection in humans, in human sexuality ** Human mating strateg ...
'', the '' projection'', the '' Cartesian product'' (also called the ''cross product'' or ''cross join''), the '' set union'', and the '' set difference''.


Set operators

The relational algebra uses set union, set difference, and Cartesian product from
set theory Set theory is the branch of mathematical logic that studies sets, which can be informally described as collections of objects. Although objects of any kind can be collected into a set, set theory, as a branch of mathematics, is mostly conce ...
, but adds additional constraints to these operators. For set union and set difference, the two relations involved must be ''union-compatible''—that is, the two relations must have the same set of attributes. Because
set intersection In set theory, the intersection of two sets A and B, denoted by A \cap B, is the set containing all elements of A that also belong to B or equivalently, all elements of B that also belong to A. Notation and terminology Intersection is writt ...
is defined in terms of set union and set difference, the two relations involved in set intersection must also be union-compatible. For the Cartesian product to be defined, the two relations involved must have disjoint headers—that is, they must not have a common attribute name. In addition, the Cartesian product is defined differently from the one in set theory in the sense that tuples are considered to be "shallow" for the purposes of the operation. That is, the Cartesian product of a set of ''n''-tuples with a set of ''m''-tuples yields a set of "flattened" -tuples (whereas basic set theory would have prescribed a set of 2-tuples, each containing an ''n''-tuple and an ''m''-tuple). More formally, ''R'' × ''S'' is defined as follows: R\times S:=\ The cardinality of the Cartesian product is the product of the cardinalities of its factors, that is, , ''R'' × ''S'', = , ''R'', × , ''S'', .


Projection ()

A projection is a
unary operation In mathematics, an unary operation is an operation with only one operand, i.e. a single input. This is in contrast to binary operations, which use two operands. An example is any function , where is a set. The function is a unary operation o ...
written as \Pi_( R ) where a_1,\ldots,a_n is a set of attribute names. The result of such projection is defined as the set that is obtained when all
tuple In mathematics, a tuple is a finite ordered list (sequence) of elements. An -tuple is a sequence (or ordered list) of elements, where is a non-negative integer. There is only one 0-tuple, referred to as ''the empty tuple''. An -tuple is defi ...
s in ''R'' are restricted to the set \. Note: when implemented in SQL standard the "default projection" returns a
multiset In mathematics, a multiset (or bag, or mset) is a modification of the concept of a set that, unlike a set, allows for multiple instances for each of its elements. The number of instances given for each element is called the multiplicity of that e ...
instead of a set, and the projection to eliminate duplicate data is obtained by the addition of the DISTINCT keyword.


Selection (''σ'')

A generalized selection is a
unary operation In mathematics, an unary operation is an operation with only one operand, i.e. a single input. This is in contrast to binary operations, which use two operands. An example is any function , where is a set. The function is a unary operation o ...
written as \sigma_\varphi(R) where is a
propositional formula In propositional logic, a propositional formula is a type of syntactic formula which is well formed and has a truth value. If the values of all variables in a propositional formula are given, it determines a unique truth value. A propositional for ...
that consists of
atom Every atom is composed of a nucleus and one or more electrons bound to the nucleus. The nucleus is made of one or more protons and a number of neutrons. Only the most common variety of hydrogen has no neutrons. Every solid, liquid, gas, ...
s as allowed in the normal selection and the logical operators (
and or AND may refer to: Logic, grammar, and computing * Conjunction (grammar), connecting two words, phrases, or clauses * Logical conjunction in mathematical logic, notated as "∧", "⋅", "&", or simple juxtaposition * Bitwise AND, a boolea ...
), ( or) and ( negation). This selection selects all those
tuple In mathematics, a tuple is a finite ordered list (sequence) of elements. An -tuple is a sequence (or ordered list) of elements, where is a non-negative integer. There is only one 0-tuple, referred to as ''the empty tuple''. An -tuple is defi ...
s in ''R'' for which holds. To obtain a listing of all friends or business associates in an address book, the selection might be written as \sigma_( \text ). The result would be a relation containing every attribute of every unique record where is true or where is true.


Rename (''ρ'')

A rename is a
unary operation In mathematics, an unary operation is an operation with only one operand, i.e. a single input. This is in contrast to binary operations, which use two operands. An example is any function , where is a set. The function is a unary operation o ...
written as \rho_(R) where the result is identical to ''R'' except that the ''b'' attribute in all tuples is renamed to an ''a'' attribute. This is simply used to rename the attribute of a relation or the relation itself. To rename the "isFriend" attribute to "isBusinessContact" in a relation, \rho_ ( \text ) might be used. There is also the \rho_(R) notation, where ''R'' is renamed to ''x'' and the attributes \ are renamed to \.


Joins and join-like operators


()

Natural join (⋈) is a
binary operator In mathematics, a binary operation or dyadic operation is a rule for combining two elements (called operands) to produce another element. More formally, a binary operation is an operation of arity two. More specifically, an internal binary o ...
that is written as (''R'' ⋈ ''S'') where ''R'' and ''S'' are relations. The result of the natural join is the set of all combinations of tuples in ''R'' and ''S'' that are equal on their common attribute names. For an example consider the tables ''Employee'' and ''Dept'' and their natural join: Note that neither the employee named Mary nor the Production department appear in the result. This can also be used to define composition of relations. For example, the composition of ''Employee'' and ''Dept'' is their join as shown above, projected on all but the common attribute ''DeptName''. In category theory, the join is precisely the
fiber product In category theory, a branch of mathematics, a pullback (also called a fiber product, fibre product, fibered product or Cartesian square) is the limit of a diagram consisting of two morphisms and with a common codomain. The pullback is often w ...
. The natural join is arguably one of the most important operators since it is the relational counterpart of the logical AND operator. Note that if the same variable appears in each of two predicates that are connected by AND, then that variable stands for the same thing and both appearances must always be substituted by the same value (this is a consequence of the
idempotence Idempotence (, ) is the property of certain operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application. The concept of idempotence arises in a number of pl ...
of the logical AND). In particular, natural join allows the combination of relations that are associated by a foreign key. For example, in the above example a foreign key probably holds from ''Employee''.''DeptName'' to ''Dept''.''DeptName'' and then the natural join of ''Employee'' and ''Dept'' combines all employees with their departments. This works because the foreign key holds between attributes with the same name. If this is not the case such as in the foreign key from ''Dept''.''Manager'' to ''Employee''.''Name'' then these columns must be renamed before taking the natural join. Such a join is sometimes also referred to as an equijoin (see ''θ''-join). More formally the semantics of the natural join are defined as follows: where ''Fun(t)'' is a
predicate Predicate or predication may refer to: * Predicate (grammar), in linguistics * Predication (philosophy) * several closely related uses in mathematics and formal logic: **Predicate (mathematical logic) **Propositional function **Finitary relation, o ...
that is true for a relation ''t'' (in the mathematical sense)
iff In logic and related fields such as mathematics and philosophy, "if and only if" (shortened as "iff") is a biconditional logical connective between statements, where either both statements are true or both are false. The connective is bicon ...
''t'' is a function (that is, ''t'' does not map any attribute to multiple values). It is usually required that ''R'' and ''S'' must have at least one common attribute, but if this constraint is omitted, and ''R'' and ''S'' have no common attributes, then the natural join becomes exactly the Cartesian product. The natural join can be simulated with Codd's primitives as follows. Assume that ''c''1,...,''c''''m'' are the attribute names common to ''R'' and ''S'', ''r''1,...,''r''''n'' are the attribute names unique to ''R'' and ''s''1,...,''s''''k'' are the attribute names unique to ''S''. Furthermore, assume that the attribute names ''x''1,...,''x''''m'' are neither in ''R'' nor in ''S''. In a first step the common attribute names in ''S'' can be renamed: Then we take the Cartesian product and select the tuples that are to be joined: Finally we take a projection to get rid of the renamed attributes:


''θ''-join and equijoin

Consider tables ''Car'' and ''Boat'' which list models of cars and boats and their respective prices. Suppose a customer wants to buy a car and a boat, but she does not want to spend more money for the boat than for the car. The ''θ''-join (⋈''θ'') on the predicate ''CarPrice'' ≥ ''BoatPrice'' produces the flattened pairs of rows which satisfy the predicate. When using a condition where the attributes are equal, for example Price, then the condition may be specified as ''Price''=''Price'' or alternatively (''Price'') itself. In order to combine tuples from two relations where the combination condition is not simply the equality of shared attributes it is convenient to have a more general form of join operator, which is the ''θ''-join (or theta-join). The ''θ''-join is a binary operator that is written as or where ''a'' and ''b'' are attribute names, ''θ'' is a binary
relational operator In computer science, a relational operator is a programming language construct or operator that tests or defines some kind of relation between two entities. These include numerical equality (''e.g.'', ) and inequalities (''e.g.'', ). In pr ...
in the set , ''υ'' is a value constant, and ''R'' and ''S'' are relations. The result of this operation consists of all combinations of tuples in ''R'' and ''S'' that satisfy ''θ''. The result of the ''θ''-join is defined only if the headers of ''S'' and ''R'' are disjoint, that is, do not contain a common attribute. The simulation of this operation in the fundamental operations is therefore as follows: : ''R'' ⋈''θ'' ''S'' = ''σθ''(''R'' × ''S'') In case the operator ''θ'' is the equality operator (=) then this join is also called an equijoin. Note, however, that a computer language that supports the natural join and selection operators does not need ''θ''-join as well, as this can be achieved by selection from the result of a natural join (which degenerates to Cartesian product when there are no shared attributes). In SQL implementations, joining on a predicate is usually called an ''inner join'', and the ''on'' keyword allows one to specify the predicate used to filter the rows. It is important to note: forming the flattened Cartesian product then filtering the rows is conceptually correct, but an implementation would use more sophisticated data structures to speed up the join query.


(⋉ and ⋊)

The left semijoin is a joining similar to the natural join and written as R \ltimes S where R and S are relations. The result is the set of all tuples in R for which there is a tuple in S that is equal on their common attribute names. The difference from a natural join is that other columns of S do not appear. For example, consider the tables ''Employee'' and ''Dept'' and their semijoin: More formally the semantics of the semijoin can be defined as follows: R \ltimes S = \ where \operatorname(r) is as in the definition of natural join. The semijoin can be simulated using the natural join as follows. If a_1, \ldots, a_n are the attribute names of R, then R \ltimes S = \Pi_(R \bowtie S). Since we can simulate the natural join with the basic operators it follows that this also holds for the semijoin. In Codd's 1970 paper, semijoin is called restriction.


(▷)

The antijoin, written as ''R'' ▷ ''S'' where ''R'' and ''S'' are relations, is similar to the semijoin, but the result of an antijoin is only those tuples in ''R'' for which there is ''no'' tuple in ''S'' that is equal on their common attribute names. For an example consider the tables ''Employee'' and ''Dept'' and their antijoin: The antijoin is formally defined as follows: : ''R'' ▷ ''S'' = or : ''R'' ▷ ''S'' = where is as in the definition of natural join. The antijoin can also be defined as the
complement A complement is something that completes something else. Complement may refer specifically to: The arts * Complement (music), an interval that, when added to another, spans an octave ** Aggregate complementation, the separation of pitch-clas ...
of the semijoin, as follows: Given this, the antijoin is sometimes called the anti-semijoin, and the antijoin operator is sometimes written as semijoin symbol with a bar above it, instead of ▷.


(÷)

The division is a binary operation that is written as ''R'' ÷ ''S''. Division is not implemented directly in SQL. The result consists of the restrictions of tuples in ''R'' to the attribute names unique to ''R'', i.e., in the header of ''R'' but not in the header of ''S'', for which it holds that all their combinations with tuples in ''S'' are present in ''R''. For an example see the tables ''Completed'', ''DBProject'' and their division: If ''DBProject'' contains all the tasks of the Database project, then the result of the division above contains exactly the students who have completed both of the tasks in the Database project. More formally the semantics of the division is defined as follows:where is the set of attribute names unique to ''R'' and ''t'' 'a''1,...,''a''''n''is the restriction of ''t'' to this set. It is usually required that the attribute names in the header of ''S'' are a subset of those of ''R'' because otherwise the result of the operation will always be empty. The simulation of the division with the basic operations is as follows. We assume that ''a''1,...,''a''''n'' are the attribute names unique to ''R'' and ''b''1,...,''b''''m'' are the attribute names of ''S''. In the first step we project ''R'' on its unique attribute names and construct all combinations with tuples in ''S'': : ''T'' := π''a''1,...,''a''''n''(''R'') × ''S'' In the prior example, T would represent a table such that every Student (because Student is the unique key / attribute of the Completed table) is combined with every given Task. So Eugene, for instance, would have two rows, Eugene → Database1 and Eugene → Database2 in T. :: EG: First, let's pretend that "Completed" has a third attribute called "grade". It's unwanted baggage here, so we must project it off always. In fact in this step we can drop "Task" from R as well; the multiply puts it back on. :: ''T'' := πStudent(''R'') × ''S'' // This gives us every possible desired combination, including those that don't actually exist in R, and excluding others (eg Fred , compiler1, which is not a desired combination) In the next step we subtract ''R'' from ''T'' relation: : ''U'' := ''T'' − ''R'' In ''U'' we have the possible combinations that "could have" been in ''R'', but weren't. :: EG: Again with projections — ''T'' and ''R'' need to have identical attribute names/headers. :: ''U'' := ''T'' − πStudent,Task(''R'') // This gives us a "what's missing" list. So if we now take the projection on the attribute names unique to ''R'' then we have the restrictions of the tuples in ''R'' for which not all combinations with tuples in ''S'' were present in ''R'': : ''V'' := π''a''1,...,''a''''n''(''U'') :: EG: Project ''U'' down to just the attribute(s) in question (Student) :: ''V'' := πStudent(''U'') So what remains to be done is take the projection of ''R'' on its unique attribute names and subtract those in ''V'': : ''W'' := π''a''1,...,''a''''n''(''R'') − ''V'' :: EG: ''W'' := πStudent(''R'') − ''V''.


Common extensions

In practice the classical relational algebra described above is extended with various operations such as outer joins, aggregate functions and even transitive closure.


Outer joins

Whereas the result of a join (or inner join) consists of tuples formed by combining matching tuples in the two operands, an outer join contains those tuples and additionally some tuples formed by extending an unmatched tuple in one of the operands by "fill" values for each of the attributes of the other operand. Outer joins are not considered part of the classical relational algebra discussed so far. The operators defined in this section assume the existence of a ''null'' value, ''ω'', which we do not define, to be used for the fill values; in practice this corresponds to the
NULL Null may refer to: Science, technology, and mathematics Computing * Null (SQL) (or NULL), a special marker and keyword in SQL indicating that something has no value * Null character, the zero-valued ASCII character, also designated by , often use ...
in SQL. In order to make subsequent selection operations on the resulting table meaningful, a semantic meaning needs to be assigned to nulls; in Codd's approach the propositional logic used by the selection is extended to a three-valued logic, although we elide those details in this article. Three outer join operators are defined: left outer join, right outer join, and full outer join. (The word "outer" is sometimes omitted.)


(⟕)

The left outer join is written as ''R'' ⟕ ''S'' where ''R'' and ''S'' are relations. The result of the left outer join is the set of all combinations of tuples in ''R'' and ''S'' that are equal on their common attribute names, in addition (loosely speaking) to tuples in ''R'' that have no matching tuples in ''S''. For an example consider the tables ''Employee'' and ''Dept'' and their left outer join: In the resulting relation, tuples in ''S'' which have no common values in common attribute names with tuples in ''R'' take a ''null'' value, ''ω''. Since there are no tuples in ''Dept'' with a ''DeptName'' of ''Finance'' or ''Executive'', ''ω''s occur in the resulting relation where tuples in ''Employee'' have a ''DeptName'' of ''Finance'' or ''Executive''. Let ''r''1, ''r''2, ..., ''r''''n'' be the attributes of the relation ''R'' and let be the singleton relation on the attributes that are ''unique'' to the relation ''S'' (those that are not attributes of ''R''). Then the left outer join can be described in terms of the natural join (and hence using basic operators) as follows: :(R \bowtie S) \cup ((R - \pi_(R \bowtie S)) \times \)


(⟖)

The right outer join behaves almost identically to the left outer join, but the roles of the tables are switched. The right outer join of relations ''R'' and ''S'' is written as ''R'' ⟖ ''S''. The result of the right outer join is the set of all combinations of tuples in ''R'' and ''S'' that are equal on their common attribute names, in addition to tuples in ''S'' that have no matching tuples in ''R''. For example, consider the tables ''Employee'' and ''Dept'' and their right outer join: In the resulting relation, tuples in ''R'' which have no common values in common attribute names with tuples in ''S'' take a ''null'' value, ''ω''. Since there are no tuples in ''Employee'' with a ''DeptName'' of ''Production'', ''ω''s occur in the Name and EmpId attributes of the resulting relation where tuples in ''Dept'' had ''DeptName'' of ''Production''. Let ''s''1, ''s''2, ..., ''s''''n'' be the attributes of the relation ''S'' and let be the singleton relation on the attributes that are ''unique'' to the relation ''R'' (those that are not attributes of ''S''). Then, as with the left outer join, the right outer join can be simulated using the natural join as follows: :(R \bowtie S) \cup (\ \times (S - \pi_(R \bowtie S)))


(⟗)

The outer join or full outer join in effect combines the results of the left and right outer joins. The full outer join is written as ''R'' ⟗ ''S'' where ''R'' and ''S'' are relations. The result of the full outer join is the set of all combinations of tuples in ''R'' and ''S'' that are equal on their common attribute names, in addition to tuples in ''S'' that have no matching tuples in ''R'' and tuples in ''R'' that have no matching tuples in ''S'' in their common attribute names. For an example consider the tables ''Employee'' and ''Dept'' and their full outer join: In the resulting relation, tuples in ''R'' which have no common values in common attribute names with tuples in ''S'' take a ''null'' value, ''ω''. Tuples in ''S'' which have no common values in common attribute names with tuples in ''R'' also take a ''null'' value, ''ω''. The full outer join can be simulated using the left and right outer joins (and hence the natural join and set union) as follows: :''R'' ⟗ ''S'' = (''R'' ⟕ ''S'') ∪ (''R'' ⟖ ''S'')


Operations for domain computations

There is nothing in relational algebra introduced so far that would allow computations on the data domains (other than evaluation of propositional expressions involving equality). For example, it is not possible using only the algebra introduced so far to write an expression that would multiply the numbers from two columns, e.g. a unit price with a quantity to obtain a total price. Practical query languages have such facilities, e.g. the SQL SELECT allows arithmetic operations to define new columns in the result SELECT unit_price * quantity AS total_price FROM t, and a similar facility is provided more explicitly by Tutorial D's EXTEND keyword. In database theory, this is called extended projection.


Aggregation

Furthermore, computing various functions on a column, like the summing up of its elements, is also not possible using the relational algebra introduced so far. There are five
aggregate function In database management, an aggregate function or aggregation function is a function where the values of multiple rows are grouped together to form a single summary value. Common aggregate functions include: * Average (i.e., arithmetic mean) * ...
s that are included with most relational database systems. These operations are Sum, Count, Average, Maximum and Minimum. In relational algebra the aggregation operation over a schema (''A''1, ''A''2, ... ''A''''n'') is written as follows: :G_1, G_2, \ldots, G_m\ g_\ (r) where each ''A''''j''', 1 ≤ ''j'' ≤ ''k'', is one of the original attributes ''A''''i'', 1 ≤ ''i'' ≤ ''n''. The attributes preceding the ''g'' are grouping attributes, which function like a "group by" clause in SQL. Then there are an arbitrary number of aggregation functions applied to individual attributes. The operation is applied to an arbitrary relation ''r''. The grouping attributes are optional, and if they are not supplied, the aggregation functions are applied across the entire relation to which the operation is applied. Let's assume that we have a table named with three columns, namely and . We wish to find the maximum balance of each branch. This is accomplished by ''G''Max()(). To find the highest balance of all accounts regardless of branch, we could simply write ''G''Max()(). Grouping is often written as ''ɣ''Max()() instead.


Transitive closure

Although relational algebra seems powerful enough for most practical purposes, there are some simple and natural operators on relations that cannot be expressed by relational algebra. One of them is the
transitive closure In mathematics, the transitive closure of a binary relation on a set is the smallest relation on that contains and is transitive. For finite sets, "smallest" can be taken in its usual sense, of having the fewest related pairs; for infinite ...
of a binary relation. Given a domain ''D'', let binary relation ''R'' be a subset of ''D''×''D''. The transitive closure ''R+'' of ''R'' is the smallest subset of ''D''×''D'' that contains ''R'' and satisfies the following condition: :\forall x \forall y \forall z \left( (x,y) \in R^+ \wedge (y,z) \in R^+ \Rightarrow (x,z) \in R^+ \right) It can be proved using the fact that there is no relational algebra expression ''E''(''R'') taking ''R'' as a variable argument that produces ''R''+. SQL however officially supports such fixpoint queries since 1999, and it had vendor-specific extensions in this direction well before that.


Use of algebraic properties for query optimization

Queries can be represented as a
tree In botany, a tree is a perennial plant with an elongated stem, or trunk, usually supporting branches and leaves. In some usages, the definition of a tree may be narrower, including only woody plants with secondary growth, plants that are ...
, where * the internal nodes are operators, * leaves are relations, * subtrees are subexpressions. The primary goal is to transform expression trees into equivalent expression trees, where the average size of the relations yielded by subexpressions in the tree is smaller than it was before the
optimization Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...
. The secondary goal is to try to form common subexpressions within a single query, or if there is more than one query being evaluated at the same time, in all of those queries. The rationale behind the second goal is that it is enough to compute common subexpressions once, and the results can be used in all queries that contain that subexpression. Here are a set of rules that can be used in such transformations.


Selection

Rules about selection operators play the most important role in query optimization. Selection is an operator that very effectively decreases the number of rows in its operand, so if the selections in an expression tree are moved towards the leaves, the internal relations (yielded by subexpressions) will likely shrink.


Basic selection properties

Selection is
idempotent Idempotence (, ) is the property of certain operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application. The concept of idempotence arises in a number of pl ...
(multiple applications of the same selection have no additional effect beyond the first one), and
commutative In mathematics, a binary operation is commutative if changing the order of the operands does not change the result. It is a fundamental property of many binary operations, and many mathematical proofs depend on it. Most familiar as the name of ...
(the order selections are applied in has no effect on the eventual result). #\sigma_(R)=\sigma_\sigma_(R)\,\! #\sigma_\sigma_(R)=\sigma_\sigma_(R)\,\!


Breaking up selections with complex conditions

A selection whose condition is a
conjunction Conjunction may refer to: * Conjunction (grammar), a part of speech * Logical conjunction, a mathematical operator ** Conjunction introduction, a rule of inference of propositional logic * Conjunction (astronomy), in which two astronomical bodies ...
of simpler conditions is equivalent to a sequence of selections with those same individual conditions, and selection whose condition is a
disjunction In logic, disjunction is a logical connective typically notated as \lor and read aloud as "or". For instance, the English language sentence "it is raining or it is snowing" can be represented in logic using the disjunctive formula R \lor S ...
is equivalent to a union of selections. These identities can be used to merge selections so that fewer selections need to be evaluated, or to split them so that the component selections may be moved or optimized separately. #\sigma_(R)=\sigma_(\sigma_(R))=\sigma_(\sigma_(R)) #\sigma_(R)=\sigma_(R)\cup\sigma_(R)


Selection and cross product

Cross product is the costliest operator to evaluate. If the input relations have ''N'' and ''M'' rows, the result will contain NM rows. Therefore, it is important to decrease the size of both operands before applying the cross product operator. This can be effectively done if the cross product is followed by a selection operator, e.g. \sigma_(R \times P). Considering the definition of join, this is the most likely case. If the cross product is not followed by a selection operator, we can try to push down a selection from higher levels of the expression tree using the other selection rules. In the above case the condition ''A'' is broken up in to conditions ''B'', ''C'' and ''D'' using the split rules about complex selection conditions, so that A = B \wedge C \wedge D and ''B'' contains attributes only from ''R'', ''C'' contains attributes only from ''P'', and ''D'' contains the part of ''A'' that contains attributes from both ''R'' and ''P''. Note, that ''B'', ''C'' or ''D'' are possibly empty. Then the following holds: :\sigma_(R \times P) = \sigma_(R \times P) = \sigma_(\sigma_(R) \times \sigma_(P))


Selection and set operators

Selection is distributive over the set difference, intersection, and union operators. The following three rules are used to push selection below set operations in the expression tree. For the set difference and the intersection operators, it is possible to apply the selection operator to just one of the operands following the transformation. This can be beneficial where one of the operands is small, and the overhead of evaluating the selection operator outweighs the benefits of using a smaller relation as an operand. #\sigma_(R\setminus P)=\sigma_(R)\setminus \sigma_(P) =\sigma_(R)\setminus P #\sigma_(R\cup P)=\sigma_(R)\cup\sigma_(P) #\sigma_(R\cap P)=\sigma_(R)\cap\sigma_(P)=\sigma_(R)\cap P=R\cap \sigma_(P)


Selection and projection

Selection commutes with projection if and only if the fields referenced in the selection condition are a subset of the fields in the projection. Performing selection before projection may be useful if the operand is a cross product or join. In other cases, if the selection condition is relatively expensive to compute, moving selection outside the projection may reduce the number of tuples which must be tested (since projection may produce fewer tuples due to the elimination of duplicates resulting from omitted fields). :\pi_(\sigma_A( R )) = \sigma_A(\pi_( R ))\textA \subseteq \


Projection


Basic projection properties

Projection is idempotent, so that a series of (valid) projections is equivalent to the outermost projection. :\pi_(\pi_(R)) = \pi_(R)\text\ \subseteq \


Projection and set operators

Projection is distributive over set union. :\pi_(R \cup P) = \pi_(R) \cup \pi_(P). \, Projection does not distribute over intersection and set difference. Counterexamples are given by: :\pi_A(\ \cap \) = \emptyset :\pi_A(\) \cap \pi_A(\) = \ and :\pi_A(\ \setminus \) = \ :\pi_A(\) \setminus \pi_A(\) = \emptyset\,, where ''b'' is assumed to be distinct from .


Rename


Basic rename properties

Successive renames of a variable can be collapsed into a single rename. Rename operations which have no variables in common can be arbitrarily reordered with respect to one another, which can be exploited to make successive renames adjacent so that they can be collapsed. #\rho_(\rho_(R)) = \rho_(R)\,\! #\rho_(\rho_(R)) = \rho_(\rho_(R))\,\!


Rename and set operators

Rename is distributive over set difference, union, and intersection. #\rho_(R \setminus P) = \rho_(R) \setminus \rho_(P) #\rho_(R \cup P) = \rho_(R) \cup \rho_(P) #\rho_(R \cap P) = \rho_(R) \cap \rho_(P)


Product and union

Cartesian product is distributive over union. #(A \times B) \cup (A \times C) = A \times (B \cup C)


Implementations

The first query language to be based on Codd's algebra was Alpha, developed by Dr. Codd himself. Subsequently, ISBL was created, and this pioneering work has been acclaimed by many authorities as having shown the way to make Codd's idea into a useful language. Business System 12 was a short-lived industry-strength relational DBMS that followed the ISBL example. In 1998
Chris Date Chris Date (born 1941) is an independent author, lecturer, researcher, and consultant, specializing in relational database theory. Biography Chris Date attended High Wycombe Royal Grammar School (U.K.) from 1951 to 1958 and received his BA i ...
and
Hugh Darwen Hugh Darwen is a computer scientist who was an employee of IBM United Kingdom from 1967. to 2004, and has been involved in the development of the relational model. Work From 1978 to 1982 he was a chief architect on Business System 12, a dat ...
proposed a language called Tutorial D intended for use in teaching relational database theory, and its query language also draws on ISBL's ideas.
Rel Rel or REL may mean: __NOTOC__ Science and technology * REL, a human gene * the rel descriptor of stereochemistry, see Relative configuration *REL (''Rassemblement Européen pour la Liberté''), European Rally for Liberty, a defunct French far-ri ...
is an implementation of Tutorial D. Even the query language of SQL is loosely based on a relational algebra, though the operands in SQL (
table Table may refer to: * Table (furniture), a piece of furniture with a flat surface and one or more legs * Table (landform), a flat area of land * Table (information), a data arrangement with rows and columns * Table (database), how the table data ...
s) are not exactly relations and several useful theorems about the relational algebra do not hold in the SQL counterpart (arguably to the detriment of optimisers and/or users). The SQL table model is a bag (
multiset In mathematics, a multiset (or bag, or mset) is a modification of the concept of a set that, unlike a set, allows for multiple instances for each of its elements. The number of instances given for each element is called the multiplicity of that e ...
), rather than a set. For example, the expression (R \cup S) \setminus T = (R \setminus T) \cup (S \setminus T) is a theorem for relational algebra on sets, but not for relational algebra on bags; for a treatment of relational algebra on bags see chapter 5 of the "Complete" textbook by Garcia-Molina, Ullman and Widom.


See also

* Cartesian product *
D4 (programming language) Dataphor is an open-source truly- relational database management system ( RDBMS) and its accompanying user interface technologies, which together are designed to provide highly declarative software application development. The Dataphor Server has ...
(an implementation of D) *
Database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases s ...
* Logic of relatives *
Object-role modeling Object-role modeling (ORM) is used to model the semantics of a universe of discourse. ORM is often used for data modeling and software engineering. An object-role model uses graphical symbols that are based on first order predicate logic and se ...
*
Projection (mathematics) In mathematics, a projection is a mapping of a set (or other mathematical structure) into a subset (or sub-structure), which is equal to its square for mapping composition, i.e., which is idempotent. The restriction to a subspace of a projecti ...
*
Projection (relational algebra) In relational algebra, a projection is a unary operation written as \Pi_( R ), where R is a relation and a_1,...,a_n are attribute names. Its result is defined as the set obtained when the components of the tuples in R are restricted to the set \ ...
*
Projection (set theory) In set theory, a projection is one of two closely related types of functions or operations, namely: * A set-theoretic operation typified by the ''j''th projection map, written \mathrm_, that takes an element \vec = (x_1,\ \ldots,\ x_j,\ \ldots,\ x ...
* Relation *
Relation (database) In database theory, a relation, as originally defined by E. F. Codd, is a set of tuples (d1, d2, ..., dn), where each element dj is a member of Dj, a data domain. Codd's original definition notwithstanding, and contrary to the usua ...
* Relation algebra *
Relation composition In the mathematics of binary relations, the composition of relations is the forming of a new binary relation from two given binary relations ''R'' and ''S''. In the calculus of relations, the composition of relations is called relative multiplic ...
* Relation construction *
Relational calculus The relational calculus consists of two calculi, the tuple relational calculus and the domain relational calculus, that are part of the relational model for databases and provide a declarative way to specify database queries. The raison d'être ...
* Relational database * Relational model *
Theory of relations In mathematics, a finitary relation over sets is a subset of the Cartesian product ; that is, it is a set of ''n''-tuples consisting of elements ''x'i'' in ''X'i''. Typically, the relation describes a possible connection between the elem ...
*
Triadic relation In mathematics, a ternary relation or triadic relation is a finitary relation in which the number of places in the relation is three. Ternary relations may also be referred to as 3-adic, 3-ary, 3-dimensional, or 3-place. Just as a binary relat ...
*
Tuple relational calculus Tuple calculus is a calculus that was created and introduced by Edgar F. Codd as part of the relational model, in order to provide a declarative database-query language for data manipulation in this data model. It formed the inspiration for the d ...
* SQL *
Datalog Datalog is a declarative logic programming language. While it is syntactically a subset of Prolog, Datalog generally uses a bottom-up rather than top-down evaluation model. This difference yields significantly different behavior and properties ...
*
Codd's theorem Codd's theorem states that relational algebra and the domain-independent relational calculus queries, two well-known foundational query languages for the relational model, are precisely equivalent in expressive power. That is, a database query can ...


Notes


References


Further reading

Practically any academic textbook on databases has a detailed treatment of the classic relational algebra. * (For relationship with cylindric algebras).


External links


RAT. Software Relational Algebra Translator to SQL


- An introduction to how database systems process relational algebra

– A quick tutorial to adapt SQL queries into relational algebra

* ttp://www-db.stanford.edu/~widom/cs346/ioannidis.pdf Query Optimization(Page deleted; Closest alternatives
Standford Query Optimization 2Microsoft research Query Optimization in relational systemsStanford paper: Query Optimization
This paper is an introduction into the use of the relational algebra in optimizing queries, and includes numerous citations for more in-depth study.
Relational Algebra System for Oracle and Microsoft SQL Server


* ttp://des.sourceforge.net DES – An educational tool for working with Relational Algebra and other formal languages
RelaX - Relational Algebra Calculator
(open-source software available as an online service without registration)
RA: A Relational Algebra Interpreter

Translating SQL to Relational Algebra
{{Databases Relational model