The
SQL SELECT statement returns a
result set
An SQL result set is a set of rows from a database, as well as metadata about the query such as the column names, and the types and sizes of each column. Depending on the database system, the number of rows in the result set may or may not be kno ...
of records, from one or more
tables
Table may refer to:
* Table (furniture), a piece of furniture with a flat surface and one or more legs
* Table (landform), a flat area of land
* Table (information), a data arrangement with rows and columns
* Table (database), how the table data ...
.
A SELECT statement retrieves zero or more rows from one or more
database tables or database
views
A view is a sight or prospect or the ability to see or be seen from a particular place.
View, views or Views may also refer to:
Common meanings
* View (Buddhism), a charged interpretation of experience which intensely shapes and affects thou ...
. In most applications,
SELECT
is the most commonly used
data manipulation language
A data manipulation language (DML) is a computer programming language used for adding (inserting), deleting, and modifying (updating) data in a database. A DML is often a sublanguage of a broader database language such as SQL, with the DML comp ...
(DML) command. As SQL is a
declarative programming
In computer science, declarative programming is a programming paradigm—a style of building the structure and elements of computer programs—that expresses the logic of a computation without describing its control flow.
Many languages that ap ...
language,
SELECT
queries specify a result set, but do not specify how to calculate it. The database translates the query into a "
query plan
In general, a query is a form of questioning, in a line of inquiry.
Query may also refer to:
Computing and technology
* Query, a precise request for information retrieval made to a database or information system
** Query language, a computer lan ...
" which may vary between executions, database versions and database software. This functionality is called the "
query optimizer
Query optimization is a feature of many relational database management systems and other databases such as NoSQL and graph databases. The query optimizer attempts to determine the most efficient way to execute a given query by considering the pos ...
" as it is responsible for finding the best possible execution plan for the query, within applicable constraints.
The SELECT statement has many optional clauses:
*
SELECT
clause is the list of
columns
A column or pillar in architecture and structural engineering is a structural element that transmits, through compression, the weight of the structure above to other structural elements below. In other words, a column is a compression member. ...
or SQL expressions that must be returned by the query. This is approximately the
relational algebra
In database theory, relational algebra is a theory that uses algebraic structures with a well-founded semantics for modeling data, and defining queries on it. The theory was introduced by Edgar F. Codd.
The main application of relational algebra ...
projection
Projection, projections or projective may refer to:
Physics
* Projection (physics), the action/process of light, heat, or sound reflecting from a surface to another in a different direction
* The display of images by a projector
Optics, graphic ...
operation.
*
AS
optionally provides an alias for each column or expression in the
SELECT
clause. This is the relational algebra
rename
Rename may refer to:
* Rename (computing), rename of a file on a computer
* RENAME (command), command to rename a file in various operating systems
* Rename (relational algebra)
In relational algebra, a rename is a unary operation written as \r ...
operation.
*
FROM
From may refer to:
* From, a preposition
* From (SQL), computing language keyword
* From: (email message header), field showing the sender of an email
* FromSoftware, a Japanese video game company
* Full range of motion, the travel in a rang ...
specifies from which table to get the data.
*
WHERE
Where may refer to:
* Where?, one of the Five Ws in journalism
* where (command), a shell command
* Where (SQL), a database language clause
* Where.com
Where, Inc. was a location-based media company in North America. Their main products were ...
specifies which rows to retrieve. This is approximately the relational algebra
selection
Selection may refer to:
Science
* Selection (biology), also called natural selection, selection in evolution
** Sex selection, in genetics
** Mate selection, in mating
** Sexual selection in humans, in human sexuality
** Human mating strategie ...
operation.
*
GROUP BY
groups rows sharing a property so that an
aggregate function
In database management, an aggregate function or aggregation function is a function where the values of multiple rows are grouped together to form a single summary value.
Common aggregate functions include:
* Average (i.e., arithmetic mean)
* C ...
can be applied to each group.
*
HAVING
selects among the groups defined by the GROUP BY clause.
*
ORDER BY
specifies how to order the returned rows.
Overview
SELECT
is the most common operation in SQL, called "the query".
SELECT
retrieves data from one or more
table
Table may refer to:
* Table (furniture), a piece of furniture with a flat surface and one or more legs
* Table (landform), a flat area of land
* Table (information), a data arrangement with rows and columns
* Table (database), how the table data ...
s, or expressions. Standard
SELECT
statements have no persistent effects on the database. Some non-standard implementations of
SELECT
can have persistent effects, such as the
SELECT INTO
syntax provided in some databases.
Queries allow the user to describe desired data, leaving the
database management system (DBMS) to carry out
planning
Planning is the process of thinking regarding the activities required to achieve a desired goal. Planning is based on foresight, the fundamental capacity for mental time travel. The evolution of forethought, the capacity to think ahead, is consi ...
,
optimizing
Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...
, and performing the physical operations necessary to produce that result as it chooses.
A query includes a list of columns to include in the final result, normally immediately following the
SELECT
keyword. An asterisk ("
*
") can be used to specify that the query should return all columns of the queried tables.
SELECT
is the most complex statement in SQL, with optional keywords and clauses that include:
* The
FROM
From may refer to:
* From, a preposition
* From (SQL), computing language keyword
* From: (email message header), field showing the sender of an email
* FromSoftware, a Japanese video game company
* Full range of motion, the travel in a rang ...
clause, which indicates the table(s) to retrieve data from. The
FROM
clause can include optional
JOIN Join may refer to:
* Join (law), to include additional counts or additional defendants on an indictment
*In mathematics:
** Join (mathematics), a least upper bound of sets orders in lattice theory
** Join (topology), an operation combining two top ...
subclauses to specify the rules for joining tables.
* The
WHERE
Where may refer to:
* Where?, one of the Five Ws in journalism
* where (command), a shell command
* Where (SQL), a database language clause
* Where.com
Where, Inc. was a location-based media company in North America. Their main products were ...
clause includes a comparison predicate, which restricts the rows returned by the query. The
WHERE
clause eliminates all rows from the result set where the comparison predicate does not evaluate to True.
* The
GROUP BY
clause projects rows having common values into a smaller set of rows.
GROUP BY
is often used in conjunction with SQL aggregation functions or to eliminate duplicate rows from a result set. The
WHERE
clause is applied before the
GROUP BY
clause.
* The
HAVING
clause includes a predicate used to filter rows resulting from the
GROUP BY
clause. Because it acts on the results of the
GROUP BY
clause, aggregation functions can be used in the
HAVING
clause predicate.
* The
ORDER BY
clause identifies which column
to use to sort the resulting data, and in which direction to sort them (ascending or descending). Without an
ORDER BY
clause, the order of rows returned by an SQL query is undefined.
* The
DISTINCT
keyword eliminates duplicate data.
The following example of a
SELECT
query returns a list of expensive books. The query retrieves all rows from the ''Book'' table in which the ''price'' column contains a value greater than 100.00. The result is sorted in ascending order by ''title''. The asterisk (*) in the ''select list'' indicates that all columns of the ''Book'' table should be included in the result set.
SELECT *
FROM Book
WHERE price > 100.00
ORDER BY title;
The example below demonstrates a query of multiple tables, grouping, and aggregation, by returning a list of books and the number of authors associated with each book.
SELECT Book.title AS Title,
count(*) AS Authors
FROM Book
JOIN Book_author
ON Book.isbn = Book_author.isbn
GROUP BY Book.title;
Example output might resemble the following:
Title Authors
---------------------- -------
SQL Examples and Guide 4
The Joy of SQL 1
An Introduction to SQL 2
Pitfalls of SQL 1
Under the precondition that ''isbn'' is the only common column name of the two tables and that a column named ''title'' only exists in the ''Book'' table, one could re-write the query above in the following form:
SELECT title,
count(*) AS Authors
FROM Book
NATURAL JOIN Book_author
GROUP BY title;
However, many vendors either do not support this approach, or require certain column-naming conventions for natural joins to work effectively.
SQL includes operators and functions for calculating values on stored values. SQL allows the use of expressions in the ''select list'' to project data, as in the following example, which returns a list of books that cost more than 100.00 with an additional ''sales_tax'' column containing a sales tax figure calculated at 6% of the ''price''.
SELECT isbn,
title,
price,
price * 0.06 AS sales_tax
FROM Book
WHERE price > 100.00
ORDER BY title;
Subqueries
Queries can be nested so that the results of one query can be used in another query via a
relational operator
In computer science, a relational operator is a programming language construct or operator that tests or defines some kind of relation between two entities. These include numerical equality (''e.g.'', ) and inequalities (''e.g.'', ).
In prog ...
or aggregation function. A nested query is also known as a ''subquery''. While joins and other table operations provide computationally superior (i.e. faster) alternatives in many cases, the use of subqueries introduces a hierarchy in execution that can be useful or necessary. In the following example, the aggregation function
AVG
receives as input the result of a subquery:
SELECT isbn,
title,
price
FROM Book
WHERE price < (SELECT AVG(price) FROM Book)
ORDER BY title;
A subquery can use values from the outer query, in which case it is known as a
correlated subquery In a SQL database query, a correlated subquery (also known as a synchronized subquery) is a subquery (a query nested inside another query) that uses values from the outer query. Because the subquery may be evaluated once for each row processed by ...
.
Since 1999 the SQL standard allows named subqueries called
common table expression
A hierarchical query is a type of SQL query that handles hierarchical model data. They are special cases of more general recursive fixpoint queries, which compute transitive closures.
In standard SQL:1999 hierarchical queries are implemented b ...
s (named and designed after the IBM DB2 version 2 implementation; Oracle calls these
subquery factoring). CTEs can also be
recursive
Recursion (adjective: ''recursive'') occurs when a thing is defined in terms of itself or of its type. Recursion is used in a variety of disciplines ranging from linguistics to logic. The most common application of recursion is in mathematics ...
by referring to themselves;
the resulting mechanism allows tree or graph traversals (when represented as relations), and more generally
fixpoint
A fixed point (sometimes shortened to fixpoint, also known as an invariant point) is a value that does not change under a given transformation. Specifically, in mathematics, a fixed point of a function is an element that is mapped to itself by the ...
computations.
Derived table
A derived table is the use of referencing an SQL subquery in a FROM clause. Essentially, the derived table is a subquery that can be selected from or joined to. Derived table functionality allows the user to reference the subquery as a table. The derived table also is referred to as an ''inline view'' or a ''select in from list''.
In the following example, the SQL statement involves a join from the initial Books table to the derived table "Sales". This derived table captures associated book sales information using the ISBN to join to the Books table. As a result, the derived table provides the result set with additional columns (the number of items sold and the company that sold the books):
SELECT b.isbn, b.title, b.price, sales.items_sold, sales.company_nm
FROM Book b
JOIN (SELECT SUM(Items_Sold) Items_Sold, Company_Nm, ISBN
FROM Book_Sales
GROUP BY Company_Nm, ISBN) sales
ON sales.isbn = b.isbn
Examples
Given a table T, the ''query'' will result in all the elements of all the rows of the table being shown.
With the same table, the query will result in the elements from the column C1 of all the rows of the table being shown. This is similar to a ''
projection
Projection, projections or projective may refer to:
Physics
* Projection (physics), the action/process of light, heat, or sound reflecting from a surface to another in a different direction
* The display of images by a projector
Optics, graphic ...
'' in
relational algebra
In database theory, relational algebra is a theory that uses algebraic structures with a well-founded semantics for modeling data, and defining queries on it. The theory was introduced by Edgar F. Codd.
The main application of relational algebra ...
, except that in the general case, the result may contain duplicate rows. This is also known as a Vertical Partition in some database terms, restricting query output to view only specified fields or columns.
With the same table, the query will result in all the elements of all the rows where the value of column C1 is '1' being shown in
relational algebra
In database theory, relational algebra is a theory that uses algebraic structures with a well-founded semantics for modeling data, and defining queries on it. The theory was introduced by Edgar F. Codd.
The main application of relational algebra ...
terms, a ''
selection
Selection may refer to:
Science
* Selection (biology), also called natural selection, selection in evolution
** Sex selection, in genetics
** Mate selection, in mating
** Sexual selection in humans, in human sexuality
** Human mating strategie ...
'' will be performed, because of the WHERE clause. This is also known as a Horizontal Partition, restricting rows output by a query according to specified conditions.
With more than one table, the result set will be every combination of rows. So if two tables are T1 and T2, will result in every combination of T1 rows with every T2 rows. E.g., if T1 has 3 rows and T2 has 5 rows, then 15 rows will result.
Although not in standard, most DBMS allows using a select clause without a table by pretending that an imaginary table with one row is used. This is mainly used to perform calculations where a table is not needed.
The SELECT clause specifies a list of properties (columns) by name, or the wildcard character (“*”) to mean “all properties”.
Limiting result rows
Often it is convenient to indicate a maximum number of rows that are returned. This can be used for testing or to prevent consuming excessive resources if the query returns more information than expected. The approach to do this often varies per vendor.
In
ISO
ISO is the most common abbreviation for the International Organization for Standardization.
ISO or Iso may also refer to: Business and finance
* Iso (supermarket), a chain of Danish supermarkets incorporated into the SuperBest chain in 2007
* Iso ...
SQL:2003, result sets may be limited by using
*
cursors, or
* by adding a
SQL window function
In SQL, a window function or analytic function is a function which uses values from one or multiple Row (database), rows to return a value for each row. (This contrasts with an aggregate function, which returns a single value for multiple rows.) W ...
to the SELECT-statement
ISO
SQL:2008 introduced the
FETCH FIRST
clause.
According to PostgreSQL v.9 documentation, an SQL window function "performs a calculation across a set of table rows that are somehow related to the current row", in a way similar to aggregate functions.
The name recalls signal processing
window functions
A window is an opening in a wall, door, roof, or vehicle that allows the exchange of light and may also allow the passage of sound and sometimes air. Modern windows are usually glazed or covered in some other transparent or translucent mater ...
. A window function call always contains an OVER clause.
ROW_NUMBER() window function
ROW_NUMBER() OVER
may be used for a ''simple table'' on the returned rows, e.g. to return no more than ten rows:
SELECT * FROM
( SELECT
ROW_NUMBER() OVER (ORDER BY sort_key ASC) AS row_number,
columns
FROM tablename
) AS foo
WHERE row_number <= 10
ROW_NUMBER can be
non-deterministic: if ''sort_key'' is not unique, each time you run the query it is possible to get different row numbers assigned to any rows where ''sort_key'' is the same. When ''sort_key'' is unique, each row will always get a unique row number.
RANK() window function
The
RANK() OVER
window function acts like ROW_NUMBER, but may return more or less than ''n'' rows in case of tie conditions, e.g. to return the top-10 youngest persons:
SELECT * FROM (
SELECT
RANK() OVER (ORDER BY age ASC) AS ranking,
person_id,
person_name,
age
FROM person
) AS foo
WHERE ranking <= 10
The above code could return more than ten rows, e.g. if there are two people of the same age, it could return eleven rows.
FETCH FIRST clause
Since ISO
SQL:2008 results limits can be specified as in the following example using the
FETCH FIRST
clause.
SELECT * FROM T
FETCH FIRST 10 ROWS ONLY
This clause currently is supported by CA DATACOM/DB 11, IBM DB2, SAP SQL Anywhere, PostgreSQL, EffiProz, H2, HSQLDB version 2.0, Oracle 12c and
Mimer SQL
Mimer SQL is an SQL-based relational database management system produced by the Swedish company ''Mimer Information Technology AB'' (Mimer AB), formerly known as ''Upright Database Technology AB''. It was originally developed as a research proje ...
.
Microsoft SQL Server 2008 and highe
supports FETCH FIRST
but it is considered part of the
ORDER BY
clause. The
ORDER BY
,
OFFSET
, and
FETCH FIRST
clauses are all required for this usage.
SELECT * FROM T
ORDER BY acolumn DESC OFFSET 0 ROWS FETCH FIRST 10 ROWS ONLY
Non-standard syntax
Some DBMSs offer non-standard syntax either instead of or in addition to SQL standard syntax. Below, variants of the ''simple limit'' query for different DBMSes are listed:
Rows Pagination
Rows Pagination is an approach used to limit and display only a part of the total data of a query in the database. Instead of showing hundreds or thousands of rows at the same time, the server is requested only one page (a limited set of rows, per example only 10 rows), and the user starts navigating by requesting the next page, and then the next one, and so on. It is very useful, specially in web systems, where there is no dedicated connection between the client and the server, so the client does not have to wait to read and display all the rows of the server.
Data in Pagination approach
*
= Number of rows in a page
*
= Number of the current page
*
= Number of the row - 1 where the page starts = (page_number-1) * rows
Simplest method (but very inefficient)
# Select all rows from the database
# Read all rows but send to display only when the row_number of the rows read is between
and
Select *
from
order by
Other simple method (a little more efficient than read all rows)
# Select all the rows from the beginning of the table to the last row to display (
)
# Read the
rows but send to display only when the row_number of the rows read is greater than
Method with positioning
# Select only
rows starting from the next row to display (
)
# Read and send to display all the rows read from the database
Method with filter (it is more sophisticated but necessary for very big dataset)
# Select only then
rows with filter:
## First Page: select only the first
rows, depending on the type of database
## Next Page: select only the first
rows, depending on the type of database, where the
is greater than
(the value of the
of the last row in the current page)
## Previous Page: sort the data in the reverse order, select only the first
rows, where the
is less than
(the value of the
of the first row in the current page), and sort the result in the correct order
# Read and send to display all the rows read from the database
Hierarchical query
Some databases provide
specialised syntax for
hierarchical data.
A window function in
SQL:2003 is an
aggregate function
In database management, an aggregate function or aggregation function is a function where the values of multiple rows are grouped together to form a single summary value.
Common aggregate functions include:
* Average (i.e., arithmetic mean)
* C ...
applied to a partition of the result set.
For example,
calculates the sum of the populations of all rows having the same ''city'' value as the current row.
Partitions are specified using the OVER clause which modifies the aggregate. Syntax:
The OVER clause can partition and order the result set. Ordering is used for order-relative functions such as row_number.
Query evaluation ANSI
The processing of a SELECT statement according to ANSI SQL would be the following:
[Inside Microsoft SQL Server 2005: T-SQL Querying by Itzik Ben-Gan, Lubor Kollar, and Dejan Sarka]
Window function support by RDBMS vendors
The implementation of window function features by vendors of relational databases and SQL engines differs wildly. Most databases support at least some flavour of window functions. However, when we take a closer look it becomes clear that most vendors only implement a subset of the standard. Let's take the powerful RANGE clause as an example. Only Oracle, DB2, Spark/Hive, and Google Big Query fully implement this feature. More recently, vendors have added new extensions to the standard, e.g. array aggregation functions. These are particularly useful in the context of running SQL against a distributed file system (Hadoop, Spark, Google BigQuery) where we have weaker data co-locality guarantees than on a distributed relational database (MPP). Rather than evenly distributing the data across all nodes, SQL engines running queries against a distributed filesystem can achieve data co-locality guarantees by nesting data and thus avoiding potentially expensive joins involving heavy shuffling across the network. User-defined aggregate functions that can be used in window functions are another extremely powerful feature.
Generating data in T-SQL
Method to generate data based on the union all
select 1 a, 1 b union all
select 1, 2 union all
select 1, 3 union all
select 2, 1 union all
select 5, 1
SQL Server 2008 supports the "row constructor" specified in the SQL3 ("SQL:1999") standard
select *
from (values (1, 1), (1, 2), (1, 3), (2, 1), (5, 1)) as x(a, b)
References
Sources
* Horizontal & Vertical Partitioning, Microsoft SQL Server 2000 Books Online.
External links
Windowed Tables and Window function in SQL Stefan Deßloch
{{SQL
SQL keywords
Articles with example SQL code