In
database management, an aggregate function or aggregation function is a
function where multiple values are processed together to form a single
summary statistic.

Common aggregate functions include:
*
Average
In colloquial, ordinary language, an average is a single number or value that best represents a set of data. The type of average taken as most typically representative of a list of numbers is the arithmetic mean the sum of the numbers divided by ...
(i.e.,
arithmetic mean
In mathematics and statistics, the arithmetic mean ( ), arithmetic average, or just the ''mean'' or ''average'' is the sum of a collection of numbers divided by the count of numbers in the collection. The collection is often a set of results fr ...
)
*
Count
Count (feminine: countess) is a historical title of nobility in certain European countries, varying in relative status, generally of middling rank in the hierarchy of nobility. Pine, L. G. ''Titles: How the King Became His Majesty''. New York: ...
*
Maximum
In mathematical analysis, the maximum and minimum of a function (mathematics), function are, respectively, the greatest and least value taken by the function. Known generically as extremum, they may be defined either within a given Interval (ma ...
*
Median
The median of a set of numbers is the value separating the higher half from the lower half of a Sample (statistics), data sample, a statistical population, population, or a probability distribution. For a data set, it may be thought of as the “ ...
*
Minimum
*
Mode
*
Range
*
Sum
Others include:
* Nanmean (mean ignoring NaN values, also known as "nil" or "null")
*
Stddev
Formally, an aggregate function takes as input a
set
Set, The Set, SET or SETS may refer to:
Science, technology, and mathematics Mathematics
*Set (mathematics), a collection of elements
*Category of sets, the category whose objects and morphisms are sets and total functions, respectively
Electro ...
, a
multiset (bag), or a
list
A list is a Set (mathematics), set of discrete items of information collected and set forth in some format for utility, entertainment, or other purposes. A list may be memorialized in any number of ways, including existing only in the mind of t ...
from some input domain and outputs an element of an output domain . The input and output domains may be the same, such as for
SUM
, or may be different, such as for
COUNT
.
Aggregate functions occur commonly in numerous
programming language
A programming language is a system of notation for writing computer programs.
Programming languages are described in terms of their Syntax (programming languages), syntax (form) and semantics (computer science), semantics (meaning), usually def ...
s, in
spreadsheet
A spreadsheet is a computer application for computation, organization, analysis and storage of data in tabular form. Spreadsheets were developed as computerized analogs of paper accounting worksheets. The program operates on data entered in c ...
s, and in
relational algebra
In database theory, relational algebra is a theory that uses algebraic structures for modeling data and defining queries on it with well founded semantics (computer science), semantics. The theory was introduced by Edgar F. Codd.
The main applica ...
.
The
listagg
function, as defined in the
SQL:2016 standard
aggregates data from multiple rows into a single concatenated string.
In the
entity relationship diagram
An entity is something that Existence, exists as itself. It does not need to be of material existence. In particular, abstractions and legal fictions are usually regarded as entities. In general, there is also no presumption that an entity is Lif ...
, aggregation is represented as seen in Figure 1 with a rectangle around the relationship and its entities to indicate that it is being treated as an aggregate entity.
Decomposable aggregate functions
Aggregate functions present a
bottleneck, because they potentially require having all input values at once. In
distributed computing
Distributed computing is a field of computer science that studies distributed systems, defined as computer systems whose inter-communicating components are located on different networked computers.
The components of a distributed system commu ...
, it is desirable to divide such computations into smaller pieces, and distribute the work, usually
computing in parallel, via a
divide and conquer algorithm.
Some aggregate functions can be computed by computing the aggregate for subsets, and then aggregating these aggregates; examples include
COUNT
,
MAX
,
MIN
, and
SUM
. In other cases the aggregate can be computed by computing auxiliary numbers for subsets, aggregating these auxiliary numbers, and finally computing the overall number at the end; examples include
AVERAGE
(tracking sum and count, dividing at the end) and
RANGE
(tracking max and min, subtracting at the end). In other cases the aggregate cannot be computed without analyzing the entire set at once, though in some cases approximations can be distributed; examples include
DISTINCT COUNT
(
Count-distinct problem),
MEDIAN
, and
MODE
.
Such functions are called decomposable aggregation functions or decomposable aggregate functions. The simplest may be referred to as self-decomposable aggregation functions, which are defined as those functions such that there is a ''merge operator'' such that
:
where is the union of multisets (see
monoid homomorphism).
For example,
SUM
:
:
, for a singleton;
:
, meaning that merge is simply addition.
COUNT
:
:
,
:
.
MAX
:
:
,
:
.
MIN
:
:
,
:
.
Note that self-decomposable aggregation functions can be combined (formally, taking the product) by applying them separately, so for instance one can compute both the
SUM
and
COUNT
at the same time, by tracking two numbers.
More generally, one can define a decomposable aggregation function as one that can be expressed as the composition of a final function and a self-decomposable aggregation function ,
. For example,
AVERAGE
=
SUM
/
COUNT
and
RANGE
=
MAX
−
MIN
.
In the
MapReduce framework, these steps are known as InitialReduce (value on individual record/singleton set), Combine (binary merge on two aggregations), and FinalReduce (final function on auxiliary values), and moving decomposable aggregation before the Shuffle phase is known as an InitialReduce step,
Decomposable aggregation functions are important in
online analytical processing
In computing, online analytical processing (OLAP) (), is an approach to quickly answer multi-dimensional analytical (MDA) queries. The term ''OLAP'' was created as a slight modification of the traditional database term online transaction proces ...
(OLAP), as they allow aggregation queries to be computed on the pre-computed results in the
OLAP cube
An OLAP cube is a multi-dimensional array of data. Online analytical processing (OLAP) is a computer-based technique of analyzing data to look for insights. The term ''cube'' here refers to a multi-dimensional dataset, which is also sometimes cal ...
, rather than on the base data. For example, it is easy to support
COUNT
,
MAX
,
MIN
, and
SUM
in OLAP, since these can be computed for each cell of the OLAP cube and then summarized ("rolled up"), but it is difficult to support
MEDIAN
, as that must be computed for every view separately.
Other decomposable aggregate functions
In order to calculate the average and standard deviation from aggregate data, it is necessary to have available for each group: the total of values (Σx
i = SUM(x)), the number of values (N=COUNT(x)) and the total of squares of the values (Σx
i2=SUM(x
2)) of each groups.
AVG
:
or
or, only if COUNT(X)=COUNT(Y)
SUM(x2)
:
The sum of squares of the values is important in order to calculate the Standard Deviation of groups
STDDEV
:
For a finite population with equal probabilities at all points, we have
[ Standard deviation#Identities and mathematical properties]
This means that the standard deviation is equal to the square root of the difference between the average of the squares of the values and the square of the average value.
See also
*
Cross-tabulation a.k.a.
Contingency table
*
Data drilling
*
Data mining
*
Data processing
Data processing is the collection and manipulation of digital data to produce meaningful information. Data processing is a form of ''information processing'', which is the modification (processing) of information in any manner detectable by an o ...
*
Extract, transform, load
*
Fold (higher-order function)
In functional programming, fold (also termed reduce, accumulate, aggregate, compress, or inject) refers to a family of higher-order functions that analyze a recursive data structure and through use of a given combining operation, recombine the re ...
*
Group by (SQL), SQL clause
*
OLAP cube
An OLAP cube is a multi-dimensional array of data. Online analytical processing (OLAP) is a computer-based technique of analyzing data to look for insights. The term ''cube'' here refers to a multi-dimensional dataset, which is also sometimes cal ...
*
Online analytical processing
In computing, online analytical processing (OLAP) (), is an approach to quickly answer multi-dimensional analytical (MDA) queries. The term ''OLAP'' was created as a slight modification of the traditional database term online transaction proces ...
*
Pivot table
*
Relational algebra
In database theory, relational algebra is a theory that uses algebraic structures for modeling data and defining queries on it with well founded semantics (computer science), semantics. The theory was introduced by Edgar F. Codd.
The main applica ...
*
Utility functions on indivisible goods#Aggregates of utility functions
*
XML for Analysis
*
AggregateIQ
*
MapReduce
References
Literature
*
Oracle Aggregate Functions: MAX, MIN, COUNT, SUM, AVG Examples
*
*
*
External links
Aggregate Functions (Transact-SQL)
{{DEFAULTSORT:Aggregate Function
Subroutines