In
computer science
Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includin ...
, a set is an
abstract data type that can store unique values, without any particular
order
Order, ORDER or Orders may refer to:
* Categorization, the process in which ideas and objects are recognized, differentiated, and understood
* Heterarchy, a system of organization wherein the elements have the potential to be ranked a number of d ...
. It is a computer implementation of the
mathematical
Mathematics is an area of knowledge that includes the topics of numbers, formulas and related structures, shapes and the spaces in which they are contained, and quantities and their changes. These topics are represented in modern mathematics ...
concept of a
finite set
In mathematics, particularly set theory, a finite set is a set that has a finite number of elements. Informally, a finite set is a set which one could in principle count and finish counting. For example,
:\
is a finite set with five elements. ...
. Unlike most other
collection types, rather than retrieving a specific element from a set, one typically tests a value for membership in a set.
Some set data structures are designed for static or frozen sets that do not change after they are constructed. Static sets allow only query operations on their elements — such as checking whether a given value is in the set, or enumerating the values in some arbitrary order. Other variants, called dynamic or mutable sets, allow also the insertion and deletion of elements from the set.
A
multiset
In mathematics, a multiset (or bag, or mset) is a modification of the concept of a set that, unlike a set, allows for multiple instances for each of its elements. The number of instances given for each element is called the multiplicity of that ...
is a special kind of set in which an element can appear multiple times in the set.
Type theory
In
type theory
In mathematics, logic, and computer science, a type theory is the formal system, formal presentation of a specific type system, and in general type theory is the academic study of type systems. Some type theories serve as alternatives to set theor ...
, sets are generally identified with their
indicator function
In mathematics, an indicator function or a characteristic function of a subset of a set is a function that maps elements of the subset to one, and all other elements to zero. That is, if is a subset of some set , one has \mathbf_(x)=1 if x ...
(characteristic function): accordingly, a set of values of type
may be denoted by
or
. (Subtypes and subsets may be modeled by
refinement types, and
quotient sets may be replaced by
setoids.) The characteristic function
of a set
is defined as:
:
In theory, many other abstract data structures can be viewed as set structures with additional operations and/or additional
axiom
An axiom, postulate, or assumption is a statement that is taken to be true, to serve as a premise or starting point for further reasoning and arguments. The word comes from the Ancient Greek word (), meaning 'that which is thought worthy o ...
s imposed on the standard operations. For example, an abstract
heap
Heap or HEAP may refer to:
Computing and mathematics
* Heap (data structure), a data structure commonly used to implement a priority queue
* Heap (mathematics), a generalization of a group
* Heap (programming) (or free store), an area of memory f ...
can be viewed as a set structure with a
min(''S'')
operation that returns the element of smallest value.
Operations
Core set-theoretical operations
One may define the operations of the
algebra of sets:
*
union(''S'',''T'')
: returns the
union of sets ''S'' and ''T''.
*
intersection(''S'',''T'')
: returns the
intersection of sets ''S'' and ''T''.
*
difference(''S'',''T'')
: returns the
difference of sets ''S'' and ''T''.
*
subset(''S'',''T'')
: a predicate that tests whether the set ''S'' is a
subset
In mathematics, set ''A'' is a subset of a set ''B'' if all elements of ''A'' are also elements of ''B''; ''B'' is then a superset of ''A''. It is possible for ''A'' and ''B'' to be equal; if they are unequal, then ''A'' is a proper subset o ...
of set ''T''.
Static sets
Typical operations that may be provided by a static set structure ''S'' are:
*
is_element_of(''x'',''S'')
: checks whether the value ''x'' is in the set ''S''.
*
is_empty(''S'')
: checks whether the set ''S'' is empty.
*
size(''S'')
or
cardinality
In mathematics, the cardinality of a set is a measure of the number of elements of the set. For example, the set A = \ contains 3 elements, and therefore A has a cardinality of 3. Beginning in the late 19th century, this concept was generalized ...
(''S'')
: returns the number of elements in ''S''.
*
iterate(''S'')
: returns a function that returns one more value of ''S'' at each call, in some arbitrary order.
*
enumerate(''S'')
: returns a list containing the elements of ''S'' in some arbitrary order.
*
build(''x''1,''x''2,…,''x''''n'',)
: creates a set structure with values ''x''
1,''x''
2,...,''x''
''n''.
*
create_from(''collection'')
: creates a new set structure containing all the elements of the given
collection or all the elements returned by the given
iterator.
Dynamic sets
Dynamic set structures typically add:
*
create()
: creates a new, initially empty set structure.
**
create_with_capacity(''n'')
: creates a new set structure, initially empty but capable of holding up to ''n'' elements.
*
add(''S'',''x'')
: adds the element ''x'' to ''S'', if it is not present already.
*
remove(''S'', ''x'')
: removes the element ''x'' from ''S'', if it is present.
*
capacity(''S'')
: returns the maximum number of values that ''S'' can hold.
Some set structures may allow only some of these operations. The cost of each operation will depend on the implementation, and possibly also on the particular values stored in the set, and the order in which they are inserted.
Additional operations
There are many other operations that can (in principle) be defined in terms of the above, such as:
*
pop(''S'')
: returns an arbitrary element of ''S'', deleting it from ''S''.
*
pick(''S'')
: returns an arbitrary element of ''S''. Functionally, the mutator
pop
can be interpreted as the pair of selectors
(pick, rest),
where
rest
returns the set consisting of all elements except for the arbitrary element. Can be interpreted in terms of
iterate
.
*
map(''F'',''S'')
: returns the set of distinct values resulting from applying function ''F'' to each element of ''S''.
*
filter(''P'',''S'')
: returns the subset containing all elements of ''S'' that satisfy a given
predicate ''P''.
*
fold
Fold, folding or foldable may refer to:
Arts, entertainment, and media
* ''Fold'' (album), the debut release by Australian rock band Epicure
*Fold (poker), in the game of poker, to discard one's hand and forfeit interest in the current pot
*Above ...
(''A''0,''F'',''S'')
: returns the value ''A''
, ''S'', after applying
''A''i+1 := ''F''(''Ai'', ''e'')
for each element ''e'' of ''S,'' for some binary operation ''F.'' ''F'' must be associative and commutative for this to be well-defined.
*
clear(''S'')
: delete all elements of ''S''.
*
equal(''S''1', ''S''2')
: checks whether the two given sets are equal (i.e. contain all and only the same elements).
*
hash(''S'')
: returns a
hash value for the static set ''S'' such that if
equal(''S''1, ''S''2)
then
hash(''S1'') = hash(''S2'')
Other operations can be defined for sets with elements of a special type:
*
sum(''S'')
: returns the sum of all elements of ''S'' for some definition of "sum". For example, over integers or reals, it may be defined as
fold(0, add, ''S'')
.
*
collapse(''S'')
: given a set of sets, return the union. For example,
collapse()
. May be considered a kind of
sum
.
*
flatten(''S'')
: given a set consisting of sets and atomic elements (elements that are not sets), returns a set whose elements are the atomic elements of the original top-level set or elements of the sets it contains. In other words, remove a level of nesting – like
collapse,
but allow atoms. This can be done a single time, or recursively flattening to obtain a set of only atomic elements. For example,
flatten()
.
*
nearest(''S'',''x'')
: returns the element of ''S'' that is closest in value to ''x'' (by some
metric).
*
min(''S'')
,
max(''S'')
: returns the minimum/maximum element of ''S''.
Implementations
Sets can be implemented using various
data structure
In computer science, a data structure is a data organization, management, and storage format that is usually chosen for Efficiency, efficient Data access, access to data. More precisely, a data structure is a collection of data values, the rel ...
s, which provide different time and space trade-offs for various operations. Some implementations are designed to improve the efficiency of very specialized operations, such as
nearest
or
union
. Implementations described as "general use" typically strive to optimize the
element_of
,
add
, and
delete
operations. A simple implementation is to use a
list
A ''list'' is any set of items in a row. List or lists may also refer to:
People
* List (surname)
Organizations
* List College, an undergraduate division of the Jewish Theological Seminary of America
* SC Germania List, German rugby uni ...
, ignoring the order of the elements and taking care to avoid repeated values. This is simple but inefficient, as operations like set membership or element deletion are ''O''(''n''), as they require scanning the entire list. Sets are often instead implemented using more efficient data structures, particularly various flavors of
trees,
tries, or
hash tables.
As sets can be interpreted as a kind of map (by the indicator function), sets are commonly implemented in the same way as (partial) maps (
associative arrays) – in this case in which the value of each key-value pair has the
unit type or a sentinel value (like 1) – namely, a
self-balancing binary search tree for sorted sets (which has O(log n) for most operations), or a
hash table for unsorted sets (which has O(1) average-case, but O(n) worst-case, for most operations). A sorted linear hash table may be used to provide deterministically ordered sets.
Further, in languages that support maps but not sets, sets can be implemented in terms of maps. For example, a common
programming idiom in
Perl
Perl is a family of two High-level programming language, high-level, General-purpose programming language, general-purpose, Interpreter (computing), interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it ...
that converts an array to a hash whose values are the sentinel value 1, for use as a set, is:
my %elements = map @elements;
Other popular methods include
arrays. In particular a subset of the integers 1..''n'' can be implemented efficiently as an ''n''-bit
bit array, which also support very efficient union and intersection operations. A
Bloom map
A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not – in o ...
implements a set probabilistically, using a very compact representation but risking a small chance of false positives on queries.
The Boolean set operations can be implemented in terms of more elementary operations (
pop
,
clear
, and
add
), but specialized algorithms may yield lower asymptotic time bounds. If sets are implemented as sorted lists, for example, the naive algorithm for
union(''S'',''T'')
will take time proportional to the length ''m'' of ''S'' times the length ''n'' of ''T''; whereas a variant of the
list merging algorithm will do the job in time proportional to ''m''+''n''. Moreover, there are specialized set data structures (such as the
union-find data structure) that are optimized for one or more of these operations, at the expense of others.
Language support
One of the earliest languages to support sets was
Pascal; many languages now include it, whether in the core language or in a
standard library.
* In
C++, the
Standard Template Library
The Standard Template Library (STL) is a software library originally designed by Alexander Stepanov for the C++ programming language that influenced many parts of the C++ Standard Library. It provides four components called ''algorithms'', ''co ...
(STL) provides the
set
template class, which is typically impleme