In
computer science
Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, a set is an
abstract data type
In computer science, an abstract data type (ADT) is a mathematical model for data types. An abstract data type is defined by its behavior (semantics) from the point of view of a ''user'', of the data, specifically in terms of possible values, pos ...
that can store unique values, without any particular
order. It is a computer implementation of the
mathematical
Mathematics is an area of knowledge that includes the topics of numbers, formulas and related structures, shapes and the spaces in which they are contained, and quantities and their changes. These topics are represented in modern mathematics ...
concept of a
finite set
In mathematics, particularly set theory, a finite set is a set that has a finite number of elements. Informally, a finite set is a set which one could in principle count and finish counting. For example,
:\
is a finite set with five elements. Th ...
. Unlike most other
collection
Collection or Collections may refer to:
* Cash collection, the function of an accounts receivable department
* Collection (church), money donated by the congregation during a church service
* Collection agency, agency to collect cash
* Collectio ...
types, rather than retrieving a specific element from a set, one typically tests a value for membership in a set.
Some set data structures are designed for static or frozen sets that do not change after they are constructed. Static sets allow only query operations on their elements — such as checking whether a given value is in the set, or enumerating the values in some arbitrary order. Other variants, called dynamic or mutable sets, allow also the insertion and deletion of elements from the set.
A
multiset
In mathematics, a multiset (or bag, or mset) is a modification of the concept of a set that, unlike a set, allows for multiple instances for each of its elements. The number of instances given for each element is called the multiplicity of that e ...
is a special kind of set in which an element can appear multiple times in the set.
Type theory
In
type theory
In mathematics, logic, and computer science, a type theory is the formal presentation of a specific type system, and in general type theory is the academic study of type systems. Some type theories serve as alternatives to set theory as a foundat ...
, sets are generally identified with their
indicator function
In mathematics, an indicator function or a characteristic function of a subset of a set is a function that maps elements of the subset to one, and all other elements to zero. That is, if is a subset of some set , one has \mathbf_(x)=1 if x\i ...
(characteristic function): accordingly, a set of values of type
may be denoted by
or
. (Subtypes and subsets may be modeled by
refinement type
In type theory, a refinement type is a type endowed with a predicate which is assumed to hold for any element of the refined type. Refinement types can express preconditions when used as function arguments or postconditions when used as return typ ...
s, and
quotient set
In mathematics, when the elements of some set S have a notion of equivalence (formalized as an equivalence relation), then one may naturally split the set S into equivalence classes. These equivalence classes are constructed so that elements a ...
s may be replaced by
setoid
In mathematics, a setoid (''X'', ~) is a set (or type) ''X'' equipped with an equivalence relation ~. A setoid may also be called E-set, Bishop set, or extensional set.
Setoids are studied especially in proof theory and in type-theoretic fou ...
s.) The characteristic function
of a set
is defined as:
:
In theory, many other abstract data structures can be viewed as set structures with additional operations and/or additional
axiom
An axiom, postulate, or assumption is a statement that is taken to be true, to serve as a premise or starting point for further reasoning and arguments. The word comes from the Ancient Greek word (), meaning 'that which is thought worthy or f ...
s imposed on the standard operations. For example, an abstract
heap can be viewed as a set structure with a
min(''S'')
operation that returns the element of smallest value.
Operations
Core set-theoretical operations
One may define the operations of the
algebra of sets
In mathematics, the algebra of sets, not to be confused with the mathematical structure of ''an'' algebra of sets, defines the properties and laws of sets, the set-theoretic operations of union, intersection, and complementation and the ...
:
*
union(''S'',''T'')
: returns the
union
Union commonly refers to:
* Trade union, an organization of workers
* Union (set theory), in mathematics, a fundamental operation on sets
Union may also refer to:
Arts and entertainment
Music
* Union (band), an American rock group
** ''Un ...
of sets ''S'' and ''T''.
*
intersection(''S'',''T'')
: returns the
intersection of sets ''S'' and ''T''.
*
difference(''S'',''T'')
: returns the
difference
Difference, The Difference, Differences or Differently may refer to:
Music
* ''Difference'' (album), by Dreamtale, 2005
* ''Differently'' (album), by Cassie Davis, 2009
** "Differently" (song), by Cassie Davis, 2009
* ''The Difference'' (al ...
of sets ''S'' and ''T''.
*
subset(''S'',''T'')
: a predicate that tests whether the set ''S'' is a
subset
In mathematics, Set (mathematics), set ''A'' is a subset of a set ''B'' if all Element (mathematics), elements of ''A'' are also elements of ''B''; ''B'' is then a superset of ''A''. It is possible for ''A'' and ''B'' to be equal; if they are ...
of set ''T''.
Static sets
Typical operations that may be provided by a static set structure ''S'' are:
*
is_element_of(''x'',''S'')
: checks whether the value ''x'' is in the set ''S''.
*
is_empty(''S'')
: checks whether the set ''S'' is empty.
*
size(''S'')
or
cardinality
In mathematics, the cardinality of a set is a measure of the number of elements of the set. For example, the set A = \ contains 3 elements, and therefore A has a cardinality of 3. Beginning in the late 19th century, this concept was generalized ...
(''S'')
: returns the number of elements in ''S''.
*
iterate
Iteration is the repetition of a process in order to generate a (possibly unbounded) sequence of outcomes. Each repetition of the process is a single iteration, and the outcome of each iteration is then the starting point of the next iteration. ...
(''S'')
: returns a function that returns one more value of ''S'' at each call, in some arbitrary order.
*
enumerate(''S'')
: returns a list containing the elements of ''S'' in some arbitrary order.
*
build(''x''1,''x''2,…,''x''''n'',)
: creates a set structure with values ''x''
1,''x''
2,...,''x''
''n''.
*
create_from(''collection'')
: creates a new set structure containing all the elements of the given
collection
Collection or Collections may refer to:
* Cash collection, the function of an accounts receivable department
* Collection (church), money donated by the congregation during a church service
* Collection agency, agency to collect cash
* Collectio ...
or all the elements returned by the given
iterator
In computer programming, an iterator is an object that enables a programmer to traverse a container, particularly lists. Various types of iterators are often provided via a container's interface. Though the interface and semantics of a given iterat ...
.
Dynamic sets
Dynamic set structures typically add:
*
create()
: creates a new, initially empty set structure.
**
create_with_capacity(''n'')
: creates a new set structure, initially empty but capable of holding up to ''n'' elements.
*
add(''S'',''x'')
: adds the element ''x'' to ''S'', if it is not present already.
*
remove(''S'', ''x'')
: removes the element ''x'' from ''S'', if it is present.
*
capacity(''S'')
: returns the maximum number of values that ''S'' can hold.
Some set structures may allow only some of these operations. The cost of each operation will depend on the implementation, and possibly also on the particular values stored in the set, and the order in which they are inserted.
Additional operations
There are many other operations that can (in principle) be defined in terms of the above, such as:
*
pop(''S'')
: returns an arbitrary element of ''S'', deleting it from ''S''.
*
pick(''S'')
: returns an arbitrary element of ''S''. Functionally, the mutator
pop
can be interpreted as the pair of selectors
(pick, rest),
where
rest
returns the set consisting of all elements except for the arbitrary element. Can be interpreted in terms of
iterate
.
*
map
A map is a symbolic depiction emphasizing relationships between elements of some space, such as objects, regions, or themes.
Many maps are static, fixed to paper or some other durable medium, while others are dynamic or interactive. Although ...
(''F'',''S'')
: returns the set of distinct values resulting from applying function ''F'' to each element of ''S''.
*
filter
Filter, filtering or filters may refer to:
Science and technology
Computing
* Filter (higher-order function), in functional programming
* Filter (software), a computer program to process a data stream
* Filter (video), a software component tha ...
(''P'',''S'')
: returns the subset containing all elements of ''S'' that satisfy a given
predicate
Predicate or predication may refer to:
* Predicate (grammar), in linguistics
* Predication (philosophy)
* several closely related uses in mathematics and formal logic:
**Predicate (mathematical logic)
**Propositional function
**Finitary relation, o ...
''P''.
*
fold(''A''0,''F'',''S'')
: returns the value ''A''
, ''S'', after applying
''A''i+1 := ''F''(''Ai'', ''e'')
for each element ''e'' of ''S,'' for some binary operation ''F.'' ''F'' must be associative and commutative for this to be well-defined.
*
clear(''S'')
: delete all elements of ''S''.
*
equal(''S''1', ''S''2')
: checks whether the two given sets are equal (i.e. contain all and only the same elements).
*
hash(''S'')
: returns a
hash value
A hash function is any function that can be used to map data of arbitrary size to fixed-size values. The values returned by a hash function are called ''hash values'', ''hash codes'', ''digests'', or simply ''hashes''. The values are usually u ...
for the static set ''S'' such that if
equal(''S''1, ''S''2)
then
hash(''S1'') = hash(''S2'')
Other operations can be defined for sets with elements of a special type:
*
sum(''S'')
: returns the sum of all elements of ''S'' for some definition of "sum". For example, over integers or reals, it may be defined as
fold(0, add, ''S'')
.
*
collapse(''S'')
: given a set of sets, return the union. For example,
collapse()
. May be considered a kind of
sum
.
*
flatten(''S'')
: given a set consisting of sets and atomic elements (elements that are not sets), returns a set whose elements are the atomic elements of the original top-level set or elements of the sets it contains. In other words, remove a level of nesting – like
collapse,
but allow atoms. This can be done a single time, or recursively flattening to obtain a set of only atomic elements. For example,
flatten()
.
*
nearest(''S'',''x'')
: returns the element of ''S'' that is closest in value to ''x'' (by some
metric
Metric or metrical may refer to:
* Metric system, an internationally adopted decimal system of measurement
* An adjective indicating relation to measurement in general, or a noun describing a specific type of measurement
Mathematics
In mathem ...
).
*
min(''S'')
,
max(''S'')
: returns the minimum/maximum element of ''S''.
Implementations
Sets can be implemented using various
data structure
In computer science, a data structure is a data organization, management, and storage format that is usually chosen for efficient access to data. More precisely, a data structure is a collection of data values, the relationships among them, a ...
s, which provide different time and space trade-offs for various operations. Some implementations are designed to improve the efficiency of very specialized operations, such as
nearest
or
union
. Implementations described as "general use" typically strive to optimize the
element_of
,
add
, and
delete
operations. A simple implementation is to use a
list
A ''list'' is any set of items in a row. List or lists may also refer to:
People
* List (surname)
Organizations
* List College, an undergraduate division of the Jewish Theological Seminary of America
* SC Germania List, German rugby union ...
, ignoring the order of the elements and taking care to avoid repeated values. This is simple but inefficient, as operations like set membership or element deletion are ''O''(''n''), as they require scanning the entire list. Sets are often instead implemented using more efficient data structures, particularly various flavors of
trees
In botany, a tree is a perennial plant with an elongated stem, or trunk, usually supporting branches and leaves. In some usages, the definition of a tree may be narrower, including only woody plants with secondary growth, plants that are u ...
,
trie
In computer science, a trie, also called digital tree or prefix tree, is a type of ''k''-ary search tree, a tree data structure used for locating specific keys from within a set. These keys are most often strings, with links between nodes def ...
s, or
hash tables
In computing, a hash table, also known as hash map, is a data structure that implements an associative array or dictionary. It is an abstract data type that maps keys to values. A hash table uses a hash function to compute an ''index'', als ...
.
As sets can be interpreted as a kind of map (by the indicator function), sets are commonly implemented in the same way as (partial) maps (
associative array
In computer science, an associative array, map, symbol table, or dictionary is an abstract data type that stores a collection of (key, value) pairs, such that each possible key appears at most once in the collection. In mathematical terms an ...
s) – in this case in which the value of each key-value pair has the
unit type
In the area of mathematical logic and computer science known as type theory, a unit type is a type that allows only one value (and thus can hold no information). The carrier (underlying set) associated with a unit type can be any singleton set. ...
or a sentinel value (like 1) – namely, a
self-balancing binary search tree
In computer science, a self-balancing binary search tree (BST) is any node-based binary search tree that automatically keeps its height (maximal number of levels below the root) small in the face of arbitrary item insertions and deletions.Donal ...
for sorted sets (which has O(log n) for most operations), or a
hash table
In computing, a hash table, also known as hash map, is a data structure that implements an associative array or dictionary. It is an abstract data type that maps keys to values. A hash table uses a hash function to compute an ''index'', als ...
for unsorted sets (which has O(1) average-case, but O(n) worst-case, for most operations). A sorted linear hash table may be used to provide deterministically ordered sets.
Further, in languages that support maps but not sets, sets can be implemented in terms of maps. For example, a common
programming idiom
In computer programming, a programming idiom or code idiom is a group of code fragments sharing an equivalent semantic role, which recurs frequently across software projects often expressing a special feature of a recurring construct in one or ...
in
Perl
Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it also referred to its redesigned "sister language", Perl 6, before the latter's name was offici ...
that converts an array to a hash whose values are the sentinel value 1, for use as a set, is:
my %elements = map @elements;
Other popular methods include
arrays
An array is a systematic arrangement of similar objects, usually in rows and columns.
Things called an array include:
{{TOC right
Music
* In twelve-tone and serial composition, the presentation of simultaneous twelve-tone sets such that the ...
. In particular a subset of the integers 1..''n'' can be implemented efficiently as an ''n''-bit
bit array
A bit array (also known as bitmask, bit map, bit set, bit string, or bit vector) is an array data structure that compactly stores bits. It can be used to implement a simple set data structure. A bit array is effective at exploiting bit-level ...
, which also support very efficient union and intersection operations. A
Bloom map implements a set probabilistically, using a very compact representation but risking a small chance of false positives on queries.
The Boolean set operations can be implemented in terms of more elementary operations (
pop
,
clear
, and
add
), but specialized algorithms may yield lower asymptotic time bounds. If sets are implemented as sorted lists, for example, the naive algorithm for
union(''S'',''T'')
will take time proportional to the length ''m'' of ''S'' times the length ''n'' of ''T''; whereas a variant of the
list merging algorithm will do the job in time proportional to ''m''+''n''. Moreover, there are specialized set data structures (such as the
union-find data structure
In computer science, a disjoint-set data structure, also called a union–find data structure or merge–find set, is a data structure that stores a collection of disjoint (non-overlapping) sets. Equivalently, it stores a partition of a se ...
) that are optimized for one or more of these operations, at the expense of others.
Language support
One of the earliest languages to support sets was
Pascal
Pascal, Pascal's or PASCAL may refer to:
People and fictional characters
* Pascal (given name), including a list of people with the name
* Pascal (surname), including a list of people and fictional characters with the name
** Blaise Pascal, Fren ...
; many languages now include it, whether in the core language or in a
standard library
In computer programming, a standard library is the library made available across implementations of a programming language. These libraries are conventionally described in programming language specifications; however, contents of a language's as ...
.
* In
C++
C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
, the
Standard Template Library
The Standard Template Library (STL) is a Library (computer science), software library originally designed by Alexander Stepanov for the C++ programming language that influenced many parts of the C++ Standard Library. It provides four components ...
(STL) provides the
set
Set, The Set, SET or SETS may refer to:
Science, technology, and mathematics Mathematics
*Set (mathematics), a collection of elements
*Category of sets, the category whose objects and morphisms are sets and total functions, respectively
Electro ...
template class, which is typically implemented using a binary search tree (e.g.
red–black tree);
SGI SGI may refer to:
Companies
*Saskatchewan Government Insurance
*Scientific Games International, a gambling company
*Silicon Graphics, Inc., a former manufacturer of high-performance computing products
*Silicon Graphics International, formerly Rac ...
's STL also provides the
hash_set
template class, which implements a set using a hash table.
C++11
C++11 is a version of the ISO/IEC 14882 standard for the C++ programming language. C++11 replaced the prior version of the C++ standard, called C++03, and was later replaced by C++14. The name follows the tradition of naming language versions by ...
has support for the
unordered_set
template class, which is implemented using a hash table. In sets, the elements themselves are the keys, in contrast to sequenced containers, where elements are accessed using their (relative or absolute) position. Set elements must have a strict weak ordering.
* The
Rust (programming language)
Rust is a multi-paradigm, general-purpose programming language. Rust emphasizes performance, type safety, and concurrency. Rust enforces memory safety—that is, that all references point to valid memory—without requiring the use of a gar ...
standard library provides the generic
HashSet
/code> and
/code> types.
* Java
Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
offers the interface
Interface or interfacing may refer to:
Academic journals
* ''Interface'' (journal), by the Electrochemical Society
* '' Interface, Journal of Applied Linguistics'', now merged with ''ITL International Journal of Applied Linguistics''
* '' Int ...
to support sets (with the class implementing it using a hash table), and the sub-interface to support sorted sets (with the class implementing it using a binary search tree).
* Apple
An apple is an edible fruit produced by an apple tree (''Malus domestica''). Apple fruit tree, trees are agriculture, cultivated worldwide and are the most widely grown species in the genus ''Malus''. The tree originated in Central Asia, wh ...
's Foundation framework (part of Cocoa
Cocoa may refer to:
Chocolate
* Chocolate
* ''Theobroma cacao'', the cocoa tree
* Cocoa bean, seed of ''Theobroma cacao''
* Chocolate liquor, or cocoa liquor, pure, liquid chocolate extracted from the cocoa bean, including both cocoa butter and ...
) provides the Objective-C
Objective-C is a general-purpose, object-oriented programming language that adds Smalltalk-style messaging to the C programming language. Originally developed by Brad Cox and Tom Love in the early 1980s, it was selected by NeXT for its NeXTS ...
classes NSSet
/code>, NSMutableSet
/code>, NSCountedSet
/code>,
/code>, and
/code>. The CoreFoundation APIs provide th
CFSet
an
CFMutableSet
types for use in C.
* Python
Python may refer to:
Snakes
* Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia
** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia
* Python (mythology), a mythical serpent
Computing
* Python (pro ...
has built-i
set
and frozenset
types
since 2.4, and since Python 3.0 and 2.7, supports non-empty set literals using a curly-bracket syntax, e.g.:
; empty sets must be created using set()
, because Python uses
to represent the empty dictionary.
* The .NET Framework
The .NET Framework (pronounced as "''dot net"'') is a proprietary software framework developed by Microsoft that runs primarily on Microsoft Windows. It was the predominant implementation of the Common Language Infrastructure (CLI) until bein ...
provides the generic HashSet
/code> and SortedSet
/code> classes that implement the generic ISet
/code> interface.
* Smalltalk
Smalltalk is an object-oriented, dynamically typed reflective programming language. It was designed and created in part for educational use, specifically for constructionist learning, at the Learning Research Group (LRG) of Xerox PARC by Alan Ka ...
's class library includes Set
and IdentitySet
, using equality and identity for inclusion test respectively. Many dialects provide variations for compressed storage (NumberSet
, CharacterSet
), for ordering (OrderedSet
, SortedSet
, etc.) or for weak reference
In computer programming, a weak reference is a reference that does not protect the referenced object from collection by a garbage collector, unlike a strong reference. An object referenced ''only'' by weak references – meaning "every chain of ref ...
s (WeakIdentitySet
).
* Ruby
A ruby is a pinkish red to blood-red colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called sa ...
's standard library includes a set
/code> module which contains Set
and SortedSet
classes that implement sets using hash tables, the latter allowing iteration in sorted order.
* OCaml
OCaml ( , formerly Objective Caml) is a general-purpose programming language, general-purpose, multi-paradigm programming language which extends the Caml dialect of ML (programming language), ML with object-oriented programming, object-oriented ...
's standard library contains a Set
module, which implements a functional set data structure using binary search trees.
* The GHC implementation of Haskell
Haskell () is a general-purpose, statically-typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research and industrial applications, Haskell has pioneered a number of programming lan ...
provides a Data.Set
/code> module, which implements immutable sets using binary search trees.
* The Tcl
TCL or Tcl or TCLs may refer to:
Business
* TCL Technology, a Chinese consumer electronics and appliance company
**TCL Electronics, a subsidiary of TCL Technology
* Texas Collegiate League, a collegiate baseball league
* Trade Centre Limited ...
Tcllib
TCL or Tcl or TCLs may refer to:
Business
* TCL Technology, a Chinese consumer electronics and appliance company
**TCL Electronics, a subsidiary of TCL Technology
* Texas Collegiate League, a collegiate baseball league
* Trade Centre Limited, ...
package provides a set module which implements a set data structure based upon TCL lists.
* The Swift
Swift or SWIFT most commonly refers to:
* SWIFT, an international organization facilitating transactions between banks
** SWIFT code
* Swift (programming language)
* Swift (bird), a family of birds
It may also refer to:
Organizations
* SWIFT, ...
standard library contains a Set
type, since Swift 1.2.
* JavaScript
JavaScript (), often abbreviated as JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS. As of 2022, 98% of Website, websites use JavaScript on the Client (computing), client side ...
introduced Set
/code> as a standard built-in object with the ECMAScript 2015 standard.
* Erlang's standard library has a sets
/code> module.
* Clojure
Clojure (, like ''closure'') is a dynamic and functional dialect of the Lisp programming language on the Java platform. Like other Lisp dialects, Clojure treats code as data and has a Lisp macro system. The current development process is comm ...
has literal syntax for hashed sets, and also implements sorted sets.
* LabVIEW
Laboratory Virtual Instrument Engineering Workbench (LabVIEW) is a system-design platform and development environment for a visual programming language from National Instruments.
The graphical language is named "G"; not to be confused with G-co ...
has native support for sets, from version 2019.
* Ada
Ada may refer to:
Places
Africa
* Ada Foah, a town in Ghana
* Ada (Ghana parliament constituency)
* Ada, Osun, a town in Nigeria
Asia
* Ada, Urmia, a village in West Azerbaijan Province, Iran
* Ada, Karaman, a village in Karaman Province, ...
provides the Ada.Containers.Hashed_Sets
/code> and
/code> packages.
As noted in the previous section, in languages which do not directly support sets but do support associative array
In computer science, an associative array, map, symbol table, or dictionary is an abstract data type that stores a collection of (key, value) pairs, such that each possible key appears at most once in the collection. In mathematical terms an ...
s, sets can be emulated using associative arrays, by using the elements as keys, and using a dummy value as the values, which are ignored.
Multiset
A generalization of the notion of a set is that of a multiset
In mathematics, a multiset (or bag, or mset) is a modification of the concept of a set that, unlike a set, allows for multiple instances for each of its elements. The number of instances given for each element is called the multiplicity of that e ...
or bag, which is similar to a set but allows repeated ("equal") values (duplicates). This is used in two distinct senses: either equal values are considered ''identical,'' and are simply counted, or equal values are considered ''equivalent,'' and are stored as distinct items. For example, given a list of people (by name) and ages (in years), one could construct a multiset of ages, which simply counts the number of people of a given age. Alternatively, one can construct a multiset of people, where two people are considered equivalent if their ages are the same (but may be different people and have different names), in which case each pair (name, age) must be stored, and selecting on a given age gives all the people of a given age.
Formally, it is possible for objects in computer science to be considered "equal" under some equivalence relation
In mathematics, an equivalence relation is a binary relation that is reflexive, symmetric and transitive. The equipollence relation between line segments in geometry is a common example of an equivalence relation.
Each equivalence relation ...
but still distinct under another relation. Some types of multiset implementations will store distinct equal objects as separate items in the data structure; while others will collapse it down to one version (the first one encountered) and keep a positive integer count of the multiplicity of the element.
As with sets, multisets can naturally be implemented using hash table or trees, which yield different performance characteristics.
The set of all bags over type T is given by the expression bag T. If by multiset one considers equal items identical and simply counts them, then a multiset can be interpreted as a function from the input domain to the non-negative integers (natural number
In mathematics, the natural numbers are those numbers used for counting (as in "there are ''six'' coins on the table") and ordering (as in "this is the ''third'' largest city in the country").
Numbers used for counting are called ''Cardinal n ...
s), generalizing the identification of a set with its indicator function. In some cases a multiset in this counting sense may be generalized to allow negative values, as in Python.
* C++'s Standard Template Library
The Standard Template Library (STL) is a Library (computer science), software library originally designed by Alexander Stepanov for the C++ programming language that influenced many parts of the C++ Standard Library. It provides four components ...
implements both sorted and unsorted multisets. It provides the multiset
In mathematics, a multiset (or bag, or mset) is a modification of the concept of a set that, unlike a set, allows for multiple instances for each of its elements. The number of instances given for each element is called the multiplicity of that e ...
class for the sorted multiset, as a kind of associative container, which implements this multiset using a self-balancing binary search tree
In computer science, a self-balancing binary search tree (BST) is any node-based binary search tree that automatically keeps its height (maximal number of levels below the root) small in the face of arbitrary item insertions and deletions.Donal ...
. It provides the unordered_multiset
class for the unsorted multiset, as a kind of unordered associative container, which implements this multiset using a hash table
In computing, a hash table, also known as hash map, is a data structure that implements an associative array or dictionary. It is an abstract data type that maps keys to values. A hash table uses a hash function to compute an ''index'', als ...
. The unsorted multiset is standard as of C++11
C++11 is a version of the ISO/IEC 14882 standard for the C++ programming language. C++11 replaced the prior version of the C++ standard, called C++03, and was later replaced by C++14. The name follows the tradition of naming language versions by ...
; previously SGI's STL provides the hash_multiset
class, which was copied and eventually standardized.
* For Java
Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
, third-party libraries provide multiset functionality:
** Apache Commons Collections provides the Bag
/code> and SortedBag
interfaces, with implementing classes like HashBag
and TreeBag
.
** Google Guava
Google Guava is an open-source set of common libraries for Java, mainly developed by Google engineers.
Overview
Google Guava can be roughly divided into three components: basic utilities to reduce manual labor to implement common methods and be ...
provides the Multiset
/code> interface, with implementing classes like
/code> and
/code>.
* Apple provides the NSCountedSet
/code> class as part of Cocoa
Cocoa may refer to:
Chocolate
* Chocolate
* ''Theobroma cacao'', the cocoa tree
* Cocoa bean, seed of ''Theobroma cacao''
* Chocolate liquor, or cocoa liquor, pure, liquid chocolate extracted from the cocoa bean, including both cocoa butter and ...
, and the CFBag
/code> and CFMutableBag
/code> types as part of CoreFoundation.
* Python's standard library includes collections.Counter
/code>, which is similar to a multiset.
* Smalltalk
Smalltalk is an object-oriented, dynamically typed reflective programming language. It was designed and created in part for educational use, specifically for constructionist learning, at the Learning Research Group (LRG) of Xerox PARC by Alan Ka ...
includes the Bag
class, which can be instantiated to use either identity or equality as predicate for inclusion test.
Where a multiset data structure is not available, a workaround is to use a regular set, but override the equality predicate of its items to always return "not equal" on distinct objects (however, such will still not be able to store multiple occurrences of the same object) or use an associative array
In computer science, an associative array, map, symbol table, or dictionary is an abstract data type that stores a collection of (key, value) pairs, such that each possible key appears at most once in the collection. In mathematical terms an ...
mapping the values to their integer multiplicities (this will not be able to distinguish between equal elements at all).
Typical operations on bags:
* contains(''B'', ''x'')
: checks whether the element ''x'' is present (at least once) in the bag ''B''
* is_sub_bag(''B''1, ''B''2)
: checks whether each element in the bag ''B''1 occurs in ''B''1 no more often than it occurs in the bag ''B''2; sometimes denoted as ''B''1 ⊑ ''B''2.
* count(''B'', ''x'')
: returns the number of times that the element ''x'' occurs in the bag ''B''; sometimes denoted as ''B'' # ''x''.
* scaled_by(''B'', ''n'')
: given a natural number
In mathematics, the natural numbers are those numbers used for counting (as in "there are ''six'' coins on the table") and ordering (as in "this is the ''third'' largest city in the country").
Numbers used for counting are called ''Cardinal n ...
''n'', returns a bag which contains the same elements as the bag ''B'', except that every element that occurs ''m'' times in ''B'' occurs ''n'' * ''m'' times in the resulting bag; sometimes denoted as ''n'' ⊗ ''B''.
* union(''B''1, ''B''2)
: returns a bag containing just those values that occur in either the bag ''B''1 or the bag ''B''2, except that the number of times a value ''x'' occurs in the resulting bag is equal to (''B''1 # x) + (''B''2 # x); sometimes denoted as ''B''1 ⊎ ''B''2.
Multisets in SQL
In relational databases
A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relatio ...
, a table can be a (mathematical) set or a multiset, depending on the presence of unicity constraints on some columns (which turns it into a candidate key A candidate key, or simply a key, of a relational database is a minimal superkey. In other words, it is any set of columns that have a unique combination of values in each row (which makes it a superkey), with the additional constraint that removi ...
).
SQL allows the selection of rows from a relational table: this operation will in general yield a multiset, unless the keyword DISTINCT
is used to force the rows to be all different, or the selection includes the primary (or a candidate) key.
In ANSI SQL the MULTISET
keyword can be used to transform a subquery into a collection expression:
SELECT expression1, expression2... FROM table_name...
is a general select that can be used as '' subquery expression'' of another more general query, while
MULTISET(SELECT expression1, expression2... FROM table_name...)
transforms the subquery into a '' collection expression'' that can be used in another query, or in assignment to a column of appropriate collection type.
See also
*Bloom filter
A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not – in ...
*Disjoint set
In mathematics, two sets are said to be disjoint sets if they have no element in common. Equivalently, two disjoint sets are sets whose intersection is the empty set.. For example, and are ''disjoint sets,'' while and are not disjoint. A c ...
*Set (mathematics)
A set is the mathematical model for a collection of different things; a set contains '' elements'' or ''members'', which can be mathematical objects of any kind: numbers, symbols, points in space, lines, other geometrical shapes, variables, or ...
Notes
References
{{Data structures
Data types
Composite data types
Abstract data types