In
computer science
Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
, an e-graph is a
data structure
In computer science, a data structure is a data organization and storage format that is usually chosen for Efficiency, efficient Data access, access to data. More precisely, a data structure is a collection of data values, the relationships amo ...
that stores an
equivalence relation
In mathematics, an equivalence relation is a binary relation that is reflexive, symmetric, and transitive. The equipollence relation between line segments in geometry is a common example of an equivalence relation. A simpler example is equ ...
over
terms of some language.
Definition and operations
Let
be a set of
uninterpreted functions, where
is the subset of
consisting of functions of arity
. Let
be a countable set of opaque identifiers that may be compared for equality, called e-class IDs. The application of
to e-class IDs
is denoted
and called an e-node.
The e-graph then represents equivalence classes of e-nodes, using the following data structures:
* A
union-find structure
representing equivalence classes of e-class IDs, with the usual operations
,
and
. An e-class ID
is canonical if
; an e-node
is canonical if each
is canonical (
in
).
* An association of e-class IDs with sets of e-nodes, called e-classes. This consists of
** a
hashcons (i.e. a mapping) from canonical e-nodes to e-class IDs, and
** an e-class map
that maps e-class IDs to e-classes, such that
maps equivalent IDs to the same set of e-nodes:
Invariants
In addition to the above structure, a valid e-graph conforms to several
data structure invariants. Two e-nodes are equivalent if they are in the same e-class. The congruence invariant states that an e-graph must ensure that equivalence is closed under congruence, where two e-nodes
are congruent when
. The hashcons invariant states that the hashcons maps canonical e-nodes to their e-class ID.
Operations
E-graphs expose wrappers around the
,
, and
operations from the union-find that preserve the e-graph invariants. The last operation, e-matching, is described below.
Equivalent formulations
An e-graph can also be formulated as a
bipartite graph
In the mathematics, mathematical field of graph theory, a bipartite graph (or bigraph) is a Graph (discrete mathematics), graph whose vertex (graph theory), vertices can be divided into two disjoint sets, disjoint and Independent set (graph theo ...
where
*
is the set of e-class IDs (as above),
*
is the set of e-nodes, and
*
is a set of directed edges.
There is a directed edge from each e-class to each of its members, and from each e-node to each of its children.
E-matching
Let
be a set of variables and let
be the smallest set that includes the 0-arity function symbols (also called constants), includes the variables, and is closed under application of the function symbols. In other words,
is the smallest set such that
,
, and when
and
, then
. A term containing variables is called a pattern, a term without variables is called ground.
An e-graph
represents a ground term
if one of its e-classes represents
. An e-class
represents
if some e-node
does. An e-node
represents a term
if
and each e-class