Regular Path Query
   HOME

TheInfoList



OR:

In
database In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and a ...
s and specifically in
graph database A graph database (GDB) is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A key concept of the system is the graph (or edge or relationship). The graph relates the dat ...
s, a regular path query or RPQ is a query asking for pairs of endpoints in the database that are connected by a path satisfying a certain
regular expression A regular expression (shortened as regex or regexp), sometimes referred to as rational expression, is a sequence of characters that specifies a match pattern in text. Usually such patterns are used by string-searching algorithms for "find" ...
. A similar feature exists in the
SPARQL SPARQL (pronounced ":wikt:sparkle, sparkle", a recursive acronym for SPARQL Protocol and RDF Query Language) is an RDF query language—that is, a Semantic Query, semantic query language for databases—able to retrieve and manipulate data sto ...
query language as "property paths".


Definition

A graph database consists of a directed
graph Graph may refer to: Mathematics *Graph (discrete mathematics), a structure made of vertices and edges **Graph theory, the study of such graphs and their properties *Graph (topology), a topological space resembling a graph in the sense of discret ...
whose edges carry a label. A regular path query is just a
regular expression A regular expression (shortened as regex or regexp), sometimes referred to as rational expression, is a sequence of characters that specifies a match pattern in text. Usually such patterns are used by string-searching algorithms for "find" ...
over the set of labels. For instance, in a graph database where vertices represent users and there is an edge label "parent" for edges from a parent to a child, the regular path query \text \text^* would select pairs of a node ''x'' and a descendant ''y'' of ''x'', with a path from ''x'' to ''y'' of "parent" edges having length 1 or more.


Semantics

The answers to RPQs can consist of ''endpoint pairs'', i.e., pairs of nodes ''x'' and ''y'' that are connected by some path satisfying the regular expression; or it can consist of the ''list of all paths'' satisfying the regular expression. However, this set of paths is generally infinite. To ensure that the number of results is not infinite, the semantics of RPQs is sometimes defined to return only the simple paths, i.e., the paths that do not go twice via the same vertex; or the
trails A trail, also known as a path or track, is an unpaved lane or a small paved road (though it can also be a route along a navigable waterways) generally not intended for usage by motorized vehicles, usually passing through a natural area. Ho ...
, i.e., the paths that do not go twice through the same edge.


Complexity

The evaluation of regular path queries (RPQ), in the sense of returning all endpoint pairs, can be performed in
polynomial time In theoretical computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations p ...
. To do this, for every endpoint pair, we can see the graph database as a
finite automaton A finite-state machine (FSM) or finite-state automaton (FSA, plural: ''automata''), finite automaton, or simply a state machine, is a mathematical model of computation. It is an abstract machine that can be in exactly one of a finite number o ...
, also represent the regular path query as a finite automaton, and check if a suitable path exists by checking that the
intersection In mathematics, the intersection of two or more objects is another object consisting of everything that is contained in all of the objects simultaneously. For example, in Euclidean geometry, when two lines in a plane are not parallel, their ...
of both languages is nonempty (i.e., solving the
emptiness problem In theoretical computer science and formal language theory, a formal language is empty if its set of valid sentences is the empty set. The emptiness problem is the question of determining whether a language is empty given some representation of it, ...
), for instance via the
product automaton Product may refer to: Business * Product (business), an item that can be offered to a market to satisfy the desire or need of a customer. * Product (project management), a deliverable or set of deliverables that contribute to a business solution ...
construction.


Other problems

Several classical problems about queries have been studied for regular path queries, such as
query containment In general, a query is a form of questioning, in a line of inquiry. Query may also refer to: Computing and technology * Query, a precise request for information retrieval made to a database, data structure or information system ** Query language ...
and
query rewriting Query rewriting is a typically automatic transformation that takes a set of database tables, views, and/or queries, usually indices, often gathered data and query statistics, and other metadata, and yields a set of different queries, which produc ...
.


Extensions

Database theory Database theory encapsulates a broad range of topics related to the study and research of the theoretical realm of databases and database management systems. Theoretical aspects of data management include, among other areas, the foundations of q ...
research has investigated more expressive variants of RPQs: * Two-way RPQs aka 2RPQs, which can also traverse edges in the reverse direction. More precisely, a 2RPQ is a regular expression that uses the labels of the graph together with labels corresponding to reverse edges. For instance, the RPQ \text^- \text selects pairs of nodes ''x'' and ''y'' with a path from ''x'' to ''y'' going first backward on a parent edge, then forward on a parent edge, i.e., ''x'' and ''y'' are siblings. * Conjunctive regular path queries aka CRPQ, which are conjunctive queries whose atoms are RPQs. Such queries make it possible to test for more complex patterns than just paths, however they are intractable to evaluate. * A further extension allowing both
disjunction In logic, disjunction (also known as logical disjunction, logical or, logical addition, or inclusive disjunction) is a logical connective typically notated as \lor and read aloud as "or". For instance, the English language sentence "it is ...
s (like
union of conjunctive queries In database theory, a conjunctive query is a restricted form of first-order queries using the logical conjunction operator. Many first-order queries can be written as conjunctive queries. In particular, a large part of queries issued on relational ...
) and two-way expressions are UC2RPQs.{{Cite book , last1=Figueira , first1=Diego , url=https://hal.science/hal-04311290 , title=Semantic Tree-Width and Path-Width of Conjunctive Regular Path Queries , last2=Morvan , first2=Rémi , date=November 2023


References

Graph databases Query languages Database theory