Property Graphs
   HOME

TheInfoList



OR:

A property graph, labeled property graph, or attributed graph is a
data model A data model is an abstract model that organizes elements of data and standardizes how they relate to one another and to the properties of real-world entities. For instance, a data model may specify that the data element representing a car be ...
of various graph-oriented databases, where pairs of entities are associated by directed relationships, and entities and relationships can have properties. In
graph theory In mathematics and computer science, graph theory is the study of ''graph (discrete mathematics), graphs'', which are mathematical structures used to model pairwise relations between objects. A graph in this context is made up of ''Vertex (graph ...
terms, a property graph is a
directed Direct may refer to: Mathematics * Directed set, in order theory * Direct limit of (pre), sheaves * Direct sum of modules, a construction in abstract algebra which combines several vector spaces Computing * Direct access (disambiguation), a ...
multigraph In mathematics, and more specifically in graph theory, a multigraph is a graph which is permitted to have multiple edges (also called ''parallel edges''), that is, edges that have the same end nodes. Thus two vertices may be connected by mor ...
, whose vertices represent entities and arcs represent relationships. Each arc has an identifier, a source node and a target node, and may have properties. Properties are key-value pairs where keys are character strings and values are numbers or character strings. They are analogous to
attributes Attribute may refer to: * Attribute (philosophy), a characteristic of an object * Attribute (research), a quality of an object * Grammatical modifier In linguistics, a modifier is an optional element in phrase structure or clause structure whic ...
in entity-attribute-value and
object-oriented Object-oriented programming (OOP) is a programming paradigm based on the concept of '' objects''. Objects can contain data (called fields, attributes or properties) and have actions they can perform (called procedures or methods and impleme ...
modeling. By contrast, in RDF graphs, "properties" is the term for the arcs. This is why a clearer name is ''attributed graphs,'' or graphs'' with'' properties. This data model emerged in the early 2000s.


Formal definition

Building upon widely adopted definitions, a property graph/attributed graph can be defined by a 7-tuple (N, A, K, V, α, \kappa , π), where * N is the set of nodes/ vertices of the graph * A is the set of arcs (directed edges) of the graph * K is a set of keys, taken from a countable set, defining the nature of attributes/properties * V is a set of values, to be associated with these keys in order to define full-fledged attributes * \alpha\colon A \to N\times N is a ''
total function In mathematics, a partial function from a set to a set is a function from a subset of (possibly the whole itself) to . The subset , that is, the '' domain'' of viewed as a function, is called the domain of definition or natural domain o ...
'', defining the multigraph proper. For a ∈ A, u∈ N, v ∈ N, α (a) = (u, v) means that a is an arc of the graph having node u for origin and node v for target * \kappa is a ''
binary relation In mathematics, a binary relation associates some elements of one Set (mathematics), set called the ''domain'' with some elements of another set called the ''codomain''. Precisely, a binary relation over sets X and Y is a set of ordered pairs ...
'' over (A∪N) and K (formally defined as a subset of the cartesian product (A∪N)×K ), associating zero, one or several keys to each arc and node of the graph * \pi\colon \kappa \to V is a ''
partial function In mathematics, a partial function from a set to a set is a function from a subset of (possibly the whole itself) to . The subset , that is, the '' domain'' of viewed as a function, is called the domain of definition or natural domain ...
'', providing values for the properties of the nodes and the arcs which include them. For u ∈ N, a ∈ A and k ∈ K, π (u, k) (respectively π (a, k)) is the value associated with the property key k for the node u, (respectively the arc a), if the corresponding attribute property is defined there. A complementary construct, used in several implementations of property graphs with commercial graph databases, is that of '
labels
', which can be associated both with nodes and arcs of the graph. Labels have a practical rather than theoretical justification, as they were originally intended for users of Entity-Relationship models and
relational database A relational database (RDB) is a database based on the relational model of data, as proposed by E. F. Codd in 1970. A Relational Database Management System (RDBMS) is a type of database management system that stores data in a structured for ...
s, to facilitate the import of their legacy data sets into
graph database A graph database (GDB) is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A key concept of the system is the graph (or edge or relationship). The graph relates the dat ...
s :. labels make it possible to associate the same identifier (that of the relational table, or of the ER entity) to all graph nodes which would correspond to the different rows of this relational table, or to instances of the same generic entity / class. With the proposed definition, these labels could in fact be viewed as attributes defined only by a key, without an associated value (this is why \kappa is defined separately as a binary relation, and π as a partial function). The basic definition thus becomes much clearer, simpler, and satisfies a principle of parsimony. Alternatively, and more consistently, labels can be defined through type graphs, as special types associated with nodes and arcs.


Relations with other models


Graph theory and classical graph algorithms

Attributed graphs are especially useful and relevant in that they are an "umbrella" hypernymic concept ( i.e. a generalization) for several key graph-theoretic models, which have long been widely used in classical
graph algorithms An algorithm is fundamentally a set of rules or defined procedures that is typically designed and used to solve a specific problem or a broad set of problems. Broadly, algorithms define process(es), sets of rules, or methodologies that are to be f ...
* Labeled graphs associate labels to each vertex and/or edge of a graph. Matched with attributed graphs, these labels correspond to ''attributes comprising only a key'', taken from a
countable In mathematics, a Set (mathematics), set is countable if either it is finite set, finite or it can be made in one to one correspondence with the set of natural numbers. Equivalently, a set is ''countable'' if there exists an injective function fro ...
set (typically a character string, or an integer) ** Colored graphs, as used in classical
graph coloring In graph theory, graph coloring is a methodic assignment of labels traditionally called "colors" to elements of a Graph (discrete mathematics), graph. The assignment is subject to certain constraints, such as that no two adjacent elements have th ...
problems, are special cases of labeled graphs, whose labels are defined on a ''finite'' set of keys, matched to colors. * Weighted graphs associate a ''numerical value'' to arcs/edges, and, when relevant, to the vertices of a directed or undirected graph. These weights correspond to the values of a ''set of attributes with the same key''. For example, for a model of a road network, where each segment has a length and a capacity (number of vehicles per unit time) can be represented by an edge with two weights. **
Flow network In graph theory, a flow network (also known as a transportation network) is a directed graph where each edge has a capacity and each edge receives a flow. The amount of flow on an edge cannot exceed the capacity of the edge. Often in operations re ...
s are weighted graphs whose weights are interpreted as a ''capacities''. They are used in all kinds of very classical models of transport networks, used e.g. with
maximum flow In optimization theory, maximum flow problems involve finding a feasible flow through a flow network that obtains the maximum possible flow rate. The maximum flow problem can be seen as a special case of more complex network flow problems, such ...
algorithms. **
Shortest path problem In graph theory, the shortest path problem is the problem of finding a path between two vertices (or nodes) in a graph such that the sum of the weights of its constituent edges is minimized. The problem of finding the shortest path between t ...
s, as solved by very classical algorithms (like
Dijkstra's algorithm Dijkstra's algorithm ( ) is an algorithm for finding the shortest paths between nodes in a weighted graph, which may represent, for example, a road network. It was conceived by computer scientist Edsger W. Dijkstra in 1956 and published three ...
), operate on weighted graphs for which the weights correspond to ''distances'', real or virtual.


Knowledge graphs and RDF graphs

Knowledge graph In knowledge representation and reasoning, a knowledge graph is a knowledge base that uses a Graph (discrete mathematics), graph-structured data model or topology to represent and operate on data. Knowledge graphs are often used to store interl ...
s, usually represented in RDF, are hybrid labeled graphs, whose node labels correspond to instance identifiers (
IRI IRI or I.R.I. refers to: Businesses and organizations * Iringa Airport, an airport in Tanzania serving Iringa and the surrounding Iringa Region by IATA airport code * India Rejuvenation Initiative, an Indian anti-corruption organization form ...
)s or literals, and edge labels identify types (not instances) of predicates. They have now acquired a visibility which tends to obscure the longer-established use of graphs as ''direct'' model for systems of all kinds.Privat, Gilles; Abbas, Abdullah
“Cyber-Physical Graphs” vs. RDF graphs
''W3C Workshop on Web Standardization for Graph Data'', Berlin, March 2019
They are less versatile and expressive than attributed graphs. Кnowledge graphs capture weakly structured information'' about'' a physical system. They mix structural relationships with attached properties, and category information with instances, drowning out the structure. By contrast, graphs whose connections capture the'' structure ''of a physical system can be called cyber-physical. Also, RDF graphs can only
express Express, The Expresss or EXPRESS may refer to: Arts, entertainment and media Film * ''Express: Aisle to Glory'', a 1998 comedy short film featuring Kal Penn * ''The Express: The Ernie Davis Story'', a 2008 film starring Dennis Quaid * The Expre ...
first order logic First-order logic, also called predicate logic, predicate calculus, or quantificational logic, is a collection of formal systems used in mathematics, philosophy, linguistics, and computer science. First-order logic uses quantified variables over ...
, while attributed graphs can express
higher order logic In mathematics and logic, a higher-order logic (abbreviated HOL) is a form of logic that is distinguished from first-order logic by additional quantifiers and, sometimes, stronger semantics. Higher-order logics with their standard semantics are mo ...
. Represening relationship properties in, RDF requires a cumbersome reification process.


Standardization


NGSI-LD

The NGSI-LD data model specified by
ETSI The European Telecommunications Standards Institute (ETSI) is an independent, not-for-profit, standardization organization operating in the field of Information and communications technology, information and communications. ETSI supports the de ...
has been the first attempt to standardize property graphs under a
de jure In law and government, ''de jure'' (; ; ) describes practices that are officially recognized by laws or other formal norms, regardless of whether the practice exists in reality. The phrase is often used in contrast with '' de facto'' ('from fa ...
standards body. Compared to the basic model defined here, the NGSI-LD meta-model adds a formal definition of basic categories (entity, relation, property) on the basis of
semantic web The Semantic Web, sometimes known as Web 3.0, is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable. To enable the encoding o ...
standards (
OWL Owls are birds from the order Strigiformes (), which includes over 200 species of mostly solitary and nocturnal birds of prey typified by an upright stance, a large, broad head, binocular vision, binaural hearing, sharp talons, and feathers a ...
,
RDFS RDF Schema (Resource Description Framework Schema, variously abbreviated as RDFS, , RDF-S, or RDF/S) is a set of classes with certain properties using the RDF extensible knowledge representation data model, providing basic elements for the descr ...
, RDF), which makes it possible to convert all data represented in NGSI-LD into RDF datasets, through
JSON-LD JSON-LD (JavaScript Object Notation for Linked Data) is a method of encoding linked data using JSON. One goal for JSON-LD was to require as little effort as possible from developers to transform their existing JSON to JSON-LD. JSON-LD allows data ...
serialization. NGSI-LD entities, relations and properties are thus defined by reference to types which can themselves be defined by reference to
ontologies In information science, an ontology encompasses a representation, formal naming, and definitions of the categories, properties, and relations between the concepts, data, or entities that pertain to one, many, or all domains of discourse. More ...
,
thesauri A thesaurus (: thesauri or thesauruses), sometimes called a synonym dictionary or dictionary of synonyms, is a reference work which arranges words by their meanings (or in simpler terms, a book where one can find different words with similar me ...
,
taxonomies image:Hierarchical clustering diagram.png, 280px, Generalized scheme of taxonomy Taxonomy is a practice and science concerned with classification or categorization. Typically, there are two parts to it: the development of an underlying scheme o ...
or microdata vocabularies, for the purpose of ensuring the
semantic interoperability Semantic interoperability is the ability of computer systems to exchange data with unambiguous, shared meaning. Semantic interoperability is a requirement to enable machine computable Logic in computer science, logic, inferencing, knowledge discove ...
of the corresponding information.


GQL

The ISO/IEC JTC1/SC32/WG3 group of
ISO The International Organization for Standardization (ISO ; ; ) is an independent, non-governmental, international standard development organization composed of representatives from the national standards organizations of member countries. Me ...
, which established the
SQL Structured Query Language (SQL) (pronounced ''S-Q-L''; or alternatively as "sequel") is a domain-specific language used to manage data, especially in a relational database management system (RDBMS). It is particularly useful in handling s ...
standard, is in the process of specifying a new
query language A query language, also known as data query language or database query language (DQL), is a computer language used to make queries in databases and information systems. In database systems, query languages rely on strict theory to retrieve informa ...
suitable for graph-oriented databases, called GQL (Graph Query Language). This standard will include the specification of a property graph data model, which should be along the lines of the basic model described here, possibly adding notions of labels, types, and schemas .


Type graphs and schemas

Graph-oriented databases are, compared to
relational database A relational database (RDB) is a database based on the relational model of data, as proposed by E. F. Codd in 1970. A Relational Database Management System (RDBMS) is a type of database management system that stores data in a structured for ...
s, touted for ''not'' requiring the prior definition of a
schema Schema may refer to: Science and technology * SCHEMA (bioinformatics), an algorithm used in protein engineering * Schema (genetic algorithms), a set of programs or bit strings that have some genotypic similarity * Schema.org, a web markup vocab ...
to start populating the base. This is desirable and suitable for environments and applications where one operates under an
open world assumption The closed-world assumption (CWA), in a formal system of logic used for knowledge representation, is the presumption that a statement that is true is also known to be true. Therefore, conversely, what is not currently known to be true, is false. T ...
, such as the description of
complex systems A complex system is a system composed of many components that may interact with one another. Examples of complex systems are Earth's global climate, organisms, the human brain, infrastructure such as power grid, transportation or communication s ...
and
systems of systems The term system of systems refers to a collection of task-oriented or dedicated systems that pool their resources and capabilities together to create a new, more complex system which offers more functionality and performance than simply the sum of ...
, characterized by bottom-up organization and evolution, not control of a single stakeholder. However, even in such environments, it may be needed to constrain the representation of specific subsets of the information entered into the database, in a way that may resemble a traditional database schema, while keeping the openness of the overall graph for addition of unforeseen data or configurations. For example, the description of a
smart city A smart city is an urban area that uses digital technology to collect data and operate services. Data is collected from citizens, devices, buildings, or cameras. Applications include traffic and transportation systems, power plants, utilities ...
falls under the open world assumption and will be described by the upper level of a graph database, without a schema. However, specific technical sub-systems of this city remain top-down closed-world systems managed by a single operator, who may impose a stronger structuring of information, as customarily represented by a schema. The notions of "type graphs" and schemas make it possible to meet this need, with types playing a role similar to that of ''labels'' in classical graph databases, but with the added possibility of specifying relations between these types and constraining them by keys and properties. The type graph is itself a property graph, linked by a relation of
graph homomorphism In the mathematics, mathematical field of graph theory, a graph homomorphism is a mapping between two graph (discrete mathematics), graphs that respects their structure. More concretely, it is a function between the vertex sets of two graphs tha ...
with the graphs of instances that use the types it defines, playing a role similar to that of a schema in a
data definition language In the context of SQL, data definition or data description language (DDL) is a syntax for creating and modifying database objects such as tables, indices, and users. DDL statements are similar to a computer programming language for defining d ...
. The
ontologies In information science, an ontology encompasses a representation, formal naming, and definitions of the categories, properties, and relations between the concepts, data, or entities that pertain to one, many, or all domains of discourse. More ...
,
thesauri A thesaurus (: thesauri or thesauruses), sometimes called a synonym dictionary or dictionary of synonyms, is a reference work which arranges words by their meanings (or in simpler terms, a book where one can find different words with similar me ...
or
taxonomies image:Hierarchical clustering diagram.png, 280px, Generalized scheme of taxonomy Taxonomy is a practice and science concerned with classification or categorization. Typically, there are two parts to it: the development of an underlying scheme o ...
used to reference NGSI-LD types are also defined by graphs, but these are RDF graphs rather than property graphs, and they typically have broader scopes than database schemas. The complementary use, possible with NGSI-LD types, of type graphs ''and'' referencing of external ontologies, makes it possible to enforce strong data structuration and consistency, while affording semantic grounding and
interoperability Interoperability is a characteristic of a product or system to work with other products or systems. While the term was initially defined for information technology or systems engineering services to allow for information exchange, a broader de ...
.


References

Graph databases Extensions and generalizations of graphs