Apache TinkerPop
   HOME

TheInfoList



OR:

Gremlin is a
graph traversal In computer science, graph traversal (also known as graph search) refers to the process of visiting (checking and/or updating) each vertex in a graph. Such traversals are classified by the order in which the vertices are visited. Tree traversa ...
language and
virtual machine In computing, a virtual machine (VM) is the virtualization or emulator, emulation of a computer system. Virtual machines are based on computer architectures and provide the functionality of a physical computer. Their implementations may involve ...
developed by Apache TinkerPop of the
Apache Software Foundation The Apache Software Foundation ( ; ASF) is an American nonprofit corporation (classified as a 501(c)(3) organization in the United States) to support a number of open-source software projects. The ASF was formed from a group of developers of the ...
. Gremlin works for both
OLTP Online transaction processing (OLTP) is a type of database system used in transaction-oriented applications, such as many operational systems. "Online" refers to the fact that such systems are expected to respond to user requests and process them i ...
-based graph databases as well as
OLAP In computing, online analytical processing (OLAP) (), is an approach to quickly answer multi-dimensional analytical (MDA) queries. The term ''OLAP'' was created as a slight modification of the traditional database term online transaction processi ...
-based graph processors. Gremlin's
automata An automaton (; : automata or automatons) is a relatively self-operating machine, or control mechanism designed to automatically follow a sequence of operations, or respond to predetermined instructions. Some automata, such as bellstrikers i ...
and
functional language In computer science, functional programming is a programming paradigm where programs are constructed by applying and composing functions. It is a declarative programming paradigm in which function definitions are trees of expressions that map ...
foundation enable Gremlin to naturally support: imperative and declarative querying; host language agnosticism; user-defined domain specific languages; an extensible compiler/optimizer, single- and multi-machine execution models; hybrid depth- and breadth-first evaluation with
Turing completeness In computability theory, a system of data-manipulation rules (such as a model of computation, a computer's instruction set, a programming language, or a cellular automaton) is said to be Turing-complete or computationally universal if it can b ...
. As an explanatory analogy, Apache TinkerPop and Gremlin are to
graph databases A graph database (GDB) is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A key concept of the system is the graph (or edge or relationship). The graph relates the data ...
what the
JDBC Java Database Connectivity (JDBC) is an application programming interface (API) for the Java (programming language), Java programming language which defines how a client may access a database. It is a Java-based data access technology used for Java ...
and
SQL Structured Query Language (SQL) (pronounced ''S-Q-L''; or alternatively as "sequel") is a domain-specific language used to manage data, especially in a relational database management system (RDBMS). It is particularly useful in handling s ...
are to
relational databases A relational database (RDB) is a database based on the relational model of data, as proposed by E. F. Codd in 1970. A Relational Database Management System (RDBMS) is a type of database management system that stores data in a structured form ...
. Likewise, the Gremlin traversal machine is to graph computing as what the
Java virtual machine A Java virtual machine (JVM) is a virtual machine that enables a computer to run Java programs as well as programs written in other languages that are also compiled to Java bytecode. The JVM is detailed by a specification that formally descr ...
is to general purpose computing.


History

* 2009-10-30 the project is born, and immediately named "TinkerPop" * 2009-12-25 v0.1 is the first release * 2011-05-21 v1.0 is released * 2012-05-24 v2.0 is released * 2015-01-16 TinkerPop becomes an Apache Incubator project * 2015-07-09 v3.0.0-incubating is released * 2016-05-23 Apache TinkerPop becomes a top-level project * 2016-07-18 v3.1.3 and v3.2.1 are first releases as Apache TinkerPop * 2017-12-17 v3.3.1 is released * 2018-05-08 v3.3.3 is released * 2019-08-05 v3.4.3 is released * 2020-02-20 v3.4.6 is released


Vendor integration

Gremlin is an Apache2-licensed graph traversal language that can be used by graph system vendors. There are typically two types of graph system vendors: OLTP
graph databases A graph database (GDB) is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A key concept of the system is the graph (or edge or relationship). The graph relates the data ...
and OLAP graph processors. The table below outlines those graph vendors that support Gremlin.


Traversal examples

The following examples of Gremlin queries and responses in a Gremlin-Groovy environment are relative to a graph representation of th
MovieLens
dataset. The dataset includes users who rate movies. Users each have one occupation, and each movie has one or more categories associated with it. The MovieLens graph schema is detailed below. user--rated tars:0-5->movie user--occupation-->occupation movie--category-->category


Simple traversals

gremlin> g.V().label().groupCount()


> ccupation:21, movie:3883, category:18, user:6040 gremlin> g.V().hasLabel('movie').values('year').min()


>1919 gremlin> g.V().has('movie','name','Die Hard').inE('rated').values('stars').mean()


>4.121848739495798


Projection traversals

gremlin> g.V().hasLabel('category').as('a','b'). select('a','b'). by('name'). by(inE('category').count())


> :Animation, b:105h1>

> :Children's, b:251h1>

> :Comedy, b:1200h1>

> :Adventure, b:283h1>

> :Fantasy, b:68h1>

> :Romance, b:471h1>

> :Drama, b:1603h1>

> :Action, b:503h1>

> :Crime, b:211h1>

> :Thriller, b:492h1>

> :Horror, b:343h1>

> :Sci-Fi, b:276h1>

> :Documentary, b:127h1>

> :War, b:143h1>

> :Musical, b:114h1>

> :Mystery, b:106h1>

> :Film-Noir, b:44h1>

> :Western, b:68 gremlin> g.V().hasLabel('movie').as('a','b'). where(inE('rated').count().is(gt(10))). select('a','b'). by('name'). by(inE('rated').values('stars').mean()). order().by(select('b'),decr). limit(10)


> :Sanjuro, b:4.608695652173913h1>

> :Seven Samurai (The Magnificent Seven), b:4.560509554140127h1>

> :Shawshank Redemption, The, b:4.554557700942973h1>

> :Godfather, The, b:4.524966261808367h1>

> :Close Shave, A, b:4.52054794520548h1>

> :Usual Suspects, The, b:4.517106001121705h1>

> :Schindler's List, b:4.510416666666667h1>

> :Wrong Trousers, The, b:4.507936507936508h1>

> :Sunset Blvd. (a.k.a. Sunset Boulevard), b:4.491489361702127h1>

> :Raiders of the Lost Ark, b:4.47772


Declarative pattern matching traversals

Gremlin supports declarative graph pattern matching similar to
SPARQL SPARQL (pronounced ":wikt:sparkle, sparkle", a recursive acronym for SPARQL Protocol and RDF Query Language) is an RDF query language—that is, a Semantic Query, semantic query language for databases—able to retrieve and manipulate data sto ...
. For instance, the following query below uses Gremlin's ''match()''-step. gremlin> g.V(). match( __.as('a').hasLabel('movie'), __.as('a').out('category').has('name','Action'), __.as('a').has('year',between(1980,1990)), __.as('a').inE('rated').as('b'), __.as('b').has('stars',5), __.as('b').outV().as('c'), __.as('c').out('occupation').has('name','programmer'), __.as('c').has('age',between(30,40))). select('a').groupCount().by('name'). order(local).by(valueDecr). limit(local,10)


>Raiders of the Lost Ark=26


>Star Wars Episode V - The Empire Strikes Back=26


>Terminator, The=23


>Star Wars Episode VI - Return of the Jedi=22


>Princess Bride, The=19


>Aliens=18


>Boat, The (Das Boot)=11


>Indiana Jones and the Last Crusade=11


>Star Trek The Wrath of Khan=10


>Abyss, The=9


OLAP traversal

gremlin> g = graph.traversal(computer(SparkGraphComputer))


>graphtraversalsource adoopgraph[gryoinputformat->gryooutputformat sparkgraphcomputer">ryoinputformat->gryooutputformat.html" ;"title="adoopgraph[gryoinputformat->gryooutputformat">adoopgraph[gryoinputformat->gryooutputformat sparkgraphcomputergremlin> g.V().repeat(outE('rated').has('stars', 5).inV(). groupCount('m').by('name'). inE('rated').has('stars', 5).outV()). times(4).cap('m')


>Star Wars Episode IV - A New Hope 35405394353105332


>American Beauty 31943228282020585


>Raiders of the Lost Ark 31224779793238499


>Star Wars Episode V - The Empire Strikes Back 30434677119726223


>Godfather, The 30258518523013057


>Shawshank Redemption, The 28297717387901031


>Schindler's List 27539336654199309


>Silence of the Lambs, The 26736276376806173


>Fargo 26531050311325270


>Matrix, The 26395118239203191


Gremlin graph traversal machine

Gremlin is a
virtual machine In computing, a virtual machine (VM) is the virtualization or emulator, emulation of a computer system. Virtual machines are based on computer architectures and provide the functionality of a physical computer. Their implementations may involve ...
composed of an
instruction set In computer science, an instruction set architecture (ISA) is an abstract model that generally defines how software controls the CPU in a computer or a family of computers. A device or program that executes instructions described by that ISA, s ...
as well as an execution engine. An analogy is drawn between Gremlin and Java (programming language), Java.


Gremlin steps (instruction set)

The following traversal is a Gremlin traversal in the Gremlin-Java8 dialect. g.V().as("a").out("knows").as("b"). select("a","b"). by("name"). by("age") The Gremlin language (i.e. the fluent-style of expressing a graph traversal) can be represented in any host language that supports
function composition In mathematics, the composition operator \circ takes two function (mathematics), functions, f and g, and returns a new function h(x) := (g \circ f) (x) = g(f(x)). Thus, the function is function application, applied after applying to . (g \c ...
and function nesting. Due to this simple requirement, there exists various Gremlin dialects including Gremlin-Groovy, Gremlin-Scala, Gremlin-Clojure, etc. The above Gremlin-Java8 traversal is ultimately compiled down to a step sequence called a ''traversal''. A string representation of the traversal above provided below. raphStep([vertex)@[a">html" ;"title="raphStep([">raphStep([vertex)@[a VertexStep(OUT,[knows">quot;>raphStep([<_a>vertex)@ raphStep([vertex)@[a">html" ;"title="raphStep([">raphStep([vertex)@[a VertexStep(OUT,[knowsvertex)@[b">.html" ;"title="html" ;"title="raphStep([">raphStep([vertex)@[a">html" ;"title="raphStep([">raphStep([vertex)@[a VertexStep(OUT,[knowsvertex)@[b SelectStep([a, b],[value(name), value(age)])] The ''steps'' are the primitives of the Gremlin graph traversal machine. They are the parameterized instructions that the machine ultimately executes. The Gremlin
instruction set In computer science, an instruction set architecture (ISA) is an abstract model that generally defines how software controls the CPU in a computer or a family of computers. A device or program that executes instructions described by that ISA, s ...
is approximately 30 steps. These steps are sufficient to provide general purpose computing and what is typically required to express the common motifs of any graph traversal query. Given that Gremlin is a language, an instruction set, and a virtual machine, it is possible to design another traversal language that compiles to the Gremlin traversal machine (analogous to how Scala compiles to the
JVM A Java virtual machine (JVM) is a virtual machine that enables a computer to run Java programs as well as programs written in other languages that are also compiled to Java bytecode. The JVM is detailed by a specification that formally descri ...
). For instance, the popular
SPARQL SPARQL (pronounced ":wikt:sparkle, sparkle", a recursive acronym for SPARQL Protocol and RDF Query Language) is an RDF query language—that is, a Semantic Query, semantic query language for databases—able to retrieve and manipulate data sto ...
graph pattern match language can be compiled to execute on the Gremlin machine. The following SPARQL query SELECT ?a ?b ?c WHERE would compile to raphStep([vertex), MatchStep(AND,MatchStartStep(a), LabelStep, IsStep(eq(Person)), MatchEndStep">html" ;"title="raphStep([">raphStep([vertex), MatchStep(AND,MatchStartStep(a), LabelStep, IsStep(eq(Person)), MatchEndStep [MatchStartStep(a), VertexStep(OUT,[knows],vertex), MatchEndStep(b)], [MatchStartStep(a), VertexStep(OUT,[created],vertex), MatchEndStep(c)], [MatchStartStep(b), VertexStep(OUT,[created],vertex), MatchEndStep(c)], [MatchStartStep(b), PropertiesStep( gevalue), MatchEndStep(d)], atchStartStep(d), IsStep(gt(30)), MatchEndStep), SelectStep( , b, c]. In Gremlin-Java8, the SPARQL query above would be represented as below and compile to the identical Gremlin step sequence (i.e. traversal). g.V().match( as("a").label().is("person"), as("a").out("knows").as("b"), as("a").out("created").as("c"), as("b").out("created").as("c"), as("b").values("age").as("d"), as("d").is(gt(30))). select("a","b","c")


Gremlin Machine (virtual machine)

The Gremlin graph traversal machine can execute on a single machine or across a multi-machine compute cluster. Execution agnosticism allows Gremlin to run over both
graph databases A graph database (GDB) is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A key concept of the system is the graph (or edge or relationship). The graph relates the data ...
(OLTP) and graph processors (OLAP).


See also

*
Cypher Query Language Cypher is a declarative graph query language that allows for expressive and efficient data querying in a property graph. Cypher was largely an invention of Andrés Taylor while working for Neo4j, Inc. (formerly Neo Technology) in 2011. Cypher wa ...
, another query language on graph data *
SPARQL SPARQL (pronounced ":wikt:sparkle, sparkle", a recursive acronym for SPARQL Protocol and RDF Query Language) is an RDF query language—that is, a Semantic Query, semantic query language for databases—able to retrieve and manipulate data sto ...
, another query language on graph data *
Regular path query In databases and specifically in graph databases, a regular path query or RPQ is a query asking for pairs of endpoints in the database that are connected by a path satisfying a certain regular expression. A similar feature exists in the SPARQL quer ...
, a theoretical query language on graph data *
Graph query language GQL (Graph Query Language) is a standardized query language for property graphs first described in ISO/IEC 39075, released in April 2024 by ISO/IEC. History The GQL project is the culmination of converging initiatives dating back to 2016, pa ...
, a proposed standardization of a query language on graph data


References


External links


Apache TinkerPop Homepage

sql2gremlin.com (TinkerPop2)
# Rodriguez, M.A.,
The Gremlin Graph Traversal Machine and Language
" Proceedings of the ACM Database Programming Languages Conference, October, 2015. {{Query languages Cluster computing Data mining and machine learning software Hadoop Apache Software Foundation Software using the Apache license Java platform Free software programmed in Scala Declarative programming languages Query languages