Gremlin is a
graph traversal
In computer science, graph traversal (also known as graph search) refers to the process of visiting (checking and/or updating) each vertex in a graph. Such traversals are classified by the order in which the vertices are visited. Tree traversal ...
language and
virtual machine
In computing, a virtual machine (VM) is the virtualization/emulation of a computer system. Virtual machines are based on computer architectures and provide functionality of a physical computer. Their implementations may involve specialized hardw ...
developed by Apache TinkerPop of the
Apache Software Foundation
The Apache Software Foundation (ASF) is an American nonprofit corporation (classified as a 501(c)(3) organization in the United States) to support a number of open source software projects. The ASF was formed from a group of developers of the A ...
. Gremlin works for both
OLTP In online transaction processing (OLTP), information systems typically facilitate and manage transaction-oriented applications. This is contrasted with online analytical processing.
The term "transaction" can have two different meanings, both of wh ...
-based graph databases as well as
OLAP
Online analytical processing, or OLAP (), is an approach to answer multi-dimensional analytical (MDA) queries swiftly in computing. OLAP is part of the broader category of business intelligence, which also encompasses relational databases, repor ...
-based graph processors. Gremlin's
automata
An automaton (; plural: automata or automatons) is a relatively self-operating machine, or control mechanism designed to automatically follow a sequence of operations, or respond to predetermined instructions.Automaton – Definition and More ...
and
functional language
In computer science, functional programming is a programming paradigm where programs are constructed by applying and composing functions. It is a declarative programming paradigm in which function definitions are trees of expressions that m ...
foundation enable Gremlin to naturally support:
imperative and
declarative querying; host language agnosticism; user-defined
domain specific languages
A domain-specific language (DSL) is a computer language specialized to a particular application domain. This is in contrast to a general-purpose language (GPL), which is broadly applicable across domains. There are a wide variety of DSLs, ranging ...
; an extensible compiler/optimizer, single- and multi-machine execution models; hybrid depth- and breadth-first evaluation with
Turing Complete
Alan Mathison Turing (; 23 June 1912 – 7 June 1954) was an English mathematician, computer scientist, logician, cryptanalyst, philosopher, and theoretical biologist. Turing was highly influential in the development of theoretical com ...
ness.
As an explanatory analogy, Apache TinkerPop and Gremlin are to
graph databases
A graph database (GDB) is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A key concept of the system is the ''graph'' (or ''edge'' or ''relationship''). The graph relat ...
what the
JDBC
Java Database Connectivity (JDBC) is an application programming interface (API) for the programming language Java, which defines how a client may access a database. It is a Java-based data access technology used for Java database connectivity. I ...
and
SQL are to
relational databases
A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relation ...
. Likewise, the Gremlin traversal machine is to graph computing as what the
Java virtual machine
A Java virtual machine (JVM) is a virtual machine that enables a computer to run Java programs as well as programs written in other languages that are also compiled to Java bytecode. The JVM is detailed by a specification that formally describes ...
is to general purpose computing.
History
* 2009-10-30 the project is born, and immediately named "TinkerPop"
* 2009-12-25 v0.1 is the first release
* 2011-05-21 v1.0 is released
* 2012-05-24 v2.0 is released
* 2015-01-16 TinkerPop becomes an Apache Incubator project
* 2015-07-09 v3.0.0-incubating is released
* 2016-05-23 Apache TinkerPop becomes a top-level project
* 2016-07-18 v3.1.3 and v3.2.1 are first releases as Apache TinkerPop
* 2017-12-17 v3.3.1 is released
* 2018-05-08 v3.3.3 is released
* 2019-08-05 v3.4.3 is released
* 2020-02-20 v3.4.6 is released
Vendor integration
Gremlin is an
Apache2-licensed graph traversal language that can be used by graph system vendors. There are typically two types of graph system vendors: OLTP
graph databases
A graph database (GDB) is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A key concept of the system is the ''graph'' (or ''edge'' or ''relationship''). The graph relat ...
and OLAP graph processors. The table below outlines those graph vendors that support Gremlin.
Traversal examples
The following examples of Gremlin queries and responses in a Gremlin-Groovy environment are relative to a graph representation of th
MovieLensdataset.
The dataset includes users who rate movies. Users each have one occupation, and each movie has one or more categories associated with it. The MovieLens graph schema is detailed below.
user--rated tars:0-5->movie
user--occupation-->occupation
movie--category-->category
Simple traversals
gremlin> g.V().label().groupCount()
> ccupation:21, movie:3883, category:18, user:6040
gremlin> g.V().hasLabel('movie').values('year').min()
>1919
gremlin> g.V().has('movie','name','Die Hard').inE('rated').values('stars').mean()
>4.121848739495798
Projection traversals
gremlin> g.V().hasLabel('category').as('a','b').
select('a','b').
by('name').
by(inE('category').count())
> :Animation, b:105h1>
> :Children's, b:251h1>
> :Comedy, b:1200h1>
> :Adventure, b:283h1>
> :Fantasy, b:68h1>
> :Romance, b:471h1>
> :Drama, b:1603h1>
> :Action, b:503h1>
> :Crime, b:211h1>
> :Thriller, b:492h1>
> :Horror, b:343h1>
> :Sci-Fi, b:276h1>
> :Documentary, b:127h1>
> :War, b:143h1>
> :Musical, b:114h1>
> :Mystery, b:106h1>
> :Film-Noir, b:44h1>
> :Western, b:68
gremlin> g.V().hasLabel('movie').as('a','b').
where(inE('rated').count().is(gt(10))).
select('a','b').
by('name').
by(inE('rated').values('stars').mean()).
order().by(select('b'),decr).
limit(10)
> :Sanjuro, b:4.608695652173913h1>
> :Seven Samurai (The Magnificent Seven), b:4.560509554140127h1>
> :Shawshank Redemption, The, b:4.554557700942973h1>
> :Godfather, The, b:4.524966261808367h1>
> :Close Shave, A, b:4.52054794520548h1>
> :Usual Suspects, The, b:4.517106001121705h1>
> :Schindler's List, b:4.510416666666667h1>
> :Wrong Trousers, The, b:4.507936507936508h1>
> :Sunset Blvd. (a.k.a. Sunset Boulevard), b:4.491489361702127h1>
> :Raiders of the Lost Ark, b:4.47772
Declarative pattern matching traversals
Gremlin supports declarative graph pattern matching similar to
SPARQL
SPARQL (pronounced "sparkle" , a recursive acronym for SPARQL Protocol and RDF Query Language) is an RDF query language—that is, a semantic query language for databases—able to retrieve and manipulate data stored in Resource Description F ...
. For instance, the following query below uses Gremlin's ''match()''-step.
gremlin> g.V().
match(
__.as('a').hasLabel('movie'),
__.as('a').out('category').has('name','Action'),
__.as('a').has('year',between(1980,1990)),
__.as('a').inE('rated').as('b'),
__.as('b').has('stars',5),
__.as('b').outV().as('c'),
__.as('c').out('occupation').has('name','programmer'),
__.as('c').has('age',between(30,40))).
select('a').groupCount().by('name').
order(local).by(valueDecr).
limit(local,10)
>Raiders of the Lost Ark=26
>Star Wars Episode V - The Empire Strikes Back=26
>Terminator, The=23
>Star Wars Episode VI - Return of the Jedi=22
>Princess Bride, The=19
>Aliens=18
>Boat, The (Das Boot)=11
>Indiana Jones and the Last Crusade=11
>Star Trek The Wrath of Khan=10
>Abyss, The=9
OLAP traversal
gremlin> g = graph.traversal(computer(SparkGraphComputer))
>graphtraversalsource gryooutputformat">adoopgraph[gryoinputformat->gryooutputformat_sparkgraphcomputer.html" ;"title="ryoinputformat->gryooutputformat.html" ;"title="adoopgraph[gryoinputformat->gryooutputformat">adoopgraph[gryoinputformat->gryooutputformat sparkgraphcomputer">ryoinputformat->gryooutputformat.html" ;"title="adoopgraph[gryoinputformat->gryooutputformat">adoopgraph[gryoinputformat->gryooutputformat sparkgraphcomputergremlin> g.V().repeat(outE('rated').has('stars', 5).inV().
groupCount('m').by('name').
inE('rated').has('stars', 5).outV()).
times(4).cap('m')
>Star Wars Episode IV - A New Hope 35405394353105332
>American Beauty 31943228282020585
>Raiders of the Lost Ark 31224779793238499
>Star Wars Episode V - The Empire Strikes Back 30434677119726223
>Godfather, The 30258518523013057
>Shawshank Redemption, The 28297717387901031
>Schindler's List 27539336654199309
>Silence of the Lambs, The 26736276376806173
>Fargo 26531050311325270
>Matrix, The 26395118239203191
Gremlin graph traversal machine
Gremlin is a
virtual machine
In computing, a virtual machine (VM) is the virtualization/emulation of a computer system. Virtual machines are based on computer architectures and provide functionality of a physical computer. Their implementations may involve specialized hardw ...
composed of an
instruction set
In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called an ' ...
as well as an execution engine. An analogy is drawn between Gremlin and Java (programming language), Java.
Gremlin steps (instruction set)
The following traversal is a Gremlin traversal in the Gremlin-Java8 dialect.
g.V().as("a").out("knows").as("b").
select("a","b").
by("name").
by("age")
The Gremlin language (i.e. the
fluent-style of expressing a graph traversal) can be represented in any host language that supports
function composition
In mathematics, function composition is an operation that takes two functions and , and produces a function such that . In this operation, the function is applied to the result of applying the function to . That is, the functions and ...
and
function nesting. Due to this simple requirement, there exists various Gremlin dialects including Gremlin-Groovy, Gremlin-Scala, Gremlin-Clojure, etc. The above Gremlin-Java8 traversal is ultimately compiled down to a step sequence called a ''traversal''. A string representation of the traversal above provided below.
raphStep([vertex)@[a">html"_;"title="raphStep([">raphStep([vertex)@[a_VertexStep(OUT,[knows.html" ;"title=">raphStep([vertex)@[a.html" ;"title="html" ;"title="raphStep([">raphStep([vertex)@[a">html" ;"title="raphStep([">raphStep([vertex)@[a VertexStep(OUT,[knows">>raphStep([vertex)@ raphStep([vertex)@[a">html"_;"title="raphStep([">raphStep([vertex)@[a_VertexStep(OUT,[knowsvertex)@[b.html" ;"title=".html" ;"title="html" ;"title="raphStep([">raphStep([vertex)@[a">html" ;"title="raphStep([">raphStep([vertex)@[a VertexStep(OUT,[knowsvertex)@[b">.html" ;"title="html" ;"title="raphStep([">raphStep([vertex)@[a">html" ;"title="raphStep([">raphStep([vertex)@[a VertexStep(OUT,[knowsvertex)@[b SelectStep([a, b],[value(name), value(age)])]
The ''steps'' are the primitives of the Gremlin graph traversal machine. They are the parameterized instructions that the machine ultimately executes. The Gremlin
instruction set
In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called an ' ...
is approximately 30 steps. These steps are sufficient to provide general purpose computing and what is typically required to express the common motifs of any graph traversal query.
Given that Gremlin is a language, an instruction set, and a virtual machine, it is possible to design another traversal language that compiles to the Gremlin traversal machine (analogous to how Scala compiles to the
JVM
A Java virtual machine (JVM) is a virtual machine that enables a computer to run Java programs as well as programs written in other languages that are also compiled to Java bytecode. The JVM is detailed by a specification that formally describes ...
). For instance, the popular
SPARQL
SPARQL (pronounced "sparkle" , a recursive acronym for SPARQL Protocol and RDF Query Language) is an RDF query language—that is, a semantic query language for databases—able to retrieve and manipulate data stored in Resource Description F ...
graph pattern match language can be compiled to execute on the Gremlin machine. The following SPARQL query
SELECT ?a ?b ?c
WHERE
would compile to
raphStep([vertex),_MatchStep(AND,MatchStartStep(a),_LabelStep,_IsStep(eq(Person)),_MatchEndStep.html" ;"title="html" ;"title="raphStep([">raphStep([vertex), MatchStep(AND,MatchStartStep(a), LabelStep, IsStep(eq(Person)), MatchEndStep">html" ;"title="raphStep([">raphStep([vertex), MatchStep(AND,MatchStartStep(a), LabelStep, IsStep(eq(Person)), MatchEndStep [MatchStartStep(a), VertexStep(OUT,[knows],vertex), MatchEndStep(b)], [MatchStartStep(a), VertexStep(OUT,[created],vertex), MatchEndStep(c)], [MatchStartStep(b), VertexStep(OUT,[created],vertex), MatchEndStep(c)], [MatchStartStep(b), PropertiesStep( gevalue), MatchEndStep(d)], atchStartStep(d), IsStep(gt(30)), MatchEndStep), SelectStep( , b, c].
In Gremlin-Java8, the SPARQL query above would be represented as below and compile to the identical Gremlin step sequence (i.e. traversal).
g.V().match(
as("a").label().is("person"),
as("a").out("knows").as("b"),
as("a").out("created").as("c"),
as("b").out("created").as("c"),
as("b").values("age").as("d"),
as("d").is(gt(30))).
select("a","b","c")
Gremlin Machine (virtual machine)
The Gremlin graph traversal machine can execute on a single machine or across a multi-machine compute cluster. Execution agnosticism allows Gremlin to run over both
graph databases
A graph database (GDB) is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A key concept of the system is the ''graph'' (or ''edge'' or ''relationship''). The graph relat ...
(OLTP) and graph processors (OLAP).
See also
*
Cypher Query Language
Cypher is a declarative graph query language that allows for expressive and efficient data querying in a property graph.
Cypher was largely an invention of Andrés Taylor while working for Neo4j, Inc. (formerly Neo Technology) in 2011. Cypher was ...
, another query language on graph data
*
SPARQL
SPARQL (pronounced "sparkle" , a recursive acronym for SPARQL Protocol and RDF Query Language) is an RDF query language—that is, a semantic query language for databases—able to retrieve and manipulate data stored in Resource Description F ...
, another query language on graph data
References
External links
Apache TinkerPop Homepagesql2gremlin.com (TinkerPop2)# Rodriguez, M.A.,
The Gremlin Graph Traversal Machine and Language" Proceedings of the ACM Database Programming Languages Conference, October, 2015.
{{Query languages
Cluster computing
Data mining and machine learning software
Hadoop
Apache Software Foundation
Software using the Apache license
Java platform
Free software programmed in Scala
Declarative programming languages
Query languages