HOME

TheInfoList



OR:

Gremlin is a
graph traversal In computer science, graph traversal (also known as graph search) refers to the process of visiting (checking and/or updating) each vertex in a graph. Such traversals are classified by the order in which the vertices are visited. Tree traversal ...
language and
virtual machine In computing, a virtual machine (VM) is the virtualization/emulation of a computer system. Virtual machines are based on computer architectures and provide functionality of a physical computer. Their implementations may involve specialized hardw ...
developed by Apache TinkerPop of the
Apache Software Foundation The Apache Software Foundation (ASF) is an American nonprofit corporation (classified as a 501(c)(3) organization in the United States) to support a number of open source software projects. The ASF was formed from a group of developers of the A ...
. Gremlin works for both
OLTP In online transaction processing (OLTP), information systems typically facilitate and manage transaction-oriented applications. This is contrasted with online analytical processing. The term "transaction" can have two different meanings, both of wh ...
-based graph databases as well as
OLAP Online analytical processing, or OLAP (), is an approach to answer multi-dimensional analytical (MDA) queries swiftly in computing. OLAP is part of the broader category of business intelligence, which also encompasses relational databases, repor ...
-based graph processors. Gremlin's
automata An automaton (; plural: automata or automatons) is a relatively self-operating machine, or control mechanism designed to automatically follow a sequence of operations, or respond to predetermined instructions.Automaton – Definition and More ...
and
functional language In computer science, functional programming is a programming paradigm where programs are constructed by applying and composing functions. It is a declarative programming paradigm in which function definitions are trees of expressions that m ...
foundation enable Gremlin to naturally support: imperative and declarative querying; host language agnosticism; user-defined
domain specific languages A domain-specific language (DSL) is a computer language specialized to a particular application domain. This is in contrast to a general-purpose language (GPL), which is broadly applicable across domains. There are a wide variety of DSLs, ranging ...
; an extensible compiler/optimizer, single- and multi-machine execution models; hybrid depth- and breadth-first evaluation with
Turing Complete Alan Mathison Turing (; 23 June 1912 – 7 June 1954) was an English mathematician, computer scientist, logician, cryptanalyst, philosopher, and theoretical biologist. Turing was highly influential in the development of theoretical com ...
ness. As an explanatory analogy, Apache TinkerPop and Gremlin are to
graph databases A graph database (GDB) is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A key concept of the system is the ''graph'' (or ''edge'' or ''relationship''). The graph relat ...
what the
JDBC Java Database Connectivity (JDBC) is an application programming interface (API) for the programming language Java, which defines how a client may access a database. It is a Java-based data access technology used for Java database connectivity. I ...
and SQL are to
relational databases A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relation ...
. Likewise, the Gremlin traversal machine is to graph computing as what the
Java virtual machine A Java virtual machine (JVM) is a virtual machine that enables a computer to run Java programs as well as programs written in other languages that are also compiled to Java bytecode. The JVM is detailed by a specification that formally describes ...
is to general purpose computing.


History

* 2009-10-30 the project is born, and immediately named "TinkerPop" * 2009-12-25 v0.1 is the first release * 2011-05-21 v1.0 is released * 2012-05-24 v2.0 is released * 2015-01-16 TinkerPop becomes an Apache Incubator project * 2015-07-09 v3.0.0-incubating is released * 2016-05-23 Apache TinkerPop becomes a top-level project * 2016-07-18 v3.1.3 and v3.2.1 are first releases as Apache TinkerPop * 2017-12-17 v3.3.1 is released * 2018-05-08 v3.3.3 is released * 2019-08-05 v3.4.3 is released * 2020-02-20 v3.4.6 is released


Vendor integration

Gremlin is an Apache2-licensed graph traversal language that can be used by graph system vendors. There are typically two types of graph system vendors: OLTP
graph databases A graph database (GDB) is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A key concept of the system is the ''graph'' (or ''edge'' or ''relationship''). The graph relat ...
and OLAP graph processors. The table below outlines those graph vendors that support Gremlin.


Traversal examples

The following examples of Gremlin queries and responses in a Gremlin-Groovy environment are relative to a graph representation of th
MovieLens
dataset. The dataset includes users who rate movies. Users each have one occupation, and each movie has one or more categories associated with it. The MovieLens graph schema is detailed below. user--rated tars:0-5->movie user--occupation-->occupation movie--category-->category


Simple traversals

gremlin> g.V().label().groupCount()


> ccupation:21, movie:3883, category:18, user:6040 gremlin> g.V().hasLabel('movie').values('year').min()


>1919 gremlin> g.V().has('movie','name','Die Hard').inE('rated').values('stars').mean()


>4.121848739495798


Projection traversals

gremlin> g.V().hasLabel('category').as('a','b'). select('a','b'). by('name'). by(inE('category').count())


> :Animation, b:105h1>

> :Children's, b:251h1>

> :Comedy, b:1200h1>

> :Adventure, b:283h1>

> :Fantasy, b:68h1>

> :Romance, b:471h1>

> :Drama, b:1603h1>

> :Action, b:503h1>

> :Crime, b:211h1>

> :Thriller, b:492h1>

> :Horror, b:343h1>

> :Sci-Fi, b:276h1>

> :Documentary, b:127h1>

> :War, b:143h1>

> :Musical, b:114h1>

> :Mystery, b:106h1>

> :Film-Noir, b:44h1>

> :Western, b:68 gremlin> g.V().hasLabel('movie').as('a','b'). where(inE('rated').count().is(gt(10))). select('a','b'). by('name'). by(inE('rated').values('stars').mean()). order().by(select('b'),decr). limit(10)


> :Sanjuro, b:4.608695652173913h1>

> :Seven Samurai (The Magnificent Seven), b:4.560509554140127h1>

> :Shawshank Redemption, The, b:4.554557700942973h1>

> :Godfather, The, b:4.524966261808367h1>

> :Close Shave, A, b:4.52054794520548h1>

> :Usual Suspects, The, b:4.517106001121705h1>

> :Schindler's List, b:4.510416666666667h1>

> :Wrong Trousers, The, b:4.507936507936508h1>

> :Sunset Blvd. (a.k.a. Sunset Boulevard), b:4.491489361702127h1>

> :Raiders of the Lost Ark, b:4.47772


Declarative pattern matching traversals

Gremlin supports declarative graph pattern matching similar to
SPARQL SPARQL (pronounced "sparkle" , a recursive acronym for SPARQL Protocol and RDF Query Language) is an RDF query language—that is, a semantic query language for databases—able to retrieve and manipulate data stored in Resource Description F ...
. For instance, the following query below uses Gremlin's ''match()''-step. gremlin> g.V(). match( __.as('a').hasLabel('movie'), __.as('a').out('category').has('name','Action'), __.as('a').has('year',between(1980,1990)), __.as('a').inE('rated').as('b'), __.as('b').has('stars',5), __.as('b').outV().as('c'), __.as('c').out('occupation').has('name','programmer'), __.as('c').has('age',between(30,40))). select('a').groupCount().by('name'). order(local).by(valueDecr). limit(local,10)


>Raiders of the Lost Ark=26


>Star Wars Episode V - The Empire Strikes Back=26


>Terminator, The=23


>Star Wars Episode VI - Return of the Jedi=22


>Princess Bride, The=19


>Aliens=18


>Boat, The (Das Boot)=11


>Indiana Jones and the Last Crusade=11


>Star Trek The Wrath of Khan=10


>Abyss, The=9


OLAP traversal

gremlin> g = graph.traversal(computer(SparkGraphComputer))


>graphtraversalsource adoopgraph[gryoinputformat->gryooutputformat_sparkgraphcomputer.html" ;"title="ryoinputformat->gryooutputformat.html" ;"title="adoopgraph[gryoinputformat->gryooutputformat">adoopgraph[gryoinputformat->gryooutputformat sparkgraphcomputer">ryoinputformat->gryooutputformat.html" ;"title="adoopgraph[gryoinputformat->gryooutputformat">adoopgraph[gryoinputformat->gryooutputformat sparkgraphcomputergremlin> g.V().repeat(outE('rated').has('stars', 5).inV(). groupCount('m').by('name'). inE('rated').has('stars', 5).outV()). times(4).cap('m')


>Star Wars Episode IV - A New Hope 35405394353105332


>American Beauty 31943228282020585


>Raiders of the Lost Ark 31224779793238499


>Star Wars Episode V - The Empire Strikes Back 30434677119726223


>Godfather, The 30258518523013057


>Shawshank Redemption, The 28297717387901031


>Schindler's List 27539336654199309


>Silence of the Lambs, The 26736276376806173


>Fargo 26531050311325270


>Matrix, The 26395118239203191


Gremlin graph traversal machine

Gremlin is a
virtual machine In computing, a virtual machine (VM) is the virtualization/emulation of a computer system. Virtual machines are based on computer architectures and provide functionality of a physical computer. Their implementations may involve specialized hardw ...
composed of an
instruction set In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called an ' ...
as well as an execution engine. An analogy is drawn between Gremlin and Java (programming language), Java.


Gremlin steps (instruction set)

The following traversal is a Gremlin traversal in the Gremlin-Java8 dialect. g.V().as("a").out("knows").as("b"). select("a","b"). by("name"). by("age") The Gremlin language (i.e. the fluent-style of expressing a graph traversal) can be represented in any host language that supports
function composition In mathematics, function composition is an operation that takes two functions and , and produces a function such that . In this operation, the function is applied to the result of applying the function to . That is, the functions and ...
and function nesting. Due to this simple requirement, there exists various Gremlin dialects including Gremlin-Groovy, Gremlin-Scala, Gremlin-Clojure, etc. The above Gremlin-Java8 traversal is ultimately compiled down to a step sequence called a ''traversal''. A string representation of the traversal above provided below. raphStep([vertex)@[a">html"_;"title="raphStep([">raphStep([vertex)@[a_VertexStep(OUT,[knows.html" ;"title=">raphStep([vertex)@[a.html" ;"title="html" ;"title="raphStep([">raphStep([vertex)@[a">html" ;"title="raphStep([">raphStep([vertex)@[a VertexStep(OUT,[knows">>raphStep([vertex)@ raphStep([vertex)@[a">html"_;"title="raphStep([">raphStep([vertex)@[a_VertexStep(OUT,[knowsvertex)@[b.html" ;"title=".html" ;"title="html" ;"title="raphStep([">raphStep([vertex)@[a">html" ;"title="raphStep([">raphStep([vertex)@[a VertexStep(OUT,[knowsvertex)@[b">.html" ;"title="html" ;"title="raphStep([">raphStep([vertex)@[a">html" ;"title="raphStep([">raphStep([vertex)@[a VertexStep(OUT,[knowsvertex)@[b SelectStep([a, b],[value(name), value(age)])] The ''steps'' are the primitives of the Gremlin graph traversal machine. They are the parameterized instructions that the machine ultimately executes. The Gremlin
instruction set In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called an ' ...
is approximately 30 steps. These steps are sufficient to provide general purpose computing and what is typically required to express the common motifs of any graph traversal query. Given that Gremlin is a language, an instruction set, and a virtual machine, it is possible to design another traversal language that compiles to the Gremlin traversal machine (analogous to how Scala compiles to the
JVM A Java virtual machine (JVM) is a virtual machine that enables a computer to run Java programs as well as programs written in other languages that are also compiled to Java bytecode. The JVM is detailed by a specification that formally describes ...
). For instance, the popular
SPARQL SPARQL (pronounced "sparkle" , a recursive acronym for SPARQL Protocol and RDF Query Language) is an RDF query language—that is, a semantic query language for databases—able to retrieve and manipulate data stored in Resource Description F ...
graph pattern match language can be compiled to execute on the Gremlin machine. The following SPARQL query SELECT ?a ?b ?c WHERE would compile to raphStep([vertex),_MatchStep(AND,MatchStartStep(a),_LabelStep,_IsStep(eq(Person)),_MatchEndStep.html" ;"title="html" ;"title="raphStep([">raphStep([vertex), MatchStep(AND,MatchStartStep(a), LabelStep, IsStep(eq(Person)), MatchEndStep">html" ;"title="raphStep([">raphStep([vertex), MatchStep(AND,MatchStartStep(a), LabelStep, IsStep(eq(Person)), MatchEndStep [MatchStartStep(a), VertexStep(OUT,[knows],vertex), MatchEndStep(b)], [MatchStartStep(a), VertexStep(OUT,[created],vertex), MatchEndStep(c)], [MatchStartStep(b), VertexStep(OUT,[created],vertex), MatchEndStep(c)], [MatchStartStep(b), PropertiesStep( gevalue), MatchEndStep(d)], atchStartStep(d), IsStep(gt(30)), MatchEndStep), SelectStep( , b, c]. In Gremlin-Java8, the SPARQL query above would be represented as below and compile to the identical Gremlin step sequence (i.e. traversal). g.V().match( as("a").label().is("person"), as("a").out("knows").as("b"), as("a").out("created").as("c"), as("b").out("created").as("c"), as("b").values("age").as("d"), as("d").is(gt(30))). select("a","b","c")


Gremlin Machine (virtual machine)

The Gremlin graph traversal machine can execute on a single machine or across a multi-machine compute cluster. Execution agnosticism allows Gremlin to run over both
graph databases A graph database (GDB) is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A key concept of the system is the ''graph'' (or ''edge'' or ''relationship''). The graph relat ...
(OLTP) and graph processors (OLAP).


See also

*
Cypher Query Language Cypher is a declarative graph query language that allows for expressive and efficient data querying in a property graph. Cypher was largely an invention of Andrés Taylor while working for Neo4j, Inc. (formerly Neo Technology) in 2011. Cypher was ...
, another query language on graph data *
SPARQL SPARQL (pronounced "sparkle" , a recursive acronym for SPARQL Protocol and RDF Query Language) is an RDF query language—that is, a semantic query language for databases—able to retrieve and manipulate data stored in Resource Description F ...
, another query language on graph data


References


External links


Apache TinkerPop Homepage

sql2gremlin.com (TinkerPop2)
# Rodriguez, M.A.,
The Gremlin Graph Traversal Machine and Language
" Proceedings of the ACM Database Programming Languages Conference, October, 2015. {{Query languages Cluster computing Data mining and machine learning software Hadoop Apache Software Foundation Software using the Apache license Java platform Free software programmed in Scala Declarative programming languages Query languages