| = Using Graph Databases with Groovy |
| Paul King |
| :revdate: 2024-09-02T22:18:00+00:00 |
| :updated: 2024-12-11T10:19:00+00:00 |
| :keywords: tugraph, apache tinkerpop, gremlin, neo4j, apache age, graph databases, apache hugegraph, arcadedb, orientdb, groovy |
| :description: This post illustrates using graph databases with Groovy. |
| |
| In this blog post, we look at using property graph databases with Groovy. |
| We'll look at: |
| |
| * Some advantages of property graph database technologies |
| * Some features of Groovy which make using such databases a little nicer |
| * Code examples for a common case study across 7 interesting graph databases |
| |
| == Case Study |
| |
| The Olympics is over for another 4 years. For sports fans, there were many exciting moments. |
| Let's look at just one event where the Olympic record was broken several times over the |
| last three years. We'll look at the women's 100m backstroke and model the results using |
| graph databases. |
| |
| Why the women's 100m backstroke? Well, that was a particularly exciting event |
| in terms of broken records. In Heat 4 of the Tokyo 2021 Olympics, Kylie Masse broke the record previously |
| held by Emily Seebohm from the London 2012 Olympics. A few minutes later in Heat 5, Regan Smith |
| broke the record again. Then in another few minutes in Heat 6, Kaylee McKeown broke the record again. |
| On the following day in Semifinal 1, Regan took back the record. Then, on the following |
| day in the final, Kaylee reclaimed the record. At the Paris 2024 Olympics, |
| Kaylee bettered her own record in the final. Then a few days later, |
| Regan lead off the 4 x 100m medley relay and broke the backstroke record swimming the first leg. |
| That makes 7 times the record was broken across the last 2 games! |
| |
| image:img/BackstrokeRecord.png[Result of Semifinal1,70%] |
| |
| We'll have vertices in our graph database corresponding to the swimmers and the swims. |
| We'll use the labels `Swimmer` and `Swim` for these vertices. We'll have relationships |
| such as `swam` and `supersedes` between vertices. |
| We'll explore modelling and querying the event |
| information using several graph database technologies. |
| |
| The examples in this post can be found on |
| https://github.com/paulk-asert/groovy-graphdb/[GitHub]. |
| |
| == Why graph databases? |
| |
| RDBMS systems are many times more popular than graph databases, but there are a |
| range of scenarios where graph databases are often used. |
| Which scenarios? Usually, it boils down to relationships. |
| If there are important relationships between data in your system, |
| graph databases might make sense. |
| Typical usage scenarios include fraud detection, knowledge graphs, recommendations engines, |
| social networks, and supply chain management. |
| |
| This blog post doesn't aim to convert everyone to use graph databases all the time, |
| but we'll show you some examples of when it might make sense and let you make up your own mind. |
| Graph databases certainly represent a very useful tool to have in your toolbox should the need arise. |
| |
| Graph databases are known for more succinct queries |
| and vastly more efficient queries in some scenarios. |
| As a first example, do you prefer this cypher query (it's from the TuGraph code we'll see later |
| but other technologies are similar): |
| |
| [source,sql] |
| ---- |
| MATCH (sr:Swimmer)-[:swam]->(sm:Swim {at: 'Paris 2024'}) |
| RETURN DISTINCT sr.country AS country |
| ---- |
| |
| Or the equivalent SQL query assuming we were storing |
| the information in relational tables: |
| |
| [source,sql] |
| ---- |
| SELECT DISTINCT country FROM Swimmer |
| LEFT JOIN Swimmer_Swim |
| ON Swimmer.swimmerId = Swimmer_Swim.fkSwimmer |
| LEFT JOIN Swim |
| ON Swim.swimId = Swimmer_Swim.fkSwim |
| WHERE Swim.at = 'Paris 2024' |
| ---- |
| |
| This SQL query is typical of what is required when we have a many-to-many relationship |
| between our entities, in this case _swimmers_ and _swims_. Many-to-many is required to |
| correctly model relay swims like the last record swim (though for brevity, we haven't |
| included the other relay swimmers in our dataset). The multiple joins in that query |
| can also be notoriously slow for large datasets. |
| |
| We'll see other examples later too, one being a query involving traversal of relationships. |
| Here is the cypher (again from TuGraph): |
| |
| [source,sql] |
| ---- |
| MATCH (s1:Swim)-[:supersedes*1..10]->(s2:Swim {at: 'London 2012'}) |
| RETURN s1.at as at, s1.event as event |
| ---- |
| |
| And the equivalent SQL: |
| |
| [source,sql] |
| ---- |
| WITH RECURSIVE traversed(swimId) AS ( |
| SELECT fkNew FROM Supersedes |
| WHERE fkOld IN ( |
| SELECT swimId FROM Swim |
| WHERE event = 'Heat 4' AND at = 'London 2012' |
| ) |
| UNION ALL |
| SELECT Supersedes.fkNew as swimId |
| FROM traversed as t |
| JOIN Supersedes |
| ON t.swimId = Supersedes.fkOld |
| WHERE t.swimId = swimId |
| ) |
| SELECT at, event FROM Swim |
| WHERE swimId IN (SELECT * FROM traversed) |
| ---- |
| |
| Here we have a `Supersedes` table and a recursive SQL function, `traversed`. |
| The details aren't important, but it shows the kind of complexity typically |
| required for the kind of relationship traversal we are looking at. |
| There are certainly far more complex SQL examples for different kinds of |
| traversals like shortest path. |
| |
| This example used TuGraph's Cypher variant as the Query language. Not all the |
| databases we'll look at support Cypher, but they all have some kind of query |
| language or API that makes such queries shorter. |
| |
| Several of the other databases do support a variant of https://www.iso.org/standard/76120.html[Cypher]. |
| Others support different SQL-like query languages. |
| We'll also see several JMV-based databases which support TinkerPop/Gremlin. |
| It's a Groovy-based technology and will be our first technology to explore. |
| Recently, ISO published an international standard, https://www.iso.org/standard/76120.html[GQL], |
| for property graph databases. We expect to see databases supporting that standard |
| in the not too distant future. |
| |
| Now, it's time to explore the case study using our different database technologies. |
| We tried to pick technologies that seem reasonably well maintained, had reasonable |
| JVM support, and had any features that seemed worth showing off. Several we |
| selected because they have TinkerPop/Gremlin support. |
| |
| == Apache TinkerPop |
| |
| Our first technology to examine is https://tinkerpop.apache.org/[Apache TinkerPopâ„¢]. |
| |
| image:https://tinkerpop.apache.org/img/tinkerpop-splash.png[tinkerpop logo,70%] |
| |
| TinkerPop is an open source computing framework for graph databases. It provides |
| a common abstraction layer, and a graph query language, called Gremlin. |
| This allows you to work with numerous graph database implementations in a consistent way. |
| TinkerPop also provides its own graph engine implementation, called TinkerGraph, |
| which is what we'll use initially. TinkerPop/Gremlin will be a technology we revisit |
| for other databases later. |
| |
| We'll look at the swims for the medalists and record breakers at the Tokyo 2021 and Paris 2024 Olympics |
| in the women's 100m backstroke. For reference purposes, we'll also include the previous swim that |
| set an olympic record. |
| |
| We'll start by creating a new in-memory graph database and |
| create a helper object for traversing the graph: |
| |
| [source,groovy] |
| ---- |
| var graph = TinkerGraph.open() |
| var g = traversal().withEmbedded(graph) |
| ---- |
| |
| Next, let's create the information relevant for the previous Olympic record which was set |
| at the London 2012 Olympics. Emily Seebohm set that record in Heat 4: |
| |
| [source,groovy] |
| ---- |
| var es = g.addV('Swimmer').property(name: 'Emily Seebohm', country: '🇦🇺').next() |
| swim1 = g.addV('Swim').property(at: 'London 2012', event: 'Heat 4', time: 58.23, result: 'First').next() |
| es.addEdge('swam', swim1) |
| ---- |
| |
| We can print out some information from our newly created nodes (vertices) |
| by querying the properties of two nodes respectively: |
| |
| [source,groovy] |
| ---- |
| var (name, country) = ['name', 'country'].collect { es.value(it) } |
| var (at, event, time) = ['at', 'event', 'time'].collect { swim1.value(it) } |
| println "$name from $country swam a time of $time in $event at the $at Olympics" |
| ---- |
| |
| Which has this output: |
| |
| ---- |
| Emily Seebohm from 🇦🇺 swam a time of 58.23 in Heat 4 at the London 2012 Olympics |
| ---- |
| |
| So far, we've just been using the Java API from TinkerPop. |
| It also provides some additional syntactic sugar for Groovy. |
| We can enable the syntactic sugar with: |
| |
| [source,groovy] |
| ---- |
| SugarLoader.load() |
| ---- |
| |
| Which then lets us write (instead of the three earlier lines) the slightly shorter: |
| |
| [source,groovy] |
| ---- |
| println "$es.name from $es.country swam a time of $swim1.time in $swim1.event at the $swim1.at Olympics" |
| ---- |
| |
| This uses Groovy's normal property access syntax and has the same output when executed. |
| |
| Let's create some helper methods to simplify creation of the remaining information. |
| |
| [source,groovy] |
| ---- |
| def insertSwimmer(TraversalSource g, name, country) { |
| g.addV('Swimmer').property(name: name, country: country).next() |
| } |
| |
| def insertSwim(TraversalSource g, at, event, time, result, swimmer) { |
| var swim = g.addV('Swim').property(at: at, event: event, time: time, result: result).next() |
| swimmer.addEdge('swam', swim) |
| swim |
| } |
| ---- |
| |
| Now we can create the remaining swim information: |
| |
| [source,groovy] |
| ---- |
| var km = insertSwimmer(g, 'Kylie Masse', '🇨🇦') |
| var swim2 = insertSwim(g, 'Tokyo 2021', 'Heat 4', 58.17, 'First', km) |
| swim2.addEdge('supersedes', swim1) |
| var swim3 = insertSwim(g, 'Tokyo 2021', 'Final', 57.72, '🥈', km) |
| |
| var rs = insertSwimmer(g, 'Regan Smith', '🇺🇸') |
| var swim4 = insertSwim(g, 'Tokyo 2021', 'Heat 5', 57.96, 'First', rs) |
| swim4.addEdge('supersedes', swim2) |
| var swim5 = insertSwim(g, 'Tokyo 2021', 'Semifinal 1', 57.86, '', rs) |
| var swim6 = insertSwim(g, 'Tokyo 2021', 'Final', 58.05, '🥉', rs) |
| var swim7 = insertSwim(g, 'Paris 2024', 'Final', 57.66, '🥈', rs) |
| var swim8 = insertSwim(g, 'Paris 2024', 'Relay leg1', 57.28, 'First', rs) |
| |
| var kmk = insertSwimmer(g, 'Kaylee McKeown', '🇦🇺') |
| var swim9 = insertSwim(g, 'Tokyo 2021', 'Heat 6', 57.88, 'First', kmk) |
| swim9.addEdge('supersedes', swim4) |
| swim5.addEdge('supersedes', swim9) |
| var swim10 = insertSwim(g, 'Tokyo 2021', 'Final', 57.47, '🥇', kmk) |
| swim10.addEdge('supersedes', swim5) |
| var swim11 = insertSwim(g, 'Paris 2024', 'Final', 57.33, '🥇', kmk) |
| swim11.addEdge('supersedes', swim10) |
| swim8.addEdge('supersedes', swim11) |
| |
| var kb = insertSwimmer(g, 'Katharine Berkoff', '🇺🇸') |
| var swim12 = insertSwim(g, 'Paris 2024', 'Final', 57.98, '🥉', kb) |
| ---- |
| |
| Note that we just entered the swims where medals were won or |
| where olympic records were broken. We could easily have added |
| more swimmers, other strokes and distances, relay events, |
| and even other sports if we wanted to. |
| |
| Let's have a look at what our graph now looks like: |
| |
| image:https://raw.githubusercontent.com/paulk-asert/groovy-graphdb/main/docs/images/BackstrokeRecords.png[network of swim and swimmer vertices and relationship edges] |
| |
| We now might want to query the graph in numerous ways. |
| For instance, what countries had success at the Paris 2024 olympics, |
| where success is defined, for the purposes of this query, as |
| winning a medal or breaking a record. Of course, just having |
| a swimmer make the olympic team is a great success - but let's |
| keep our example simple for now. |
| |
| [source,groovy] |
| ---- |
| var successInParis = g.V().out('swam').has('at', 'Paris 2024').in() |
| .values('country').toSet() |
| assert successInParis == ['🇺🇸', '🇦🇺'] as Set |
| ---- |
| |
| By way of explanation, we find all nodes with an outgoing `swam` edge |
| pointing to a swim that was at the Paris 2024 olympics, i.e. |
| all the swimmers from Paris 2024. We then find the set of countries |
| represented. We are using sets here to remove duplicates, and also |
| we aren't imposing an ordering on the returned results so we compare |
| sets on both sides. |
| |
| Similarly, we can find the olympic records set during heat swims: |
| |
| [source,groovy] |
| ---- |
| var recordSetInHeat = g.V().has('Swim','event', startingWith('Heat')).values('at').toSet() |
| assert recordSetInHeat == ['London 2012', 'Tokyo 2021'] as Set |
| ---- |
| |
| Or, we can find the times of the records set during finals: |
| |
| [source,groovy] |
| ---- |
| var recordTimesInFinals = g.V().has('event', 'Final').as('ev').out('supersedes') |
| .select('ev').values('time').toSet() |
| assert recordTimesInFinals == [57.47, 57.33] as Set |
| ---- |
| |
| Making use of the Groovy syntactic sugar gives simpler versions: |
| |
| [source,groovy] |
| ---- |
| var successInParis = g.V.out('swam').has('at', 'Paris 2024').in.country.toSet |
| assert successInParis == ['🇺🇸', '🇦🇺'] as Set |
| |
| var recordSetInHeat = g.V.has('Swim','event', startingWith('Heat')).at.toSet |
| assert recordSetInHeat == ['London 2012', 'Tokyo 2021'] as Set |
| |
| var recordTimesInFinals = g.V.has('event', 'Final').as('ev').out('supersedes').select('ev').time.toSet |
| assert recordTimesInFinals == [57.47, 57.33] as Set |
| ---- |
| |
| Groovy happens to be very good at allowing you to add syntactic sugar |
| for your own programs or existing classes. TinkerPop's special Groovy support |
| is just one example of this. Your vendor could certainly supply such a feature |
| for your favorite graph database (why not ask them?) but we'll look shortly at |
| how you could write such syntactic sugar yourself when we explore Neo4j. |
| |
| Our examples so far are all interesting, |
| but graph databases really excel when performing queries |
| involving multiple edge traversals. Let's look |
| at all the olympic records set in 2021 and 2024, |
| i.e. all records set after London 2012 (`swim1` from earlier): |
| |
| [source,groovy] |
| ---- |
| println "Olympic records after ${g.V(swim1).values('at', 'event').toList().join(' ')}: " |
| println g.V(swim1).repeat(in('supersedes')).as('sw').emit() |
| .values('at').concat(' ') |
| .concat(select('sw').values('event')).toList().join('\n') |
| ---- |
| |
| Or after using the Groovy syntactic sugar, the query becomes: |
| |
| [source,groovy] |
| ---- |
| println g.V(swim1).repeat(in('supersedes')).as('sw').emit |
| .at.concat(' ').concat(select('sw').event).toList.join('\n') |
| ---- |
| |
| Both have this output: |
| |
| ---- |
| Olympic records after London 2012 Heat 4: |
| Tokyo 2021 Heat 4 |
| Tokyo 2021 Heat 5 |
| Tokyo 2021 Heat 6 |
| Tokyo 2021 Semifinal 1 |
| Tokyo 2021 Final |
| Paris 2024 Final |
| Paris 2024 Relay leg1 |
| ---- |
| |
| NOTE: While not important for our examples, TinkerPop has a `GraphMLWriter` class which can write out our |
| graph in _GraphML_, which is how the earlier image of Graphs and Nodes was initially generated. |
| |
| == Neo4j |
| |
| Our next technology to examine is |
| https://neo4j.com/product/neo4j-graph-database/[neo4j]. Neo4j is a graph |
| database storing nodes and edges. Nodes and edges may have a label and properties (or attributes). |
| |
| image:https://dist.neo4j.com/wp-content/uploads/20230926084108/Logo_FullColor_RGB_TransBG.svg[neo4j logo,50%] |
| |
| Neo4j models edge relationships using enums. Let's create an enum for our example: |
| |
| [source,groovy] |
| ---- |
| enum SwimmingRelationships implements RelationshipType { |
| swam, supersedes, runnerup |
| } |
| ---- |
| |
| We'll use Neo4j in embedded mode and perform all of our operations |
| as part of a transaction: |
| |
| [source,groovy] |
| ---- |
| // ... set up managementService ... |
| var graphDb = managementService.database(DEFAULT_DATABASE_NAME) |
| |
| try (Transaction tx = graphDb.beginTx()) { |
| // ... other Neo4j code below here ... |
| } |
| ---- |
| |
| Let's create our nodes and edges using Neo4j. First the existing Olympic record: |
| |
| [source,groovy] |
| ---- |
| es = tx.createNode(label('Swimmer')) |
| es.setProperty('name', 'Emily Seebohm') |
| es.setProperty('country', '🇦🇺') |
| |
| swim1 = tx.createNode(label('Swim')) |
| swim1.setProperty('event', 'Heat 4') |
| swim1.setProperty('at', 'London 2012') |
| swim1.setProperty('result', 'First') |
| swim1.setProperty('time', 58.23d) |
| es.createRelationshipTo(swim1, swam) |
| |
| var name = es.getProperty('name') |
| var country = es.getProperty('country') |
| var at = swim1.getProperty('at') |
| var event = swim1.getProperty('event') |
| var time = swim1.getProperty('time') |
| println "$name from $country swam a time of $time in $event at the $at Olympics" |
| ---- |
| |
| While there is nothing wrong with this code, Groovy has many features for making code more succinct. |
| Let's use some dynamic metaprogramming to achieve just that. |
| |
| [source,groovy] |
| ---- |
| Node.metaClass { |
| propertyMissing { String name, val -> delegate.setProperty(name, val) } |
| propertyMissing { String name -> delegate.getProperty(name) } |
| methodMissing { String name, args -> |
| delegate.createRelationshipTo(args[0], SwimmingRelationships."$name") |
| } |
| } |
| ---- |
| |
| What does this do? The propertyMissing lines catch attempts to use Groovy's |
| normal property access and funnels then through appropriate `getProperty` and `setProperty` methods. |
| The methodMissing line means any attempted method calls that we don't recognize |
| are intended to be relationship creation, so we funnel them through the appropriate |
| `createRelationshipTo` method call. |
| |
| Now we can use normal Groovy property access for setting the node properties. |
| It looks much cleaner. |
| We define an edge relationship simply by calling a method having the relationship name. |
| |
| [source,groovy] |
| ---- |
| km = tx.createNode(label('Swimmer')) |
| km.name = 'Kylie Masse' |
| km.country = '🇨🇦' |
| ---- |
| |
| The code is already a little cleaner, but we can tweak the metaprogramming a little |
| more to get rid of the noise associated with the `label` method: |
| |
| [source,groovy] |
| ---- |
| Transaction.metaClass { |
| createNode { String labelName -> delegate.createNode(label(labelName)) } |
| } |
| ---- |
| |
| This adds an overload for `createNode` that takes a `String`, and |
| node creation is improved again, as we can see here: |
| |
| [source,groovy] |
| ---- |
| swim2 = tx.createNode('Swim') |
| swim2.time = 58.17d |
| swim2.result = 'First' |
| swim2.event = 'Heat 4' |
| swim2.at = 'Tokyo 2021' |
| km.swam(swim2) |
| swim2.supersedes(swim1) |
| |
| swim3 = tx.createNode('Swim') |
| swim3.time = 57.72d |
| swim3.result = '🥈' |
| swim3.event = 'Final' |
| swim3.at = 'Tokyo 2021' |
| km.swam(swim3) |
| ---- |
| |
| The code for relationships is certainly a lot cleaner too, |
| and it was quite a minimal amount of work to define the necessary metaprogramming. |
| |
| With a little bit more work, we could use static metaprogramming techniques. |
| This would give us better IDE completion. |
| We'll have more to say about improved type checking at the end of this post. |
| For now though, let's continue with defining the rest of our graph. |
| |
| We can redefine our `insertSwimmer` and `insertSwim` methods using Neo4j implementation |
| calls, and then our earlier code could be used to create our graph. Now let's |
| investigate what the queries look like. We'll start with querying via |
| the API. and later look at using Cypher. |
| |
| First, the successful countries in Paris 2024: |
| |
| [source,groovy] |
| ---- |
| var swimmers = [es, km, rs, kmk, kb] |
| var successInParis = swimmers.findAll { swimmer -> |
| swimmer.getRelationships(swam).any { run -> |
| run.getOtherNode(swimmer).at == 'Paris 2024' |
| } |
| } |
| assert successInParis*.country.unique() == ['🇺🇸', '🇦🇺'] |
| ---- |
| |
| Then, at which olympics were records broken in heats: |
| |
| [source,groovy] |
| ---- |
| var swims = [swim1, swim2, swim3, swim4, swim5, swim6, swim7, swim8, swim9, swim10, swim11, swim12] |
| var recordSetInHeat = swims.findAll { swim -> |
| swim.event.startsWith('Heat') |
| }*.at |
| assert recordSetInHeat.unique() == ['London 2012', 'Tokyo 2021'] |
| ---- |
| |
| Now, what were the times for records broken in finals: |
| |
| [source,groovy] |
| ---- |
| var recordTimesInFinals = swims.findAll { swim -> |
| swim.event == 'Final' && swim.hasRelationship(supersedes) |
| }*.time |
| assert recordTimesInFinals == [57.47d, 57.33d] |
| ---- |
| |
| To see traversal in action, Neo4j has a special API for doing such queries: |
| |
| [source,groovy] |
| ---- |
| var info = { s -> "$s.at $s.event" } |
| println "Olympic records following ${info(swim1)}:" |
| |
| for (Path p in tx.traversalDescription() |
| .breadthFirst() |
| .relationships(supersedes) |
| .evaluator(Evaluators.fromDepth(1)) |
| .uniqueness(Uniqueness.NONE) |
| .traverse(swim1)) { |
| println p.endNode().with(info) |
| } |
| ---- |
| |
| Earlier versions of Neo4j also supported Gremlin, so we could have written our queries in |
| the same was as we did for TinkerPop. That technology is deprecated in recent Neo4j versions, and instead |
| they now offer a Cypher query language. We can use that language for all of our previous queries |
| as shown here: |
| |
| [source,groovy] |
| ---- |
| assert tx.execute(''' |
| MATCH (s:Swim WHERE s.event STARTS WITH 'Heat') |
| WITH s.at as at |
| WITH DISTINCT at |
| RETURN at |
| ''')*.at == ['London 2012', 'Tokyo 2021'] |
| |
| assert tx.execute(''' |
| MATCH (s1:Swim {event: 'Final'})-[:supersedes]->(s2:Swim) |
| RETURN s1.time AS time |
| ''')*.time == [57.47d, 57.33d] |
| |
| tx.execute(''' |
| MATCH (s1:Swim)-[:supersedes]->{1,}(s2:Swim { at: $at }) |
| RETURN s1 |
| ''', [at: swim1.at])*.s1.each { s -> |
| println "$s.at $s.event" |
| } |
| ---- |
| |
| .An aside on graph design |
| **** |
| This blog post is definitely, not meant to be an advanced course on graph database |
| design, but it is worth noting a few points. |
| |
| Deciding which information should be stored as node properties and which as relationships |
| still requires developer judgement. For example, we could have added a Boolean `olympicRecord` |
| property to our `Swim` nodes. Certain queries might now become simpler, or at least more familiar |
| to traditional RDBMS SQL developers, but other queries might become much harder to write |
| and potentially much less efficient. |
| This is the kind of thing which needs to be thought through and sometimes experimented with. |
| |
| Suppose, in the case where a record is broken, we wanted to see which other swimmers |
| (in our case medallists in the final) also broke the previous record. |
| We could write a query to find this as follows: |
| |
| [source,groovy] |
| ---- |
| assert tx.execute(''' |
| MATCH (sr1:Swimmer)-[:swam]->(sm1:Swim {event: 'Final'}), (sm2:Swim {event: 'Final'})-[:supersedes]->(sm3:Swim) |
| WHERE sm1.at = sm2.at AND sm1 <> sm2 AND sm1.time < sm3.time |
| RETURN sr1.name as name |
| ''')*.name == ['Kylie Masse'] |
| ---- |
| |
| It's not too bad, but if we had a much larger graph of data, it could be quite slow. |
| We could instead opt to use an additional relationship, called `runnerup` in our graph. |
| |
| [source,groovy] |
| ---- |
| swim6.runnerup(swim3) |
| swim3.runnerup(swim10) |
| swim12.runnerup(swim7) |
| swim7.runnerup(swim11) |
| ---- |
| |
| The visualization is something like this: |
| |
| image:img/BackstrokeRecordsRunnerup.png[Additional runnerup relationship,60%] |
| |
| It essentially makes it easier to find the other medalists if we know any one of them. |
| |
| The resulting query becomes this: |
| |
| [source,groovy] |
| ---- |
| assert tx.execute(''' |
| MATCH (sr1:Swimmer)-[:swam]->(sm1:Swim {event: 'Final'})-[:runnerup]->{1,2}(sm2:Swim {event: 'Final'})-[:supersedes]->(sm3:Swim) |
| WHERE sm1.time < sm3.time |
| RETURN sr1.name as name |
| ''')*.name == ['Kylie Masse'] |
| ---- |
| |
| The _MATCH_ clause is similar in complexity, the _WHERE_ clause is much simpler. |
| The query is probably faster too, but it is a tradeoff that should be weighed up. |
| **** |
| |
| == Apache AGE |
| |
| The next technology we'll look at is the https://age.apache.org/[Apache AGEâ„¢] graph database. |
| Apache AGE leverages https://www.postgresql.org[PostgreSQL] for storage. |
| |
| image:https://age.apache.org/age-manual/master/_static/logo.png[Apache AGE logo, 40%] |
| image:https://age.apache.org/img/logo-large-postgresql.jpg[PostgreSQL logo] |
| |
| We installed Apache AGE via a Docker Image as outlined in the Apache AGE |
| https://age.apache.org/age-manual/master/intro/setup.html#installing-via-docker-image[manual]. |
| |
| Since Apache AGE offers a SQL-inspired graph database experience, we use Groovy's |
| SQL facilities to interact with the database: |
| |
| [source,groovy] |
| ---- |
| Sql.withInstance(DB_URL, USER, PASS, 'org.postgresql.jdbc.PgConnection') { sql -> |
| // enable Apache AGE extension, then use Sql connection ... |
| } |
| ---- |
| |
| For creating our nodes and subsequent querying, we use SQL statements |
| with embedded _cypher_ clauses. Here is the statement for creating |
| out nodes and edges: |
| |
| [source,groovy] |
| ---- |
| sql.execute''' |
| SELECT * FROM cypher('swimming_graph', $$ CREATE |
| (es:Swimmer {name: 'Emily Seebohm', country: '🇦🇺'}), |
| (swim1:Swim {event: 'Heat 4', result: 'First', time: 58.23, at: 'London 2012'}), |
| (es)-[:swam]->(swim1), |
| |
| (km:Swimmer {name: 'Kylie Masse', country: '🇨🇦'}), |
| (swim2:Swim {event: 'Heat 4', result: 'First', time: 58.17, at: 'Tokyo 2021'}), |
| (km)-[:swam]->(swim2), |
| (swim2)-[:supersedes]->(swim1), |
| (swim3:Swim {event: 'Final', result: '🥈', time: 57.72, at: 'Tokyo 2021'}), |
| (km)-[:swam]->(swim3), |
| |
| (rs:Swimmer {name: 'Regan Smith', country: '🇺🇸'}), |
| (swim4:Swim {event: 'Heat 5', result: 'First', time: 57.96, at: 'Tokyo 2021'}), |
| (rs)-[:swam]->(swim4), |
| (swim4)-[:supersedes]->(swim2), |
| (swim5:Swim {event: 'Semifinal 1', result: 'First', time: 57.86, at: 'Tokyo 2021'}), |
| (rs)-[:swam]->(swim5), |
| (swim6:Swim {event: 'Final', result: '🥉', time: 58.05, at: 'Tokyo 2021'}), |
| (rs)-[:swam]->(swim6), |
| (swim7:Swim {event: 'Final', result: '🥈', time: 57.66, at: 'Paris 2024'}), |
| (rs)-[:swam]->(swim7), |
| (swim8:Swim {event: 'Relay leg1', result: 'First', time: 57.28, at: 'Paris 2024'}), |
| (rs)-[:swam]->(swim8), |
| |
| (kmk:Swimmer {name: 'Kaylee McKeown', country: '🇦🇺'}), |
| (swim9:Swim {event: 'Heat 6', result: 'First', time: 57.88, at: 'Tokyo 2021'}), |
| (kmk)-[:swam]->(swim9), |
| (swim9)-[:supersedes]->(swim4), |
| (swim5)-[:supersedes]->(swim9), |
| (swim10:Swim {event: 'Final', result: '🥇', time: 57.47, at: 'Tokyo 2021'}), |
| (kmk)-[:swam]->(swim10), |
| (swim10)-[:supersedes]->(swim5), |
| (swim11:Swim {event: 'Final', result: '🥇', time: 57.33, at: 'Paris 2024'}), |
| (kmk)-[:swam]->(swim11), |
| (swim11)-[:supersedes]->(swim10), |
| (swim8)-[:supersedes]->(swim11), |
| |
| (kb:Swimmer {name: 'Katharine Berkoff', country: '🇺🇸'}), |
| (swim12:Swim {event: 'Final', result: '🥉', time: 57.98, at: 'Paris 2024'}), |
| (kb)-[:swam]->(swim12) |
| $$) AS (a agtype) |
| ''' |
| ---- |
| |
| To find which olympics where records were set in heats, we |
| can use the following _cypher_ query: |
| |
| [source,groovy] |
| ---- |
| assert sql.rows(''' |
| SELECT * from cypher('swimming_graph', $$ |
| MATCH (s:Swim) |
| WHERE left(s.event, 4) = 'Heat' |
| RETURN s |
| $$) AS (a agtype) |
| ''').a*.map*.get('properties')*.at.toUnique() == ['London 2012', 'Tokyo 2021'] |
| ---- |
| |
| The results come back in a special JSON-like data type called `agtype`. |
| From that, we can query the properties and return the `at` property. |
| We select the unique ones to remove duplicates. |
| |
| Similarly, we can find the times of olympic records set in finals |
| as follows: |
| |
| [source,groovy] |
| ---- |
| assert sql.rows(''' |
| SELECT * from cypher('swimming_graph', $$ |
| MATCH (s1:Swim {event: 'Final'})-[:supersedes]->(s2:Swim) |
| RETURN s1 |
| $$) AS (a agtype) |
| ''').a*.map*.get('properties')*.time == [57.47, 57.33] |
| ---- |
| |
| To print all the olympic records set across Tokyo 2021 and Paris 2024, |
| we can use `eachRow` and the following query: |
| |
| [source,groovy] |
| ---- |
| sql.eachRow(''' |
| SELECT * from cypher('swimming_graph', $$ |
| MATCH (s1:Swim)-[:supersedes]->(swim1) |
| RETURN s1 |
| $$) AS (a agtype) |
| ''') { |
| println it.a*.map*.get('properties')[0].with{ "$it.at $it.event" } |
| } |
| ---- |
| |
| The output looks like this: |
| |
| ---- |
| Tokyo 2021 Heat 4 |
| Tokyo 2021 Heat 5 |
| Tokyo 2021 Heat 6 |
| Tokyo 2021 Final |
| Tokyo 2021 Semifinal 1 |
| Paris 2024 Final |
| Paris 2024 Relay leg1 |
| ---- |
| |
| The Apache AGE project also maintains a viewer tool offering a web-based |
| user interface for visualization of graph data stored in our database. |
| Instructions for installation are available on the |
| https://github.com/apache/age-viewer[GitHub site]. |
| The tool allows visualization of the results from any query. |
| For our database, a query returning all nodes and edges creates |
| a visualization like below (we chose to manually re-arrange the nodes): |
| |
| image:img/age-viewer.png[] |
| |
| == OrientDB |
| |
| image:https://www.orientdb.com/images/orientdb_logo_mid.png[orientdb logo,50%] |
| |
| The next graph database we'll look at is https://orientdb.org/[OrientDB]. |
| We used the open source Community edition. We used it in embedded mode but there are |
| https://orientdb.org/docs/3.0.x/gettingstarted/Tutorial-Installation.html[instructions] |
| for running a docker image as well. |
| |
| The main claim to fame for OrientDB (and the closely related ArcadeDB we'll cover next) |
| is that they are multi-model databases, supporting graphs and documents |
| in the one database. |
| |
| Creating our database and setting up our vertex and edge classes (think mini-schema) |
| is done as follows: |
| |
| [source,groovy] |
| ---- |
| try (var db = context.open("swimming", "admin", "adminpwd")) { |
| db.createVertexClass('Swimmer') |
| db.createVertexClass('Swim') |
| db.createEdgeClass('swam') |
| db.createEdgeClass('supersedes') |
| // other code here |
| } |
| ---- |
| |
| See the https://github.com/paulk-asert/groovy-graphdb/tree/main/orientdb[GitHub repo] for further details. |
| |
| With initialization out fo the way, we can start defining our nodes and edges: |
| |
| [source,groovy] |
| ---- |
| var es = db.newVertex('Swimmer') |
| es.setProperty('name', 'Emily Seebohm') |
| es.setProperty('country', '🇦🇺') |
| var swim1 = db.newVertex('Swim') |
| swim1.setProperty('at', 'London 2012') |
| swim1.setProperty('result', 'First') |
| swim1.setProperty('event', 'Heat 4') |
| swim1.setProperty('time', 58.23) |
| es.addEdge(swim1, 'swam') |
| ---- |
| |
| We can print out the details as before: |
| |
| [source,groovy] |
| ---- |
| var (name, country) = ['name', 'country'].collect { es.getProperty(it) } |
| var (at, event, time) = ['at', 'event', 'time'].collect { swim1.getProperty(it) } |
| println "$name from $country swam a time of $time in $event at the $at Olympics" |
| ---- |
| |
| At this point, we could apply some Groovy metaprogramming to make the code more succinct, |
| but we'll just flesh out our `insertSwimmer` and `insertSwim` helper methods like before. |
| We can use these to enter the remaining swim information. |
| |
| Queries are performed using the Multi-Model API using SQL-like queries. |
| Our three queries we've seen earlier look like this: |
| |
| [source,groovy] |
| ---- |
| var results = db.query("SELECT expand(out('supersedes').in('supersedes')) FROM Swim WHERE event = 'Final'") |
| assert results*.getProperty('time').toSet() == [57.47, 57.33] as Set |
| |
| results = db.query("SELECT expand(out('supersedes')) FROM Swim WHERE event.left(4) = 'Heat'") |
| assert results*.getProperty('at').toSet() == ['Tokyo 2021', 'London 2012'] as Set |
| |
| results = db.query("SELECT country FROM ( SELECT expand(in('swam')) FROM Swim WHERE at = 'Paris 2024' )") |
| assert results*.getProperty('country').toSet() == ['🇺🇸', '🇦🇺'] as Set |
| ---- |
| |
| Traversal looks like this: |
| |
| [source,groovy] |
| ---- |
| results = db.query("TRAVERSE in('supersedes') FROM :swim", swim1) |
| results.each { |
| if (it.toElement() != swim1) { |
| println "${it.getProperty('at')} ${it.getProperty('event')}" |
| } |
| } |
| ---- |
| |
| OrientDB also supports Gremlin and a studio Web-UI. |
| Both of these features are very similar to the ArcadeDB counterparts. |
| We'll examine them next when we look at ArcadeDB. |
| |
| == ArcadeDB |
| |
| Now, we'll examine https://arcadedb.com/#getting-started[ArcadeDB]. |
| |
| image:https://arcadedb.com/assets/images/arcadedb-logo-mini.png[arcadedb logo] |
| |
| ArcadeDB is a rewrite/partial fork of OrientDB and carries over its Multi-Model nature. |
| We used it in embedded mode but there are |
| https://arcadedb.com/#getting-started[instructions] for running a docker image if you prefer. |
| |
| Not surprisingly, some usage of ArcadeDB is very similar to OrientDB. Initialization |
| changes slightly: |
| |
| [source,groovy] |
| ---- |
| var factory = new DatabaseFactory("swimming") |
| |
| try (var db = factory.create()) { |
| db.transaction { -> |
| db.schema.with { |
| createVertexType('Swimmer') |
| createVertexType('Swim') |
| createEdgeType('swam') |
| createEdgeType('supersedes') |
| } |
| // ... other code goes here ... |
| } |
| } |
| ---- |
| |
| Defining the existing record information is done as follows: |
| |
| [source,groovy] |
| ---- |
| var es = db.newVertex('Swimmer') |
| es.set(name: 'Emily Seebohm', country: '🇦🇺').save() |
| |
| var swim1 = db.newVertex('Swim') |
| swim1.set(at: 'London 2012', result: 'First', event: 'Heat 4', time: 58.23).save() |
| swim1.newEdge('swam', es, false).save() |
| ---- |
| |
| Accessing the information can be done like this: |
| |
| [source,groovy] |
| ---- |
| var (name, country) = ['name', 'country'].collect { es.get(it) } |
| var (at, event, time) = ['at', 'event', 'time'].collect { swim1.get(it) } |
| println "$name from $country swam a time of $time in $event at the $at Olympics" |
| ---- |
| |
| ArcadeDB supports multiple query languages. The SQL-like language mirrors the OrientDB offering. |
| Here are our three now familiar queries: |
| |
| [source,groovy] |
| ---- |
| var results = db.query('SQL', ''' |
| SELECT expand(outV()) FROM (SELECT expand(outE('supersedes')) FROM Swim WHERE event = 'Final') |
| ''') |
| assert results*.toMap().time.toSet() == [57.47, 57.33] as Set |
| |
| results = db.query('SQL', "SELECT expand(outV()) FROM (SELECT expand(outE('supersedes')) FROM Swim WHERE event.left(4) = 'Heat')") |
| assert results*.toMap().at.toSet() == ['Tokyo 2021', 'London 2012'] as Set |
| |
| results = db.query('SQL', "SELECT country FROM ( SELECT expand(out('swam')) FROM Swim WHERE at = 'Paris 2024' )") |
| assert results*.toMap().country.toSet() == ['🇺🇸', '🇦🇺'] as Set |
| ---- |
| |
| Here is our traversal example: |
| |
| [source,groovy] |
| ---- |
| results = db.query('SQL', "TRAVERSE out('supersedes') FROM :swim", swim1) |
| results.each { |
| if (it.toElement() != swim1) { |
| var props = it.toMap() |
| println "$props.at $props.event" |
| } |
| } |
| ---- |
| |
| ArcadeDB also supports Cypher queries (like Neo4j). The times for records in finals query |
| using the Cypher dialect looks like this: |
| |
| [source,groovy] |
| ---- |
| results = db.query('cypher', ''' |
| MATCH (s1:Swim {event: 'Final'})-[:supersedes]->(s2:Swim) |
| RETURN s1.time AS time |
| ''') |
| assert results*.toMap().time.toSet() == [57.47, 57.33] as Set |
| ---- |
| |
| ArcadeDB also supports Gremlin queries. The times for records in finals query |
| using the Gremlin dialect looks like this: |
| |
| [source,groovy] |
| ---- |
| results = db.query('gremlin', ''' |
| g.V().has('event', 'Final').as('ev').out('supersedes').select('ev').values('time') |
| ''') |
| assert results*.toMap().result.toSet() == [57.47, 57.33] as Set |
| ---- |
| |
| Rather than just passing a Gremlin query as a String, we can get full access to the TinkerPop environment |
| as this example show: |
| |
| [source,groovy] |
| ---- |
| try (final ArcadeGraph graph = ArcadeGraph.open("swimming")) { |
| var recordTimesInFinals = graph.traversal().V().has('event', 'Final').as('ev').out('supersedes') |
| .select('ev').values('time').toSet() |
| assert recordTimesInFinals == [57.47, 57.33] as Set |
| } |
| ---- |
| |
| ArcadeDB also supports a Studio Web-UI. Here is an example of using Studio |
| with a query that looks at all nodes and edges associated with the Tokyo 2021 olympics: |
| |
| image:img/ArcadeStudio.png[ArcadeStudio] |
| |
| |
| == TuGraph |
| |
| Next, we'll look at |
| https://tugraph.tech/[TuGraph]. |
| |
| image:https://mdn.alipayobjects.com/huamei_qcdryc/afts/img/A*AbamQ5lxv0IAAAAAAAAAAAAADgOBAQ/original[tugraph logo,width=40%] |
| |
| We used the Community Edition using a docker image as outlined in the |
| https://tugraph-db.readthedocs.io/en/latest/5.installation%26running/3.docker-deployment.html[documentation] and |
| https://blog.csdn.net/qq_35721299/article/details/128076604[here]. |
| TuGraph's claim to fame is high performance. Certainly, that isn't really |
| needed for this example, but let's have a play anyway. |
| |
| There are a few ways to talk to TuGraph. We'll use the recommended Neo4j |
| https://tugraph-db.readthedocs.io/en/latest/7.client-tools/5.bolt-client.html[Bolt client] |
| which uses the Bolt protocol to talk to the TuGraph server. |
| |
| We'll create a session using that client plus a helper `run` method to invoke our queries. |
| |
| [source,groovy] |
| ---- |
| var authToken = AuthTokens.basic("admin", "73@TuGraph") |
| var driver = GraphDatabase.driver("bolt://localhost:7687", authToken) |
| var session = driver.session(SessionConfig.forDatabase("default")) |
| var run = { String s -> session.run(s) } |
| ---- |
| |
| Next, we set up our database including providing a schema for our nodes, edges and properties. |
| One point of difference with earlier examples is that TuGraph needs a primary key for each vertex. |
| Hence, we added the `id` for our `Swim` vertex. |
| |
| [source,groovy] |
| ---- |
| ''' |
| CALL db.dropDB() |
| CALL db.createVertexLabel('Swimmer', 'name', 'name', 'STRING', false, 'country', 'STRING', false) |
| CALL db.createVertexLabel('Swim', 'id', 'id', 'INT32', false, 'event', 'STRING', false, 'result', 'STRING', false, 'at', 'STRING', false, 'time', 'FLOAT', false) |
| CALL db.createEdgeLabel('swam','[["Swimmer","Swim"]]') |
| CALL db.createEdgeLabel('supersedes','[["Swim","Swim"]]') |
| '''.trim().readLines().each{ run(it) } |
| ---- |
| |
| With these defined, we can create our swim information: |
| |
| [source,groovy] |
| ---- |
| run '''create |
| (es:Swimmer {name: 'Emily Seebohm', country: '🇦🇺'}), |
| (swim1:Swim {event: 'Heat 4', result: 'First', time: 58.23, at: 'London 2012', id:1}), |
| (es)-[:swam]->(swim1), |
| (km:Swimmer {name: 'Kylie Masse', country: '🇨🇦'}), |
| (swim2:Swim {event: 'Heat 4', result: 'First', time: 58.17, at: 'Tokyo 2021', id:2}), |
| (km)-[:swam]->(swim2), |
| (swim3:Swim {event: 'Final', result: '🥈', time: 57.72, at: 'Tokyo 2021', id:3}), |
| (km)-[:swam]->(swim3), |
| (swim2)-[:supersedes]->(swim1), |
| (rs:Swimmer {name: 'Regan Smith', country: '🇺🇸'}), |
| (swim4:Swim {event: 'Heat 5', result: 'First', time: 57.96, at: 'Tokyo 2021', id:4}), |
| (rs)-[:swam]->(swim4), |
| (swim5:Swim {event: 'Semifinal 1', result: 'First', time: 57.86, at: 'Tokyo 2021', id:5}), |
| (rs)-[:swam]->(swim5), |
| (swim6:Swim {event: 'Final', result: '🥉', time: 58.05, at: 'Tokyo 2021', id:6}), |
| (rs)-[:swam]->(swim6), |
| (swim7:Swim {event: 'Final', result: '🥈', time: 57.66, at: 'Paris 2024', id:7}), |
| (rs)-[:swam]->(swim7), |
| (swim8:Swim {event: 'Relay leg1', result: 'First', time: 57.28, at: 'Paris 2024', id:8}), |
| (rs)-[:swam]->(swim8), |
| (swim4)-[:supersedes]->(swim2), |
| (kmk:Swimmer {name: 'Kaylee McKeown', country: '🇦🇺'}), |
| (swim9:Swim {event: 'Heat 6', result: 'First', time: 57.88, at: 'Tokyo 2021', id:9}), |
| (kmk)-[:swam]->(swim9), |
| (swim9)-[:supersedes]->(swim4), |
| (swim5)-[:supersedes]->(swim9), |
| (swim10:Swim {event: 'Final', result: '🥇', time: 57.47, at: 'Tokyo 2021', id:10}), |
| (kmk)-[:swam]->(swim10), |
| (swim10)-[:supersedes]->(swim5), |
| (swim11:Swim {event: 'Final', result: '🥇', time: 57.33, at: 'Paris 2024', id:11}), |
| (kmk)-[:swam]->(swim11), |
| (swim11)-[:supersedes]->(swim10), |
| (swim8)-[:supersedes]->(swim11), |
| (kb:Swimmer {name: 'Katharine Berkoff', country: '🇺🇸'}), |
| (swim12:Swim {event: 'Final', result: '🥉', time: 57.98, at: 'Paris 2024', id:12}), |
| (kb)-[:swam]->(swim12) |
| ''' |
| ---- |
| |
| TuGraph uses Cypher style queries. Here are our three standard queries: |
| |
| [source,groovy] |
| ---- |
| assert run(''' |
| MATCH (sr:Swimmer)-[:swam]->(sm:Swim {at: 'Paris 2024'}) |
| RETURN DISTINCT sr.country AS country |
| ''')*.get('country')*.asString().toSet() == ['🇺🇸', '🇦🇺'] as Set |
| |
| assert run(''' |
| MATCH (s:Swim) |
| WHERE s.event STARTS WITH 'Heat' |
| RETURN DISTINCT s.at AS at |
| ''')*.get('at')*.asString().toSet() == ["London 2012", "Tokyo 2021"] as Set |
| |
| assert run(''' |
| MATCH (s1:Swim {event: 'Final'})-[:supersedes]->(s2:Swim) |
| RETURN s1.time as time |
| ''')*.get('time')*.asDouble().toSet() == [57.47d, 57.33d] as Set |
| ---- |
| |
| Here is our traversal query: |
| |
| [source,groovy] |
| ---- |
| run(''' |
| MATCH (s1:Swim)-[:supersedes*1..10]->(s2:Swim {at: 'London 2012'}) |
| RETURN s1.at as at, s1.event as event |
| ''')*.asMap().each{ println "$it.at $it.event" } |
| ---- |
| |
| == Apache HugeGraph |
| |
| Our final technology is Apache |
| https://hugegraph.apache.org/[HugeGraph]. |
| It is a project undergoing incubation at the ASF. |
| |
| image:https://www.apache.org/logos/res/hugegraph/hugegraph.png[hugegraph logo,50%] |
| |
| HugeGraph's claim to fame is the ability to support very large graph databases. |
| Again, not really needed for this example, but it should be fun to play with. |
| We used a docker image as described in the |
| https://hugegraph.apache.org/docs/quickstart/hugegraph-server/#31-use-docker-container-convenient-for-testdev[documentation]. |
| |
| Setup involved creating a client for talking to the server (running on the docker image): |
| |
| [source,groovy] |
| ---- |
| var client = HugeClient.builder("http://localhost:8080", "hugegraph").build() |
| ---- |
| |
| Next, we defined the schema for our graph database: |
| |
| [source,groovy] |
| ---- |
| var schema = client.schema() |
| schema.propertyKey("num").asInt().ifNotExist().create() |
| schema.propertyKey("name").asText().ifNotExist().create() |
| schema.propertyKey("country").asText().ifNotExist().create() |
| schema.propertyKey("at").asText().ifNotExist().create() |
| schema.propertyKey("event").asText().ifNotExist().create() |
| schema.propertyKey("result").asText().ifNotExist().create() |
| schema.propertyKey("time").asDouble().ifNotExist().create() |
| |
| schema.vertexLabel('Swimmer') |
| .properties('name', 'country') |
| .primaryKeys('name') |
| .ifNotExist() |
| .create() |
| |
| schema.vertexLabel('Swim') |
| .properties('num', 'at', 'event', 'result', 'time') |
| .primaryKeys('num') |
| .ifNotExist() |
| .create() |
| |
| schema.edgeLabel("swam") |
| .sourceLabel("Swimmer") |
| .targetLabel("Swim") |
| .ifNotExist() |
| .create() |
| |
| schema.edgeLabel("supersedes") |
| .sourceLabel("Swim") |
| .targetLabel("Swim") |
| .ifNotExist() |
| .create() |
| |
| schema.indexLabel("SwimByEvent") |
| .onV("Swim") |
| .by("event") |
| .secondary() |
| .ifNotExist() |
| .create() |
| |
| schema.indexLabel("SwimByAt") |
| .onV("Swim") |
| .by("at") |
| .secondary() |
| .ifNotExist() |
| .create() |
| ---- |
| |
| While, technically, HugeGraph supports composite keys, |
| it seemed to work better when the `Swim` vertex had a single primary key. |
| We used the `num` field just giving a number to each swim. |
| |
| We use the graph API used for creating nodes and edges: |
| |
| [source,groovy] |
| ---- |
| var g = client.graph() |
| |
| var es = g.addVertex(T.LABEL, 'Swimmer', 'name', 'Emily Seebohm', 'country', '🇦🇺') |
| var swim1 = g.addVertex(T.LABEL, 'Swim', 'at', 'London 2012', 'event', 'Heat 4', 'time', 58.23, 'result', 'First', 'num', NUM++) |
| es.addEdge('swam', swim1) |
| ---- |
| |
| Here is how to print out some node information: |
| |
| [source,groovy] |
| ---- |
| var (name, country) = ['name', 'country'].collect { es.property(it) } |
| var (at, event, time) = ['at', 'event', 'time'].collect { swim1.property(it) } |
| println "$name from $country swam a time of $time in $event at the $at Olympics" |
| ---- |
| |
| We now create the other swimmer and swim nodes and edges. |
| |
| Gremlin queries are invoked through a gremlin helper object. |
| Our three standard queries look like this: |
| |
| [source,groovy] |
| ---- |
| var gremlin = client.gremlin() |
| |
| var successInParis = gremlin.gremlin(''' |
| g.V().out('swam').has('Swim', 'at', 'Paris 2024').in().values('country').dedup().order() |
| ''').execute() |
| assert successInParis.data() == ['🇦🇺', '🇺🇸'] |
| |
| var recordSetInHeat = gremlin.gremlin(''' |
| g.V().hasLabel('Swim') |
| .filter { it.get().property('event').value().startsWith('Heat') } |
| .values('at').dedup().order() |
| ''').execute() |
| assert recordSetInHeat.data() == ['London 2012', 'Tokyo 2021'] |
| |
| var recordTimesInFinals = gremlin.gremlin(''' |
| g.V().has('Swim', 'event', 'Final').as('ev').out('supersedes').select('ev').values('time').order() |
| ''').execute() |
| assert recordTimesInFinals.data() == [57.33, 57.47] |
| ---- |
| |
| Here is our traversal example: |
| |
| [source,groovy] |
| ---- |
| println "Olympic records after ${swim1.properties().subMap(['at', 'event']).values().join(' ')}: " |
| gremlin.gremlin(''' |
| g.V().has('at', 'London 2012').repeat(__.in('supersedes')).emit().values('at', 'event') |
| ''').execute().data().collate(2).each { a, e -> |
| println "$a $e" |
| } |
| ---- |
| |
| == Static typing |
| |
| Another interesting topic is improving type checking for graph database code. |
| Groovy supports very dynamic styles of code through to "stronger-than-Java" type checking. |
| |
| Some graph database technologies offer only a schema-free experience |
| to allow your data models to _"adapt and change easily with your business"_. |
| Others allow a schema to be defined with varying degrees of information. |
| Groovy's dynamic capabilities make it particularly suited for writing code |
| that will work easily even if you change your data model on the fly. |
| However, if you prefer to add further type checking into your code, Groovy has |
| options for that too. |
| |
| Let's recap on what schema-like capabilities our examples made use of: |
| |
| * Apache TinkerPop: used dynamic vertex labels and edges |
| * Neo4j: used dynamic vertex labels but required edges to be defined by an enum |
| * Apache AGE: although not shown in this post, defined vertex labels, edges were dynamic |
| * OrientDB: defined vertex and edge classes |
| * ArcadeDB: defined vertex and edge types |
| * TuGraph: defined vertex and edge labels, vertex labels had typed properties, edge labels typed with from/to vertex labels |
| * Apache HugeGraph: defined vertex and edge labels, vertex labels had typed properties, edge labels typed with from/to vertex labels |
| |
| The good news about where we chose very dynamic options, we could easily add new |
| vertices and edges, e.g.: |
| |
| [source,groovy] |
| ---- |
| var mb = g.addV('Coach').property(name: 'Michael Bohl').next() |
| mb.coaches(kmk) |
| ---- |
| |
| For the examples which used schema-like capabilities, we'd need to declare the additional |
| vertex type `Coach` and edge `coaches` before we could define the new node and edge. |
| Let's explore just a few options where Groovy capabilities could make it easier to deal |
| with typing. |
| |
| We previously used `insertSwimmer` and `insertSwim` helper methods. We could supply types |
| for those parameters even where our underlying database technology wasn't using them. |
| That would at least capture typing errors when inserting information into our graph. |
| |
| We could use a richly-typed domain using Groovy classes or records. We could generate |
| the necessary method calls to create the schema/labels and then populate the database. |
| |
| Alternatively, we can leave the code in its dynamic form and make use of Groovy's |
| extensible type checking system. We could write an extension which |
| fails compilation if any invalid edge or vertex definitions were detected. |
| For our `coaches` example above, the previous line would pass compilation |
| but if had incorrect vertices for that edge relationship, compilation would fail, |
| e.g. for the statement `swim1.coaches(mb)`, we'd get the following error: |
| |
| ---- |
| [Static type checking] - Invalid edge - expected: <Coach>.coaches(<Swimmer>) |
| but found: <Swim>.coaches(<Coach>) |
| @ line 20, column 5. |
| swim1.coaches(mb) |
| ^ |
| |
| 1 error |
| ---- |
| |
| We won't show the code for this, it's in the GitHub repo. It is hard-coded to |
| know about the `coaches` relationship. Ideally, we'd combine extensible type checking |
| with the previously mentioned richly-typed model, and we could populate both the |
| information that our type checker needs and any label/schema information our |
| graph database would need. |
| |
| Anyway, these a just a few options Groovy gives you. Why not have fun trying out some |
| ideas yourself! |
| |
| .Update history |
| **** |
| *02/Sep/2024*: Initial version. + |
| *18/Sep/2024*: Updated for: latest Groovy 5 version, TuGraph 4.5.0 with thanks to Florian (GitHub: fanzhidongyzby) and Richard Bian (x: @RichSFO), TinkerPop tweaks with thanks to Stephen Mallette (ASF: spmallette). + |
| *11/Dec/2024*: Updated for: latest Groovy 5 version, TuGraph 4.5.1, HugeGraph 1.5.0, ArcadeDB 24.11.1, Gremlin 3.7.3, Neo4J 5.26.0, OrientDB 3.2.36. + |
| **** |