Why invest in a native graph database?

5 min readFeb 11, 2024

Disclaimer: I spend a lot of my time working for Neo4j. This post is however my own opinion and does not reflect that of Neo4j (or maybe it does but I didn’t check what the Neo4j opinion is).

Nativeness

Question — What is a native graph database?
Answer — It is a database that is implemented as a graph (a connected structure) both in memory and when persisted. The translation step between how it’s used and how it’s stored is minimal.

Yggdrasil. Tree of life. As above, as below.

Let nobody tell you graph databases are a recent invention. They are almost as old as the concept of database itself. I worked as an IDMS administrator on mainframe in the early 1990s. IDMS is a network-oriented (= graph) database. They can be traced back to the CODASYL workgroup and the year 1969. That’s the year we first put a man on the moon! That wasn’t me though, I am not that old.

Use case

Question — What is the quintessential graph use case?
Answer — Reacting in real time on a complex problem where connections matter.

That’s it. No more, no less. That’s quite a lot though:

Recommendations. Given the attention span of the average customer, you better be fast and accurate.
Fraud detection. Claiming things back after the fact is a tedious process, you want to act in the moment or even prevent it from happening at all.
Routing and rerouting. While the transport is moving. No room for error and no time either.

Just three lines — and I’m sure I’m missing a few generic use cases — but they represent hundreds if not thousands of specific use cases. Allow me to walk you through one example.

I can work out a traveling salesman problem with a pencil and a piece of paper. Given enough time I don’t even need a calculator let alone a computer with a database running on top of it.

And now we have our salesman sipping happily on his complimentary drink in the first class flight on one of the journey’s legs. All is well.

Except for that freak snow storm that has just moved in from over the Atlantic and is now covering all of the New York and surrounding airports. All possible flight plans are — quite literally — up in the air.

If the salesman had to wait for me working out the solution as I did before that drink would be his last. That plane needs rerouting now. The number of variables and possible paths is enormous. Several of them will end in disaster.

Analytics

Question — What about analytical use cases then?
Answer — There are analytical use cases that have a graphy nature. Algorithms such as Louvain, Pagerank and many many others are based on graph theory and benefit from being able to solve them on a native graph database. But given enough time you can also solve them with other means.

This goes back to what Turing completeness means. I’m often asked if something else running on top of something else could do the same as I’m doing with some Cypher query on top of a native graph database. Yes. Of course. How much time do you have? I can write you a mainframe PL/I program that does it, PL/I is Turing complete. Is that what you want?

robot with pencil representing a Turing machine

To me the main benefit of running analytical graph use cases on a native graph database is the feedback loop. You can write the results straight into your operational graph, enhancing it with new information! A recommendation based on what others have bought is fast but if I know through analytics who is similar to you not only is the recommendation even faster, it will also be more accurate.

Spatial — Full Text — Vector

Question — Is a vector index going to bring in Skynet?
Answer — No.

An efficient graph query can be broken down in two parts:

Find your starting points
Walk pointers from there

This is super efficient as computers and pointers are a match made in a computer manufacturing plant but it stands or falls with the first part. How quickly can you find your starting points? How relevant are they? It’s always about indexes:

A label in a property graph is the most basic index you can imagine, it classifies the nodes. Find me all the Actor nodes.
A regular index combines a label with a property value. Find me the Actor with name Tom Hanks.
A spatial index finds neighbouring nodes based on coordinates. Find me actors that were born near where Tom Hanks was born (37.9775° N, 122.0312° W).
A full text index can find nodes based on exact information. Given we’ve indexed the bio of Tom Hanks, it could find him with — for example — the query oscar best actor 1995.
A vector index is not unlike a spatial index. It finds neighbouring nodes. The coordinates are not latitude and longitude however, it’s an embedding created from unstructured text (the same thing that feeds a full text index). It has way more context than a full text index but may struggle with exact information. It might find Bill Murray, Robin Williams, Jimmy Stewart and Jeff Bridges to be like Tom Hanks.

ai powered robot singing “anything you can do, I can do better”

At the end of the day however, these find you starting points. If there’s nothing connected to the results or if the query isn’t complex, there is not really a point in using a native graph database.

Conclusion

If you have one of the thousands of use cases where a native graph database can help, invest in one.

Finding a use case is not hard. Almost every organisation on the planet has an IT network. Often a complex one. We already established that network and graph are synomyms. While these networks may not be the core business of said organisation, the organisation often depends on them.
Use case found.

By all means then extend it to your core business. Add some analytics to the side. Sprinkle with a touch of AI. Bon appetit!