Breaking the Limits of Traditional Cyber Threat Detection with Memgraph

By Josip Mrden

9 min readFebruary 1, 2023

Using a relational database on the network-like representation of data isn’t the most performant choice if the number of data points in your system is counted in millions. Using tabular form to represent data is useful in many use cases, unfortunately, not for multiple chains of events that form an attack - from its source (downloaded from an e-mail or via URL, transferred on a USB stick, etc.) to its target (execution of the .exe file, which can be a trojan horse, virus, ransomware, etc.).

In this article, we’ll dive a little deeper into why Memgraph is a great choice for maintaining your cybersecurity solutions. It’s more performant, scalable, intuitive and easy to use when compared to other graph database vendors.

Stop wasting time waiting for query results

Analyzing chains of events by hopping between data points has a huge impact on database performance. The more hops on the security dataset, the more joins a relational database will perform. Graph databases mitigated the issue with a graph-like data representation, suitable for exploring interactions between entities.

However, some solutions are more performant than others, and some architectural decisions from the past (like choosing a programming language, storage engine, running the database on a single machine vs. multiple machines) may come back to haunt you in the years that follow.

In Memgraph, we have strived to implement a solution that was founded on performance and therefore chose to implement our database in C++ and in memory, to ensure performant queries and a low-memory footprint. Multi-hop solutions and joins are no longer a problem, as Memgraph performs quite well when tested against other databases, and it’s continuously being improved.

breaking-the-limits-of-traditional-cyber-threat-detection-with-memgraph

The end result is a database core that executed queries that hop throughout the file structure, ZIPs, malware, and emails, with ease and without the endless waiting on results. Making sure your threat analysts can work efficiently is one of the crucial factors in making your system secure.

Explore your cyber security graph storage and answer your deepest questions from the data

Relational databases and tabular form is good for aggregation, while wide column databases transpose rows of data into columns, and add compression algorithms to reduce its size. But they all fall short when data needs to be connected and JOINed.

When highly connected data is stored as a graph, data is JOIN on the fly, during ingestion. Cypher query language, Memgraph uses as well, enables the ingestion and manipulation of the graph storage, optimizing the high number of hops inside the data network.

In cybersecurity, that would correlate with tracking down a cyber attack from its source to its target, without compromising performance. Moreover, with Cypher, everything is intuitively accessible even without knowing the graph schema thanks to pathfinding algorithms. They allow for greater expressibility and simpler queries than using explicitly a number of JOINs in order to get to the target resource in the database.

Let’s look at the following 2 queries which will give us a little bit of an overview of how much simpler it is to query graphs when doing path traversals.

If you are wondering about dependencies in your code that have vulnerabilities, you can run the following query in Cypher:

MATCH p=(n {name: “root”})-[*]-(m)-->(v:Vulnerability) RETURN m, v;

The SQL query would look something like this:

WITH RECURSIVE dependencies_table AS (
SELECT dp.name, dp.dependency FROM dependencies dp WHERE dp.name = “root”
UNION
SELECT dp2.name FROM dependencies dp2
JOIN dependencies_table dp ON dp2.name = dp.dependency
)
SELECT *
FROM dependencies_table dp
JOIN vulnerabilities v
ON v.package_name = dp.name;

The query used in Cypher is better for multiple reasons:

Query performance - In-memory storage with a minimal memory footprint optimizes the latency, throughput and memory usage of the query.
Query simplicity - Cypher is designed to find a starting point in the graph and traverse through all the other connected nodes, which makes writing queries more intuitive which is evident from the length of queries as well
No recursion needed - Cypher automatically “does the recursion” by traversing already connected nodes
Schema is optional - in a strict schema solution, you should know all the connected entities on the path from the source data point to the target data point. Without schema, you don’t have to know what’s in the network and still have a smooth solution and queries that will provide correct results.

But graph databases are more than just appropriate storage. They allow the formation of all kinds of questions and queries that will upgrade the cyber security of your company.

Let’s look at some of those questions and see how Memgraph can provide an answer:

What are similar patterns and courses of action that attackers keep on repeating when executing an attack?

MATCH p=(n)-[r *bfs]->(m)
WITH nodes(p) AS nodes
WITH extract(n IN nodes | labels(n)[0]) AS labels
RETURN labels;

What IP address is an origin of a particular trojan horse on my computer?

MATCH p=(i:IP_ADDRESS)-[* bfs]->(ce:CODE_EXECUTION {id:1}) RETURN i;

Are there any non-malicious files or compressed images that are regularly exchanged, making it an easy target for a potential threat?

MATCH (n)
RETURN degree(n)
LIMIT 10;

CALL betweenness_centrality.set()
YIELD node, betweenness_centrality
RETURN node, betweenness_centrality
ORDER BY betweenness_centrality DESC

The last query for finding influential files in the system (like common images or some possible non-harmful elements that can serve as a heuristic for a detailed overview of the actions of a potential attacker) used betweenness centrality which is a centrality algorithm coming from graph theory. It means that any algorithm suitable to be presented for graph storage can be leveraged as a part of a graph database.

Memgraph developed its own graph algorithm library called MAGE (Memgraph Advanced Graph Extensions), which can be used to efficiently extract meaningful insights from your data. In cybersecurity, its purpose would be to acquire even more hidden knowledge about the connected threats in the company. Graph algorithms like PageRank, and ML algorithms like node classification, provide an advanced technique to explore the graph space and increase pattern recognition abilities for your analysts to make better decisions.

breaking-the-limits-of-traditional-cyber-threat-detection-with-memgraph-mage

One additional possibility is the ability to extend the library with your own graph analytics over network data storage. Some data points can be analyzed with general algorithms, but when things get custom, the expressibility of MAGE makes it simple to add new modules in different programming languages (C++, C, Rust, Python).

Display your data in the most optimal way to make better business decisions

When it comes to analyzing sequences of attacks, they are mostly formed as directed acyclic graphs or chains of events that lead to a vulnerability exploit. Even if we did get an output from a relational database, we wouldn’t be able to conclude a thing with a tabular display of query results. For interactions between data, results need to add structure in the form of a graph, and a network-like visualization would be a perfect choice for rendering subgraphs and showing end-users how a cyber attack was executed, or what potential threats are in a code project.

Memgraph offers its own graph visualization tool, Memgraph Lab. It was designed to increase the visibility of your system and display query results in a connected-data form, which makes tracking of threat actions leading towards data compromise as easy as possible.

breaking-the-limits-of-traditional-cyber-threat-detection-with-memgraph-lab

Screenshot below was taken in Megmgraph Lab, and it shows how it can help efficiently analyze retrieved results. breaking-the-limits-of-traditional-cyber-threat-detection-with-memgraph-lab-2

3-in-1 tool for lower cost and maintenance

With a graph database as its core product, a visualization tool of Memgraph Lab and the extendable graph algorithm library MAGE, Memgraph offers a complete 3-in-1 solution to analyze cybersecurity use cases.

Coupling the three things in one tool boosts efficiency while minimizing maintenance costs, saving you the time you would waste connecting and fixing incompatible and highly specialized applications. It also adds up to the user experience, as every tool and feature you might need on your graph storage is in one place without the need to switch between views and run custom personal connecting scripts.

Conclusion

To sum up, we have seen that, since cybersecurity use cases tend to behave like networks in the dataset, Memgraph arrives as a perfect solution to tackle threats in your system. You might not see the difference at first when the number of data points is small, and every tool or every databases does its job quickly and without bottlenecks. However, if your dataset size is growing day by day at a rapid pace, Memgraph will prove as a performant solution, regardless of your data size.