Horizontally Scalable Distributed Compute
Graph workloads can quickly overwhelm a single database instance. A fraud detection query traversing billions of transactions, a recommendation engine computing similarities across millions of users, or an analytics pipeline calculating PageRank on a web-scale graph—these operations demand more memory, CPU, and I/O than any one node can provide.
MemGQL solves this by partitioning queries across a cluster of workers, materializing intermediate results in distributed storage, and aggregating final results at the coordinator. This allows horizontal scaling: add more nodes to handle larger graphs or faster query throughput.
The Problem
Single-node graph databases hit three fundamental limits:
-
Memory Constraints: Large graphs exceed RAM, forcing slow disk-based operations or requiring expensive high-memory instances.
-
CPU Bottlenecks: Graph algorithms like community detection, centrality measures, and pathfinding are computationally intensive. A single CPU cannot exploit the parallelism inherent in graph structure.
-
Query Latency: Complex multi-hop traversals on large datasets can take minutes or hours on a single node, making interactive analysis impossible.
Traditional approaches—sharding the graph across nodes or replicating the entire dataset—introduce their own problems. Sharding requires knowing the query pattern upfront to avoid cross-node chatter. Replication is expensive and doesn’t help with write-heavy workloads.
The Solution
MemGQL takes a query-centric approach to distribution. Rather than partitioning the data permanently, it partitions the work of executing a query across a pool of stateless workers.