
Cognee + Memgraph: How To Build An Intelligent Knowledge Graph Using Hacker News Data
What if your app could understand Hacker News threads, not just read them?
With the rise of AI agents and intelligent systems, the ability to convert raw discussions into semantically searchable knowledge graphs has become a game changer.
This blog showcase how the all-new integration of Cognee, an AI memory engine, with Memgraph’s high-performance, in-memory graph database, can be used to convert Hacker News content into a live, queryable knowledge graph.
This demo’s aim is to combine natural language processing with graph storage to generate real-time semantic insights from one of the internet’s most dynamic tech communities.
What is Cognee?
Cognee is an open-source AI memory layer that enhances the accuracy of AI applications, ensuring accurate, context-aware responses.
This is done using cognitive search and graph-based knowledge representation. Unlike traditional text processing tools, Cognee leverages LLMs to break down natural language into structured concepts and relationships, enabling intelligent search and reasoning over textual data.
It’s used to power context-aware AI agents in a variety of use cases, including research tools, knowledge management systems and customer support automation.
Why This Integration Matters
By combining Cognee's AI-driven semantic processing with Memgraph's real-time graph database, you can:
- Turn unstructured text into structured graphs by automatically identifying entities, relationships, and key concepts.
- Store and query semantic knowledge graphs in Memgraph using Cypher, with support for fast traversal and real-time insights.
- Perform natural language querying by pairing LLM-generated graphs with Memgraph’s powerful property graph engine.
- Visualize evolving knowledge networks to uncover topic clusters, relationships, and trends.
- Process high-volume datasets in real time, making it possible to monitor, analyze, and update graphs continuously.
While this demo uses Hacker News as a sample dataset, the integration of Cognee and Memgraph is broadly useful in any context where you need to analyze and make sense of unstructured content at scale.
Memgraph and Cognee together offer the perfect foundation for building graph-powered, semantically aware AI applications.
Why Hacker News?
Hacker News offers a rich stream of real-time content. With its mix of structured metadata (posts, comments, users) and unstructured insights (opinions, debates, trends), it’s a perfect candidate for building a semantically searchable knowledge graph.
Transforming Hacker News data into a graph lets us:
- Track trending topics and technologies
- Identify conversation clusters
- Map conceptual relationships over time
All accessible via natural language queries.
Project Overview: Hacker News Knowledge Graph
In this tutorial, we'll help you build a complete pipeline that:
- Extracts data from the Hacker News using its public API (posts, comments, users, etc.)
- Processes text content using Cognee's LLM capabilities
- Stores the knowledge graph in Memgraph
- Enables semantic search over the entire dataset
- Provides visualization tools for exploration
Prerequisites
Before we begin, ensure you have:
- Docker: For running Memgraph
- Python 3.10+: For our pipeline
- OpenAI API Key: For LLM processing
- Basic knowledge of Python and graph databases
Environment Setup
1. Install Memgraph
The easiest way to run Memgraph is using Docker:
# For Linux/macOS:
curl https://install.memgraph.com | sh
# For Windows:
iwr https://windows.memgraph.com | iex
This launches Memgraph in your browser at localhost:3000
or via the desktop version of Memgraph Lab.
2. Install Dependencies
# Initialize your project
poetry init
poetry add cognee dlt requests python-dateutil neo4j
# Or using pip
pip install cognee dlt requests python-dateutil neo4j
3. Environment Configuration
Create a .env
file in your project root:
# LLM Configuration
LLM_API_KEY=sk-your-openai-api-key
LLM_MODEL=openai/gpt-4o-mini
LLM_PROVIDER=openai
EMBEDDING_PROVIDER=openai
EMBEDDING_MODEL=openai/text-embedding-3-large
# Memgraph Configuration
GRAPH_DATABASE_PROVIDER=memgraph
GRAPH_DATABASE_URL=bolt://localhost:7687
GRAPH_DATABASE_USERNAME=""
GRAPH_DATABASE_PASSWORD=""
# Hacker News API
HN_API_BASE=https://hacker-news.firebaseio.com/v0
Building the Pipeline
Step 1: Data Extraction from Hacker News
Our pipeline starts by extracting data from the Hacker News API:
import dlt
import requests
from typing import Iterator, Dict, Any
from datetime import datetime
import time
HN_API_BASE = "<https://hacker-news.firebaseio.com/v0>"
@dlt.resource(table_name="posts", write_disposition="merge", primary_key="id")
def get_posts_incremental(
updated_at=dlt.sources.incremental("time", initial_value=0)
) -> Iterator[Dict[str, Any]]:
"""Extract posts from Hacker News API with incremental loading"""
# Get latest stories
top_stories_response = requests.get(f"{HN_API_BASE}/topstories.json")
top_story_ids = top_stories_response.json()
new_stories_response = requests.get(f"{HN_API_BASE}/newstories.json")
new_story_ids = new_stories_response.json()
all_story_ids = list(set(top_story_ids + new_story_ids))
for story_id in all_story_ids:
try:
item_response = requests.get(f"{HN_API_BASE}/item/{story_id}.json")
if item_response.status_code == 200:
item = item_response.json()
if item and item.get('type') == 'story':
item_time = item.get('time', 0)
if item_time > updated_at.last_value:
# Prepare data for Cognee processing
item['created_at'] = datetime.fromtimestamp(item['time'])
item['extracted_at'] = datetime.now()
item['content_for_cognee'] = f"Title: {item.get('title', '')}. {item.get('text', '')}"
yield item
time.sleep(0.1) # Rate limiting
except Exception as e:
print(f"Error fetching story {story_id}: {e}")
continue
Step 2: Cognee Integration for Knowledge Graph Generation
Now, you may integrate Cognee to process the extracted text and build your knowledge graph:
import cognee
import asyncio
from dotenv import load_dotenv
load_dotenv()
class CogneeMemgraphProcessor:
def __init__(self):
pass
async def process_posts(self, posts_data):
"""Process posts through Cognee to build knowledge graph"""
for post in posts_data:
try:
# Add post content to Cognee
content = post.get('content_for_cognee', '')
if content:
await cognee.add(content, dataet_name="hackernews_posts")
# Add metadata as structured data
metadata = {
"post_id": post.get('id'),
"author": post.get('by'),
"score": post.get('score', 0),
"url": post.get('url'),
"created_at": post.get('created_at')
}
await cognee.add(metadata, dataset_name="hackernews_metadata")
except Exception as e:
print(f"Error processing post {post.get('id')}: {e}")
# Build the knowledge graph
print("Building knowledge graph with Cognee...")
await cognee.cognify()
print("Knowledge graph construction completed!")
async def search_knowledge_graph(self, query: str):
"""Perform semantic search on the knowledge graph"""
results = await cognee.search(query_text=query)
return results
Visualizing Your Knowledge Graph
Once your pipeline runs successfully, you can explore the generated knowledge graph in Memgraph's web interface:
1. Access Memgraph Lab
Navigate to http://localhost:3000
in your browser to access Memgraph Lab.
2. Explore the Graph
Use Cypher queries to explore your knowledge graph:
-- View the entire graph structure
MATCH p=()-[]-() RETURN p LIMIT 100;
-- Find all entities related to "AI" or "artificial intelligence"
MATCH (n)
WHERE n.name CONTAINS "AI" OR n.name CONTAINS "artificial intelligence"
RETURN n;
-- Discover relationships between programming languages
MATCH (lang1)-[r]-(lang2)
WHERE lang1.type = "programming_language" AND lang2.type = "programming_language"
RETURN lang1, r, lang2;
-- Find the most connected entities (high centrality)
MATCH (n)-[r]-()
RETURN n.name, count(r) as connections
ORDER BY connections DESC
LIMIT 10;
Conclusion
This integration demonstrates how fast-moving online discussions, like those on Hacker News, can be transformed into queryable knowledge graphs using Cognee and Memgraph.
By combining the Hacker News API with AI-based semantic understanding and high-performance graph database technology, you’ve created a system that can:
- Automatically understand the semantic content of discussions
- Discover hidden relationships between concepts and entities
- Enable smart search that goes beyond keyword matching
- Provide visual insights through graph exploration
- Scale to handle large volumes of real-time data
The future of knowledge management lies in systems that can think, reason, and discover insights the way humans. With this integration, you’re one step closer to that reality!
Further Reading
- Docs: Cognee Integration
- Webinar: How NASA is Using Graph Technology and LLMs to Build a People Knowledge Graph
- Blog: Why Knowledge Graphs Are the Ideal Structure for LLM Personalization
- Webinar: Cedars-Sinai: Using Graph Databases for Knowledge-Aware Automated Machine Learning
- Blog: LLM Limitations: Why Can’t You Query Your Enterprise Knowledge with Just an LLM?
- Webinar: Microchip Optimizes LLM Chatbot with RAG and a Knowledge Graph
- Webinar: Vector Search in Memgraph: Turn Unstructured Text into Queryable Knowledge