Cognee + Memgraph: How To Build An Intelligent Knowledge Graph Using Hacker News Data

By Sabika Tasneem

12 min readJune 5, 2025

What if your app could understand Hacker News threads, not just read them?

With the rise of AI agents and intelligent systems, the ability to convert raw discussions into semantically searchable knowledge graphs has become a game changer.

This blog showcase how the all-new integration of Cognee, an AI memory engine, with Memgraph’s high-performance, in-memory graph database, can be used to convert Hacker News content into a live, queryable knowledge graph.

This demo’s aim is to combine natural language processing with graph storage to generate real-time semantic insights from one of the internet’s most dynamic tech communities.

What is Cognee?

Cognee is an open-source AI memory layer that enhances the accuracy of AI applications, ensuring accurate, context-aware responses.

This is done using cognitive search and graph-based knowledge representation. Unlike traditional text processing tools, Cognee leverages LLMs to break down natural language into structured concepts and relationships, enabling intelligent search and reasoning over textual data.

It’s used to power context-aware AI agents in a variety of use cases, including research tools, knowledge management systems and customer support automation.

Why This Integration Matters

By combining Cognee's AI-driven semantic processing with Memgraph's real-time graph database, you can:

Turn unstructured text into structured graphs by automatically identifying entities, relationships, and key concepts.
Store and query semantic knowledge graphs in Memgraph using Cypher, with support for fast traversal and real-time insights.
Perform natural language querying by pairing LLM-generated graphs with Memgraph’s powerful property graph engine.
Visualize evolving knowledge networks to uncover topic clusters, relationships, and trends.
Process high-volume datasets in real time, making it possible to monitor, analyze, and update graphs continuously.

While this demo uses Hacker News as a sample dataset, the integration of Cognee and Memgraph is broadly useful in any context where you need to analyze and make sense of unstructured content at scale.

Memgraph and Cognee together offer the perfect foundation for building graph-powered, semantically aware AI applications.

Why Hacker News?

Hacker News offers a rich stream of real-time content. With its mix of structured metadata (posts, comments, users) and unstructured insights (opinions, debates, trends), it’s a perfect candidate for building a semantically searchable knowledge graph.

Transforming Hacker News data into a graph lets us:

Track trending topics and technologies
Identify conversation clusters
Map conceptual relationships over time

All accessible via natural language queries.

Project Overview: Hacker News Knowledge Graph

In this tutorial, we'll help you build a complete pipeline that:

Extracts data from the Hacker News using its public API (posts, comments, users, etc.)
Processes text content using Cognee's LLM capabilities
Stores the knowledge graph in Memgraph
Enables semantic search over the entire dataset
Provides visualization tools for exploration

Prerequisites

Before we begin, ensure you have:

Docker: For running Memgraph
Python 3.10+: For our pipeline
OpenAI API Key: For LLM processing
Basic knowledge of Python and graph databases

Environment Setup

1. Install Memgraph

The easiest way to run Memgraph is using Docker:

# For Linux/macOS:
curl https://install.memgraph.com | sh
 
# For Windows:
iwr https://windows.memgraph.com | iex

This launches Memgraph in your browser at localhost:3000 or via the desktop version of Memgraph Lab.

2. Install Dependencies

# Initialize your project
poetry init
poetry add cognee dlt requests python-dateutil neo4j
 
# Or using pip
pip install cognee dlt requests python-dateutil neo4j

3. Environment Configuration

Create a .env file in your project root:

# LLM Configuration
LLM_API_KEY=sk-your-openai-api-key
LLM_MODEL=openai/gpt-4o-mini
LLM_PROVIDER=openai
EMBEDDING_PROVIDER=openai
EMBEDDING_MODEL=openai/text-embedding-3-large

# Memgraph Configuration
GRAPH_DATABASE_PROVIDER=memgraph
GRAPH_DATABASE_URL=bolt://localhost:7687
GRAPH_DATABASE_USERNAME=""
GRAPH_DATABASE_PASSWORD=""

# Hacker News API
HN_API_BASE=https://hacker-news.firebaseio.com/v0

Building the Pipeline

Step 1: Data Extraction from Hacker News

Our pipeline starts by extracting data from the Hacker News API:

import dlt
import requests
from typing import Iterator, Dict, Any
from datetime import datetime
import time
 
HN_API_BASE = "<https://hacker-news.firebaseio.com/v0>"
 
@dlt.resource(table_name="posts", write_disposition="merge", primary_key="id")
def get_posts_incremental(
    updated_at=dlt.sources.incremental("time", initial_value=0)
) -> Iterator[Dict[str, Any]]:
    """Extract posts from Hacker News API with incremental loading"""
 
    # Get latest stories
    top_stories_response = requests.get(f"{HN_API_BASE}/topstories.json")
    top_story_ids = top_stories_response.json()
 
    new_stories_response = requests.get(f"{HN_API_BASE}/newstories.json")
    new_story_ids = new_stories_response.json()
 
    all_story_ids = list(set(top_story_ids + new_story_ids))
 
    for story_id in all_story_ids:
        try:
            item_response = requests.get(f"{HN_API_BASE}/item/{story_id}.json")
            if item_response.status_code == 200:
                item = item_response.json()
 
                if item and item.get('type') == 'story':
                    item_time = item.get('time', 0)
                    if item_time > updated_at.last_value:
                        # Prepare data for Cognee processing
                        item['created_at'] = datetime.fromtimestamp(item['time'])
                        item['extracted_at'] = datetime.now()
                        item['content_for_cognee'] = f"Title: {item.get('title', '')}. {item.get('text', '')}"
 
                        yield item
 
            time.sleep(0.1)  # Rate limiting
 
        except Exception as e:
            print(f"Error fetching story {story_id}: {e}")
            continue

Step 2: Cognee Integration for Knowledge Graph Generation

Now, you may integrate Cognee to process the extracted text and build your knowledge graph:

import cognee
import asyncio
from dotenv import load_dotenv
 
load_dotenv()
 
class CogneeMemgraphProcessor:
    def __init__(self):
        pass
 
    async def process_posts(self, posts_data):
        """Process posts through Cognee to build knowledge graph"""
 
        for post in posts_data:
            try:
                # Add post content to Cognee
                content = post.get('content_for_cognee', '')
                if content:
                    await cognee.add(content, dataet_name="hackernews_posts")
 
                # Add metadata as structured data
                metadata = {
                    "post_id": post.get('id'),
                    "author": post.get('by'),
                    "score": post.get('score', 0),
                    "url": post.get('url'),
                    "created_at": post.get('created_at')
                }
                await cognee.add(metadata, dataset_name="hackernews_metadata")
 
            except Exception as e:
                print(f"Error processing post {post.get('id')}: {e}")
 
        # Build the knowledge graph
        print("Building knowledge graph with Cognee...")
        await cognee.cognify()
        print("Knowledge graph construction completed!")
 
    async def search_knowledge_graph(self, query: str):
        """Perform semantic search on the knowledge graph"""
        results = await cognee.search(query_text=query)
        return results

Visualizing Your Knowledge Graph

Once your pipeline runs successfully, you can explore the generated knowledge graph in Memgraph's web interface:

1. Access Memgraph Lab

Navigate to http://localhost:3000 in your browser to access Memgraph Lab.

2. Explore the Graph

Use Cypher queries to explore your knowledge graph:

-- View the entire graph structure
MATCH p=()-[]-() RETURN p LIMIT 100;

-- Find all entities related to "AI" or "artificial intelligence"
MATCH (n)
WHERE n.name CONTAINS "AI" OR n.name CONTAINS "artificial intelligence"
RETURN n;

-- Discover relationships between programming languages
MATCH (lang1)-[r]-(lang2)
WHERE lang1.type = "programming_language" AND lang2.type = "programming_language"
RETURN lang1, r, lang2;

-- Find the most connected entities (high centrality)
MATCH (n)-[r]-()
RETURN n.name, count(r) as connections
ORDER BY connections DESC
LIMIT 10;

Conclusion

This integration demonstrates how fast-moving online discussions, like those on Hacker News, can be transformed into queryable knowledge graphs using Cognee and Memgraph.

By combining the Hacker News API with AI-based semantic understanding and high-performance graph database technology, you’ve created a system that can:

Automatically understand the semantic content of discussions
Discover hidden relationships between concepts and entities
Enable smart search that goes beyond keyword matching
Provide visual insights through graph exploration
Scale to handle large volumes of real-time data

The future of knowledge management lies in systems that can think, reason, and discover insights the way humans. With this integration, you’re one step closer to that reality!