How to migrate from Neo4j to Memgraph

by
Katarina Supe
How to migrate from Neo4j to Memgraph

Introduction

Through this tutorial, you’ll learn how to migrate the movies dataset from Neo4j to Memgraph. If you used Neo4j, you are probably familiar with their example graph that helps you learn the basics of the Cypher query language. Neo4j is an ACID-compliant transactional native graph database, while Memgraph is a platform designed for graph computations on streaming data. You can read more about their differences in the Neo4j vs Memgraph article.

Prerequisites

  • Docker (Linux) or Docker Desktop (macOS/Windows) - you will run Memgraph and Neo4j in a Docker container
  • Neo4j with APOC - in this tutorial, you’ll use the apoc.export.csv.all() procedure.
  • Memgraph - you are going to import your data using the LOAD CSV clause.

Run Neo4j and Memgraph

You need to start Neo4j with APOC, and you can do that by running:

docker run -p 7474:7474 -p 7687:7687 \
-v data:/data -v  plugins:/plugins --name neo4j-apoc \ 
-e NEO4J_apoc_export_file_enabled=true \ 
-e NEO4J_apoc_import_file_enabled=true \ 
-e NEO4J_apoc_import_file_use__neo4j__config=true \ 
-e NEO4JLABS_PLUGINS=\[\"apoc\"\] \  
-e NEO4J_AUTH=neo4j/password neo4j:latest

If you’re having trouble running this on the new Apple M1 chip, try adding --platform linux/arm64/v8 after the run command. Also, you can set NEO4J_AUTH however you like, here neo4j is username and password is password.

The Neo4j Browser is available at http://localhost:7474, and you can use the credentials from the NEO4J_AUTH environment variable. The ports 7474 and 7687 are now exposed, and since Memgraph also uses port 7687, you can run it at the next available port, that is, 7688:

docker run -it -p 7688:7687 -p 3000:3000 -v mg_lib:/var/lib/memgraph memgraph/memgraph-platform

Since you started Memgraph Platform, the command-line tool mgconsole should be open in your terminal, and the visual user interface Memgraph Lab is available at http://localhost:3000.

Great, now you have both databases running and you’re ready to play with the movies dataset!

1. Load the movies dataset into Neo4j

The movies dataset contains actors and directors that are related through the movies they’ve collaborated on.

movies-dataset-graph-model

This is a simple dataset and a good example of how you can migrate the whole graph database from Neo4j to Memgraph. You can import it to Neo4j by running the Movie Graph example from Example Graphs in the Neo4j Browser or Desktop app. Once the movie dataset is loaded into Neo4j, you can jump to exporting the whole database.

2. Export the movies dataset from Neo4j

You are going to export the whole database from Neo4j to a CSV file movies.csv by using the apoc.export.csv.all() procedure. Through the Neo4j Browser (or Desktop app), run the following query:

CALL apoc.export.csv.all("movies.csv", {})

Now you have exported the whole database to the movies.csv file. But where is this file located? Since you ran both Neo4j and Memgraph with Docker, files you are exporting or importing are located inside the Docker container. To enter the container, you first must find out the container ID of the container where Neo4j is running. Run docker ps to check all the running containers. There you can find the Neo4j container ID and copy it. After that, run:

docker exec -it <container_ID> bash

After you run the above command, you are inside the container where Neo4j is running. All exported data is located at /var/lib/neo4j/import/. Hence, run cd /var/lib/neo4j/import/ to check whether the movies.csv file is there. Next, you are going to copy the movies.csv file to your local file system. To do that, you need to run:

docker cp <container_ID>:/var/lib/neo4j/import/movies.csv /path_to_local_folder/movies.csv

The <container_ID> is the ID of the Neo4j container, which you already found before. On the right, you can give the path to one of your local folders where you want the movies.csv file to be copied to.

3. Import the movies dataset into Memgraph

After you have exported the whole database from Neo4j into the movies.csv file and copied it to your local file system, you are ready to import the movies dataset into Memgraph. First, you have to copy the movies.csv file to the Docker container where Memgraph is running. Here’s what you have to do:

  1. Run docker ps to find out Memgraph’s container ID
  2. Run docker cp /path_to_local_folder/movies.csv <container_ID>:/usr/lib/memgraph/movies.csv

Now the movies.csv file is ready to be imported to Memgraph with the LOAD CSV clause. First, you need to import all nodes, and after that, relationships. In order to create the relationships, start and end nodes have to already be in the database, and that’s why you are importing in that order.

You can write the queries in the mgconsole, a command-line tool, or in the query editor in the Memgraph Lab. Memgraph Lab is more user-friendly, so I would recommend this way of querying. To connect to Memgraph from Memgraph Lab, you will have to choose the Connect manually option, since the port on which Memgraph is running is not the default one (7687). Once you choose Connect manually, you just have to change the port to 7688 and click Connect.

To create all Person nodes, copy the following query in the query editor:

LOAD CSV FROM "/usr/lib/memgraph/movies.csv" WITH HEADER as row
WITH row WHERE row._labels = ':Person'
CREATE (:Person {id: row._id, name: row.name, born: row.born});

In the movies.csv file, the column _labels is empty if the data in the row represents a relationship. If the row is a node, then the column _labels contains the label of that node. Hence, you can create the Person nodes by going through the _labels column. Every person has the properties id, name and born.

In a similar way, you can create the Movie nodes:

LOAD CSV FROM "/usr/lib/memgraph/movies.csv" WITH HEADER as row
WITH row WHERE row._labels = ':Movie'
CREATE (:Movie {id: row._id, relased: row.released, tagline: row.tagline, title: row.title});

You can check whether you created all nodes correctly. First, in the Overview tab, you can see that the number of nodes is 171, and it matches the number of nodes in Neo4j.

memgraph-lab-overview-node

Next, you can import the relationships. There are 6 different types of relationships, some have properties and some don’t. It’s best to import them one by one:

  • Import ACTED_IN relationship
LOAD CSV FROM "/usr/lib/memgraph/movies.csv" WITH HEADER as row
WITH row WHERE row._type = 'ACTED_IN'
MATCH (n {id: row._start}), (m {id: row._end})
CREATE (n)-[:ACTED_IN {roles: row.roles}]->(m);
  • Import DIRECTED relationship
LOAD CSV FROM "/usr/lib/memgraph/movies.csv" WITH HEADER as row
WITH row WHERE row._type = 'DIRECTED'
MATCH (n {id: row._start}), (m {id: row._end})
CREATE (n)-[:DIRECTED]->(m);
  • Import PRODUCED relationship
LOAD CSV FROM "/usr/lib/memgraph/movies.csv" WITH HEADER as row
WITH row WHERE row._type = 'PRODUCED'
MATCH (n {id: row._start}), (m {id: row._end})
CREATE (n)-[:PRODUCED]->(m);
  • Import WROTE relationship
LOAD CSV FROM "/usr/lib/memgraph/movies.csv" WITH HEADER as row
WITH row WHERE row._type = 'WROTE'
MATCH (n {id: row._start}), (m {id: row._end})
CREATE (n)-[:WROTE]->(m);
  • Import FOLLOWS relationship
LOAD CSV FROM "/usr/lib/memgraph/movies.csv" WITH HEADER as row
WITH row WHERE row._type = 'FOLLOWS'
MATCH (n {id: row._start}), (m {id: row._end})
CREATE (n)-[:FOLLOWS]->(m);
  • Import REVIEWED relationship
LOAD CSV FROM "/usr/lib/memgraph/movies.csv" WITH HEADER as row
WITH row WHERE row._type = 'REVIEWED'
MATCH (n {id: row._start}), (m {id: row._end})
CREATE (n)-[:REVIEWED {rating: row.rating, summary: row.summary}]->(m);

Now you can check again that the number of relationships in the Overview tab in Memgraph Lab is the same as in Neo4j.

memgraph-lab-overview-relationships

And that’s it! You have migrated the whole movies dataset from Neo4j to Memgraph.

4. Explore the data

You can run a couple of example queries from the Neo4j and compare them with the results in Memgraph, to make sure you did everything right.

For example, run the following query:

MATCH (tom {name: "Tom Hanks"}) RETURN tom;

and check the results:

first-query-neo4j

Results in Neo4j

first-query-memgraph

Results in Memgraph

Next, you can run:

MATCH (cloudAtlas {title: "Cloud Atlas"}) RETURN cloudAtlas;

and check the results:

second-query-neo4j

Results in Neo4j

second-query-memgraph

Results in Memgraph

Feel free to try out a different set of queries and make sure that the results are the same in Neo4j and Memgraph.

Conclusion

That’s it for now! You’ve learned how to migrate the whole database from Neo4j to Memgraph. In this tutorial, you used the apoc.export.csv.all() procedure to export the data from Neo4j and the LOAD CSV clause to import the data to Memgraph. This is not the only way to migrate your data from Neo4j to Memgraph. If you feel creative and find some easier way of doing this, join our Discord server and share it with the Memgraph team and community.

Table of Contents

Get a personalized Memgraph demo
and get your questions answered.

Continue Reading

memgraph-vs-neo4j-performance-benchmark-comparison
Neo4j
Comparison
Under the Hood
Memgraph vs. Neo4j: A Performance Comparison

Memgraph delivers results up to 120 times faster than Neo4j while consuming one quarter of the memory!

by
Ante Javor
November 30, 2022
modeling-the-data-a-key-step-in-using-a-graph-database
Graph Database 101
Modeling the Data: A Key Step in Using a Graph Database

Did you ever fall down some bottomless pit of bad data modeling? Our inter Adrian sure did, but he learned a lot from it - how to recognize the pitfalls and how to avoid them in the future! Hope his experience helps you… but let’s be honest, we never really learn from other people’s mistakes, so if you fall be sure to yell for help!

by
Adrian Cvijanovic
October 10, 2022
in-memory-database-python
Python
Graph Database 101
Comparison
In-Memory Databases that Work Great with Python

An in-memory database is a database that is kept in the main memory (RAM) of a computer and controlled by an in-memory database management system. When analyzing information in an in-memory database, only the RAM is used.

by
Memgraph
May 30, 2022