Import data from CSV files

If your data is in CSV format, you can import it into a running Memgraph database from designated CSV files using the LOAD CSV Cypher clause or directly from Memgraph Lab using a CSV file import method.

💡

If the data imports slower than expected, you can speed up the import by creating indexes on nodes to improve the import speed of relationships or switching the storage mode to analytical.

If the import speed remains unsatisfactory, don't hesitate to reach out on Discord (opens in a new tab).

LOAD CSV Cypher clause

The LOAD CSV clause reads row by row from a CSV file, binds the contents of the parsed row to the specified variable, and populates the database if it is empty, or appends new data to an existing dataset. Memgraph supports the Excel CSV dialect, as it's the most common one.

LOAD CSV clause syntax

Memgraph Cloud instances do not have direct access to files stored on a user's local machine or private network. To use LOAD CSV within Memgraph Cloud, upload your CSV file to a public URL and import data directly from there.

Here is an example of the Cypher query:

LOAD CSV FROM "https://example.com/path/to/your-data.csv" WITH HEADER AS row

The syntax of the LOAD CSV clause is:

LOAD CSV FROM <csv-location> ( WITH | NO ) HEADER [IGNORE BAD] [DELIMITER <delimiter-string>] [QUOTE <quote-string>] [NULLIF <nullif-string>] AS <variable-name>
  • <csv-location> is a string of the location of the CSV file.
    Without a URL protocol, it refers to a file path. There are no restrictions on where in your file system the file can be located, as long as the path is valid (i.e., the file exists). If you are using Docker to run Memgraph, you will need to copy the files from your local directory into Docker container where Memgraph can access them.
    If using http://, https://, or ftp:// the CSV file will be fetched over the network.

  • ( WITH | NO ) HEADER flag specifies whether the CSV file has a header, in which case it will be parsed as a map, or it doesn't have a header, in which case it will be parsed as a list.

    If the WITH HEADER option is set, the very first line in the file will be parsed as the header, and any remaining rows will be parsed as regular rows. The value bound to the row variable will be a map of the form:

    { ( "header_field" : "row_value" )? ( , "header_field" : "row_value" )* }

    If the NO HEADER option is set, then each row is parsed as a list of values. The contents of the row can be accessed using the list index syntax. Note that in this mode, there are no restrictions on the number of values a row contains. This isn't recommended, as the user must manually handle the varying number of values in a row.

  • IGNORE BAD flag specifies whether rows containing errors should be ignored or not. If it's set, the parser attempts to return the first valid row from the CSV file. If it isn't set, an exception will be thrown on the first invalid row encountered.

  • DELIMITER <delimiter-string> option enables the user to specify the CSV delimiter character. If it isn't set, the default delimiter character , is assumed.

  • QUOTE <quote-string> option enables the user to specify the CSV quote character. If it isn't set, the default quote character " is assumed.

  • NULLIF <nullif-string> option enables you to specify a sequence of characters that will be parsed as null. By default, all empty columns in Memgraph are treated as empty strings, so if this option is not used, no values will be treated as null.

  • <variable-name> is a symbolic name representing the variable to which the contents of the parsed row will be bound to, enabling access to the row contents later in the query. The variable doesn't have to be used in any subsequent clause.

LOAD CSV clause specificities

When using the LOAD CSV clause please keep in mind:

  • The parser parses the values as strings so it's up to the user to convert the parsed row values to the appropriate type. This can be done using the built-in conversion functions such as ToInteger, ToFloat, ToBoolean etc. Consult the documentation on the available conversion functions.

    If all values are indeed strings and the file has a header, you can import data using the following query:

    LOAD CSV FROM "/people.csv" WITH HEADER AS row
    CREATE (p:People) SET p += row;
  • The LOAD CSV clause is not a standalone clause, meaning a valid query must contain at least one more clause, for example:

    LOAD CSV FROM "/people.csv" WITH HEADER AS row
    CREATE (p:People) SET p += row;

    In this regard, the following query will throw an exception:

    LOAD CSV FROM "/file.csv" WITH HEADER AS row;

    Adding a MATCH or MERGE clause before LOAD CSV allows you to match certain entities in the graph before running LOAD CSV, optimizing the process as matched entities do not need to be searched for every row in the CSV file.

    But, the MATCH or MERGE clause can be used prior the LOAD CSV clause only if the clause returns only one row. Returning multiple rows before calling the LOAD CSV clause will cause a Memgraph runtime error.

  • The LOAD CSV clause can be used at most once per query, so queries like the one below will throw an exception:

    LOAD CSV FROM "/x.csv" WITH HEADER as x
    LOAD CSV FROM "/y.csv" WITH HEADER as y
    CREATE (n:A {p1 : x, p2 : y});

Increase import speed

The LOAD CSV clause will create relationships much faster and consequently speed up data import if you create indexes on nodes or node properties once you import them:

  CREATE INDEX ON :Node(id);

If the LOAD CSV clause is merging data instead of creating it, create indexes before running the LOAD CSV clause.

You can also speed up import if you switch Memgraph to analytical storage mode. In the analytical storage mode there are no ACID guarantees besides manually created snapshots but it does increase the import speed up to 6 times with 6 times less memory consumption. After import you can switch the storage mode back to transactional and enable ACID guarantees.

You can switch between modes within the session using the following query:

STORAGE MODE IN_MEMORY_{TRANSACTIONAL|ANALYTICAL};

When in the analytical storage mode, don't import data using multiple threads.

The LOAD CSV clause can handle CSVs that are compressed with gzip or bzip2. This can speed up the time it takes to fetch and/or load the file.

If you are using on-disk storage mode, consider using Edge import mode to get the best import performance.

Import files WITH and NO HEADER

The goal of this example is to import two CSV files.

One file contains data we will use to create nodes labeled :Person, and the other file will be used to connect those nodes with the :IS_FRIENDS_WITH relationship.

There are also two variations of the files: files with a header and files without a header.

Download the files

Check the location of the CSV files

If you are working with Docker, copy the files from your local directory into the Docker container so that Memgraph can access them.

Import nodes

These CSV files have a header, which means the HEADER option of the LOAD CSV clause needs to be set to WITH. Each row will be parsed as a map, and the fields can be accessed using the property lookup syntax (e.g. id:row.id).

The following query will load row by row from the people_nodes_wh CSV file, and create a new node for each row with properties based on the parsed row values:

LOAD CSV FROM "/path-to/people_nodes_wh.csv" WITH HEADER AS row
CREATE (p:Person {id: row.id, name: row.name});

If successful, you should receive an Empty set (0.014 sec) message.

Create indexes

Creating an index on a property used to connect nodes with relationships, in this case, the id property of the :Person nodes, will speed up the import of relationships, especially with large datasets:

CREATE INDEX ON :Person(id);

Import relationships

Import the data from the people_relationships_wh.csv file:

LOAD CSV FROM "/path-to/people_relationships.csv" WITH HEADER AS row
MATCH (p1:Person {id: row.id_from}), (p2:Person {id: row.id_to})
CREATE (p1)-[:IS_FRIENDS_WITH]->(p2);

If successful, you should receive an Empty set (0.014 sec) message.

Final result

Run the following query to see how the imported data looks as a graph:

MATCH p=()-[]-() RETURN p;


Import data from a single CSV file containing both nodes and relationships

This example will show how to use set of Cypher queries to import data into Memgraph from a single CSV file containing both nodes and relationships. This file has header row.

The people_friendship.csv file is structured to contain both people and their relationships:

  • type: Indicates the type of record. It has two possible values: PERSON for individual records and IS_FRIEND for relationships.
  • id: Unique identifier for a person. This is filled only for PERSON records.
  • name: Name of the person. This is also filled only for PERSON records.
  • id_from and id_to: Used to denote relationships. For IS_FRIEND records, id_from is the ID of the person from whom the friendship originates, and id_to is the ID of the friend.

This structure allows representation of both nodes (persons) and their relationships (friendships) within a single CSV file.

Download the file

type,id,name,id_from,id_to
PERSON,100,Daniel,,
PERSON,101,Alex,,
PERSON,102,Sarah,,
PERSON,103,Mia,,
PERSON,104,Lucy,,
IS_FRIEND,,,100,101
IS_FRIEND,,,100,102
IS_FRIEND,,,100,103
IS_FRIEND,,,101,103
IS_FRIEND,,,102,104

Import nodes

First, import the individual records (people) from the CSV file. Only rows with non-empty id and name columns are processed.

LOAD CSV FROM "/path-to/people_friendship.csv" WITH HEADER AS row
WITH row WHERE row.type = 'PERSON'
CREATE (:Person {id: row.id, name: row.name});

Create indexes

Create an index on the id property of :Person nodes to enhance the performance of the relationship import process.

CREATE INDEX ON :Person(id);

Import relationships

Finally, import the relationships from the CSV file, focusing on rows where id_from and id_to are not empty.

LOAD CSV FROM "/path-to/people_friendship.csv" WITH HEADER AS row
WITH row WHERE row.type = 'IS_FRIEND'
MATCH (p1:Person {id: row.id_from}), (p2:Person {id: row.id_to})
CREATE (p1)-[:IS_FRIENDS_WITH]->(p2);

Final result

Run the following query to see how the imported data looks as a graph:

MATCH p=()-[]-() RETURN p;

Import multiple CSV files with distinct graph objects

In this example, the data is split across four files, each file contains nodes of a single label or relationships of a single type. All files have a header.

Download the files

  • people_nodes.csv (opens in a new tab) is used to create nodes labeled :Person.
    The file contains the following data:

    id,name,age,city
    100,Daniel,30,London
    101,Alex,15,Paris
    102,Sarah,17,London
    103,Mia,25,Zagreb
    104,Lucy,21,Paris
  • restaurants_nodes.csv (opens in a new tab) is used to create nodes labeled :Restaurants.
    The file contains the following data:

    id,name,menu
    200,Mc Donalds,Fries;BigMac;McChicken;Apple Pie
    201,KFC,Fried Chicken;Fries;Chicken Bucket
    202,Subway,Ham Sandwich;Turkey Sandwich;Foot-long
    203,Dominos,Pepperoni Pizza;Double Dish Pizza;Cheese filled Crust
  • people_relationships.csv (opens in a new tab) is used to connect people with the :IS_FRIENDS_WITH relationship.
    The file contains the following data:

    first_person,second_person,met_in
    100,102,2014
    103,101,2021
    102,103,2005
    101,104,2005
    104,100,2018
    101,102,2017
    100,103,2001
  • restaurants_relationships.csv (opens in a new tab) is used to connect people with restaurants using the :ATE_AT relationship.
    The file contains the following data:

    PERSON_ID,REST_ID,liked
    100,200,true
    103,201,false
    104,200,true
    101,202,false
    101,203,false
    101,200,true
    102,201,true

Check the location of the CSV files

If you are working with Docker, copy the files from your local directory into the Docker container so that Memgraph can access them.

Import nodes

The CSV files have a header, which means the HEADER option of the LOAD CSV clause needs to be set to WITH. Each row will be parsed as a map, and the fields can be accessed using the property lookup syntax (e.g. id: row.id).

The following query will load row by row from the file, and create a new node for each row with properties based on the parsed row values:

LOAD CSV FROM "/path-to/people_nodes.csv" WITH HEADER AS row
CREATE (n:Person {id: row.id, name: row.name, age: ToInteger(row.age), city: row.city});

In the same manner, the following query will create new nodes for each restaurant:

LOAD CSV FROM "/path-to/restaurants_nodes.csv" WITH HEADER AS row
CREATE (n:Restaurant {id: row.id, name: row.name, menu: row.menu});

Create indexes

Creating an index on a property used to connect nodes with relationships, in this case, the id property of the :Person nodes, will speed up the import of relationships, especially with large datasets:

CREATE INDEX ON :Person(id);

Import relationships

The following query will create relationships between the people nodes:

LOAD CSV FROM "/path-to/people_relationships.csv" WITH HEADER AS row
MATCH (p1:Person {id: row.first_person})
MATCH (p2:Person {id: row.second_person})
CREATE (p1)-[f:IS_FRIENDS_WITH]->(p2)
SET f.met_in = row.met_in;

The following query will create relationships between people and restaurants where they ate:

LOAD CSV FROM "/path-to/restaurants_relationships.csv" WITH HEADER AS row
MATCH (p1:Person {id: row.PERSON_ID})
MATCH (re:Restaurant {id: row.REST_ID})
CREATE (p1)-[ate:ATE_AT]->(re)
SET ate.liked = ToBoolean(row.liked);

Final result

Run the following query to see how the imported data looks as a graph:

MATCH p=()-[]-() RETURN p;

CSV file import in Memgraph Lab

The CSV file import tool reads row by row from loaded CSV files, each containing separate data for a specific node label or a relationship type. It binds the contents of the parsed row to the specified variable and either merges the data with the current dataset or drops the database and starts from scratch, depending on the user's choice.

💡

The CSV file import tool is convenient for importing smaller amounts of data. However, for larger datasets, it is still recommended to use LOAD CSV as the best practice for importing.


Import files WITH and NO HEADER

The goal of this example is to import two CSV files.

One file contains data we will use to create nodes labeled :Person and the other file will be used to connect those nodes with the :IS_FRIENDS_WITH relationship.

There are also two variations of the files, files with a header and files without a header.

Download the files

Upload files

Upload your files by selecting them from your local filesystem or using a drag and drop method. Once all your files are uploaded, proceed to configure each one separately.

Let's start by configuring the nodes by clicking on the people_nodes.csv file.

Configure nodes

After clicking on the + Add node button for our CSV file with a header, make sure you select the checkbox on the left side of the interface indicating that the file has a header row. Just below that checkbox, the preview of the detected properties and the first row of your file will be displayed.

During configuration, start by adding a Node label and choose whether to include an index with the label.

💡

Index data carefully as indexes do speed up queries but also take up memory.

Let's add the label :Person and leave the index checkbox empty since we currently only have one type of node and using an index won't impact performance.

Next, add properties. In the dropdown, you can choose from the properties identified by the import tool from the header of the file. Select the ones you need and customize each property by renaming it, defining the type and specifying whether it should be Unique, Required or Indexed.

Finally, choose how to handle duplicates: Import everything, Merge or Skip.

The image below illustrates a possible configuration:

  • Since nodes are identified by the id property, the selected checkboxes indicate uniqueness, requirement, and indexing of that property.
  • For the configuration of the name property, not all nodes need unique names, but having that property is necessary, so the required checkbox is marked.


Configure relationships

Let's now start with configuring the relationships by clicking on the people_relationships.csv file.

After clicking on the + Add edge button for our CSV file with a header, make sure you select the checkbox on the left side of the interface indicating that the file has a header row. Again, the preview of the detected properties and the first row of your file will be displayed.

During configuration, start by adding an Edge type, in our case IS_FRIENDS_WITH.

Moving on, let's add the Node labels for both the starting and ending nodes of the relationship. Since all our nodes have the label :Person, we'll select the :Person label for both. It's important to understand the schema of your data at this point to choose the appropriate labels.

After selecting labels, add the extra property condition that specifies which node property corresponds to the selected property from the CSV file containing relationships.

There's an option to add edge properties, but since our relationships don't have any properties, we can skip that step.

Once again, finalize your configuration by choosing how to handle duplicates: Import everything, Merge or Skip.

The image below illustrates a possible relationship configuration:


Import data

Once you've configured your files, Memgraph proceeds through four steps to import data:

  1. Validating files: Ensuring the consistency of data format across all CSV files.
  2. Uploading to Memgraph: Uploading files to prepare them for validation.
  3. Import progress: Setting up the database, creating constraints (selected in the configuration, such as uniqueness and existence), establishing indexes and importing nodes and relationships.
  4. Database cleanup: Removing the files from Memgraph.

The import process is complete! The data has been successfully imported into Memgraph and now it's time to head over to the Query execution window to explore your dataset and execute queries.

Final result

Run the following query to see how the imported data looks as a graph:

MATCH p=()-[]-() RETURN p;



Import data from a single CSV file

In this example, we'll use the same data from the previous section, but imported from the same CSV file.

Download the file

The file contains four columns, two containing properties id and name for nodes labeled with :Person and two containing id property of nodes the relationship :IS_FRIENDS_WITH is connecting.


Configure nodes

After uploading the file, we need to configure it.

Let's start by configuring nodes. Select the + Add node button and assign the label :Person to the nodes. After adding the label, choose properties from the dropdown from the CSV related to the nodes, in this case, id and name.

The image below illustrates a possible configuration:


  • Since id is the key property for identifying nodes, it needs to be unique, required and optionally indexed.
  • The name property is a string property required for a node, but not every node needs to have a unique name property.

Finally, choose how to handle duplicates: Import everything, Merge or Skip.


Configure relationships

Let's now configure edges connecting the nodes, with relationship type IS_FRIENDS_WITH.

After selecting + Add Edge, assign the IS_FRIENDS_WITH relationship type. We now need to choose the Start node and the End node label. Since relationship type IS_FRIENDS_WITH only connects nodes with :Person label, both our Start node label and End node label is going to be :Person.

Select the properties from the CSV file that corresponds to the id of the person relationship is connecting and add edge properties if they exists.

The image below illustrates a possible configuration:

Finally, choose how to handle duplicates: Import everything, Merge or Skip.


Final result

After configuring files and successfully importing data into Memgraph, navigate to the Query execution window and execute the following query to visualize the imported data as a graph:

MATCH p=()-[]-() RETURN p;



Import data from multiple CSV files

In this example, the same data from the previous example is split across four files, each file containing nodes of a single label or relationships of a single type. All files have a header.

Download the files

  • people_nodes.csv (opens in a new tab) is used to create nodes labeled :Person.
    The file contains the following data:

    id,name,age,city
    100,Daniel,30,London
    101,Alex,15,Paris
    102,Sarah,17,London
    103,Mia,25,Zagreb
    104,Lucy,21,Paris
  • restaurants_nodes.csv (opens in a new tab) is used to create nodes labeled :Restaurants.
    The file contains the following data:

    id,name,menu
    200,Mc Donalds,Fries;BigMac;McChicken;Apple Pie
    201,KFC,Fried Chicken;Fries;Chicken Bucket
    202,Subway,Ham Sandwich;Turkey Sandwich;Foot-long
    203,Dominos,Pepperoni Pizza;Double Dish Pizza;Cheese filled Crust
  • people_relationships.csv (opens in a new tab) is used to connect people with the :IS_FRIENDS_WITH relationship.
    The file contains the following data:

    first_person,second_person,met_in
    100,102,2014
    103,101,2021
    102,103,2005
    101,104,2005
    104,100,2018
    101,102,2017
    100,103,2001
  • restaurants_relationships.csv (opens in a new tab) is used to connect people with restaurants using the :ATE_AT relationship.
    The file contains the following data:

    PERSON_ID,REST_ID,liked
    100,200,true
    103,201,false
    104,200,true
    101,202,false
    101,203,false
    101,200,true
    102,201,true

Data schema

When configuring files, it's important to understand the schema of the data being imported.

The image below illustrates the structure of the data schema:

Configure nodes

After uploading all four files in the import window, we need to configure each one individually. Let's start by configuring the files containing nodes, people_nodes.csv and restaurants_nodes.csv.

For the people_nodes.csv file, we'll assign the :People label. Since we now have more than one type of node labels, we'll add an index to enhance the query speed.

Next, let's add properties. In the dropdown, select all available properties: id, name, age and city.

The image below illustrates a possible configuration:


  • Since id is the key property for identifying nodes, it needs to be unique, required and optionally indexed.
  • The name property is a string property required for a node, but not every node needs to have a unique name property.
  • The age property is an optional integer property that neither has to exist nor be unique.
  • The city is an optional string property that neither has to exist nor be unique as well.

Finally, choose how to handle duplicates: Import everything, Merge or Skip.

For the restaurant_nodes.csv file, we'll assign the :Restaurants label and again add an index to the label.

Next, let's add properties. In the dropdown, select all available properties: id, name and menu.

The image below illustrates a possible configuration:


  • Since id is the key property for identifying nodes, it needs to be unique, required and optionally indexed.
  • The name property is a string property required for a node, but not every node needs to have a unique name property.
  • The menu is an optional string property that neither has to exist nor be unique.

Finally, choose how to handle duplicates: Import everything, Merge or Skip.


Configure relationships

Let's now configure the files containing relationships, people_relationships.csv and restaurants_relationships.csv.

For the people_relationships.csv file, we'll assign the IS_FRIENDS_WITH relationship type. We also need to choose the Start node and the End node label. For this step, it's important to know the schema of the data. Since relationship type IS_FRIENDS_WITH only connects nodes with :People label, both our Start node label and End node label is going to be :People.

Next, let's add properties. In the dropdown, select the property met_in.

The image below illustrates a possible configuration:


Finally, choose how to handle duplicates: Import everything, Merge or Skip.

For the restaurants_relationships.csv file, we'll assign the ATE_AT relationship type. We also need to choose the Start node and the End node label. From our data schema, relationship type ATE_AT connects nodes with :People label to nodes with :Restaurants label.

Next, let's add properties. In the dropdown, select the property liked. Since values of that property are either true or false, the type of the property can be defined as boolean.

The image below illustrates a possible configuration:

Finally, choose how to handle duplicates: Import everything, Merge or Skip.


Final result

After configuring files and successfully importing data into Memgraph, navigate to the Query execution window and execute the following query to visualize the imported data as a graph:

MATCH p=()-[]-() RETURN p;