LOAD PARQUET clause
The LOAD PARQUET clause enables you to load and use data from a PARQUET file of your
choosing in a row-based manner within a query.
The syntax of the clause is:
LOAD PARQUET FROM <parquet-location> ( WITH CONFIG configs=configMap ) ? AS <variable-name>-
<parquet-location>is a string that specifies where the Parquet file is located.
If the path does not start withs3://, it is treated as a local file path. If it does start withs3://, Memgraph retrieves the file from the S3-compatible storage using the provided URI. There are no restrictions on the file’s location within your local file system, as long as the path is valid and the file exists. If you are using Docker to run Memgraph, you will need to copy the files from your local directory into Docker container where Memgraph can access them. -
<configs>Represents an optional configuration map through which you can specify configuration options:aws_region,aws_access_key,aws_secret_keyandaws_endpoint_url.<aws_region>: The region in which your S3 service is being located<aws_access_key>: Access key used to connect to S3 service<aws_secret_key>: Secret key used to connect S3 service<aws_endpoint_url>: Optional configuration parameter. Can be used to set the URL of the S3 compatible storage.
<variable-name>is a symbolic name representing the variable to which the contents of the parsed row will be bound to, enabling access to the row contents later in the query. The variable doesn’t have to be used in any subsequent clause.
The clause reads row by row from a PARQUET file and binds the contents of the parsed row to the variable you specified.
Adding a MATCH or MERGE clause before the LOAD PARQUET allows you to match
certain entities in the graph before running LOAD PARQUET, which is an optimization
as matched entities do not need to be searched for every row in the PARQUET file.
But, the MATCH or MERGE clause can be used prior the LOAD PARQUET clause only
if the clause returns only one row. Returning multiple rows before calling the
LOAD PARQUET clause will cause a Memgraph runtime error.
Type handling:
The parser reads each value using its native Parquet type, so you should
receive the same data type inside Memgraph. The following types are supported:
BOOL, INT8, INT16, INT32, INT64, UINT8, UINT16, UINT32, UINT64, HALF_FLOAT,
FLOAT, DOUBLE, STRING, LARGE_STRING, STRING_VIEW, DATE32, DATE64, TIME32,
TIME64, TIMESTAMP, DURATION, DECIMAL128, DECIMAL256, BINARY, LARGE_BINARY,
FIXED_SIZE_BINARY, LIST, MAP.
Any unsupported types are automatically stored as strings. Note that
UINT64_T values are cast to INT64_T because Memgraph does not support
unsigned 64-bit integers, and the Cypher standard only defines 64-bit signed
integers.