xml_module
The XML module enhances graph database capabilities by providing support for loading and parsing XML data.
Trait | Value |
---|---|
Module type | util |
Implementation | Python |
Parallelism | sequential |
Functions
You can execute this algorithm on graph projections, subgraphs or portions of the graph.
parse()
Parses an XML string or file into a map.
In XML file format, every element is represented as a map. For every element,
its children elements are represented as a key-value pair inside that map, the
key being _children
, and the value an array of children elements. But, when
simple
is True
, the key of children elements is not _children
, but rather
the name of the parent element.
Consider a root element named catalog
. When parsing this element, if simple
is False
, the key-value pair of children elements will look like this:
_children: [child_element_1, child_element_2, ....]
. But, when simple
is
True
, the key-value pair will look like this _catalog: [child_element_1, child_element_2, ....]
. Using simple mode makes nested XML elements accessible
via an element name prefixed with an _
.
Input:
-
subgraph: Graph
(OPTIONAL) ➡ A specific subgraph, which is an object of type Graph returned by theproject()
function, on which the algorithm is run. If subgraph is not specified, the algorithm is computed on the entire graph by default. -
xml_input: string
➡ input XML string. -
simple: bool (default = false)
➡ configuration bool used for specifying whether simple mode should be used. Simple configuration explanation. -
path: string (default = "")
➡ path to the XML file that needs to be parsed. If the path is not empty, thexml_input
string is ignored, and only the file is parsed.
Output:
The output of this function is a parsed XML map.
Usage:
Parsing XML from string
WITH '<catalog><book id="1"><title>Book 1</title><author>Author 1</author><publication><year>2022</year><publisher>Publisher A</publisher></publication></book><book id="2"><title>Book 2</title><author>Author 2</author><publication><year>2023</year><publisher>Publisher B</publisher></publication></book></catalog>' AS xmlString
RETURN xml_module.parse(xmlString) AS value;
Output:
{
"_children": [
{
"_children": [
{
"_text": "Book 1",
"_type": "title"
},
{
"_text": "Author 1",
"_type": "author"
},
{
"_children": [
{
"_text": "2022",
"_type": "year"
},
{
"_text": "Publisher A",
"_type": "publisher"
}
],
"_type": "publication"
}
],
"_type": "book",
"id": "1"
},
{
"_children": [
{
"_text": "Book 2",
"_type": "title"
},
{
"_text": "Author 2",
"_type": "author"
},
{
"_children": [
{
"_text": "2023",
"_type": "year"
},
{
"_text": "Publisher B",
"_type": "publisher"
}
],
"_type": "publication"
}
],
"_type": "book",
"id": "2"
}
],
"_type": "catalog"
}
Parsing with simple configuration
WITH '<catalog><book id="1"><title>Book 1</title><author>Author 1</author><publication><year>2022</year><publisher>Publisher A</publisher></publication></book><book id="2"><title>Book 2</title><author>Author 2</author><publication><year>2023</year><publisher>Publisher B</publisher></publication></book></catalog>' AS xmlString
RETURN xml_module.parse(xmlString, true) AS value;
Output:
{
"_catalog": [
{
"_book": [
{
"_text": "Book 1",
"_type": "title"
},
{
"_text": "Author 1",
"_type": "author"
},
{
"_publication": [
{
"_text": "2022",
"_type": "year"
},
{
"_text": "Publisher A",
"_type": "publisher"
}
],
"_type": "publication"
}
],
"_type": "book",
"id": "1"
},
{
"_book": [
{
"_text": "Book 2",
"_type": "title"
},
{
"_text": "Author 2",
"_type": "author"
},
{
"_publication": [
{
"_text": "2023",
"_type": "year"
},
{
"_text": "Publisher B",
"_type": "publisher"
}
],
"_type": "publication"
}
],
"_type": "book",
"id": "2"
}
],
"_type": "catalog"
}
Parsing from a file, with simple configuration
The following example shows how to parse this file:
file.xml
<catalog>
<book id="1">
<title>Book 1</title>
<author>Author 1</author>
<publication>
<year>2022</year>
<publisher>Publisher A</publisher>
</publication>
</book>
<book id="2">
<title>Book 2</title>
<author>Author 2</author>
<publication>
<year>2023</year>
<publisher>Publisher B</publisher>
</publication>
</book>
</catalog>
Cypher:
RETURN xml_module.parse("", true,"/home/demonstration/Documents/file.xml") AS value;
Output:
{
"_catalog": [
{
"_book": [
{
"_text": "Book 1",
"_type": "title"
},
{
"_text": "Author 1",
"_type": "author"
},
{
"_publication": [
{
"_text": "2022",
"_type": "year"
},
{
"_text": "Publisher A",
"_type": "publisher"
}
],
"_type": "publication"
}
],
"_type": "book",
"id": "1"
},
{
"_book": [
{
"_text": "Book 2",
"_type": "title"
},
{
"_text": "Author 2",
"_type": "author"
},
{
"_publication": [
{
"_text": "2023",
"_type": "year"
},
{
"_text": "Publisher B",
"_type": "publisher"
}
],
"_type": "publication"
}
],
"_type": "book",
"id": "2"
}
],
"_type": "catalog"
}
Procedures
load()
Loads and parses an XML file from a URL or a local file. Supports simple mode, and XPath expressions. You can choose to execute the procedure in simple mode => Simple configuration explanation.
This procedure supports the usage of XPath expressions. Since the module is
implemented in Python, XPath expressions should follow, and are limited to the
XPath syntax explained here XML python
docs.
XPath implemented this way cannot use absolute paths, so one of these 3 prefixes
must be used to avoid errors: . .. *
. The current node is the root node.
Input:
-
subgraph: Graph
(OPTIONAL) ➡ A specific subgraph, which is an object of type Graph returned by theproject()
function, on which the algorithm is run. If subgraph is not specified, the algorithm is computed on the entire graph by default. -
xml_url: string
➡ The input URL where the XML file is located. -
simple: bool (default = false)
➡ A bool used for specifying whether simple mode should be used. -
path: string (default = "")
➡ A path to the XML file that needs to be parsed. If the path is not empty,xml_input
string is ignored, and only the file is parsed. -
xpath: string (default = "")
➡ XPath expression. If the expression is not empty, the result of the procedure is all elements satisfying the expression. -
headers: Map (default = {})
➡ A map of additional HTTP headers used when fetching a file from URL.
Output:
output_map: Map
➡ parsed XML map.
If the XPath expression is not empty, the output is all elements that satisfy the expression.
Usage:
This section shows the usage of the procedure on the folllowing XML file
Parse an XML file from URL
WITH "https://www.w3schools.com/xml/note.xml" AS xmlUrl
CALL xml_module.load(xmlUrl, false, "", "", {}) YIELD output_map RETURN output_map;
Output:
{
"_children": [
{
"_text": "Tove",
"_type": "to"
},
{
"_text": "Jani",
"_type": "from"
},
{
"_text": "Reminder",
"_type": "heading"
},
{
"_text": "Don't forget me this weekend!",
"_type": "body"
}
],
"_type": "note"
}
Parse with simple configuration
WITH "https://www.w3schools.com/xml/note.xml" AS xmlUrl
CALL xml_module.load(xmlUrl, true, "", "", {}) YIELD output_map RETURN output_map;
Output:
{
"_note": [
{
"_text": "Tove",
"_type": "to"
},
{
"_text": "Jani",
"_type": "from"
},
{
"_text": "Reminder",
"_type": "heading"
},
{
"_text": "Don't forget me this weekend!",
"_type": "body"
}
],
"_type": "note"
}
Parse XML from a file
Example of the file:
file.xml
<catalog>
<book id="1">
<title>Book 1</title>
<author>Author 1</author>
<publication>
<year>2022</year>
<publisher>Publisher A</publisher>
</publication>
</book>
<book id="2">
<title>Book 2</title>
<author>Author 2</author>
<publication>
<year>2023</year>
<publisher>Publisher B</publisher>
</publication>
</book>
</catalog>
Cypher:
CALL xml_module.load("", true, "/home/demonstration/Documents/file.xml", "", {}) YIELD output_map RETURN output_map;
Output:
{
"_catalog": [
{
"_book": [
{
"_text": "Book 1",
"_type": "title"
},
{
"_text": "Author 1",
"_type": "author"
},
{
"_publication": [
{
"_text": "2022",
"_type": "year"
},
{
"_text": "Publisher A",
"_type": "publisher"
}
],
"_type": "publication"
}
],
"_type": "book",
"id": "1"
},
{
"_book": [
{
"_text": "Book 2",
"_type": "title"
},
{
"_text": "Author 2",
"_type": "author"
},
{
"_publication": [
{
"_text": "2023",
"_type": "year"
},
{
"_text": "Publisher B",
"_type": "publisher"
}
],
"_type": "publication"
}
],
"_type": "book",
"id": "2"
}
],
"_type": "catalog"
}
Use XPath
For the XPath demonstration, cd_catalog.xml file will be used.
The XPath expression is going to be ".//CD[YEAR='1988']"
, which will return
all CD elements with attribute year equaling 1988. Note that XPath expressions
cannot be absolute paths because of the Python implementation of XPath, so .
is used as an XPath prefix for this example, meaning the search will start from
the current (root) element.
WITH "https://www.w3schools.com/xml/cd_catalog.xml" AS xmlUrl
CALL xml_module.load(xmlUrl, false, "", ".//CD[YEAR='1988']", {}) YIELD output_map RETURN output_map;
Result:
{
"_children": [
{
"_text": "Hide your heart",
"_type": "TITLE"
},
{
"_text": "Bonnie Tyler",
"_type": "ARTIST"
},
{
"_text": "UK",
"_type": "COUNTRY"
},
{
"_text": "CBS Records",
"_type": "COMPANY"
},
{
"_text": "9.90",
"_type": "PRICE"
},
{
"_text": "1988",
"_type": "YEAR"
}
],
"_type": "CD"
}
{
"_children": [
{
"_text": "Stop",
"_type": "TITLE"
},
{
"_text": "Sam Brown",
"_type": "ARTIST"
},
{
"_text": "UK",
"_type": "COUNTRY"
},
{
"_text": "A and M",
"_type": "COMPANY"
},
{
"_text": "8.90",
"_type": "PRICE"
},
{
"_text": "1988",
"_type": "YEAR"
}
],
"_type": "CD"
}