How to Build a Graph Web Application With Python, Flask, Docker & Memgraph - Part 1
The goal is straightforward (or at least it seems simple enough). Let's build a web application in Python that can visualize a graph and run some cool graph algorithms out of the box. Maybe it's not your flavor, but I prefer the Flask web framework for such occasions, so bear with me through this tutorial.
Now, I am going to show you an example of how to accomplish this. You can also take a look at the finished app on GitHub if you want to see the complete code.
The general outline of the tutorial is:
- Create a Flask server
- Dockerize your application
- Import the data into Memgraph
- Query the database
Graph visualizations will be covered in part two of the tutorial so stay tuned! Spoiler alert, we are going to use D3.js to draw our graph.
Prerequisites
For this tutorial, you will need to install:
- Docker
- Docker Compose (which is included with Docker on Windows and macOS)
With Docker, we don't need to worry about installing Python, Flask, Memgraph... essentially anything. Everything will be installed automatically and run smoothly inside Docker containers!
Disclaimer: Docker fanboy alert
1. Create a Flask server
I included comments in the code to make it more understandable, but if at any
point you feel like something is unclear, join our Discord
Server and share your thoughts. First, create the
file app.py
with the following contents:
import json
import logging
import os
from argparse import ArgumentParser
from flask import Flask, Response, render_template
from gqlalchemy import Memgraph
log = logging.getLogger(__name__)
def init_log():
logging.basicConfig(level=logging.DEBUG)
log.info("Logging enabled")
# Set the log level for werkzeug to WARNING because it will print out too much info otherwise
logging.getLogger("werkzeug").setLevel(logging.WARNING)
Other than the imports, the first few lines focus on setting up the logging. No web application is complete without logging, so we will add the bare minimum and disable the pesky werkzeug logger, which sometimes prints too much info.
Now, let's create an argument parser. This will enable you to easily change the behavior of the app on startup using arguments.
# Parse the input arguments for the app
def parse_args():
"""
Parse command line arguments.
"""
parser = ArgumentParser(description=__doc__)
parser.add_argument("--host", default="0.0.0.0", help="Host address.")
parser.add_argument("--port", default=5000, type=int, help="App port.")
parser.add_argument("--template-folder", default="public/template", help="Flask templates.")
parser.add_argument("--static-folder", default="public", help="Flask static files.")
parser.add_argument("--path-to-input-file", default="graph.cypherl", help="Graph input file.")
parser.add_argument("--debug", default=True, action="store_true", help="Web server in debug mode.")
print(__doc__)
return parser.parse_args()
args = parse_args()
It’s time to create your server instance:
# Create the Flask server instance
app = Flask(
__name__,
template_folder=args.template_folder,
static_folder=args.static_folder,
static_url_path="",
)
You can finally create the view functions that will be invoked from the browser via HTTP requests. In layman's terms, the homepage is called by:
# Retrieve the home page for the app
@app.route("/", methods=["GET"])
def index():
return render_template("index.html")
The only thing that’s left is to implement and call the main()
function:
# Entrypoint for the app that will be executed first
def main():
# Code that should only be run once
if os.environ.get("WERKZEUG_RUN_MAIN") == "true":
init_log()
app.run(host=args.host,
port=args.port,
debug=args.debug)
if __name__ == "__main__":
main()
The somewhat strange statement os.environ.get("WERKZEUG_RUN_MAIN") == "true"
will make sure that this code is only executed once. Confused? A problem arises when working with Flask in development mode because each code change triggers a reload of the server, which in turn could result in parts of your code executing multiple times (for example, like the main function).
So, if you need to execute something only once in Flask at the beginning like loading data, this is the perfect place for it.
The next step is to create the following files, which we will work on in the next tutorial:
index.html
inpublic/template
index.js
inpublic/js
style.css
inpublic/css
One more file is needed and this one will specify all the Python dependencies
that need to be installed. Create requirements.txt
with the following
contents:
gqlalchemy==1.0.6
Flask==2.0.2
Your current project structure should look like this:
app
├── public
│ ├── css
│ │ └── style.css
│ ├── js
│ │ └── index.js
│ └── templates
│ └── index.html
├── app.py
└── requirements.txt
2. Dockerize your application
This is much simpler than you might think. Most often, you will need a
Dockerfile in which you will specify how your Docker image should be created.
Let's take a look at our Dockerfile
:
FROM python:3.9
# Install CMake
RUN apt-get update && /
apt-get --yes install cmake && /
rm -rf /var/lib/apt/lists/*
# Install Python packages
COPY requirements.txt ./
RUN pip3 install -r requirements.txt
# Copy the source code
COPY public /app/public
COPY app.py /app/app.py
WORKDIR /app
# Set the environment variables
ENV FLASK_ENV=development
ENV LC_ALL=C.UTF-8
ENV LANG=C.UTF-8
# Start the web application
ENTRYPOINT ["python3", "app.py"]
The first line indicates that we are basing our image on a Linux image that has
Python 3.9 preinstalled. The next step is to install CMake (which is needed for
the Memgraph Python driver) with RUN
and the standard Linux installation
command apt-get ...
.
We copy the requirements.txt
file and install the Python packages with
pip. The source code also needs to be copied to the image in order for us to
start the web application. The ENTRYPOINT
command is responsible for starting
the desired process inside the container.
But we are not finished with Docker yet. We need to create a
docker-compose.yml
file that will tell Docker which containers to start.
version: "3"
services:
server:
build: .
volumes:
- .:/app
ports:
- "5000:5000"
environment:
MEMGRAPH_HOST: memgraph
MEMGRAPH_PORT: "7687"
depends_on:
- memgraph
memgraph:
image: "memgraph/memgraph"
ports:
- "7687:7687"
There are two services/containers in our app:
- Server: Uses the
Dockerfile
to build a Docker image and run it. - Memgraph: This is our database. Docker will automatically download the image and start it.
Because we are supplying environment variables, let's load them in app.py
right after the imports:
MEMGRAPH_HOST = os.getenv("MEMGRAPH_HOST", "memgraph")
MEMGRAPH_PORT = int(os.getenv("MEMGRAPH_PORT", "7687"))
Your current project structure should look like this:
app
├── public
│ ├── css
│ │ └── style.css
│ ├── js
│ │ └── index.js
│ └── templates
│ └── index.html
├── app.py
├── docker-compose.yml
├── Dockerfile
└── requirements.txt
Now, we can even start our app with the following commands:
docker-compose build
docker-compose up
3. Import the data into Memgraph
This task will be done inside the main()
function because it only needs to be
executed once:
memgraph = None
def main():
if os.environ.get("WERKZEUG_RUN_MAIN") == "true":
init_log()
global memgraph
memgraph = Memgraph(MEMGRAPH_HOST,
MEMGRAPH_PORT)
load_data(args.path_to_input_file)
app.run(host=args.host,
port=args.port,
debug=args.debug)
How do we import the data into Memgraph? I prepared a file with the Cypher
queries that need to be executed in order to populate the database. You just
need to download the file in your root directory and add the following
load_data()
function:
def load_data(path_to_input_file):
"""Load data into the database."""
try:
memgraph.drop_database()
with open(path_to_input_file, "r") as file:
for line in file:
memgraph.execute(line)
except Exception as e:
log.info(f"Data loading error: {e}")
First, we clear everything in the database, and then we go over each line in the
file graph.cypherl
and execute them. And that's it. Once we start the web
application, Memgraph will import the dataset.
4. Query the database
We will create a function that will execute a Cypher query and return the results. It returns the whole graph, but we will limit ourselves to 100 nodes:
def get_graph():
results = memgraph.execute_and_fetch(
f"""MATCH (n)-[]-(m)
RETURN n as from, m AS to
LIMIT 100;"""
)
return list(results)
The view function get_data()
which fetches all the nodes and relationships
from the database, filters out the most important information, and returns it in
JSON format for visualization. To can the network load at a minimum, you will
send a list with every node id (no other information about the nodes) and a list
that specifies how they are connected to each other.
@app.route("/get-graph", methods=["GET"])
def get_data():
"""Load everything from the database."""
try:
results = get_graph()
# Sets for quickly checking if we have already added the node or edge
# We don't want to send duplicates to the frontend
nodes_set = set()
links_set = set()
for result in results:
source_id = result["from"].properties['name']
target_id = result["to"].properties['name']
nodes_set.add(source_id)
nodes_set.add(target_id)
if ((source_id, target_id) not in links_set and
(target_id, source_id,) not in links_set):
links_set.add((source_id, target_id))
nodes = [
{"id": node_id}
for node_id in nodes_set
]
links = [{"source": n_id, "target": m_id} for (n_id, m_id) in links_set]
response = {"nodes": nodes, "links": links}
return Response(json.dumps(response), status=200, mimetype="application/json")
except Exception as e:
log.info(f"Data loading error: {e}")
return ("", 500)
What’s next?
As you can see, it’s very easy to connect to Memgraph and run graph algorithms, even from a web application. While this part of the tutorial focused on the backend, in the next one, we will talk about graph visualizations and the D3.js framework.
Did you hear about the Memgraph App Challenge?
The rules are simple, you just have to create something that enriches the world of graphs! It could be a web application, Memgraph driver, an integration for a graph library, or implementation of a graph algorithm in MAGE. You could just create a Python script or Jupyter Notebook for graph analysis.
Good luck with your coding, and don’t forget to register for the Challenge!