Debugging Memgraph by yourself
Why do we expose users with the capability of debugging Memgraph?
In the process of enhancing Memgraph’s performance and reliability, user-driven debugging plays a crucial role. Given the complex nature of software environments, it is often challenging for our team to address all issues simultaneously. Moreover, bugs can exhibit behaviors that are difficult to replicate on different systems.
By conducting initial debugging, users can provide us with valuable diagnostic data from their specific environment, which is essential for reproducing and pinpointing the root cause of issues. To facilitate this, we have equipped our container with a suite of user-friendly tools designed to assist in the debugging process.
These tools not only empower users to identify and report problems more accurately but also expedite the resolution process, enhancing overall system stability and performance.
Which Memgraph image can I use to debug Memgraph?
Memgraph offers Docker images specifically tailored for debugging, with Memgraph built in the RelWithDebInfo mode. These containers are equipped with crucial debugging tools such as perf, gdb, and pgrep, along with other useful apt packages.
This setup is intended to empower users to conduct thorough investigations into issues such as slow performance, system hangs, or unusually high memory usage. Although the RelWithDebInfo build mode of Memgraph comes with some performance degradations (~10% slower performance overall), this approach contributes to more stable and efficient system performance in the long run.
After identification of all the issues and moving to production, users are encouraged to switch to Release mode, with the default Memgraph images, to make sure they gain the extra performance, from the now fast and reliable system.
Memgraph in the RelWithDebInfo mode can be downloaded from DockerHub with the following command:
docker image pull <memgraph_version>-relwithdebinfo
If you are using Memgraph MAGE, the following command should do
docker image pull <MAGE_version>-memgraph-<memgraph_version>-relwithdebinfo
Where memgraph_version
and MAGE_version
are the respective versions of the image. All the images in the RelWithDebInfo
build
mode have the suffix -relwithdebinfo
.
I have the Memgraph image, is there anything specific I should do to run Memgraph?
Yes, Memgraph container needs to be run in the —privileged mode. Privileged mode is an option of the Docker ecosystem, not Memgraph. It will allow for gdb and perf tools to run properly.
Below, we can see an example command of how to run a Memgraph container in the privileged mode:
docker container run --name mg --privileged -p 7687:7687 -p 9091:9091 memgraph --log-level=TRACE --also-log-to-stderr
I have run Memgraph container, where should I perform the debugging?
All debugging is performed inside the container. To enter the container, you need to execute the following command.
docker container exec -it -u root memgraph bash
The -u root
command is there to enable root privileges inside the container, necessary for running debugging tools.
What are all the debugging capabilities I am equipped with?
Memgraph supports the following debug capabilities:
- Attaching Memgraph with
GDB
and inspecting threads - Generating a core dump after Memgraph crashed
- Using
perf
to identify performance bottlenecks
we will be going through each of them in detail.
1. Attaching Memgraph with GDB
GDB
and pgrep
are already installed packages in the Memgraph container that has the debug symbols.
Since Memgraph is already running there at port 7687, you can attach GDB to your running Memgraph with the following command:
gdb -p $(pgrep memgraph)
Most likely, the Memgraph process will have the PID number 1, but for certainty, we use pgrep
.
List of useful commands when in GDB
Name | Description |
---|---|
CTRL + C | Pausing execution. |
c | Continuing execution. |
info thread | List of all executing threads. |
t <x> | Positioning in a specific thread with number x . |
bt | Prints the backtrace of a thread. |
bt full | Prints the backtrace with extra information. |
frame <x> | Positioning in a particular point in the backtrace with number x . |
list | Prints the 10 lines above and below the source code of the frame. |
up | Go 1 frame up in the backtrace. |
down | Go 1 frame down in the backtrace. |
info locals | Prints local variables of the current frame. |
info args | Prints function arguments of the current frame. |
print $local_var | Prints the value of the variable $local_var . |
Most commonly used commands.
In Memgraph, we usually want to see first what are all the threads currently in place. We do that by issuing:
info thread
By identifying a certain thread with a code that could belong to the Memgraph repository, we can issue the command
t <x>
where x
is the specific thread number.
Seeing the backtrace can be done with the command
bt
2. Generating a core dump with Memgraph
Generating a core dump with a plain Docker image
In order to generate a core dump, you need to do a few steps on the host and in the container image.
- Setting no size limit to the core dump Initially, you will need to set no boundaries to the size of the core dump that can be generated. This is done with the following command:
ulimit -c unlimited
- Mounting the correct volume
When Memgraph crashes, we would want to get the present core dump file on our host system. When starting the container, we will provide the appropriate volume. Additionally, don’t forget to set the
--privileged
flag as noted in the previous sections.
docker container run --name mg --privileged -v /home/user/cores:/tmp/cores -p 7687:7687 -p 9091:9091 memgraph:2.16.0_17_050d5c985 --log-level=TRACE --also-log-to-stderr
- Setting up the container for core dump generation Additionally, in the container, the following commands will need to be executed after the container has started, to be able to generate a correct core dump.
ulimit -c unlimited
mkdir -p /tmp/cores
chmod a+rwx /tmp/cores
echo "/tmp/cores/core.%e.%p.%h.%t" > /proc/sys/kernel/core_pattern
When Memgraph crashes, a core dump will be generated, and you will see it on the host system if you have mounted the volume correctly.
- Starting Memgraph again and inspecting the core dump with
GDB
The container will need to be started again since we want the same debug symbols to be present, and using an identical image is the most proper way for that. However, we don’t need now the Memgraph process at port 7687, so we will ignore it.
You will need to copy the core dump file into the container with the docker cp
command.
After logging into the container with the root credentials:
docker container exec -it -u root memgraph bash
we will execute GDB
with the core dump file provided
gdb /usr/lib/memgraph/memgraph --core=/core.memgraph.file
where the core.memgraph.file
is the name of your core dump file. Possibly, appropriate permissions will need to be set on the core dump file. You can check the list of useful GDB
commands in the above sections.
To find out more about setting core dumps, you can check this article.
Generating a core dump with Docker Compose
The setup with Docker Compose is similar to Docker. You will need to bind the volume, run Memgraph in privileged mode, and make sure you set no size limit on the generated core dump.
Below we can see an example Docker Compose file which can generate a core dump:
services:
memgraph:
image: memgraph:2.16.0_17_050d5c985
container_name: mg
privileged: true
ports:
- "7687:7687"
- "7444:7444"
- "9091:9091"
volumes:
- /home/josipmrden/cores:/tmp/cores
command: ["--log-level=TRACE", "--also-log-to-stderr=true"]
ulimits:
core:
hard: -1
soft: -1
lab:
image: memgraph/lab:latest
container_name: memgraph-lab
ports:
- "3000:3000"
depends_on:
- memgraph
environment:
- QUICK_CONNECT_MG_HOST=memgraph
- QUICK_CONNECT_MG_PORT=7687
3. Using perf
to identify performance bottlenecks
Perfing is the most common operation that is run when Memgraph is hanging or performing slowly.
In the next steps, the instructions are provided on how to check which parts of Memgraph are stalling during query execution, so we can use the information to make the system better.
Prior to performing perf instructions, you will need to bound the Memgraph binary to the local filesystem. You can start Memgraph with the volume binded like this:
docker container run --name mg -p 7687:7687 --privileged -v memgraph-binary:/usr/lib/memgraph <memgraph_image> --log-level=TRACE --also-log-to-stderr
- Perfing Memgraph Inside the container, we will need to perf the system. That is done using the following command.
perf record -p $(pgrep memgraph) --call-graph dwarf sleep 5
The command will perf the Memgraph process for 5 seconds. Of course, you can tweak the number by yourself. The command will generate a file called perf.data
in the directory you have performed the command in the container
- Installing
hotspot
as the GUI tool on the host On the outside, we will need to install a GUI tool calledhotspot
, that will help us generate a flamegraph. We can installhotspot
on the host machine by issuing the command:
apt install hotspot
if your machine does not support APT, please check the hotspot repo and follow the installation steps.
- Transferring
perf.data
to host After we have perfed the data, we need to get the perf information from the container to the host. That is done with the following command on the host
docker cp <container_id>:<path_to_perf.data> .
where the container_id
is the Memgraph container ID, and the path_to_perf.data
is the absolute path in the container to the generated perf.data
file.
- Linking Memgraph’s debug symbols in the container to the host
For
hotspot
to be able to identify debug symbols and draw the flamegraph, it needs the path to the debug symbols to be the exact one as in the container. In the container it is the/usr/lib/memgraph/memgraph
binary, so we will need to make a symbolic link from the container volume to the host system:
ln -s <path_to_docker_volume_debug_symbols_binary> /usr/lib/memgraph/memgraph
- Start
hotspot
If you did everything correctly, by starting hotspot
hotspot perf.data
you should be able to see a similar flamegraph like in the picture below.
Debugging Memgraph under Kubernetes (k8s)
General commands
To being with, the master of all kubectl commands is:
kubectl get all
Managing nodes:
kubectl get nodes --show-labels # Show all nodes and their labels.
kubectl get nodes -o wide # Show additional information about the nodes.
kubectl top nodes # Get the current memory usage.
Managing pods:
kubectl get pods --show-labels # Show all pods and their labels.
kubectl get pods -o wide # Inspect how pods get scheduled.
kubectl describe pod <pod-name> # Inspect pod config (args, envs, ...).
kubectl get pod <pod-name> -o yaml # Get pod yaml config.
kubectl exec -it <pod-name> -- /bin/bash # Login to a runnning pod.
kubectl logs <pod-name> # Get logs for a running pod.
kubectl logs memgraph-data-0-0 | tail -n 100 # Filter last logs from a running pod.
kubectl logs --previous <pod-name> # Get logs from a crashed pod.
kubectl logs <pod-name> -c <container-name> # Get logs from a specific pod, e.g., debugging init containers.
kubectl cp <pod-name>:<pod-path> . # Copy logs from a running pod.
kubectl get events --all-namespaces --sort-by='.metadata.creationTimestamp' # List all events by creation time.
kubectl get events --namespace <namespace-name> # List all events in the given namespace.
kubectl port-forward <pod-name> <host-port>:<pod-port> # Forward/connect port on host to the pod port.
kubectl cluster-info dump # Dump current cluster state to stdout.
kubectl get statefulsets # Show all StatefulSets.
kubectl get pvc # Get all PersistentVolumeClaims.
kubectl get pvc -l app=<statefulset-name> # Get the PersistentVolumeClaims for the StatefulSet.
Debugging Memgraph pods
To use gdb
inside a Kubernetes pod, the container must run in privileged
mode. To run any given container in the privileged mode, the k8s cluster
itself needs to have an appropriate configuration.
Below is an example on how to start the privileged kind
cluster.
Create a privileged kind cluster
First, create new config debug-cluster.yaml
file with allow-privileged
enabled.
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
image: kindest/node:v1.31.0
extraPortMappings:
- containerPort: 80
hostPort: 8080
protocol: TCP
kubeadmConfigPatches:
- |
kind: ClusterConfiguration
kubeletConfiguration:
extraArgs:
allow-privileged: "true"
# To inspect the cluster run `kubectl get pods -n kube-system`.
# If some of the pods is in the CrashLoopBackOff status, try runnig `kubectl
# logs <pod-name> -n kube-system` to get the error message.
To start the cluster, execute the following command:
kind create cluster --name <cluster-name> --config debug-cluster.yaml
Deploy a debug pod
Once cluster is up and running, create a new debug-pod.yaml
file with the
following content:
apiVersion: v1
kind: Pod
metadata:
name: debug-pod
spec:
containers:
- name: my-container
image: memgraph/memgraph:3.2.0-relwithdebinfo # Use the latest, but make sure it's the relwithdebinfo one!
securityContext:
runAsUser: 0 # Runs the container as root.
privileged: true
capabilities:
add: ["SYS_PTRACE"]
allowPrivilegeEscalation: true
command: ["sleep"]
args: ["infinity"]
stdin: true
tty: true
To get the pod up and running and open a shell inside it run:
kubectl apply -f debug-pod.yaml
kubectl exec -it debug-pod -- bash
Once you are in the pod execute:
apt-get update && apt-get install -y gdb
su memgraph
gdb --args ./memgraph <memgraph-flags>
run
Once you have memgraph up and running under gdb
, run your workload (insert
data, write or queries…). When you manage to recreate the issue, use the gdb
commands
to pin point the exact issue.
Delete the debug pod
To delete the debug pod run:
kubectl delete pod debug-pod
k8s official documentation on how to debug running pods is quite detailed.
Handling core dumps
When Memgraph crashes, for example, due to segmentation faults (SIGSEGV
),
core dumps can provide invaluable insight for debugging. The Memgraph Helm
charts provide an easy way to enable persistent core dump storage using the
createCoreDumpsClaim
option.
To enable core dumps, create a values.yaml
file with at least the following setting:
createCoreDumpsClaim: true
Feel free to copy values file from the helm-charts repository as a base, since additional required fields may be missing from a minimal config.
This instructs the Helm chart to create a PersistentVolumeClaim
(PVC) to
store core dumps generated by the Memgraph process.
Important configuration notes
By default the storage size is 10GiB. Core dumps can be as large as your node’s total RAM, so it’s recommended to set this explicitly and make sure to adjust the coreDumpsStorageSize
under
values.yaml
file.
Make sure to use the relwithdebinfo
image of Memgraph by setting the image.tag
also under values.yaml
file.
Run the following command to install Memgraph with the debugging configuration:
helm install my-release memgraph/memgraph -f values.yaml
The core dumps are written to a mounted volume inside the container (the
default is /var/core/memgraph
, it’s possible to tweak that by changing the
coreDumpsMountPath
under values.yaml
). You can use kubectl exec
or
kubectl cp
to access the files for post-mortem analysis.
If you have k8s cluster under any major cloud provider + you want to store the dumps under S3, probably the best repo to check out is the core-dump-handler.
Specific cloud provider instructions
The k8s quick reference is an amazing set of commands!