Monitoring

Monitoring applications and databases is essential to track the load and resources used by the system.

Memgraph currently supports:

  1. Real-time logs tracking via WebSocket server: Log tracking is helpful for debugging purposes and monitoring the database operations.
  2. Metrics tracking via HTTP server (Enterprise Edition): In the Enterprise edition, besides log tracking, Memgraph allows tracking information about transactions, query latencies, snapshot recovery latencies, triggers, Bolt messages, indexes, streams, memory, operators and sesions, all by using an HTTP server.

Real-time logs tracking via WebSocket server

Connect to Memgraph’s logging server via WebSocket to forward logs to all the connected clients.

Log messages

Each log that is written to the log file is forwarded to the connected clients in the following format:

{
  event: "log",
  level: "trace"|"debug"|"info"|"warning"|"error"|"critical",
  message: "<log-message>"
}

Connection

To connect to Memgraph’s WebSocket server, use the following URL:

ws://host:port

The default host is 0.0.0.0, but it can be changed using the --monitoring-address= configuration flag.

The default port is 7444, but it can be changed using the --monitoring-port configuration flag.

To connect to Memgraph’s WebSocket server using the default configuration and Python client, you can use the following code snippet:

import asyncio
import websockets
 
HOST = 'localhost'
PORT = '7444'
 
# The WebSocket URL
ws_url = f'ws://{HOST}:{PORT}'
 
async def read_logs():
    async with websockets.connect(ws_url) as websocket:
        print(f"Connected to Memgraph at {ws_url}")
        try:
            # Keep reading messages (logs) from the server
            while True:
 log_message = await websocket.recv()
                print(log_message)
        except websockets.exceptions.ConnectionClosed:
            print("Connection to Memgraph closed.")
 
asyncio.run(read_logs())

The code structure is similar in other languages, depending on the WebSocket library used.

If you want to connect with a WebSocket Secure (WSS) connection, set the --bolt-cert-file and --bolt-key-file configuration flags to enable an SSL connection. Refer to the configuration page to learn how to update the configuration and to see the list of all available configuration flags.

Authentication

When authentication is not used due to no users present in Memgraph, no authentication message is expected, and no response will be returned.

When the authentication is used, Memgraph won’t send a message to a certain connection until it’s authenticated.

To authenticate, create a JSON with the credentials in the following format:

{
  "username": "<username>",
  "password": "<password>"
}

If the credentials are valid, the connection will be made, and the client will receive the messages. As a response, the client should receive the following message:

{
  "success": true,
  "message": "User has been successfully authenticated!"
}

If they are invalid or the first message is in an invalid format, the connection is dropped. As a response, the following message is sent:

{
  "success": false,
  "message": "<error-message>"
}

To use WebSocket to authenticate, send the JSON message to the WebSocket server:

import asyncio
import json
import websockets
 
HOST = "localhost"
PORT = "7444"
USERNAME = "memgraph"
PASSWORD = "memgraph"
 
ws_url = f"ws://{HOST}:{PORT}"
 
 
async def authenticate_and_read_logs(websocket):
    # Send authentication credentials
    credentials = json.dumps({"username": USERNAME, "password": PASSWORD})
    await websocket.send(credentials)
 
    # Wait for authentication response
    response = await websocket.recv()
    response_data = json.loads(response)
    if response_data.get("success"):
        print("Authentication successful!")
    else:
        print(f"Authentication failed: {response_data.get('message')}")
        return  # Stop if authentication fails
 
    # After successful authentication, start reading logs
    try:
        while True:
            log_message = await websocket.recv()
            print(log_message)
    except websockets.exceptions.ConnectionClosed:
        print("Connection to Memgraph closed.")
 
 
async def connect_and_authenticate():
    async with websockets.connect(ws_url) as websocket:
        print(f"Connected to Memgraph at {ws_url}")
        await authenticate_and_read_logs(websocket)
 
 
asyncio.run(connect_and_authenticate())

Authorization (Enterprise)

Permission for connecting through WebSocket is controlled by the WEBSOCKET privilege.

Metrics tracking via HTTP server (Enterprise Edition)

In the Enterprise Edition, Memgraph allows tracking information about transactions, query latencies, snapshot recovery latencies, triggers, Bolt messages, indexes, streams, memory, operators and sesions, all by using using an HTTP server.

To retrieve data from the HTTP server, enter a valid Memgraph Enterprise license key.

The default address and port for the metrics server is 0.0.0.0:9091, and can be configured using the --metrics-address and --metrics-port configuration flags.

To retrieve the metrics, send a GET request to the following URL:

http://host:port

Here is an example of how to retrieve metrics using Python:

import requests
import json
 
 
def fetch_memgraph_metrics():
    metrics_url = "http://0.0.0.0:9091/metrics"
 
    try:
    response = requests.get(metrics_url, timeout=5) 
 
        if response.status_code == 200:
            try:
                metrics_data = json.loads(response.text)
                pretty_metrics = json.dumps(metrics_data, indent=4)
                print("Memgraph Metrics:\n", pretty_metrics)
            except json.JSONDecodeError:
                print("Memgraph Metrics:\n", response.text)
        else:
            print(f"Failed to fetch metrics. Status code: {response.status_code}")
    except requests.exceptions.RequestException as e:
        print(f"An error occurred: {e}")
 
 
fetch_memgraph_metrics()

The example above will print the metric similar to the example response below.

System metrics

All system metrics measuring different parts of the system can be divided into three different types:

  • Gauge - a single value of some variable in the system (e.g. memory usage)
  • Counter (uint64_t) - a value that can be incremented or decremented (e.g. number of active transactions in the system)
  • Histogram (uint64_t) - distribution of measured values (e.g. certain percentile of query latency on N measured queries)

General metrics

NameTypeDescription
vertex_countGauge (uint64_t)Number of nodes stored in the system.
edge_countGauge (uint64_t)Number of relationships stored in the system.
average_degreeGauge (double)Average number of relationships of a single node.
memory_usageGauge (uint64_t)Amount of RAM used reported by the OS (in bytes).
disk_usageGauge (uint64_t)Amount of disk space used by the data directory (in bytes).
peak_memory_usageGauge (uint64_t)Peak RAM memory usage in the system during the whole run (in bytes).

Index metrics

NameTypeDescription
ActiveLabelIndicesCounterNumber of active label indexes in the system.
ActiveLabelPropertyIndicesCounterNumber of active label property indexes in the system.
ActiveTextIndicesCounterNumber of active text indexes in the system.

Memory metrics

NameTypeDescription
PeakMemoryResGauge (uint64_t)Peak RAM memory usage in the system during the whole run (in bytes).

Operator metrics

Before a Cypher query is executed, it is converted into an internal form suitable for execution, known as a query plan. A query plan is a tree-like data structure describing a pipeline of operations that will be performed on the database in order to yield the results for a given query. Every node within a plan is known as a logical operator and describes a particular operation.

NameTypeDescription
OnceOperatorCounterNumber of times Once operator was used.
CreateNodeOperatorCounterNumber of times CreateNode operator was used.
CreateExpandOperatorCounterNumber of times CreateExpand operator was used.
ScanAllOperatorCounterNumber of times ScanAll operator was used.
ScanAllByLabelOperatorCounterNumber of times ScanAllByLabel operator was used.
ScanAllByLabelPropertyRangeOperatorCounterNumber of times ScanAllByLabelPropertyRange operator was used.
ScanAllByLabelPropertyValueOperatorCounterNumber of times ScanAllByLabelPropertyValue operator was used.
ScanAllByLabelPropertyOperatorCounterNumber of times ScanAllByLabelProperty operator was used.
ScanAllByLabelIdOperatorCounterNumber of times ScanAllByLabelId operator was used.
ExpandOperatorCounterNumber of times Expand operator was used.
ExpandVariableOperatorCounterNumber of times ExpandVariable operator was used.
ConstructNamedPathOperatorCounterNumber of times ConstructNamedPath operator was used.
FilterOperatorCounterNumber of times Filter operator was used.
ProduceOperatorCounterNumber of times Produce operator was used.
DeleteOperatorCounterNumber of times Delete operator was used.
SetPropertyOperatorCounterNumber of times SetProperty operator was used.
SetPropertiesOperatorCounterNumber of times SetProperties operator was used.
SetLabelsOperatorCounterNumber of times SetLabels operator was used.
RemovePropertyOperatorCounterNumber of times RemoveProperty operator was used.
RemoveLabelsOperatorCounterNumber of times RemoveLabels operator was used.
EdgeUniquenessFilterOperatorCounterNumber of times EdgeUniquenessFilter operator was used.
EmptyResultOperatorCounterNumber of times EmptyResult operator was used.
AccumulateOperatorCounterNumber of times Accumulate operator was used.
AggregateOperatorCounterNumber of times Aggregate operator was used.
SkipOperatorCounterNumber of times Skip operator was used.
LimitOperatorCounterNumber of times Limit operator was used.
OrderByOperatorCounterNumber of times OrderBy operator was used.
MergeOperatorCounterNumber of times Merge operator was used.
OptionalOperatorCounterNumber of times Optional operator was used.
UnwindOperatorCounterNumber of times Unwind operator was used.
DistinctOperatorCounterNumber of times Distinct operator was used.
UnionOperatorCounterNumber of times Union operator was used.
CartesianOperatorCounterNumber of times Cartesian operator was used.
CallProcedureOperatorCounterNumber of times CallProcedureOperator operator was used.
ForeachOperatorCounterNumber of times Foreach operator was used.
EvaluatePatternFilterOperatorCounterNumber of times EvaluatePatternFilter operator was used.
ApplyOperatorCounterNumber of times Apply operator was used.
HashJoinCounterNumber of times HashJoin operator was used.
IndexedJoinCounterNumber of times IndexedJoin operator was used.
PeriodicCommitCounterNumber of times PeriodicCommit operator was used.
PeriodicSubqueryCounterNumber of times PeriodicSubquery operator was used.
RollUpApplyOperatorCounterNumber of times RollUpApply operator was used.
ScanAllByEdgeIdOperatorCounterNumber of times ScanAllByEdgeId operator was used.
ScanAllByEdgeTypeOperatorCounterNumber of times ScanAllByEdgeType operator was used.

Query metrics

NameTypeDescription
QueryExecutionLatency_us_50pHistogramQuery execution latency in microseconds (50th percentile).
QueryExecutionLatency_us_90pHistogramQuery execution latency in microseconds (90th percentile).
QueryExecutionLatency_us_99pHistogramQuery execution latency in microseconds (99th percentile).

Query type metrics

NameTypeDescription
ReadQueryCounterNumber of read-only queries executed.
WriteQueryCounterNumber of write-only queries executed.
ReadWriteQueryCounterNumber of read-write queries executed.

Session metrics

NameTypeDescription
ActiveSessionsCounterNumber of active connections.
ActiveBoltSessionsCounterNumber of active Bolt connections.
ActiveTCPSessionsCounterNumber of active TCP connections.
ActiveSSLSessionsCounterNumber of active SSL connections.
ActiveWebSocketSessionsCounterNumber of active websocket connections.
BoltMessagesCounterNumber of Bolt messages sent.

Snapshot metrics

NameTypeDescription
SnapshotCreationLatency_us_50pHistogramSnapshot creation latency in microseconds (50th percentile).
SnapshotCreationLatency_us_90pHistogramSnapshot creation latency in microseconds (90th percentile)-
SnapshotCreationLatency_us_99pHistogramSnapshot creation latency in microseconds (99th percentile).
SnapshotRecoveryLatency_us_50pHistogramSnapshot recovery latency in microseconds (50th percentile).
SnapshotRecoveryLatency_us_90pHistogramSnapshot recovery latency in microseconds (90th percentile).
SnapshotRecoveryLatency_us_99pHistogramSnapshot recovery latency in microseconds (99th percentile).

Stream metrics

NameTypeDescription
StreamsCreatedCounterNumber of streams created.
MessagesConsumedCounterNumber of consumed streamed messages.

Transaction metrics

NameTypeDescription
ActiveTransactionsCounterNumber of active transactions.
CommitedTransactionsCounterNumber of committed transactions.
RollbackedTransactionsCounterNumber of rollbacked transactions.
FailedQueryCounterNumber of times executing a query failed (either during parse time or runtime).
FailedPrepareCounterNumber of times preparing a query failed.
FailedPullCounterNumber of times pulling a query failed.
SuccessfulQueryCounterNumber of successful queries.

Trigger metrics

NameTypeDescription
TriggersCreatedCounterNumber of Triggers created.
TriggersExecutedCounterNumber of Triggers executed.

Example response

If there aren’t any modifying configurations, by sending a GET request to localhost:9091 in the local Memgraph build will result in a response similar to the one below.

{
    "General": {
        "average_degree": 0.0,
        "disk_usage": 1417846,
        "edge_count": 0,
        "memory_usage": 36937728,
        "peak_memory_usage": 100265984,
        "vertex_count": 0
 },
    "Index": {
        "ActiveLabelIndices": 0,
        "ActiveLabelPropertyIndices": 0, 
        "ActiveTextIndices": 0
 },
    "Memory": {
        "PeakMemoryRes": 100265984
 },
    "Operator": {
        "AccumulateOperator": 0,
        "AggregateOperator": 0,
        "ApplyOperator": 0,
        "CallProcedureOperator": 0,
        "CartesianOperator": 0,
        "ConstructNamedPathOperator": 0,
        "CreateExpandOperator": 0,
        "CreateNodeOperator": 0,
        "DeleteOperator": 0,
        "DistinctOperator": 0,
        "EdgeUniquenessFilterOperator": 0,
        "EmptyResultOperator": 0,
        "EvaluatePatternFilterOperator": 0,
        "ExpandOperator": 0,
        "ExpandVariableOperator": 0,
        "FilterOperator": 0,
        "ForeachOperator": 0,
        "HashJoinOperator": 0,
        "IndexedJoinOperator": 0,
        "LimitOperator": 0,
        "MergeOperator": 0,
        "OnceOperator": 0,
        "OptionalOperator": 0,
        "OrderByOperator": 0,
        "ProduceOperator": 0,
        "PeriodicCommit": 0,
        "PeriodicSubquery": 0,
        "RemoveLabelsOperator": 0,
        "RemovePropertyOperator": 0,
        "RollUpApplyOperator": 0,
        "ScanAllByEdgeIdOperator": 0,
        "ScanAllByEdgeTypeOperator": 0,
        "ScanAllByIdOperator": 0,
        "ScanAllByLabelOperator": 0,
        "ScanAllByLabelPropertyOperator": 0,
        "ScanAllByLabelPropertyRangeOperator": 0,
        "ScanAllByLabelPropertyValueOperator": 0,
        "ScanAllOperator": 0,
        "SetLabelsOperator": 0,
        "SetPropertiesOperator": 0,
        "SetPropertyOperator": 0,
        "SkipOperator": 0,
        "UnionOperator": 0,
        "UnwindOperator": 0
 },
    "Query": {
        "QueryExecutionLatency_us_50p": 0,
        "QueryExecutionLatency_us_90p": 0,
        "QueryExecutionLatency_us_99p": 0
 },
    "QueryType": {
        "ReadQuery": 0,
        "ReadWriteQuery": 0,
        "WriteQuery": 0
 },
    "Session": {
        "ActiveBoltSessions": 0,
        "ActiveSSLSessions": 0,
        "ActiveSessions": 0,
        "ActiveTCPSessions": 0,
        "ActiveWebSocketSessions": 0,
        "BoltMessages": 0
 },
    "Snapshot": {
        "SnapshotCreationLatency_us_50p": 4860,
        "SnapshotCreationLatency_us_90p": 4860,
        "SnapshotCreationLatency_us_99p": 4860,
        "SnapshotRecoveryLatency_us_50p": 628,
        "SnapshotRecoveryLatency_us_90p": 628,
        "SnapshotRecoveryLatency_us_99p": 628
 },
    "Stream": {
        "MessagesConsumed": 0,
        "StreamsCreated": 0
 },
    "Transaction": {
        "ActiveTransactions": 0,
        "CommitedTransactions": 0,
        "FailedPrepare": 0,
        "FailedPull": 0,
        "FailedQuery": 0,
        "RollbackedTransactions": 0,
        "SuccessfulQuery": 0
 },
    "Trigger": {
        "TriggersCreated": 0,
        "TriggersExecuted": 0
 }
}

Prometheus exporter

Memgraph also supports exporting metrics in the Prometheus format. For more information on how to set up the Prometheus exporter, refer to the Prometheus exporter repository.

Session trace

A session refers to a temporary, interactive connection from a client (either a user or an application) to the database server. It used to execute a series of transactions and queries. Session trace is a feature in Memgraph that allows you to profile the execution of all queries within a session, providing detailed information on query planning and execution. This can be invaluable for understanding performance bottlenecks and optimizing database interactions.

Before enabling session trace, you need to define a directory where the session logs will be stored. This is done using the --query-log-directory configuration flag, which can be set either at Memgraph startup via the command line, or during an active session. By default, --query-log-directory is set to /var/log/memgraph/session_trace.

To set the query log directory while launching from the command line using Docker, do the following:

docker run -p 7687:7687 -p 7444:7444 --name memgraph memgraph/memgraph --query-log-directory {directory}

To set the query log directory interactively, use the following command:

SET DATABASE SETTING "query-log-directory" TO "{directory}";

Replace {directory} with the desired path inside the Memgraph environment, either native or within a container.

If this directory is not set, you will receive the following error when attempting to enable session trace:

The flag --query-log-directory has to be present in order to enable session trace.

Once the query log directory is configured, you can enable session trace for your current session using the following command:

SET SESSION TRACE ON;

This command initiates logging for the session, and the result will include the unique session UUID being tracked:

session uuid
”de9c907b-6675-40bd-bf09-d4ce7b24f22d”

This UUID is used to identify the session log file.

Understanding the Session Trace Log

The session trace log captures detailed information for every query executed during the session, including:

  • Session UUID: Unique identifier for the session.
  • Transaction ID: ID of the current query transaction.
  • User Name: The user executing the query.
  • Query Name and Explain Plan: The name and execution plan of the query.
  • Query Planning and Execution Timings: Start and end timestamps, as well as execution times for query planning and execution.
  • Commit Timings: Start and end timestamps, and execution time for query commit.
  • Profile Plan and Metadata: Detailed profiling information and metadata for the query.

This detailed logging helps in profiling the entire session workload, increasing visibility into potential performance bottlenecks without congesting the system log.

Accessing the Log File

The session trace log is stored at {--query-log-directory}/{session uuid}.log, where --query-log-directory is the directory you defined earlier, and session uuid is the UUID obtained when session trace was enabled.

To stop logging additional information for the current session, use the following command:

SET SESSION TRACE OFF;

After executing this command, no further data will be written to the session trace log for the active session.