Monitoring

Monitoring applications and databases is essential to track the load and resources used by the system.

Memgraph currently supports:

Real-time logs tracking via WebSocket server: Log tracking is helpful for debugging purposes and monitoring the database operations.
Metrics tracking via HTTP server (Enterprise Edition): In the Enterprise edition, besides log tracking, Memgraph allows tracking information about transactions, query latencies, types of queries executed, snapshot recovery latencies, triggers, TTL data, Bolt messages, indexes, streams, memory, operators and sessions. There are also many metrics which are trying to describe the state of high availability. These include metrics which describe latencies of RPC messages, time needed to perform a failover and counters for events occurring throughout the lifetime of HA Memgraph.

Real-time logs tracking via WebSocket server

Connect to Memgraph’s logging server via WebSocket to forward logs to all the connected clients.

Log messages

Each log that is written to the log file is forwarded to the connected clients in the following format:

{
  event: "log",
  level: "trace"|"debug"|"info"|"warning"|"error"|"critical",
  message: "<log-message>"
}

Connection

To connect to Memgraph’s WebSocket server, use the following URL:

ws://host:port

The default host is 0.0.0.0, but it can be changed using the --monitoring-address= configuration flag.

The default port is 7444, but it can be changed using the --monitoring-port configuration flag.

To connect to Memgraph’s WebSocket server using the default configuration and Python client, you can use the following code snippet:

import asyncio
import websockets
 
HOST = 'localhost'
PORT = '7444'
 
# The WebSocket URL
ws_url = f'ws://{HOST}:{PORT}'
 
async def read_logs():
    async with websockets.connect(ws_url) as websocket:
        print(f"Connected to Memgraph at {ws_url}")
        try:
            # Keep reading messages (logs) from the server
            while True:
 log_message = await websocket.recv()
                print(log_message)
        except websockets.exceptions.ConnectionClosed:
            print("Connection to Memgraph closed.")
 
asyncio.run(read_logs())

The code structure is similar in other languages, depending on the WebSocket library used.

If you want to connect with a WebSocket Secure (WSS) connection, set the --bolt-cert-file and --bolt-key-file configuration flags to enable an SSL connection. Refer to the configuration page to learn how to update the configuration and to see the list of all available configuration flags.

Authentication

When authentication is not used due to no users present in Memgraph, no authentication message is expected, and no response will be returned.

When the authentication is used, Memgraph won’t send a message to a certain connection until it’s authenticated.

To authenticate, create a JSON with the credentials in the following format:

{
  "username": "<username>",
  "password": "<password>"
}

If the credentials are valid, the connection will be made, and the client will receive the messages. As a response, the client should receive the following message:

{
  "success": true,
  "message": "User has been successfully authenticated!"
}

If they are invalid or the first message is in an invalid format, the connection is dropped. As a response, the following message is sent:

{
  "success": false,
  "message": "<error-message>"
}

To use WebSocket to authenticate, send the JSON message to the WebSocket server:

import asyncio
import json
import websockets
 
HOST = "localhost"
PORT = "7444"
USERNAME = "memgraph"
PASSWORD = "memgraph"
 
ws_url = f"ws://{HOST}:{PORT}"
 
 
async def authenticate_and_read_logs(websocket):
    # Send authentication credentials
    credentials = json.dumps({"username": USERNAME, "password": PASSWORD})
    await websocket.send(credentials)
 
    # Wait for authentication response
    response = await websocket.recv()
    response_data = json.loads(response)
    if response_data.get("success"):
        print("Authentication successful!")
    else:
        print(f"Authentication failed: {response_data.get('message')}")
        return  # Stop if authentication fails
 
    # After successful authentication, start reading logs
    try:
        while True:
            log_message = await websocket.recv()
            print(log_message)
    except websockets.exceptions.ConnectionClosed:
        print("Connection to Memgraph closed.")
 
 
async def connect_and_authenticate():
    async with websockets.connect(ws_url) as websocket:
        print(f"Connected to Memgraph at {ws_url}")
        await authenticate_and_read_logs(websocket)
 
 
asyncio.run(connect_and_authenticate())

Authorization (Enterprise)

Permission for connecting through WebSocket is controlled by the WEBSOCKET privilege.

Metrics tracking via HTTP server (Enterprise Edition)

In the Enterprise Edition, Memgraph allows tracking information about high availability, transactions, query latencies, snapshot recovery latencies, triggers, Bolt messages, indexes, streams, memory, operators and sessions, all by using using an HTTP server.

To retrieve data from the HTTP server, enter a valid Memgraph Enterprise license key.

The default address and port for the metrics server is 0.0.0.0:9091, and can be configured using the --metrics-address and --metrics-port configuration flags.

To retrieve the metrics, send a GET request to the following URL:

http://host:port

Here is an example of how to retrieve metrics using Python:

import requests
import json
 
 
def fetch_memgraph_metrics():
    metrics_url = "http://0.0.0.0:9091/metrics"
 
    try:
    response = requests.get(metrics_url, timeout=5) 
 
        if response.status_code == 200:
            try:
                metrics_data = json.loads(response.text)
                pretty_metrics = json.dumps(metrics_data, indent=4)
                print("Memgraph Metrics:\n", pretty_metrics)
            except json.JSONDecodeError:
                print("Memgraph Metrics:\n", response.text)
        else:
            print(f"Failed to fetch metrics. Status code: {response.status_code}")
    except requests.exceptions.RequestException as e:
        print(f"An error occurred: {e}")
 
 
fetch_memgraph_metrics()

The example above will print the metric similar to the example response below.

System metrics

All system metrics measuring different parts of the system can be divided into three different types:

Gauge - a single value of some variable in the system (e.g. memory usage)
Counter (uint64_t) - a value that can be incremented or decremented (e.g. number of active transactions in the system)
Histogram (uint64_t) - distribution of measured values (e.g. certain percentile of query latency on N measured queries)

General metrics

Name	Type	Description
average_degree	Counter	Average number of relationships of a single node.
disk_usage	Gauge	Amount of disk space used by the data directory (in bytes).
edge_count	Counter	Number of relationships stored in the system.
memory_usage	Gauge	Amount of RAM used reported by the OS (in bytes).
peak_memory_usage	Gauge	Peak amount of RAM used reported by the OS (in bytes).
unreleased_delta_objects	Counter	Number of unreleased delta objects.
vertex_count	Counter	Number of nodes stored in the system.
SocketConnect_us_50p	Histogram	Latency of connecting to the socket, 50th percentile.
SocketConnect_us_90p	Histogram	Latency of connecting to the socket, 90th percentile.
SocketConnect_us_99p	Histogram	Latency of connecting to the socket, 99th percentile.

Index metrics

Name	Type	Description
ActiveLabelIndices	Counter	Number of active label indexes in the system.
ActiveLabelPropertyIndices	Counter	Number of active label property indexes in the system.
ActivePointIndices	Counter	Number of active point indices in the system.
ActiveTextIndices	Counter	Number of active text indexes in the system.

Memory metrics

Name	Type	Description
GCLatency_us_50p	Histogram	GC total cleanup time in microseconds (50th percentile).
GCLatency_us_90p	Histogram	GC total cleanup time in microseconds (90th percentile).
GCLatency_us_99p	Histogram	GC total cleanup time in microseconds (99th percentile).
GCSkiplistCleanupLatency_us_50p	Histogram	GC time spent for cleaning skiplists in indexes (50th percentile).
GCSkiplistCleanupLatency_us_90p	Histogram	GC time spent for cleaning skiplists in indexes (90th percentile).
GCSkiplistCleanupLatency_us_99p	Histogram	GC time spent for cleaning skiplists in indexes (99th percentile).

Operator metrics

Before a Cypher query is executed, it is converted into an internal form suitable for execution, known as a query plan. A query plan is a tree-like data structure describing a pipeline of operations that will be performed on the database in order to yield the results for a given query. Every node within a plan is known as a logical operator and describes a particular operation.

Name	Type	Description
OnceOperator	Counter	Number of times Once operator was used.
CreateNodeOperator	Counter	Number of times CreateNode operator was used.
CreateExpandOperator	Counter	Number of times CreateExpand operator was used.
ScanAllOperator	Counter	Number of times ScanAll operator was used.
ScanAllByLabelOperator	Counter	Number of times ScanAllByLabel operator was used.
ScanAllByLabelPropertiesOperator	Counter	Number of times ScanAllByLabelProperties operator was used.
ExpandOperator	Counter	Number of times Expand operator was used.
ExpandVariableOperator	Counter	Number of times ExpandVariable operator was used.
ConstructNamedPathOperator	Counter	Number of times ConstructNamedPath operator was used.
FilterOperator	Counter	Number of times Filter operator was used.
ProduceOperator	Counter	Number of times Produce operator was used.
DeleteOperator	Counter	Number of times Delete operator was used.
SetPropertyOperator	Counter	Number of times SetProperty operator was used.
SetPropertiesOperator	Counter	Number of times SetProperties operator was used.
SetLabelsOperator	Counter	Number of times SetLabels operator was used.
RemovePropertyOperator	Counter	Number of times RemoveProperty operator was used.
RemoveLabelsOperator	Counter	Number of times RemoveLabels operator was used.
EdgeUniquenessFilterOperator	Counter	Number of times EdgeUniquenessFilter operator was used.
EmptyResultOperator	Counter	Number of times EmptyResult operator was used.
AccumulateOperator	Counter	Number of times Accumulate operator was used.
AggregateOperator	Counter	Number of times Aggregate operator was used.
SkipOperator	Counter	Number of times Skip operator was used.
LimitOperator	Counter	Number of times Limit operator was used.
OrderByOperator	Counter	Number of times OrderBy operator was used.
MergeOperator	Counter	Number of times Merge operator was used.
OptionalOperator	Counter	Number of times Optional operator was used.
UnwindOperator	Counter	Number of times Unwind operator was used.
DistinctOperator	Counter	Number of times Distinct operator was used.
UnionOperator	Counter	Number of times Union operator was used.
CartesianOperator	Counter	Number of times Cartesian operator was used.
CallProcedureOperator	Counter	Number of times CallProcedureOperator operator was used.
ForeachOperator	Counter	Number of times Foreach operator was used.
EvaluatePatternFilterOperator	Counter	Number of times EvaluatePatternFilter operator was used.
ApplyOperator	Counter	Number of times Apply operator was used.
HashJoin	Counter	Number of times HashJoin operator was used.
IndexedJoin	Counter	Number of times IndexedJoin operator was used.
PeriodicCommit	Counter	Number of times PeriodicCommit operator was used.
PeriodicSubquery	Counter	Number of times PeriodicSubquery operator was used.
RollUpApplyOperator	Counter	Number of times RollUpApply operator was used.
ScanAllByEdgeIdOperator	Counter	Number of times ScanAllByEdgeId operator was used.
ScanAllByEdgeOperator	Counter	Number of times ScanAllByEdge operator was used.
ScanAllByEdgeTypeOperator	Counter	Number of times ScanAllByEdgeType operator was used.
ScanAllByEdgeTypePropertyOperator	Counter	Number of times ScanAllByEdgeTypeProperty operator was used.
ScanAllByEdgeTypePropertyRangeOperator	Counter	Number of times ScanAllByEdgeTypePropertyRange operator was used.
ScanAllByEdgeTypePropertyValueOperator	Counter	Number of times ScanAllByEdgeTypePropertyValue operator was used.
ScanAllByPointDistanceOperator	Counter	Number of times ScanAllByPointDistance operator was used.
ScanAllByPointWithinbboxOperator	Counter	Number of times ScanAllByPointWithinbbox operator was used.

Query metrics

Name	Type	Description
QueryExecutionLatency_us_50p	Histogram	Query execution latency in microseconds (50th percentile).
QueryExecutionLatency_us_90p	Histogram	Query execution latency in microseconds (90th percentile).
QueryExecutionLatency_us_99p	Histogram	Query execution latency in microseconds (99th percentile).

Query type metrics

Name	Type	Description
ReadQuery	Counter	Number of read-only queries executed.
WriteQuery	Counter	Number of write-only queries executed.
ReadWriteQuery	Counter	Number of read-write queries executed.

Session metrics

Name	Type	Description
ActiveSessions	Counter	Number of active connections.
ActiveBoltSessions	Counter	Number of active Bolt connections.
ActiveTCPSessions	Counter	Number of active TCP connections.
ActiveSSLSessions	Counter	Number of active SSL connections.
ActiveWebSocketSessions	Counter	Number of active websocket connections.
BoltMessages	Counter	Number of Bolt messages sent.

Snapshot metrics

Name	Type	Description
SnapshotCreationLatency_us_50p	Histogram	Snapshot creation latency in microseconds (50th percentile).
SnapshotCreationLatency_us_90p	Histogram	Snapshot creation latency in microseconds (90th percentile).
SnapshotCreationLatency_us_99p	Histogram	Snapshot creation latency in microseconds (99th percentile).
SnapshotRecoveryLatency_us_50p	Histogram	Snapshot recovery latency in microseconds (50th percentile).
SnapshotRecoveryLatency_us_90p	Histogram	Snapshot recovery latency in microseconds (90th percentile).
SnapshotRecoveryLatency_us_99p	Histogram	Snapshot recovery latency in microseconds (99th percentile).

Stream metrics

Name	Type	Description
StreamsCreated	Counter	Number of streams created.
MessagesConsumed	Counter	Number of consumed streamed messages.

Transaction metrics

Name	Type	Description
ActiveTransactions	Counter	Number of active transactions.
CommittedTransactions	Counter	Number of committed transactions.
RollbackedTransactions	Counter	Number of rollbacked transactions.
FailedQuery	Counter	Number of times executing a query failed (either during parse time or runtime).
FailedPrepare	Counter	Number of times preparing a query failed.
FailedPull	Counter	Number of times pulling a query failed.
SuccessfulQuery	Counter	Number of successful queries.
TransientErrors	Counter	Number of times a transient error happened. Transient errors are those errors which can be retried.
WriteWriteConflicts	Counter	Number of times a write-write conflict happened (2 transactions performing modification to the same node simultaneously).

Trigger metrics

Name	Type	Description
TriggersCreated	Counter	Number of Triggers created.
TriggersExecuted	Counter	Number of Triggers executed.

TTL metrics

Name	Type	Description
DeletedNodes	Counter	Number of deleted TTL edges.
DeletedEdges	Counter	Number of deleted TTL nodes.

HA metrics

Name	Type	Description
PrepareCommitRpc_us_50p	Histogram	PrepareCommitRpc latency in microseconds (50th percentile).
PrepareCommitRpc_us_90p	Histogram	PrepareCommitRpc latency in microseconds (90th percentile).
PrepareCommitRpc_us_99p	Histogram	PrepareCommitRpc latency in microseconds (99th percentile).
CurrentWalRpc_us_50p	Histogram	CurrentWalRpc latency in microseconds (50th percentile).
CurrentWalRpc_us_90p	Histogram	CurrentWalRpc latency in microseconds (90th percentile).
CurrentWalRpc_us_99p	Histogram	CurrentWalRpc latency in microseconds (99th percentile).
WalFilesRpc_us_50p	Histogram	WalFilesRpc latency in microseconds (50th percentile).
WalFilesRpc_us_90p	Histogram	WalFilesRpc latency in microseconds (90th percentile).
WalFilesRpc_us_99p	Histogram	WalFilesRpc latency in microseconds (99th percentile).
ReplicaStream_us_50p	Histogram	Time needed to construct PrepareCommitRpc stream (50th percentile).
ReplicaStream_us_90p	Histogram	Time needed to construct PrepareCommitRpc stream (90th percentile).
ReplicaStream_us_99p	Histogram	Time needed to construct PrepareCommitRpc stream (99th percentile).
SnapshotRpc_us_50p	Histogram	SnapshotRpc latency in microseconds (50th percentile).
SnapshotRpc_us_90p	Histogram	SnapshotRpc latency in microseconds (90th percentile).
SnapshotRpc_us_99p	Histogram	SnapshotRpc latency in microseconds (99th percentile).
FrequentHeartbeatRpc_us_50p	Histogram	FrequentHeartbeatRpc latency in microseconds (50th percentile).
FrequentHeartbeatRpc_us_90p	Histogram	FrequentHeartbeatRpc latency in microseconds (90th percentile).
FrequentHeartbeatRpc_us_99p	Histogram	FrequentHeartbeatRpc latency in microseconds (99th percentile).
HeartbeatRpc_us_50p	Histogram	HeartbeatRpc latency in microseconds (50th percentile).
HeartbeatRpc_us_90p	Histogram	HeartbeatRpc latency in microseconds (90th percentile).
HeartbeatRpc_us_99p	Histogram	HeartbeatRpc latency in microseconds (99th percentile).
SystemRecoveryRpc_us_50p	Histogram	SystemRecoveryRpc latency in microseconds (50th percentile).
SystemRecoveryRpc_us_90p	Histogram	SystemRecoveryRpc latency in microseconds (90th percentile).
SystemRecoveryRpc_us_99p	Histogram	SystemRecoveryRpc latency in microseconds (99th percentile).
ChooseMostUpToDateInstance_us_50p	Histogram	ChooseMostUpToDateInstance latency in microseconds (50th percentile).
ChooseMostUpToDateInstance_us_90p	Histogram	ChooseMostUpToDateInstance latency in microseconds (90th percentile).
ChooseMostUpToDateInstance_us_99p	Histogram	ChooseMostUpToDateInstance latency in microseconds (99th percentile).
GetHistories_us_50p	Histogram	GetHistories latency in microseconds (50th percentile).
GetHistories_us_90p	Histogram	GetHistories latency in microseconds (90th percentile).
GetHistories_us_99p	Histogram	GetHistories latency in microseconds (99th percentile).
InstanceFailCallback_us_50p	Histogram	InstanceFailCallback latency in microseconds (50th percentile).
InstanceFailCallback_us_90p	Histogram	InstanceFailCallback latency in microseconds (90th percentile).
InstanceFailCallback_us_99p	Histogram	InstanceFailCallback latency in microseconds (99th percentile).
InstanceSuccCallback_us_50p	Histogram	InstanceSuccCallback latency in microseconds (50th percentile).
InstanceSuccCallback_us_90p	Histogram	InstanceSuccCallback latency in microseconds (90th percentile).
InstanceSuccCallback_us_99p	Histogram	InstanceSuccCallback latency in microseconds (99th percentile).
DataFailover_us_50p	Histogram	DataFailover latency in microseconds (50th percentile).
DataFailover_us_90p	Histogram	DataFailover latency in microseconds (90th percentile).
DataFailover_us_99p	Histogram	DataFailover latency in microseconds (99th percentile).
StartTxnReplication_us_50p	Histogram	StartTxnReplication latency in microseconds (50th percentile).
StartTxnReplication_us_90p	Histogram	StartTxnReplication latency in microseconds (90th percentile).
StartTxnReplication_us_99p	Histogram	StartTxnReplication latency in microseconds (99th percentile).
FinalizeTxnReplication_us_50p	Histogram	FinalizeTxnReplication latency in microseconds (50th percentile).
FinalizeTxnReplication_us_90p	Histogram	FinalizeTxnReplication latency in microseconds (90th percentile).
FinalizeTxnReplication_us_99p	Histogram	FinalizeTxnReplication latency in microseconds (99th percentile).
DemoteMainToReplicaRpc_us_50p	Histogram	DemoteMainToReplicaRpc latency in microseconds (50th percentile).
DemoteMainToReplicaRpc_us_90p	Histogram	DemoteMainToReplicaRpc latency in microseconds (90th percentile).
DemoteMainToReplicaRpc_us_99p	Histogram	DemoteMainToReplicaRpc latency in microseconds (99th percentile).
EnableWritingOnMainRpc_us_50p	Histogram	EnableWritingOnMainRpc latency in microseconds (50th percentile).
EnableWritingOnMainRpc_us_90p	Histogram	EnableWritingOnMainRpc latency in microseconds (90th percentile).
EnableWritingOnMainRpc_us_99p	Histogram	EnableWritingOnMainRpc latency in microseconds (99th percentile).
GetDatabaseHistoriesRpc_us_50p	Histogram	GetDatabaseHistoriesRpc latency in microseconds (50th percentile).
GetDatabaseHistoriesRpc_us_90p	Histogram	GetDatabaseHistoriesRpc latency in microseconds (90th percentile).
GetDatabaseHistoriesRpc_us_99p	Histogram	GetDatabaseHistoriesRpc latency in microseconds (99th percentile).
PromoteToMainRpc_us_50p	Histogram	PromoteToMainRpc latency in microseconds (50th percentile).
PromoteToMainRpc_us_90p	Histogram	PromoteToMainRpc latency in microseconds (90th percentile).
PromoteToMainRpc_us_99p	Histogram	PromoteToMainRpc latency in microseconds (99th percentile).
RegisterReplicaOnMainRpc_us_50p	Histogram	RegisterReplicaOnMainRpc latency in microseconds (50th percentile).
RegisterReplicaOnMainRpc_us_90p	Histogram	RegisterReplicaOnMainRpc latency in microseconds (90th percentile).
RegisterReplicaOnMainRpc_us_99p	Histogram	RegisterReplicaOnMainRpc latency in microseconds (99th percentile).
StateCheckRpc_us_50p	Histogram	StateCheckRpc latency in microseconds (50th percentile).
StateCheckRpc_us_90p	Histogram	StateCheckRpc latency in microseconds (90th percentile).
StateCheckRpc_us_99p	Histogram	StateCheckRpc latency in microseconds (99th percentile).
UnregisterReplicaRpc_us_50p	Histogram	UnregisterReplicaRpc latency in microseconds (50th percentile).
UnregisterReplicaRpc_us_90p	Histogram	UnregisterReplicaRpc latency in microseconds (90th percentile).
UnregisterReplicaRpc_us_99p	Histogram	UnregisterReplicaRpc latency in microseconds (99th percentile).
BecomeLeaderSuccess	Counter	The number of times coordinators successfully became leaders.
FailedToBecomeLeader	Counter	The number of times coordinators failed to become leaders.
SuccessfulFailovers	Counter	The number of times failover was done successfully.
RaftFailedFailovers	Counter	The number of times failover failed because writing to Raft failed.
NoAliveInstanceFailedFailovers	Counter	The number of times failover failed because no instance was alive.
ShowInstance	Counter	The number of times `SHOW INSTANCE` query was called.
ShowInstances	Counter	The number of times `SHOW INSTANCES` query was called.
DemoteInstance	Counter	The number of times the user manually demoted instance.
UnregisterReplInstance	Counter	The number of times the user tried to unregister replication instance.
RemoveCoordInstance	Counter	The number of times the user tried to remove coordinator instance.
StateCheckRpcFail	Counter	The number of times coordinators received unsuccessful or no response to StateCheckRpc.
StateCheckRpcSuccess	Counter	The number of times coordinators received successful response to StateCheckRpc.
UnregisterReplicaRpcFail	Counter	The number of times coordinators received unsuccessful or no response to UnregisterReplicaRpc.
UnregisterReplicaRpcSuccess	Counter	The number of times coordinators received successful response to UnregisterReplicaRpc.
EnableWritingOnMainRpcFail	Counter	The number of times coordinators received unsuccessful or no response to EnableWritingOnMainRpc.
EnableWritingOnMainRpcSuccess	Counter	The number of times coordinators received successful response to EnableWritingOnMainRpc.
PromoteToMainRpcFail	Counter	The number of times coordinators received unsuccessful or no response to PromoteToMainRpc.
PromoteToMainRpcSuccess	Counter	The number of times coordinators received successful response to PromoteToMainRpc.
DemoteMainToReplicaRpcFail	Counter	The number of times coordinators received unsuccessful or no response to DemoteMainToReplicaRpc.
DemoteMainToReplicaRpcSuccess	Counter	The number of times coordinators received successful response to DemoteMainToReplicaRpc.
RegisterReplicaOnMainRpcFail	Counter	The number of times coordinators received unsuccessful or no response to RegisterReplicaOnMainRpc.
RegisterReplicaOnMainRpcSuccess	Counter	The number of times coordinators received successful response to RegisterReplicaOnMainRpc.
SwapMainUUIDRpcFail	Counter	The number of times coordinators received unsuccessful or no response to SwapMainUUIDRpc.
SwapMainUUIDRpcSuccess	Counter	The number of times coordinators received successful response to SwapMainUUIDRpc.
GetDatabaseHistoriesRpcFail	Counter	The number of times coordinators received unsuccessful or no response to GetDatabaseHistoriesRpc.
GetDatabaseHistoriesRpcSuccess	Counter	The number of times coordinators received successful response to GetDatabaseHistoriesRpc.
ReplicaRecoverySuccess	Counter	The number of times the replica recovery process finished successfully.
ReplicaRecoveryFail	Counter	The number of times the replica recovery process finished unsuccessfully.
ReplicaRecoverySkip	Counter	The number of times the replica recovery task was skipped.

All HA metrics with type Counter are aggregated for all coordinators. That makes it easier for users to track what is going on since they don’t need to aggregate by their own metrics specific to each coordinators. Also, after every pull, HA metrics with type Counter are in Memgraph reset to 0. This makes it possible to use Gauge on the client side without worrying that on restart we will lose all data.

Prometheus exporter

Memgraph also supports exporting metrics in the Prometheus format. For more information on how to set up the Prometheus exporter, refer to the Prometheus exporter repository.

Session trace

A session refers to a temporary, interactive connection from a client (either a user or an application) to the database server. It used to execute a series of transactions and queries. Session trace is a feature in Memgraph that allows you to profile the execution of all queries within a session, providing detailed information on query planning and execution. This can be invaluable for understanding performance bottlenecks and optimizing database interactions.

Before enabling session trace, you need to define a directory where the session logs will be stored. This is done using the --query-log-directory configuration flag, which can be set either at Memgraph startup via the command line, or during an active session. By default, --query-log-directory is set to /var/log/memgraph/session_trace.

To set the query log directory while launching from the command line using Docker, do the following:

docker run -p 7687:7687 -p 7444:7444 --name memgraph memgraph/memgraph --query-log-directory {directory}

To set the query log directory interactively, use the following command:

SET DATABASE SETTING "query-log-directory" TO "{directory}";

Replace {directory} with the desired path inside the Memgraph environment, either native or within a container.

If this directory is not set, you will receive the following error when attempting to enable session trace:

The flag --query-log-directory has to be present in order to enable session trace.

Once the query log directory is configured, you can enable session trace for your current session using the following command:

SET SESSION TRACE ON;

This command initiates logging for the session, and the result will include the unique session UUID being tracked:

session uuid
”de9c907b-6675-40bd-bf09-d4ce7b24f22d”

This UUID is used to identify the session log file.

Understanding the Session Trace Log

The session trace log captures detailed information for every query executed during the session, including:

Session UUID: Unique identifier for the session.
Transaction ID: ID of the current query transaction.
User Name: The user executing the query.
Query Name and Explain Plan: The name and execution plan of the query.
Query Planning and Execution Timings: Start and end timestamps, as well as execution times for query planning and execution.
Commit Timings: Start and end timestamps, and execution time for query commit.
Profile Plan and Metadata: Detailed profiling information and metadata for the query.

This detailed logging helps in profiling the entire session workload, increasing visibility into potential performance bottlenecks without congesting the system log.

Accessing the Log File

The session trace log is stored at {--query-log-directory}/{session uuid}.log, where --query-log-directory is the directory you defined earlier, and session uuid is the UUID obtained when session trace was enabled.

To stop logging additional information for the current session, use the following command:

SET SESSION TRACE OFF;

After executing this command, no further data will be written to the session trace log for the active session.

Logs Multi-tenancy