Monitoring
Monitoring applications and databases is essential to track the load and resources used by the system.
Memgraph currently supports:
- Real-time logs tracking via WebSocket server: Log tracking is helpful for debugging purposes and monitoring the database operations.
- Metrics tracking via HTTP server (Enterprise Edition): In the Enterprise edition, besides log tracking, Memgraph allows tracking information about transactions, query latencies, snapshot recovery latencies, triggers, Bolt messages, indexes, streams, memory, operators and sesions, all by using an HTTP server.
Real-time logs tracking via WebSocket server
Connect to Memgraph's logging server via WebSocket to forward logs to all the connected clients.
Log messages
Each log that is written to the log file is forwarded to the connected clients in the following format:
{
event: "log",
level: "trace"|"debug"|"info"|"warning"|"error"|"critical",
message: "<log-message>"
}
Connection
To connect to Memgraph's WebSocket server, use the following URL:
ws://host:port
The default host is 0.0.0.0
, but it can be changed using the
--monitoring-address=
configuration flag.
The default port is 7444
, but it can be changed using the --monitoring-port
configuration flag.
To connect to Memgraph's WebSocket server using the default configuration and Python client, you can use the following code snippet:
import asyncio
import websockets
HOST = 'localhost'
PORT = '7444'
# The WebSocket URL
ws_url = f'ws://{HOST}:{PORT}'
async def read_logs():
async with websockets.connect(ws_url) as websocket:
print(f"Connected to Memgraph at {ws_url}")
try:
# Keep reading messages (logs) from the server
while True:
log_message = await websocket.recv()
print(log_message)
except websockets.exceptions.ConnectionClosed:
print("Connection to Memgraph closed.")
asyncio.run(read_logs())
The code structure is similar in other languages, depending on the WebSocket library used.
If you want to connect with a WebSocket Secure (WSS) connection, set the
--bolt-cert-file
and --bolt-key-file
configuration flags to enable an SSL
connection. Refer to the configuration page to learn how to update the
configuration and to
see the list of all available configuration
flags.
Authentication
When authentication is not used due to no users present in Memgraph, no authentication message is expected, and no response will be returned.
When the authentication is used, Memgraph won't send a message to a certain connection until it's authenticated.
To authenticate, create a JSON with the credentials in the following format:
{
"username": "<username>",
"password": "<password>"
}
If the credentials are valid, the connection will be made, and the client will receive the messages. As a response, the client should receive the following message:
{
"success": true,
"message": "User has been successfully authenticated!"
}
If they are invalid or the first message is in an invalid format, the connection is dropped. As a response, the following message is sent:
{
"success": false,
"message": "<error-message>"
}
To use WebSocket to authenticate, send the JSON message to the WebSocket server:
import asyncio
import json
import websockets
HOST = "localhost"
PORT = "7444"
USERNAME = "memgraph"
PASSWORD = "memgraph"
ws_url = f"ws://{HOST}:{PORT}"
async def authenticate_and_read_logs(websocket):
# Send authentication credentials
credentials = json.dumps({"username": USERNAME, "password": PASSWORD})
await websocket.send(credentials)
# Wait for authentication response
response = await websocket.recv()
response_data = json.loads(response)
if response_data.get("success"):
print("Authentication successful!")
else:
print(f"Authentication failed: {response_data.get('message')}")
return # Stop if authentication fails
# After successful authentication, start reading logs
try:
while True:
log_message = await websocket.recv()
print(log_message)
except websockets.exceptions.ConnectionClosed:
print("Connection to Memgraph closed.")
async def connect_and_authenticate():
async with websockets.connect(ws_url) as websocket:
print(f"Connected to Memgraph at {ws_url}")
await authenticate_and_read_logs(websocket)
asyncio.run(connect_and_authenticate())
Authorization (Enterprise)
Permission for connecting through WebSocket is controlled by the WEBSOCKET
privilege.
Metrics tracking via HTTP server (Enterprise Edition)
In the Enterprise Edition, Memgraph allows tracking information about transactions, query latencies, snapshot recovery latencies, triggers, Bolt messages, indexes, streams, memory, operators and sesions, all by using using an HTTP server.
To retrieve data from the HTTP server, enter a valid Memgraph Enterprise license key.
The default address and port for the metrics server is 0.0.0.0:9091
, and can
be configured using the --metrics-address
and --metrics-port
configuration
flags.
To retrieve the metrics, send a GET request to the following URL:
http://host:port
Here is an example of how to retrieve metrics using Python:
import requests
import json
def fetch_memgraph_metrics():
metrics_url = "http://0.0.0.0:9091/metrics"
try:
response = requests.get(metrics_url, timeout=5)
if response.status_code == 200:
try:
metrics_data = json.loads(response.text)
pretty_metrics = json.dumps(metrics_data, indent=4)
print("Memgraph Metrics:\n", pretty_metrics)
except json.JSONDecodeError:
print("Memgraph Metrics:\n", response.text)
else:
print(f"Failed to fetch metrics. Status code: {response.status_code}")
except requests.exceptions.RequestException as e:
print(f"An error occurred: {e}")
fetch_memgraph_metrics()
The example above will print the metric similar to the example response below.
System metrics
All system metrics measuring different parts of the system can be divided into three different types:
- Gauge - a single value of some variable in the system (e.g. memory usage)
- Counter (uint64_t) - a value that can be incremented or decremented (e.g. number of active transactions in the system)
- Histogram (uint64_t) - distribution of measured values (e.g. certain percentile of query latency on N measured queries)
General metrics
Name | Type | Description |
---|---|---|
vertex_count | Gauge (uint64_t) | Number of nodes stored in the system. |
edge_count | Gauge (uint64_t) | Number of relationships stored in the system. |
average_degree | Gauge (double) | Average number of relationships of a single node. |
memory_usage | Gauge (uint64_t) | Amount of RAM used reported by the OS (in bytes). |
disk_usage | Gauge (uint64_t) | Amount of disk space used by the data directory (in bytes). |
peak_memory_usage | Gauge (uint64_t) | Peak RAM memory usage in the system during the whole run (in bytes). |
Index metrics
Name | Type | Description |
---|---|---|
ActiveLabelIndices | Counter | Number of active label indexes in the system. |
ActiveLabelPropertyIndices | Counter | Number of active label property indexes in the system. |
ActiveTextIndices | Counter | Number of active text indexes in the system. |
Memory metrics
Name | Type | Description |
---|---|---|
PeakMemoryRes | Gauge (uint64_t) | Peak RAM memory usage in the system during the whole run (in bytes). |
Operator metrics
Before a Cypher query is executed, it is converted into an internal form suitable for execution, known as a query plan. A query plan is a tree-like data structure describing a pipeline of operations that will be performed on the database in order to yield the results for a given query. Every node within a plan is known as a logical operator and describes a particular operation.
Name | Type | Description |
---|---|---|
OnceOperator | Counter | Number of times Once operator was used. |
CreateNodeOperator | Counter | Number of times CreateNode operator was used. |
CreateExpandOperator | Counter | Number of times CreateExpand operator was used. |
ScanAllOperator | Counter | Number of times ScanAll operator was used. |
ScanAllByLabelOperator | Counter | Number of times ScanAllByLabel operator was used. |
ScanAllByLabelPropertyRangeOperator | Counter | Number of times ScanAllByLabelPropertyRange operator was used. |
ScanAllByLabelPropertyValueOperator | Counter | Number of times ScanAllByLabelPropertyValue operator was used. |
ScanAllByLabelPropertyOperator | Counter | Number of times ScanAllByLabelProperty operator was used. |
ScanAllByLabelIdOperator | Counter | Number of times ScanAllByLabelId operator was used. |
ExpandOperator | Counter | Number of times Expand operator was used. |
ExpandVariableOperator | Counter | Number of times ExpandVariable operator was used. |
ConstructNamedPathOperator | Counter | Number of times ConstructNamedPath operator was used. |
FilterOperator | Counter | Number of times Filter operator was used. |
ProduceOperator | Counter | Number of times Produce operator was used. |
DeleteOperator | Counter | Number of times Delete operator was used. |
SetPropertyOperator | Counter | Number of times SetProperty operator was used. |
SetPropertiesOperator | Counter | Number of times SetProperties operator was used. |
SetLabelsOperator | Counter | Number of times SetLabels operator was used. |
RemovePropertyOperator | Counter | Number of times RemoveProperty operator was used. |
RemoveLabelsOperator | Counter | Number of times RemoveLabels operator was used. |
EdgeUniquenessFilterOperator | Counter | Number of times EdgeUniquenessFilter operator was used. |
EmptyResultOperator | Counter | Number of times EmptyResult operator was used. |
AccumulateOperator | Counter | Number of times Accumulate operator was used. |
AggregateOperator | Counter | Number of times Aggregate operator was used. |
SkipOperator | Counter | Number of times Skip operator was used. |
LimitOperator | Counter | Number of times Limit operator was used. |
OrderByOperator | Counter | Number of times OrderBy operator was used. |
MergeOperator | Counter | Number of times Merge operator was used. |
OptionalOperator | Counter | Number of times Optional operator was used. |
UnwindOperator | Counter | Number of times Unwind operator was used. |
DistinctOperator | Counter | Number of times Distinct operator was used. |
UnionOperator | Counter | Number of times Union operator was used. |
CartesianOperator | Counter | Number of times Cartesian operator was used. |
CallProcedureOperator | Counter | Number of times CallProcedureOperator operator was used. |
ForeachOperator | Counter | Number of times Foreach operator was used. |
EvaluatePatternFilterOperator | Counter | Number of times EvaluatePatternFilter operator was used. |
ApplyOperator | Counter | Number of times Apply operator was used. |
HashJoin | Counter | Number of times HashJoin operator was used. |
IndexedJoin | Counter | Number of times IndexedJoin operator was used. |
PeriodicCommit | Counter | Number of times PeriodicCommit operator was used. |
PeriodicSubquery | Counter | Number of times PeriodicSubquery operator was used. |
RollUpApplyOperator | Counter | Number of times RollUpApply operator was used. |
ScanAllByEdgeIdOperator | Counter | Number of times ScanAllByEdgeId operator was used. |
ScanAllByEdgeTypeOperator | Counter | Number of times ScanAllByEdgeType operator was used. |
Query metrics
Name | Type | Description |
---|---|---|
QueryExecutionLatency_us_50p | Histogram | Query execution latency in microseconds (50th percentile). |
QueryExecutionLatency_us_90p | Histogram | Query execution latency in microseconds (90th percentile). |
QueryExecutionLatency_us_99p | Histogram | Query execution latency in microseconds (99th percentile). |
Query type metrics
Name | Type | Description |
---|---|---|
ReadQuery | Counter | Number of read-only queries executed. |
WriteQuery | Counter | Number of write-only queries executed. |
ReadWriteQuery | Counter | Number of read-write queries executed. |
Session metrics
Name | Type | Description |
---|---|---|
ActiveSessions | Counter | Number of active connections. |
ActiveBoltSessions | Counter | Number of active Bolt connections. |
ActiveTCPSessions | Counter | Number of active TCP connections. |
ActiveSSLSessions | Counter | Number of active SSL connections. |
ActiveWebSocketSessions | Counter | Number of active websocket connections. |
BoltMessages | Counter | Number of Bolt messages sent. |
Snapshot metrics
Name | Type | Description |
---|---|---|
SnapshotCreationLatency_us_50p | Histogram | Snapshot creation latency in microseconds (50th percentile). |
SnapshotCreationLatency_us_90p | Histogram | Snapshot creation latency in microseconds (90th percentile)- |
SnapshotCreationLatency_us_99p | Histogram | Snapshot creation latency in microseconds (99th percentile). |
SnapshotRecoveryLatency_us_50p | Histogram | Snapshot recovery latency in microseconds (50th percentile). |
SnapshotRecoveryLatency_us_90p | Histogram | Snapshot recovery latency in microseconds (90th percentile). |
SnapshotRecoveryLatency_us_99p | Histogram | Snapshot recovery latency in microseconds (99th percentile). |
Stream metrics
Name | Type | Description |
---|---|---|
StreamsCreated | Counter | Number of streams created. |
MessagesConsumed | Counter | Number of consumed streamed messages. |
Transaction metrics
Name | Type | Description |
---|---|---|
ActiveTransactions | Counter | Number of active transactions. |
CommitedTransactions | Counter | Number of committed transactions. |
RollbackedTransactions | Counter | Number of rollbacked transactions. |
FailedQuery | Counter | Number of times executing a query failed (either during parse time or runtime). |
FailedPrepare | Counter | Number of times preparing a query failed. |
FailedPull | Counter | Number of times pulling a query failed. |
SuccessfulQuery | Counter | Number of successful queries. |
Trigger metrics
Name | Type | Description |
---|---|---|
TriggersCreated | Counter | Number of Triggers created. |
TriggersExecuted | Counter | Number of Triggers executed. |
Example response
If there aren't any modifying configurations, by sending a GET request to
localhost:9091
in the local Memgraph build will result in a response similar
to the one below.
{
"General": {
"average_degree": 0.0,
"disk_usage": 1417846,
"edge_count": 0,
"memory_usage": 36937728,
"peak_memory_usage": 100265984,
"vertex_count": 0
},
"Index": {
"ActiveLabelIndices": 0,
"ActiveLabelPropertyIndices": 0,
"ActiveTextIndices": 0
},
"Memory": {
"PeakMemoryRes": 100265984
},
"Operator": {
"AccumulateOperator": 0,
"AggregateOperator": 0,
"ApplyOperator": 0,
"CallProcedureOperator": 0,
"CartesianOperator": 0,
"ConstructNamedPathOperator": 0,
"CreateExpandOperator": 0,
"CreateNodeOperator": 0,
"DeleteOperator": 0,
"DistinctOperator": 0,
"EdgeUniquenessFilterOperator": 0,
"EmptyResultOperator": 0,
"EvaluatePatternFilterOperator": 0,
"ExpandOperator": 0,
"ExpandVariableOperator": 0,
"FilterOperator": 0,
"ForeachOperator": 0,
"HashJoinOperator": 0,
"IndexedJoinOperator": 0,
"LimitOperator": 0,
"MergeOperator": 0,
"OnceOperator": 0,
"OptionalOperator": 0,
"OrderByOperator": 0,
"ProduceOperator": 0,
"PeriodicCommit": 0,
"PeriodicSubquery": 0,
"RemoveLabelsOperator": 0,
"RemovePropertyOperator": 0,
"RollUpApplyOperator": 0,
"ScanAllByEdgeIdOperator": 0,
"ScanAllByEdgeTypeOperator": 0,
"ScanAllByIdOperator": 0,
"ScanAllByLabelOperator": 0,
"ScanAllByLabelPropertyOperator": 0,
"ScanAllByLabelPropertyRangeOperator": 0,
"ScanAllByLabelPropertyValueOperator": 0,
"ScanAllOperator": 0,
"SetLabelsOperator": 0,
"SetPropertiesOperator": 0,
"SetPropertyOperator": 0,
"SkipOperator": 0,
"UnionOperator": 0,
"UnwindOperator": 0
},
"Query": {
"QueryExecutionLatency_us_50p": 0,
"QueryExecutionLatency_us_90p": 0,
"QueryExecutionLatency_us_99p": 0
},
"QueryType": {
"ReadQuery": 0,
"ReadWriteQuery": 0,
"WriteQuery": 0
},
"Session": {
"ActiveBoltSessions": 0,
"ActiveSSLSessions": 0,
"ActiveSessions": 0,
"ActiveTCPSessions": 0,
"ActiveWebSocketSessions": 0,
"BoltMessages": 0
},
"Snapshot": {
"SnapshotCreationLatency_us_50p": 4860,
"SnapshotCreationLatency_us_90p": 4860,
"SnapshotCreationLatency_us_99p": 4860,
"SnapshotRecoveryLatency_us_50p": 628,
"SnapshotRecoveryLatency_us_90p": 628,
"SnapshotRecoveryLatency_us_99p": 628
},
"Stream": {
"MessagesConsumed": 0,
"StreamsCreated": 0
},
"Transaction": {
"ActiveTransactions": 0,
"CommitedTransactions": 0,
"FailedPrepare": 0,
"FailedPull": 0,
"FailedQuery": 0,
"RollbackedTransactions": 0,
"SuccessfulQuery": 0
},
"Trigger": {
"TriggersCreated": 0,
"TriggersExecuted": 0
}
}
Prometheus exporter
Memgraph also supports exporting metrics in the Prometheus (opens in a new tab) format. For more information on how to set up the Prometheus exporter, refer to the Prometheus exporter (opens in a new tab) repository.
Session trace
A session refers to a temporary, interactive connection from a client (either a user or an application) to the database server. It used to execute a series of transactions and queries. Session trace is a feature in Memgraph that allows you to profile the execution of all queries within a session, providing detailed information on query planning and execution. This can be invaluable for understanding performance bottlenecks and optimizing database interactions.
Before enabling session trace, you need to define a directory where the session logs will be stored. This is done using
the --query-log-directory
configuration flag, which can be set either at
Memgraph startup via the command line, or during an active session. By default, --query-log-directory
is set to /var/log/memgraph/session_trace
.
To set the query log directory while launching from the command line using Docker, do the following:
docker run -p 7687:7687 -p 7444:7444 --name memgraph memgraph/memgraph --query-log-directory {directory}
To set the query log directory interactively, use the following command:
SET DATABASE SETTING "query-log-directory" TO "{directory}";
Replace {directory}
with the desired path inside the Memgraph environment, either native or within a container.
If this directory is not set, you will receive the following error when attempting to enable session trace:
The flag --query-log-directory has to be present in order to enable session trace.
Once the query log directory is configured, you can enable session trace for your current session using the following command:
SET SESSION TRACE ON;
This command initiates logging for the session, and the result will include the unique session UUID being tracked:
session uuid |
---|
"de9c907b-6675-40bd-bf09-d4ce7b24f22d" |
This UUID is used to identify the session log file.
Understanding the Session Trace Log
The session trace log captures detailed information for every query executed during the session, including:
- Session UUID: Unique identifier for the session.
- Transaction ID: ID of the current query transaction.
- User Name: The user executing the query.
- Query Name and Explain Plan: The name and execution plan of the query.
- Query Planning and Execution Timings: Start and end timestamps, as well as execution times for query planning and execution.
- Commit Timings: Start and end timestamps, and execution time for query commit.
- Profile Plan and Metadata: Detailed profiling information and metadata for the query.
This detailed logging helps in profiling the entire session workload, increasing visibility into potential performance bottlenecks without congesting the system log.
Accessing the Log File
The session trace log is stored at {--query-log-directory}/{session uuid}.log
, where --query-log-directory
is the directory you defined earlier, and session uuid
is the UUID obtained when session trace was enabled.
To stop logging additional information for the current session, use the following command:
SET SESSION TRACE OFF;
After executing this command, no further data will be written to the session trace log for the active session.