Monitoring server

Monitoring server

In production systems, monitoring of applications is crucial, and that includes databases as well. Memgraph allows monitoring the database via WebSocket server and a HTTP server.

Monitoring via WebSocket server

Connect to Memgraph's monitoring server via WebSocket to forwarded logs to all the connected clients.

Connection

To connect to Memgraph's WebSocket server use the following URL:

ws://host:port

The default host is localhost but it can be changed using the --monitoring-host configuration flag.
The default port is 7444 but it can be changed using the --monitoring-port configuration flag.

If you want to connect with a WebSocket Secure (WSS) connection. Set the following configuration flags to enable an SSL connection: --bolt-cert-file and --bolt-key-file.

These flags are also used to set up an SSL connection when connecting via Bolt protocol, hence the name.

If both of the configuration flags are set, use the following URL to connect to the WebSocket server:

wss://host:port

Authentication

If authentication is not used (there are no users present in Memgraph), no authentication message is expected, and no response will be returned.

If the authentication is used, Memgraph won't send a message to a certain connection until it's authenticated.

To authenticate, create a JSON with the credentials in the following format:

{
  "username": "<username>",
  "password": "<password>"
}

If the credentials are valid, the connection will be made, and the client will receive the messages. As a response, the client should receive the following message:

{
  "success": true,
  "message": "User has been successfully authenticated!"
}

If they are invalid or the first message is in the invalid format, the connection is dropped. As a response, the following message is sent:

{
  "success": false,
  "message": "<error-message>"
}

Authorization (Enterprise)

Permission for connecting through WebSocket is controlled by the WEBSOCKET privilege.

Log messages

Each log that is written to the log file is forwarded to the connected clients in the following format:

{
  event: "log",
  level: "trace"|"debug"|"info"|"warning"|"error"|"critical",
  message: "<log-message>"
}

Read more about logs in Memgraph.

Monitoring via HTTP server (Enterprise)

In the Enterprise Edition, Memgraph allows tracking information about transactions, query latencies, snapshot recovery latencies, triggers, bolt messages, indexes, streams, and many more using an HTTP server.

To start the metrics HTTP server, enter a valid Memgraph Enterprise license key and restart the database.

Configure the HTTP endpoint

The default address and port for the metrics server is 0.0.0.0:9091, and can be configured using the --metrics-address and --metrics-port configuration flags.

System metrics

All system metrics measuring different parts of the system can be divided into three different types:

  • Gauge - a single value of some variable in the system (e.g. memory usage)
  • Counter (uint64_t) - a value that can be incremented or decremented (e.g. number of active transactions in the system)
  • Histogram (uint64_t) - distribution of measured values (e.g. certain percentile of query latency on N measured queries)

General metrics

NameTypeDescription
vertex_countGauge (uint64_t)Number of nodes stored in the system.
edge_countGauge (uint64_t)Number of relationships stored in the system.
average_degreeGauge (double)Average number of relationships of a single node.
memory_usageGauge (uint64_t)Amount of RAM used reported by the OS (in bytes).
disk_usageGauge (uint64_t)Amount of disk space used by the data directory (in bytes).

Index metrics

NameTypeDescription
ActiveLabelIndicesCounterNumber of active label indexes in the system.
ActiveLabelPropertyIndicesCounterNumber of active label property indexes in the system.

Operator metrics

Before a Cypher query is executed, it is converted into an internal form suitable for execution, known as a query plan. A query plan is a tree-like data structure describing a pipeline of operations that will be performed on the database in order to yield the results for a given query. Every node within a plan is known as a logical operator and describes a particular operation.

NameTypeDescription
OnceOperatorCounterNumber of times Once operator was used.
CreateNodeOperatorCounterNumber of times CreateNode operator was used.
CreateExpandOperatorCounterNumber of times CreateExpand operator was used.
ScanAllOperatorCounterNumber of times ScanAll operator was used.
ScanAllByLabelOperatorCounterNumber of times ScanAllByLabel operator was used.
ScanAllByLabelPropertyRangeOperatorCounterNumber of times ScanAllByLabelPropertyRange operator was used.
ScanAllByLabelPropertyValueOperatorCounterNumber of times ScanAllByLabelPropertyValue operator was used.
ScanAllByLabelPropertyOperatorCounterNumber of times ScanAllByLabelProperty operator was used.
ScanAllByLabelIdOperatorCounterNumber of times ScanAllByLabelId operator was used.
ExpandOperatorCounterNumber of times Expand operator was used.
ExpandVariableOperatorCounterNumber of times ExpandVariable operator was used.
ConstructNamedPathOperatorCounterNumber of times ConstructNamedPath operator was used.
FilterOperatorCounterNumber of times Filter operator was used.
ProduceOperatorCounterNumber of times Produce operator was used.
DeleteOperatorCounterNumber of times Delete operator was used.
SetPropertyOperatorCounterNumber of times SetProperty operator was used.
SetPropertiesOperatorCounterNumber of times SetProperties operator was used.
SetLabelsOperatorCounterNumber of times SetLabels operator was used.
RemovePropertyOperatorCounterNumber of times RemoveProperty operator was used.
RemoveLabelsOperatorCounterNumber of times RemoveLabels operator was used.
EdgeUniquenessFilterOperatorCounterNumber of times EdgeUniquenessFilter operator was used.
EmptyResultOperatorCounterNumber of times EmptyResult operator was used.
AccumulateOperatorCounterNumber of times Accumulate operator was used.
AggregateOperatorCounterNumber of times Aggregate operator was used.
SkipOperatorCounterNumber of times Skip operator was used.
LimitOperatorCounterNumber of times Limit operator was used.
OrderByOperatorCounterNumber of times OrderBy operator was used.
MergeOperatorCounterNumber of times Merge operator was used.
OptionalOperatorCounterNumber of times Optional operator was used.
UnwindOperatorCounterNumber of times Unwind operator was used.
DistinctOperatorCounterNumber of times Distinct operator was used.
UnionOperatorCounterNumber of times Union operator was used.
CartesianOperatorCounterNumber of times Cartesian operator was used.
CallProcedureOperatorCounterNumber of times CallProcedureOperator operator was used.
ForeachOperatorCounterNumber of times Foreach operator was used.
EvaluatePatternFilterOperatorCounterNumber of times EvaluatePatternFilter operator was used.
ApplyOperatorCounterNumber of times Apply operator was used.
HashJoinCounterNumber of times HashJoin operator was used.
IndexedJoinCounterNumber of times IndexedJoin operator was used.

Query metrics

NameTypeDescription
QueryExecutionLatency_us_50pHistogramQuery execution latency in microseconds (50th percentile).
QueryExecutionLatency_us_90pHistogramQuery execution latency in microseconds (90th percentile).
QueryExecutionLatency_us_99pHistogramQuery execution latency in microseconds (99th percentile).

Query type metrics

NameTypeDescription
ReadQueryCounterNumber of read-only queries executed.
WriteQueryCounterNumber of write-only queries executed.
ReadWriteQueryCounterNumber of read-write queries executed.

Session metrics

NameTypeDescription
ActiveSessionsCounterNumber of active connections.
ActiveBoltSessionsCounterNumber of active Bolt connections.
ActiveTCPSessionsCounterNumber of active TCP connections.
ActiveSSLSessionsCounterNumber of active SSL connections.
ActiveWebSocketSessionsCounterNumber of active websocket connections.
BoltMessagesCounterNumber of Bolt messages sent.

Snapshot metrics

NameTypeDescription
SnapshotCreationLatency_us_50pHistogramSnapshot creation latency in microseconds (50th percentile).
SnapshotCreationLatency_us_90pHistogramSnapshot creation latency in microseconds (90th percentile)-
SnapshotCreationLatency_us_99pHistogramSnapshot creation latency in microseconds (99th percentile).
SnapshotRecoveryLatency_us_50pHistogramSnapshot recovery latency in microseconds (50th percentile).
SnapshotRecoveryLatency_us_90pHistogramSnapshot recovery latency in microseconds (90th percentile).
SnapshotRecoveryLatency_us_99pHistogramSnapshot recovery latency in microseconds (99th percentile).

Stream metrics

NameTypeDescription
StreamsCreatedCounterNumber of streams created.
MessagesConsumedCounterNumber of consumed streamed messages.

Transaction metrics

NameTypeDescription
ActiveTransactionsCounterNumber of active transactions.
CommitedTransactionsCounterNumber of committed transactions.
RollbackedTransactionsCounterNumber of rollbacked transactions.
FailedQueryCounterNumber of times executing a query failed (either during parse time or runtime).

Trigger metrics

NameTypeDescription
TriggersCreatedCounterNumber of Triggers created.
TriggersExecutedCounterNumber of Triggers executed.

Example response

If there aren't any modifying configurations, by sending a GET request to localhost:9091 in the local Memgraph build will result in a response similar to the one below.

{
    "General": {
        "average_degree": 0.0,
        "disk_usage": 1417846,
        "edge_count": 0,
        "memory_usage": 36937728,
        "vertex_count": 0
    },
    "Index": {
        "ActiveLabelIndices": 0,
        "ActiveLabelPropertyIndices": 0
    },
    "Operator": {
        "AccumulateOperator": 0,
        "AggregateOperator": 0,
        "ApplyOperator": 0,
        "CallProcedureOperator": 0,
        "CartesianOperator": 0,
        "ConstructNamedPathOperator": 0,
        "CreateExpandOperator": 0,
        "CreateNodeOperator": 0,
        "DeleteOperator": 0,
        "DistinctOperator": 0,
        "EdgeUniquenessFilterOperator": 0,
        "EmptyResultOperator": 0,
        "EvaluatePatternFilterOperator": 0,
        "ExpandOperator": 0,
        "ExpandVariableOperator": 0,
        "FilterOperator": 0,
        "ForeachOperator": 0,
        "LimitOperator": 0,
        "MergeOperator": 0,
        "OnceOperator": 0,
        "OptionalOperator": 0,
        "OrderByOperator": 0,
        "ProduceOperator": 0,
        "RemoveLabelsOperator": 0,
        "RemovePropertyOperator": 0,
        "ScanAllByIdOperator": 0,
        "ScanAllByLabelOperator": 0,
        "ScanAllByLabelPropertyOperator": 0,
        "ScanAllByLabelPropertyRangeOperator": 0,
        "ScanAllByLabelPropertyValueOperator": 0,
        "ScanAllOperator": 0,
        "SetLabelsOperator": 0,
        "SetPropertiesOperator": 0,
        "SetPropertyOperator": 0,
        "SkipOperator": 0,
        "UnionOperator": 0,
        "UnwindOperator": 0
    },
    "Query": {
        "QueryExecutionLatency_us_50p": 0,
        "QueryExecutionLatency_us_90p": 0,
        "QueryExecutionLatency_us_99p": 0
    },
    "QueryType": {
        "ReadQuery": 0,
        "ReadWriteQuery": 0,
        "WriteQuery": 0
    },
    "Session": {
        "ActiveBoltSessions": 0,
        "ActiveSSLSessions": 0,
        "ActiveSessions": 0,
        "ActiveTCPSessions": 0,
        "ActiveWebSocketSessions": 0,
        "BoltMessages": 0
    },
    "Snapshot": {
        "SnapshotCreationLatency_us_50p": 4860,
        "SnapshotCreationLatency_us_90p": 4860,
        "SnapshotCreationLatency_us_99p": 4860,
        "SnapshotRecoveryLatency_us_50p": 628,
        "SnapshotRecoveryLatency_us_90p": 628,
        "SnapshotRecoveryLatency_us_99p": 628
    },
    "Stream": {
        "MessagesConsumed": 0,
        "StreamsCreated": 0
    },
    "Transaction": {
        "ActiveTransactions": 0,
        "CommitedTransactions": 0,
        "FailedQuery": 0,
        "RollbackedTransactions": 0
    },
    "Trigger": {
        "TriggersCreated": 0,
        "TriggersExecuted": 0
    }
}