Telemetry

The Vault server process collects various runtime metrics about the performance of different libraries and subsystems. These metrics are aggregated on a ten second interval and are retained for one minute in-memory. In order to monitor Vault and collect durable metrics, Telemetry from Vault must be stored in metrics aggregation software.

To view the raw data, you must send a signal to the Vault process: on Unix-style operating systems, this is USR1 while on Windows it is BREAK. When the Vault process receives this signal it will dump the current telemetry information to the process's stderr.

This telemetry information can be used for debugging or otherwise getting a better view of what Vault is doing.

Telemetry information can also be streamed directly from Vault to a range of metrics aggregation solutions as described in the telemetry Stanza documentation.

The following is an example telemetry dump snippet:

[2017-12-19 20:37:50 +0000 UTC][G] 'vault.7f320e57f9fe.expire.num_leases': 5100.000
[2017-12-19 20:37:50 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.num_goroutines': 39.000
[2017-12-19 20:37:50 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.sys_bytes': 222746880.000
[2017-12-19 20:37:50 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.malloc_count': 109189192.000
[2017-12-19 20:37:50 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.free_count': 108408240.000
[2017-12-19 20:37:50 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.heap_objects': 780953.000
[2017-12-19 20:37:50 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.total_gc_runs': 232.000
[2017-12-19 20:37:50 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.alloc_bytes': 72954392.000
[2017-12-19 20:37:50 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.total_gc_pause_ns': 150293024.000
[2017-12-19 20:37:50 +0000 UTC][S] 'vault.merkle.flushDirty': Count: 100 Min: 0.008 Mean: 0.027 Max: 0.183 Stddev: 0.024 Sum: 2.681 LastUpdated: 2017-12-19 20:37:59.848733035 +0000 UTC m=+10463.692105920
[2017-12-19 20:37:50 +0000 UTC][S] 'vault.merkle.saveCheckpoint': Count: 4 Min: 0.021 Mean: 0.054 Max: 0.110 Stddev: 0.039 Sum: 0.217 LastUpdated: 2017-12-19 20:37:57.048458148 +0000 UTC m=+10460.891835029
[2017-12-19 20:38:00 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.alloc_bytes': 73326136.000
[2017-12-19 20:38:00 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.sys_bytes': 222746880.000
[2017-12-19 20:38:00 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.malloc_count': 109195904.000
[2017-12-19 20:38:00 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.free_count': 108409568.000
[2017-12-19 20:38:00 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.heap_objects': 786342.000
[2017-12-19 20:38:00 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.total_gc_pause_ns': 150293024.000
[2017-12-19 20:38:00 +0000 UTC][G] 'vault.7f320e57f9fe.expire.num_leases': 5100.000
[2017-12-19 20:38:00 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.num_goroutines': 39.000
[2017-12-19 20:38:00 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.total_gc_runs': 232.000
[2017-12-19 20:38:00 +0000 UTC][S] 'vault.route.rollback.consul-': Count: 1 Sum: 0.013 LastUpdated: 2017-12-19 20:38:01.968471579 +0000 UTC m=+10465.811842067
[2017-12-19 20:38:00 +0000 UTC][S] 'vault.rollback.attempt.consul-': Count: 1 Sum: 0.073 LastUpdated: 2017-12-19 20:38:01.968502743 +0000 UTC m=+10465.811873131
[2017-12-19 20:38:00 +0000 UTC][S] 'vault.rollback.attempt.pki-': Count: 1 Sum: 0.070 LastUpdated: 2017-12-19 20:38:01.96867005 +0000 UTC m=+10465.812041936
[2017-12-19 20:38:00 +0000 UTC][S] 'vault.route.rollback.auth-app-id-': Count: 1 Sum: 0.012 LastUpdated: 2017-12-19 20:38:01.969146401 +0000 UTC m=+10465.812516689
[2017-12-19 20:38:00 +0000 UTC][S] 'vault.rollback.attempt.identity-': Count: 1 Sum: 0.063 LastUpdated: 2017-12-19 20:38:01.968029888 +0000 UTC m=+10465.811400276
[2017-12-19 20:38:00 +0000 UTC][S] 'vault.rollback.attempt.database-': Count: 1 Sum: 0.066 LastUpdated: 2017-12-19 20:38:01.969394215 +0000 UTC m=+10465.812764603
[2017-12-19 20:38:00 +0000 UTC][S] 'vault.barrier.get': Count: 16 Min: 0.010 Mean: 0.015 Max: 0.031 Stddev: 0.005 Sum: 0.237 LastUpdated: 2017-12-19 20:38:01.983268118 +0000 UTC m=+10465.826637008
[2017-12-19 20:38:00 +0000 UTC][S] 'vault.merkle.flushDirty': Count: 100 Min: 0.006 Mean: 0.024 Max: 0.098 Stddev: 0.019 Sum: 2.386 LastUpdated: 2017-12-19 20:38:09.848158309 +0000 UTC m=+10473.691527099

You'll note that log entries are prefixed with the metric type as follows:

[C] is a counter. Counters are cumulative metrics that are incremented when some event occurs, and are reset at the end of reporting intervals. Vault retains counters and other metrics for one minute in-memory, so to see accurate and persistent counters over time an aggregation solution must be configured.
[G] is a gauge. Gauges provide measurements of current values.
[S] is a summary. Summaries provide sample observations of values. Vault commonly uses summaries for measuring timing duration of discrete events in the reporting interval.

The following sections describe available Vault metrics. The metrics interval can be assumed to be 10 seconds when manually triggering metrics output using the above described signals. Some high-cardinality gauges, like vault.kv.secret.count, are emitted every 10 minutes, or at an interval configured in the telemetry stanza.

Some Vault metrics come with additional labels describing the measurement in more detail, such as the namespace in which an operation takes place, or the auth method used to create a token. In the in-memory telemetry, or other telemetry engines that do not support labels, this additional information is incorporated into the metric name. The metric name in the table below is followed by a list of labels supported, in the order in which they appear if flattened.

Audit Metrics

These metrics relate to auditing.

Metric	Description	Unit	Type
`vault.audit.log_request`	Duration of time taken by all audit log requests across all audit log devices	ms	summary
`vault.audit.log_response`	Duration of time taken by audit log responses across all audit log devices	ms	summary
`vault.audit.log_request_failure`	Number of audit log request failures. NOTE: This is a particularly important metric. Any non-zero value here indicates that there was a failure to make an audit log request to any of the configured audit log devices; when Vault cannot log to any of the configured audit log devices it ceases all user operations, and you should begin troubleshooting the audit log devices immediately if this metric continually increases.	failures	counter
`vault.audit.log_response_failure`	Number of audit log response failures. NOTE: This is a particularly important metric. Any non-zero value here indicates that there was a failure to receive a response to a request made to one of the configured audit log devices; when Vault cannot log to any of the configured audit log devices it ceases all user operations, and you should begin troubleshooting the audit log devices immediately if this metric continually increases.	failures	counter

NOTE: In addition, there are audit metrics for each enabled audit device represented as vault.audit.<type>.log_request. For example, if a file audit device is enabled, its metrics would be vault.audit.file.log_request and vault.audit.file.log_response .

Core Metrics

These metrics represent operational aspects of the running Vault instance.

Metric	Description	Unit	Type
`vault.barrier.delete`	Duration of time taken by DELETE operations at the barrier	ms	summary
`vault.barrier.get`	Duration of time taken by GET operations at the barrier	ms	summary
`vault.barrier.put`	Duration of time taken by PUT operations at the barrier	ms	summary
`vault.barrier.list`	Duration of time taken by LIST operations at the barrier	ms	summary
`vault.cache.hit`	Number of times a value was retrieved from the LRU cache.	cache hit	counter
`vault.cache.miss`	Number of times a value was not in the LRU cache. The results in a read from the configured storage.	cache miss	counter
`vault.cache.write`	Number of times a value was written to the LRU cache.	cache write	counter
`vault.cache.delete`	Number of times a value was deleted from the LRU cache. This does not count cache expirations.	cache delete	counter
`vault.core.active`	Has value 1 when the vault node is active, and 0 when node is in standby.	bool	gauge
`vault.core.activity.fragment_size`	Number of entities or tokens (depending on the "type" label) observed by the local node.	tokens	counter
`vault.core.activity.segment_write`	Duration of time taken writing activity log segments to storage.	ms	summary
`vault.core.check_token`	Duration of time taken by token checks handled by Vault core	ms	summary
`vault.core.fetch_acl_and_token`	Duration of time taken by ACL and corresponding token entry fetches handled by Vault core	ms	summary
`vault.core.handle_request`	Duration of time taken by non-login requests handled by Vault core	ms	summary
`vault.core.handle_login_request`	Duration of time taken by login requests handled by Vault core	ms	summary
`vault.core.in_flight_requests`	Number of in-flight requests.	requests	gauge
`vault.core.leadership_setup_failed`	Duration of time taken by cluster leadership setup failures which have occurred in a highly available Vault cluster. This should be monitored and alerted on for overall cluster leadership status.	ms	summary
`vault.core.leadership_lost`	Duration of time taken by cluster leadership losses which have occurred in a highly available Vault cluster. This should be monitored and alerted on for overall cluster leadership status.	ms	summary
`vault.core.license.expiration_time_epoch`	Time as epoch (seconds since Jan 1 1970) at which license will expire.	seconds	gauge
`vault.core.mount_table.num_entries`	Number of mounts in a particular mount table. This metric is labeled by table type (auth or logical) and whether or not the table is replicated (local or not)	objects	gauge
`vault.core.mount_table.size`	Size of a particular mount table. This metric is labeled by table type (auth or logical) and whether or not the table is replicated (local or not)	bytes	gauge
`vault.core.post_unseal`	Duration of time taken by post-unseal operations handled by Vault core	ms	summary
`vault.core.pre_seal`	Duration of time taken by pre-seal operations	ms	summary
`vault.core.seal-with-request`	Duration of time taken by requested seal operations	ms	summary
`vault.core.seal`	Duration of time taken by seal operations	ms	summary
`vault.core.seal-internal`	Duration of time taken by internal seal operations	ms	summary
`vault.core.step_down`	Duration of time taken by cluster leadership step downs. This should be monitored and alerted on for overall cluster leadership status.	ms	summary
`vault.core.unseal`	Duration of time taken by unseal operations	ms	summary
`vault.core.unsealed`	Has value 1 when Vault is unsealed, and 0 when Vault is sealed.	bool	gauge
`vault.metrics.collection` (cluster,gauge)	Time taken to collect usage gauges, labelled by gauge type.	summary
`vault.metrics.collection.interval` (cluster,gauge)	Current value of usage gauge collection interval.	summary
`vault.metrics.collection.error` (cluster,gauge)	Errors while collection usage gauges, labeled by gauge type.	counter
`vault.rollback.attempt.<mountpoint>`	Time taken to perform a rollback operation on the given mount point. The mount point name has its forward slashes `/` replaced by `-`. For example, a rollback operation on the `auth/token` backend would be reportes as `vault.rollback.attempt.auth-token-`.	ms	summary
`vault.route.create.<mountpoint>`	Time taken to dispatch a create operation to a backend, and for that backend to process it. The mount point name has its forward slashes `/` replaced by `-`. For example, a create operation to `ns1/secret/` would have corresponding metric `vault.route.create.ns1-secret-`. The number of samples of this metric, and the corresponding ones for other operations below, indicates how many operations were performed per mount point.	ms	summary
`vault.route.delete.<mountpoint>`	Time taken to dispatch a delete operation to a backend, and for that backend to process it.	ms	summary
`vault.route.list.<mountpoint>`	Time taken to dispatch a list operation to a backend, and for that backend to process it.	ms	summary
`vault.route.read.<mountpoint>`	Time taken to dispatch a read operation to a backend, and for that backend to process it.	ms	summary
`vault.route.rollback.<mountpoint>`	Time taken to dispatch a rollback operation to a backend, and for that backend to process it. Rollback operations are automatically scheduled to clean up partial errors.	ms	summary

Runtime Metrics

These metrics collect information from Vault's Go runtime, such as memory usage information.

Metric	Description	Unit	Type
`vault.runtime.alloc_bytes`	Number of bytes allocated by the Vault process. This could burst from time to time, but should return to a steady state value.	bytes	gauge
`vault.runtime.free_count`	Number of freed objects	objects	gauge
`vault.runtime.heap_objects`	Number of objects on the heap. This is a good general memory pressure indicator worth establishing a baseline and thresholds for alerting.	objects	gauge
`vault.runtime.malloc_count`	Cumulative count of allocated heap objects	objects	gauge
`vault.runtime.num_goroutines`	Number of goroutines. This serves as a general system load indicator worth establishing a baseline and thresholds for alerting.	goroutines	gauge
`vault.runtime.sys_bytes`	Number of bytes allocated to Vault. This includes what is being used by Vault's heap and what has been reclaimed but not given back to the operating system.	bytes	gauge
`vault.runtime.total_gc_pause_ns`	The total garbage collector pause time since Vault was last started	ns	gauge
`vault.runtime.gc_pause_ns`	Total duration of the last garbage collection run	ns	summary
`vault.runtime.total_gc_runs`	Total number of garbage collection runs since Vault was last started	operations	gauge

Policy Metrics

These metrics report measurements of the time spent performing policy operations.

Metric	Description	Unit	Type
`vault.policy.get_policy`	Time taken to get a policy	ms	summary
`vault.policy.list_policies`	Time taken to list policies	ms	summary
`vault.policy.delete_policy`	Time taken to delete a policy	ms	summary
`vault.policy.set_policy`	Time taken to set a policy	ms	summary

Token, Identity, and Lease Metrics

These metrics cover measurement of token, identity, and lease operations, and counts of the number of such objects managed by Vault.

Metric	Description	Unit	Type
`vault.expire.fetch-lease-times`	Time taken to fetch lease times	ms	summary
`vault.expire.fetch-lease-times-by-token`	Time taken to fetch lease times by token	ms	summary
`vault.expire.num_leases`	Number of all leases which are eligible for eventual expiry	leases	gauge
`vault.expire.num_irrevocable_leases`	Number of leases that cannot be revoked automatically	leases	gauge
`vault.expire.leases.by_expiration` (cluster,gauge,expiring,namespace)	Number of leases set to expire, grouped by a time interval. This time interval and total number of time intervals are configurable via `lease_metrics_epsilon` and `num_lease_metrics_buckets` in the telemetry stanza of a vault server configuration. The default values for these are `1hr` and `168` respectively, so the metric will report the number of leases that will expire each hour from the current time to a week from the current time. One can additionally group lease expiration by namespace by setting `add_lease_metrics_namespace_labels` to `true` in the config file (default is `false`).	leases	gauge
`vault.expire.job_manager.total_jobs`	Total pending revocation jobs	leases	summary
`vault.expire.job_manager.queue_length`	Total pending revocation jobs by auth method	leases	summary
`vault.expire.lease_expiration`	Count of lease expirations	leases	counter
`vault.expire.lease_expiration.time_in_queue`	Time taken for lease to get to the front of the revoke queue	ms	summary
`vault.expire.lease_expiration.error`	Count of lease expiration errors	errors	counter
`vault.expire.revoke`	Time taken to revoke a token	ms	summary
`vault.expire.revoke-force`	Time taken to forcibly revoke a token	ms	summary
`vault.expire.revoke-prefix`	Time taken to revoke tokens on a prefix	ms	summary
`vault.expire.revoke-by-token`	Time taken to revoke all secrets issued with a given token	ms	summary
`vault.expire.renew`	Time taken to renew a lease	ms	summary
`vault.expire.renew-token`	Time taken to renew a token which does not need to invoke a logical backend	ms	summary
`vault.expire.register`	Time taken for register operations	ms	summary
`vault.expire.register-auth`	Time taken for register authentication operations which create lease entries without lease ID	ms	summary
`vault.identity.num_entities`	Number of identity entities stored in Vault	entities	gauge
`vault.identity.entity.active.monthly` (cluster, namespace)	Number of distinct entities that created a token during the past month, per namespace. Only available if client count is enabled. Reported at the start of each month.	entities	gauge
`vault.identity.entity.active.partial_month` (cluster)	Total number of distinct entities that created a token during the current month. Only available if client count is enabled. Reported periodically within each month.	entities	gauge
`vault.identity.entity.active.reporting_period` (cluster, namespace)	Number of distinct entities that created a token in the past N months, as defined by the client count default reporting period. Only available if client count is enabled. Reported at the start of each month.	entities	gauge
`vault.identity.entity.alias.count` (cluster, namespace, auth_method, mount_point)	Number of identity entities aliases stored in Vault, grouped by the auth mount that created them. This gauge is computed every 10 minutes.	aliases	gauge
`vault.identity.entity.count` (cluster, namespace)	Number of identity entities stored in Vault, grouped by namespace.	entities	gauge
`vault.identity.entity.creation` (cluster, namespace, auth_method, mount_point)	Number of identity entities created, grouped by the auth mount that created them.	entities	counter
`vault.identity.upsert_entity_txn`	Time taken to insert a new or modified entity into the in-memory database, and persist it to storage.	ms	summary
`vault.identity.upsert_group_txn`	Time taken to insert a new or modified group into the in-memory database, and persist it to storage. This operation is performed on group membership changes.	ms	summary
`vault.token.count` (cluster, namespace)	Number of service tokens available for use; counts all un-expired and un-revoked tokens in Vault's token store. This measurement is performed every 10 minutes.	token	gauge
`vault.token.count.by_auth` (cluster, namespace, auth_method)	Number of service tokens that were created by a particular auth method.	tokens	gauge
`vault.token.count.by_policy` (cluster, namespace, policy)	Number of service tokens that have a particular policy attached. If a token has more than one policy, it is counted in each policy gauge.	tokens	gauge
`vault.token.count.by_ttl` (cluster, namespace, creation_ttl)	Number of service tokens, grouped by the TTL range they were assigned at creation.	tokens	gauge
`vault.token.create`	The time taken to create a token	ms	summary
`vault.token.create_root`	Number of created root tokens. Does not decrease on revocation.	tokens	counter
`vault.token.createAccessor`	The time taken to create a token accessor	ms	summary
`vault.token.creation` (cluster, namespace, auth_method, mount_point, creation_ttl, token_type)	Number of service or batch tokens created.	tokens	counter
`vault.token.lookup`	The time taken to look up a token	ms	summary
`vault.token.revoke`	Time taken to revoke a token	ms	summary
`vault.token.revoke-tree`	Time taken to revoke a token tree	ms	summary
`vault.token.store`	Time taken to store an updated token entry without writing to the secondary index	ms	summary

Resource Quota Metrics

These metrics relate to rate limit and lease count quotas. Each metric comes with a label "name" identifying the specific quota.

Metric	Description	Unit	Type
`vault.quota.rate_limit.violation`	Total number of rate limit quota violations	quota	counter
`vault.quota.lease_count.violation`	Total number of lease count quota violations	quota	counter
`vault.quota.lease_count.max`	Total maximum amount of leases allowed by the lease count quota	lease	gauge
`vault.quota.lease_count.counter`	Total current amount of leases generated by the lease count quota	lease	gauge

Merkle Tree and Write Ahead Log Metrics

These metrics relate to internal operations on Merkle Trees and Write Ahead Logs (WAL)

Metric	Description	Unit	Type
`vault.merkle.flushDirty`	Time taken to flush any dirty pages to cold storage	ms	summary
`vault.merkle.flushDirty.num_pages`	Number of pages flushed	pages	gauge
`vault.merkle.saveCheckpoint`	Time taken to save the checkpoint	ms	summary
`vault.merkle.saveCheckpoint.num_dirty`	Number of dirty pages at checkpoint	pages	gauge
`vault.wal.deleteWALs`	Time taken to delete a Write Ahead Log (WAL)	ms	summary
`vault.wal.gc.deleted`	Number of Write Ahead Logs (WAL) deleted during each garbage collection run	WAL	gauge
`vault.wal.gc.total`	Total Number of Write Ahead Logs (WAL) on disk	WAL	gauge
`vault.wal.loadWAL`	Time taken to load a Write Ahead Log (WAL)	ms	summary
`vault.wal.persistWALs`	Time taken to persist a Write Ahead Log (WAL)	ms	summary
`vault.wal.flushReady`	Time taken to flush a ready Write Ahead Log (WAL) to storage	ms	summary
`vault.wal.flushReady.queue_len`	Size of the write queue in the WAL system	WAL	summary

HA Metrics

These metrics are emitted on standbys when talking to the active node, and in some cases by performance standbys as well.

Metric	Description	Unit	Type
`vault.ha.rpc.client.forward`	Time taken to forward a request from a standby to the active node	ms	summary
`vault.ha.rpc.client.forward.errors`	Number of standby request forwarding failures	errors	counter

Replication Metrics

These metrics relate to Vault Enterprise Replication. The following metrics are not available in telemetry unless replication is in an unhealthy state: replication.fetchRemoteKeys, replication.merkleDiff, and replication.merkleSync.

Metric	Description	Unit	Type
`vault.core.replication.performance.primary`	Set to 1 if this is a performance primary, 0 if not	boolean	gauge
`vault.core.replication.performance.secondary`	Set to 1 if this is a performance secondary, 0 if not	boolean	gauge
`vault.core.replication.dr.primary`	Set to 1 if this is a DR primary, 0 if not	boolean	gauge
`vault.core.replication.dr.secondary`	Set to 1 if this is a DR secondary, 0 if not	boolean	gauge
`vault.core.performance_standby`	Set to 1 if this is a performance standby, 0 if not	boolean	gauge
`vault.logshipper.streamWALs.missing_guard`	Number of incidences where the starting Merkle Tree index used to begin streaming WAL entries is not matched/found	missing guards	counter
`vault.logshipper.streamWALs.guard_found`	Number of incidences where the starting Merkle Tree index used to begin streaming WAL entries is matched/found	found guards	counter
`vault.logshipper.streamWALs.scanned_entries`	Number of entries scanned in the buffer before the right one was found.	scanned entries	summary
`vault.logshipper.buffer.length`	Current length of the log shipper buffer	buffer entries	gauge
`vault.logshipper.buffer.size`	Current size in bytes of the log shipper buffer	bytes	gauge
`vault.logshipper.buffer.max_length`	Maximum length of the log shipper buffer	buffer entries	gauge
`vault.logshipper.buffer.max_size`	Maximum size in bytes of the log shipper buffer	bytes	gauge
`vault.replication.fetchRemoteKeys`	Time taken to fetch keys from a remote cluster participating in replication prior to Merkle Tree based delta generation	ms	summary
`vault.replication.merkleDiff`	Time taken to perform a Merkle Tree based delta generation between the clusters participating in replication	ms	summary
`vault.replication.merkleSync`	Time taken to perform a Merkle Tree based synchronization using the last delta generated between the clusters participating in replication	ms	summary
`vault.replication.merkle.commit_index`	The last committed index in the Merkle Tree.	sequence number	gauge
`vault.replication.wal.last_wal`	The index of the last WAL	sequence number	gauge
`vault.replication.wal.last_dr_wal`	The index of the last DR WAL	sequence number	gauge
`vault.replication.wal.last_performance_wal`	The index of the last Performance WAL	sequence number	gauge
`vault.replication.fsm.last_remote_wal`	The index of the last remote WAL	sequence number	gauge
`vault.replication.wal.gc`	Time taken to complete one run of the WAL garbage collection process	ms	summary
`vault.replication.rpc.server.auth_request`	Duration of time taken by auth request	ms	summary
`vault.replication.rpc.server.bootstrap_request`	Duration of time taken by bootstrap request	ms	summary
`vault.replication.rpc.server.conflicting_pages_request`	Duration of time taken by conflicting pages request	ms	summary
`vault.replication.rpc.server.echo`	Duration of time taken by echo	ms	summary
`vault.replication.rpc.server.save_mfa_response_auth`	Duration of time taken by saving MFA auth response	ms	summary
`vault.replication.rpc.server.forwarding_request`	Duration of time taken by forwarding request	ms	summary
`vault.replication.rpc.server.guard_hash_request`	Duration of time taken by guard hash request	ms	summary
`vault.replication.rpc.server.persist_alias_request`	Duration of time taken by persist alias request	ms	summary
`vault.replication.rpc.server.persist_persona_request`	Duration of time taken by persist persona request	ms	summary
`vault.replication.rpc.server.stream_wals_request`	Duration of time taken by stream wals request	ms	summary
`vault.replication.rpc.server.sub_page_hashes_request`	Duration of time taken by sub page hashes request	ms	summary
`vault.replication.rpc.server.sync_counter_request`	Duration of time taken by sync counter request	ms	summary
`vault.replication.rpc.server.upsert_group_request`	Duration of time taken by upsert group request	ms	summary
`vault.replication.rpc.client.conflicting_pages`	Duration of time taken by client conflicting pages request	ms	summary
`vault.replication.rpc.client.fetch_keys`	Duration of time taken by client fetch keys request	ms	summary
`vault.replication.rpc.client.forward`	Duration of time taken by client forward request	ms	summary
`vault.replication.rpc.client.guard_hash`	Duration of time taken by client guard hash request	ms	summary
`vault.replication.rpc.client.persist_alias`	Duration of time taken by	ms	summary
`vault.replication.rpc.client.register_auth`	Duration of time taken by client register auth request	ms	summary
`vault.replication.rpc.client.register_lease`	Duration of time taken by client register lease request	ms	summary
`vault.replication.rpc.client.stream_wals`	Duration of time taken by client s	ms	summary
`vault.replication.rpc.client.sub_page_hashes`	Duration of time taken by client sub page hashes request	ms	summary
`vault.replication.rpc.client.sync_counter`	Duration of time taken by client sync counter request	ms	summary
`vault.replication.rpc.client.upsert_group`	Duration of time taken by client upstert group request	ms	summary
`vault.replication.rpc.client.wrap_in_cubbyhole`	Duration of time taken by client wrap in cubbyhole request	ms	summary
`vault.replication.rpc.client.save_mfa_response_auth`	Duration of time taken by client saving MFA auth response	ms	summary
`vault.replication.rpc.dr.server.echo`	Duration of time taken by DR echo request	ms	summary
`vault.replication.rpc.dr.server.fetch_keys_request`	Duration of time taken by DR fetch keys request	ms	summary
`vault.replication.rpc.standby.server.echo`	Duration of time taken by standby echo request	ms	summary
`vault.replication.rpc.standby.server.register_auth_request`	Duration of time taken by standby register auth request	ms	summary
`vault.replication.rpc.standby.server.register_lease_request`	Duration of time taken by standby register lease request	ms	summary
`vault.replication.rpc.standby.server.wrap_token_request`	Duration of time taken by standby wrap token request	ms	summary

Secrets Engines Metrics

These metrics relate to the supported secrets engines.

Metric	Description	Unit	Type
`database.Initialize`	Time taken to initialize a database secret engine across all database secrets engines	ms	summary
`database.<name>.Initialize`	Time taken to initialize a database secret engine for the named database secrets engine `<name>`, for example: `database.postgresql-prod.Initialize`	ms	summary
`database.Initialize.error`	Number of database secrets engine initialization operation errors across all database secrets engines	errors	counter
`database.<name>.Initialize.error`	Number of database secrets engine initialization operation errors for the named database secrets engine `<name>`, for example: `database.postgresql-prod.Initialize.error`	errors	counter
`database.Close`	Time taken to close a database secret engine across all database secrets engines	ms	summary
`database.<name>.Close`	Time taken to close a database secret engine for the named database secrets engine `<name>`, for example: `database.postgresql-prod.Close`	ms	summary
`database.Close.error`	Number of database secrets engine close operation errors across all database secrets engines	errors	counter
`database.<name>.Close.error`	Number of database secrets engine close operation errors for the named database secrets engine `<name>`, for example: `database.postgresql-prod.Close.error`	errors	counter
`database.CreateUser`	Time taken to create a user across all database secrets engines	ms	summary
`database.<name>.CreateUser`	Time taken to create a user for the named database secrets engine `<name>`	ms	summary
`database.CreateUser.error`	Number of user creation operation errors across all database secrets engines	errors	counter
`database.<name>.CreateUser.error`	Number of user creation operation errors for the named database secrets engine `<name>`, for example: `database.postgresql-prod.CreateUser.error`	errors	counter
`database.RenewUser`	Time taken to renew a user across all database secrets engines	ms	summary
`database.<name>.RenewUser`	Time taken to renew a user for the named database secrets engine `<name>`, for example: `database.postgresql-prod.RenewUser`	ms	summary
`database.RenewUser.error`	Number of user renewal operation errors across all database secrets engines	errors	counter
`database.<name>.RenewUser.error`	Number of user renewal operations for the named database secrets engine `<name>`, for example: `database.postgresql-prod.RenewUser.error`	errors	counter
`database.RevokeUser`	Time taken to revoke a user across all database secrets engines	ms	summary
`database.<name>.RevokeUser`	Time taken to revoke a user for the named database secrets engine `<name>`, for example: `database.postgresql-prod.RevokeUser`	ms	summary
`database.RevokeUser.error`	Number of user revocation operation errors across all database secrets engines	errors	counter
`database.<name>.RevokeUser.error`	Number of user revocation operations for the named database secrets engine `<name>`, for example: `database.postgresql-prod.RevokeUser.error`	errors	counter
`secrets.pki.tidy.cert_store_current_entry`	The index of the current entry in the certificate store being verified by the tidy operation	entry index	gauge
`secrets.pki.tidy.cert_store_deleted_count`	Number of entries deleted from the certificate store	entry	counter
`secrets.pki.tidy.cert_store_total_entries`	Number of entries in the certificate store to verify during the tidy operation	entry	gauge
`secrets.pki.tidy.duration`	Duration of time taken by the PKI tidy operation	ms	summary
`secrets.pki.tidy.failure`	Number of times the PKI tidy operation has not completed due to errors	operations	counter
`secrets.pki.tidy.revoked_cert_current_entry`	The index of the current revoked certificate entry in the certificate store being verified by the tidy operation	entry index	gauge
`secrets.pki.tidy.revoked_cert_deleted_count`	Number of entries deleted from the certificate store for revoked certificates	entry	counter
`secrets.pki.tidy.revoked_cert_total_entries`	Number of entries in the certificate store for revoked certificates to verify during the tidy operation	entry	gauge
`secrets.pki.tidy.start_time_epoch`	Start time (as seconds since Jan 1 1970) when the PKI tidy operation is active, 0 otherwise	seconds	gauge
`secrets.pki.tidy.success`	Number of times the PKI tidy operation has completed succcessfully	operations	counter
`vault.secret.kv.count` (cluster, namespace, mount_point)	Number of entries in each key-value secret engine.	paths	gauge
`vault.secret.lease.creation` (cluster, namespace, secret_engine, mount_point, creation_ttl)	Counts the number of leases created by secret engines.	leases	counter

Storage Backend Metrics

These metrics relate to the supported storage backends.

Metric	Description	Unit	Type
`vault.azure.put`	Duration of a PUT operation against the Azure storage backend	ms	summary
`vault.azure.get`	Duration of a GET operation against the Azure storage backend	ms	summary
`vault.azure.delete`	Duration of a DELETE operation against the Azure storage backend	ms	summary
`vault.azure.list`	Duration of a LIST operation against the Azure storage backend	ms	summary
`vault.cassandra.put`	Duration of a PUT operation against the Cassandra storage backend	ms	summary
`vault.cassandra.get`	Duration of a GET operation against the Cassandra storage backend	ms	summary
`vault.cassandra.delete`	Duration of a DELETE operation against the Cassandra storage backend	ms	summary
`vault.cassandra.list`	Duration of a LIST operation against the Cassandra storage backend	ms	summary
`vault.cockroachdb.put`	Duration of a PUT operation against the CockroachDB storage backend	ms	summary
`vault.cockroachdb.get`	Duration of a GET operation against the CockroachDB storage backend	ms	summary
`vault.cockroachdb.delete`	Duration of a DELETE operation against the CockroachDB storage backend	ms	summary
`vault.cockroachdb.list`	Duration of a LIST operation against the CockroachDB storage backend	ms	summary
`vault.consul.put`	Duration of a PUT operation against the Consul storage backend	ms	summary
`vault.consul.transaction`	Duration of a Txn operation against the Consul storage backend	ms	summary
`vault.consul.get`	Duration of a GET operation against the Consul storage backend	ms	summary
`vault.consul.delete`	Duration of a DELETE operation against the Consul storage backend	ms	summary
`vault.consul.list`	Duration of a LIST operation against the Consul storage backend	ms	summary
`vault.couchdb.put`	Duration of a PUT operation against the CouchDB storage backend	ms	summary
`vault.couchdb.get`	Duration of a GET operation against the CouchDB storage backend	ms	summary
`vault.couchdb.delete`	Duration of a DELETE operation against the CouchDB storage backend	ms	summary
`vault.couchdb.list`	Duration of a LIST operation against the CouchDB storage backend	ms	summary
`vault.dynamodb.put`	Duration of a PUT operation against the DynamoDB storage backend	ms	summary
`vault.dynamodb.get`	Duration of a GET operation against the DynamoDB storage backend	ms	summary
`vault.dynamodb.delete`	Duration of a DELETE operation against the DynamoDB storage backend	ms	summary
`vault.dynamodb.list`	Duration of a LIST operation against the DynamoDB storage backend	ms	summary
`vault.etcd.put`	Duration of a PUT operation against the etcd storage backend	ms	summary
`vault.etcd.get`	Duration of a GET operation against the etcd storage backend	ms	summary
`vault.etcd.delete`	Duration of a DELETE operation against the etcd storage backend	ms	summary
`vault.etcd.list`	Duration of a LIST operation against the etcd storage backend	ms	summary
`vault.gcs.put`	Duration of a PUT operation against the Google Cloud Storage storage backend	ms	summary
`vault.gcs.get`	Duration of a GET operation against the Google Cloud Storage storage backend	ms	summary
`vault.gcs.delete`	Duration of a DELETE operation against the Google Cloud Storage storage backend	ms	summary
`vault.gcs.list`	Duration of a LIST operation against the Google Cloud Storage storage backend	ms	summary
`vault.gcs.lock.unlock`	Duration of an UNLOCK operation against the Google Cloud Storage storage backend in HA mode	ms	summary
`vault.gcs.lock.lock`	Duration of a LOCK operation against the Google Cloud Storage storage backend in HA mode	ms	summary
`vault.gcs.lock.value`	Duration of a VALUE operation against the Google Cloud Storage storage backend in HA mode	ms	summary
`vault.mssql.put`	Duration of a PUT operation against the MS-SQL storage backend	ms	summary
`vault.mssql.get`	Duration of a GET operation against the MS-SQL storage backend	ms	summary
`vault.mssql.delete`	Duration of a DELETE operation against the MS-SQL storage backend	ms	summary
`vault.mssql.list`	Duration of a LIST operation against the MS-SQL storage backend	ms	summary
`vault.mysql.put`	Duration of a PUT operation against the MySQL storage backend	ms	summary
`vault.mysql.get`	Duration of a GET operation against the MySQL storage backend	ms	summary
`vault.mysql.delete`	Duration of a DELETE operation against the MySQL storage backend	ms	summary
`vault.mysql.list`	Duration of a LIST operation against the MySQL storage backend	ms	summary
`vault.postgres.put`	Duration of a PUT operation against the PostgreSQL storage backend	ms	summary
`vault.postgres.get`	Duration of a GET operation against the PostgreSQL storage backend	ms	summary
`vault.postgres.delete`	Duration of a DELETE operation against the PostgreSQL storage backend	ms	summary
`vault.postgres.list`	Duration of a LIST operation against the PostgreSQL storage backend	ms	summary
`vault.s3.put`	Duration of a PUT operation against the Amazon S3 storage backend	ms	summary
`vault.s3.get`	Duration of a GET operation against the Amazon S3 storage backend	ms	summary
`vault.s3.delete`	Duration of a DELETE operation against the Amazon S3 storage backend	ms	summary
`vault.s3.list`	Duration of a LIST operation against the Amazon S3 storage backend	ms	summary
`vault.spanner.put`	Duration of a PUT operation against the Google Cloud Spanner storage backend	ms	summary
`vault.spanner.get`	Duration of a GET operation against the Google Cloud Spanner storage backend	ms	summary
`vault.spanner.delete`	Duration of a DELETE operation against the Google Cloud Spanner storage backend	ms	summary
`vault.spanner.list`	Duration of a LIST operation against the Google Cloud Spanner storage backend	ms	summary
`vault.spanner.lock.unlock`	Duration of an UNLOCK operation against the Google Cloud Spanner storage backend in HA mode	ms	summary
`vault.spanner.lock.lock`	Duration of a LOCK operation against the Google Cloud Spanner storage backend in HA mode	ms	summary
`vault.spanner.lock.value`	Duration of a VALUE operation against the Google Cloud Spanner storage backend in HA mode	ms	summary
`vault.swift.put`	Duration of a PUT operation against the Swift storage backend	ms	summary
`vault.swift.get`	Duration of a GET operation against the Swift storage backend	ms	summary
`vault.swift.delete`	Duration of a DELETE operation against the Swift storage backend	ms	summary
`vault.swift.list`	Duration of a LIST operation against the Swift storage backend	ms	summary
`vault.zookeeper.put`	Duration of a PUT operation against the ZooKeeper storage backend	ms	summary
`vault.zookeeper.get`	Duration of a GET operation against the ZooKeeper storage backend	ms	summary
`vault.zookeeper.delete`	Duration of a DELETE operation against the ZooKeeper storage backend	ms	summary
`vault.zookeeper.list`	Duration of a LIST operation against the ZooKeeper storage backend	ms	summary

Integrated Storage (Raft)

These metrics relate to raft based integrated storage.

Metric	Description	Unit	Type
`vault.raft.apply`	Number of Raft transactions occurring over the interval, which is a general indicator of the write load on the Raft servers.	raft transactions / interval	counter
`vault.raft.barrier`	Number of times the node has started the barrier i.e the number of times it has issued a blocking call, to ensure that the node has all the pending operations that were queued, to be applied to the node's FSM.	blocks / interval	counter
`vault.raft.candidate.electSelf`	Time to request for a vote from a peer.	ms	summary
`vault.raft.commitNumLogs`	Number of logs processed for application to the FSM in a single batch.	logs	gauge
`vault.raft.commitTime`	Time to commit a new entry to the Raft log on the leader.	ms	timer
`vault.raft.compactLogs`	Time to trim the logs that are no longer needed.	ms	summary
`vault.raft.delete`	Time to delete file from raft's underlying storage.	ms	summary
`vault.raft.delete_prefix`	Time to delete files under a prefix from raft's underlying storage.	ms	summary
`vault.raft.fsm.apply`	Number of logs committed since the last interval.	commit logs / interval	summary
`vault.raft.fsm.applyBatch`	Time to apply batch of logs.	ms	summary
`vault.raft.fsm.applyBatchNum`	Number of logs applied in batch.	ms	summary
`vault.raft.fsm.enqueue`	Time to enqueue a batch of logs for the FSM to apply.	ms	timer
`vault.raft.fsm.restore`	Time taken by the FSM to restore its state from a snapshot.	ms	summary
`vault.raft.fsm.snapshot`	Time taken by the FSM to record the current state for the snapshot.	ms	summary
`vault.raft.fsm.store_config`	Time to store the configuration.	ms	summary
`vault.raft.get`	Time to retrieve file from raft's underlying storage.	ms	summary
`vault.raft.leader.dispatchLog`	Time for the leader to write log entries to disk.	ms	timer
`vault.raft.leader.dispatchNumLogs`	Number of logs committed to disk in a batch.	logs	gauge
`vault.raft.list`	Time to retrieve list of keys from raft's underlying storage.	ms	summary
`vault.raft.peers`	Number of peers in the raft cluster configuration.	peers	gauge
`vault.raft.put`	Time to persist key in raft's underlying storage.	ms	summary
`vault.raft.replication.appendEntries.log`	Number of logs replicated to a node, to bring it up to speed with the leader's logs.	logs appended / interval	counter
`vault.raft.replication.appendEntries.rpc`	Time taken by the append entries RFC, to replicate the log entries of a leader node onto its follower node(s).	ms	timer
`vault.raft.replication.heartbeat`	Time taken to invoke appendEntries on a peer, so that it doesn’t timeout on a periodic basis.	ms	timer
`vault.raft.replication.installSnapshot`	Time taken to process the installSnapshot RPC call. This metric should only be seen on nodes which are currently in the follower state.	ms	timer
`vault.raft.restore`	Number of times the restore operation has been performed by the node. Here, restore refers to the action of raft consuming an external snapshot to restore its state.	operation invoked / interval	counter
`vault.raft.restoreUserSnapshot`	Time taken by the node to restore the FSM state from a user's snapshot.	ms	timer
`vault.raft.rpc.appendEntries`	Time taken to process an append entries RPC call from a node.	ms	timer
`vault.raft.rpc.appendEntries.processLogs`	Time taken to process the outstanding log entries of a node.	ms	timer
`vault.raft.rpc.appendEntries.storeLogs`	Time taken to add any outstanding logs for a node, since the last appendEntries was invoked.	ms	timer
`vault.raft.rpc.installSnapshot`	Time taken to process the installSnapshot RPC call. This metric should only be seen on nodes which are currently in the follower state.	ms	timer
`vault.raft.rpc.processHeartbeat`	Time taken to process a heartbeat request.	ms	timer
`vault.raft.rpc.requestVote`	Time taken to complete requestVote RPC call.	ms	summary
`vault.raft.snapshot.create`	Time taken to initialize the snapshot process.	ms	timer
`vault.raft.snapshot.persist`	Time taken to dump the current snapshot taken by the node to the disk.	ms	timer
`vault.raft.snapshot.takeSnapshot`	Total time involved in taking the current snapshot (creating one and persisting it) by the node.	ms	timer
`vault.raft.state.follower`	Number of times node has entered the follower mode. This happens when a new node joins the cluster or after the end of a leader election.	follower state entered / interval	counter
`vault.raft.transition.heartbeat_timeout`	Number of times node has transitioned to the Candidate state, after receive no heartbeat messages from the last known leader.	timeouts / interval	counter
`vault.raft.transition.leader_lease_timeout`	Number of times quorum of nodes were not able to be contacted.	contact failures	counter
`vault.raft.verify_leader`	Number of times node checks whether it is still the leader or not.	checks / interval	counter
`vault.raft-storage.delete`	Time to insert log entry to delete path.	ms	timer
`vault.raft-storage.get`	Time to retrieve value for path from FSM.	ms	timer
`vault.raft-storage.put`	Time to insert log entry to persist path.	ms	timer
`vault.raft-storage.list`	Time to list all entries under the prefix from the FSM.	ms	timer
`vault.raft-storage.transaction`	Time to insert operations into a single log.	ms	timer
`vault.raft-storage.entry_size`	The total size of a Raft entry during log application in bytes.	bytes	summary
`vault.raft_storage.bolt.freelist.` `free_pages`	Number of free pages in the freelist.	pages	gauge
`vault.raft_storage.bolt.freelist.` `pending_pages`	Number of pending pages in the freelist.	pages	gauge
`vault.raft_storage.bolt.freelist.` `allocated_bytes`	Total bytes allocated in free pages.	bytes	gauge
`vault.raft_storage.bolt.freelist.` `used_bytes`	Total bytes used by the freelist.	bytes	gauge
`vault.raft_storage.bolt.transaction.` `started_read_transactions`	Number of started read transactions.	transactions	gauge
`vault.raft_storage.bolt.transaction.` `currently_open_read_transactions`	Number of currently open read transactions.	transactions	gauge
`vault.raft_storage.bolt.page.count`	Number of page allocations.	allocations	gauge
`vault.raft_storage.bolt.page.` `bytes_allocated`	Total bytes allocated.	bytes	gauge
`vault.raft_storage.bolt.cursor.count`	Number of cursors created.	cursors	gauge
`vault.raft_storage.bolt.node.count`	Number of node allocations.	nodes	gauge
`vault.raft_storage.bolt.node.dereferences`	Number of node dereferences.	dereferences	gauge
`vault.raft_storage.bolt.rebalance.count`	Number of node rebalances.	rebalances	gauge
`vault.raft_storage.bolt.rebalance.time`	Time taken rebalancing.	ms	summary
`vault.raft_storage.bolt.split.count`	Number of nodes split.	nodes	gauge
`vault.raft_storage.bolt.spill.count`	Number of nodes spilled.	nodes	gauge
`vault.raft_storage.bolt.spill.time`	Time taken spilling.	ms	summary
`vault.raft_storage.bolt.write.count`	Number of writes performed.	writes	gauge
`vault.raft_storage.bolt.write.time`	Time taken writing to disk.	ms	summary

Integrated Storage (Raft) Autopilot

Metric	Description	Unit	Type
`vault.autopilot.node.healthy`	Set to 1 if the node_id is deemed healthy by Autopilot, 0 if not	bool	gauge
`vault.autopilot.healthy`	Set to 1 if Autopilot considers all nodes healthy	bool	gauge
`vault.autopilot.failure_tolerance`	How many nodes can be lost while maintaining quorum, i.e. number of healthy nodes in excess of quorum	nodes	gauge

Since Autopilot runs only the on the active node, these metrics are only emitted by the active node.

Integrated Storage (Raft) Leadership Changes

Metric	Description	Unit	Type
`vault.raft.leader.lastContact`	Measures the time since the leader was last able to contact the follower nodes when checking its leader lease	ms	summary
`vault.raft.state.candidate`	Increments whenever raft server starts an election	Elections	counter
`vault.raft.state.leader`	Increments whenever raft server becomes a leader	Leaders	counter

Why they're important: Normally, your raft cluster should have a stable leader. If there are frequent elections or leadership changes, it would likely indicate network issues between the raft nodes, or that the raft servers themselves are unable to keep up with the load.

What to look for: For a healthy cluster, you're looking for a lastContact lower than 200ms, leader > 0 and candidate == 0. Deviations from this might indicate flapping leadership.

Integrated Storage (Raft) Automated Snapshots

These metrics related to the Enterprise feature Raft Automated Snapshots.

Metric	Description	Unit	Type
`vault.autosnapshots.total.snapshot.size`	For storage_type=local, space on disk used by saved snapshots	bytes	gauge
`vault.autosnapshots.percent.maxspace.used`	For storage_type=local, percent used of maximum allocated space	percentage	gauge
`vault.autosnapshots.save.errors`	Increments whenever an error occurs trying to save a snapshot	n/a	counter
`vault.autosnapshots.save.duration`	Measures the time taken saving a snapshot	ms	summary
`vault.autosnapshots.last.success.time`	Epoch time (seconds since 1970/01/01) of last successful snapshot save	n/a	gauge
`vault.autosnapshots.snapshot.size`	Measures the size in bytes of snapshots	bytes	summary
`vault.autosnapshots.rotate.duration`	Measures the time taken to rotate (i.e. delete) old snapshots to satisfy configured retention	ms	summary
`vault.autosnapshots.snapshots.in.storage`	Number of snapshots in storage	n/a	gauge

Metric Labels

Metric	Description	Example
`auth_method`	Authorization engine type .	`userpass`
`cluster`	The cluster name from which the metric originated; set in the configuration file, or automatically generated when a cluster is create	`vault-cluster-d54ad07`
`creation_ttl`	Time-to-live value assigned to a token or lease at creation. This value is rounded up to the next-highest bucket; the available buckets are `1m`, `10m`, `20m`, `1h`, `2h`, `1d`, `2d`, `7d`, and `30d`. Any longer TTL is assigned the value `+Inf`.	`7d`
`mount_point`	Path at which an auth method or secret engine is mounted.	`auth/userpass/`
`namespace`	A namespace path, or `root` for the root namespace	`ns1`
`policy`	A single named policy	`default`
`secret_engine`	The [secret engine][secrets-engine] type.	`aws`
`token_type`	Identifies whether the token is a batch token or a service token.	`service`
`peer_id`	Unique identifier of a raft peer.	`node-1`
`node_id`	Unique identifier of a raft peer, same as peer_id.	`node-1`
`snapshot_config_name`	For automated snapshots, the name of the configuration	`config1`