Troubleshoot Vault

19min
|
Vault
Interactive

Troubleshooting Vault involves learning from available sources of observability and monitoring, like server or client error messages, audit devices, and telemetry metrics to isolate the root cause of an issue.

Vault provides operators with a rich collection of data, which can help the HashiCups team to learn about the root cause of an issue when troubleshooting issues.

Diagram showing Vault observability data sources — Vault observability data includes server and client messages, audit device outputs, and telemetry metrics.

Server and client output: Oliver and Steve can use the Vault server's operational output or messages from the CLI or API to help troubleshoot use case and server issues.
Audit device output: The team can also enable audit devices to record the details of every request and response between applications and Vault. This data is also handy for troubleshooting issues with specific use cases and client applications.
Telemetry metrics: Since Steve's job is to ensure that Vault performs its best, they can enable and use telemetry metrics which measure Vault server performance in one of several popular formats. They can then use the team's aggregation solution to roll up Vault metrics and share them through dashboards and enable alerting on key metrics.

Scenario

Oliver, Steve, and Danielle are all involved in troubleshooting Vault in some form or another. Danielle uses Vault API warnings and errors to understand client issues when building applications or plugins, and sometimes needs to use audit device entries to troubleshoot tricky policy problems.

Oliver and Steve rely on Vault's server output, telemetry metrics and audit device entries to troubleshoot a range of issues with Vault use cases and issues with the servers.

Launch Terminal

This tutorial includes a free interactive command-line lab that lets you follow along on actual cloud infrastructure.

Server output

Vault outputs server operational data to the operating system standard output and standard error devices, and the Linux systemd journal automatically gathers this output by default. This means the team can consume operational output from Vault in the same way they do with other systemd services.

Steve can troubleshoot Vault issues with the server output because it provides issue context, including timestamps and warning or error messages from the server. This information gives Steve more insight into the server's state during the time-frame of the incident under investigation.

Server output consists of single lines, which follow a consistent format shown and described in the following example.

2024-05-30T12:40:36.574-0400 [INFO]  events: Starting event system

Each log line has the format of: timestamp [log level] subsystem: message

In the example line, the events subsystem logs message at the INFO log level, and the message text of the log line is "Starting event system".

Tip

Oliver can configure a production server to emit logs in different levels of detail from lowest to highest by specifying a log level. The available log levels are error, warn, info (default), debug, and trace, with the highest detail levels being most useful for troubleshooting. Similarly, Danielle can pass a -log-level flag to a development mode server while testing something they are building.

CLI and API output

Vault's CLI outputs warnings and errors to the system standard error. These messages begin with 'Warning' or 'Error', and if the terminal supports color output, warning messages appear in yellow and error messages appear in red.

Here is an error message example from the Vault CLI:

Error checking seal status: Get "https://127.0.0.1:8200/v1/sys/seal-status": dial tcp 127.0.0.1:8200: connect: connection refused

This message indicates that the CLI attempted a connection to the Vault server at https://127.0.0.1:8200, but the host refused the connection. Such an error is typical when the Vault service is not listening on the address and port of the request.

Vault's HTTP API returns JSON data containing error objects when the server meets with a problem handling the request.

Here is an example error object as part of the server's response.

{"errors":["no handler for route \"operations-secret/data/datacenter-west\". route entry not found."]}

This error is the result of the client attempting access on a path for which no handler is available.

The root cause might be a typo in the request path on the client side. In this example, the correct K/V secrets engine path begins with operations-secrets. The request is missing the pluralizing s in the path, so it fails with this error.

Better together

Oliver can troubleshoot this error with Danielle, and cross-reference entries from an audit device to find Danielle's request and the corresponding Vault response for more troubleshooting context. Synthesizing data from many sources in this way often helps with issue root cause isolation and resolution.

UI messages

Vault's web UI emits warnings and error messages, which are often useful to include when reporting issues for troubleshooting purposes.

Here are some examples of warnings and errors in the Vault UI.

Screenshot of Vault UI showing a warning dialog — This dialog warns about logging into the Vault UI with a root token, which users should avoid doing.

Audit device output

Vault features audit devices, which record all client requests and server responses in a detailed way, and write the data to a configurable destination. Oliver can enable more than one audit device type, and configure one for writing to a file, and one to write to a network socket. Oliver can also configure a syslog audit device for writing to the syslog local agent on Unix systems.

Vault formats audit device entries as JSON objects representing request and response pairs. Each object holds non-sensitive and sensitive values. Vault hashes sensitive values with a salt and the HMAC-SHA256 algorithm so that sensitive content is not present as plaintext in the audit device output.

The audit device type determines where you can find the output. If it is a file audit device, you can typically find the output in a file on a local filesystem. Socket and syslog based audit devices typically forward output to a remote host for ingestion and processing. In such cases, you can find the audit device output in the tool that processes the entries and makes them available to filter, search, and display in dashboards.

The following example builds from the HTTP API error example in the earlier section, where a client requested a path containing a typo. Click each tab to view the request and matching response to learn more about the structure and metadata contents.

{
  "auth": {
    "accessor": "hmac-sha256:ffc6fd47e5cc11e05747f741338a5b52cde14b526faf8298029b671f2192b488",
    "client_token": "hmac-sha256:397909dc14952b85de9dcd8310a16693bec341594c7f1b4a0a96f0eb7f49eb17",
    "display_name": "root",
    "policies": [
      "root"
    ],
    "policy_results": {
      "allowed": true,
      "granting_policies": [
        {
          "name": "root",
          "namespace_id": "root",
          "type": "acl"
        }
      ]
    },
    "token_policies": [
      "root"
    ],
    "token_issue_time": "2024-06-13T15:23:10Z",
    "token_type": "service"
  },
  "request": {
    "client_id": "0DHqvq2D77kL2/JTPSZkTMJbkFVmUu0TzMi0jiXcFy8=",
    "client_token": "hmac-sha256:397909dc14952b85de9dcd8310a16693bec341594c7f1b4a0a96f0eb7f49eb17",
    "client_token_accessor": "hmac-sha256:ffc6fd47e5cc11e05747f741338a5b52cde14b526faf8298029b671f2192b488",
    "id": "87634924-d0e3-90ed-d3ee-300a00d7b33a",
    "namespace": {
      "id": "root"
    },
    "operation": "read",
    "path": "operations-secret/data/datacenter-west",
    "remote_address": "192.168.65.1",
    "remote_port": 30591
  },
  "time": "2024-06-13T15:29:57.909649515Z",
  "type": "request"
}

The object type is request, and includes the timestamp of the request along with several related fields about the host, path, and token.

{
  "auth": {
    "accessor": "hmac-sha256:ffc6fd47e5cc11e05747f741338a5b52cde14b526faf8298029b671f2192b488",
    "client_token": "hmac-sha256:397909dc14952b85de9dcd8310a16693bec341594c7f1b4a0a96f0eb7f49eb17",
    "display_name": "root",
    "policies": [
      "root"
    ],
    "policy_results": {
      "allowed": true,
      "granting_policies": [
        {
          "name": "root",
          "namespace_id": "root",
          "type": "acl"
        }
      ]
    },
    "token_policies": [
      "root"
    ],
    "token_issue_time": "2024-06-13T15:23:10Z",
    "token_type": "service"
  },
  "error": "1 error occurred:\n\t* unsupported path\n\n",
  "time": "2024-06-13T15:29:57.911445665Z",
  "type": "response",
  "request": {
    "client_id": "0DHqvq2D77kL2/JTPSZkTMJbkFVmUu0TzMi0jiXcFy8=",
    "client_token": "hmac-sha256:397909dc14952b85de9dcd8310a16693bec341594c7f1b4a0a96f0eb7f49eb17",
    "client_token_accessor": "hmac-sha256:ffc6fd47e5cc11e05747f741338a5b52cde14b526faf8298029b671f2192b488",
    "id": "87634924-d0e3-90ed-d3ee-300a00d7b33a",
    "namespace": {
      "id": "root"
    },
    "operation": "read",
    "path": "operations-secret/data/datacenter-west",
    "remote_address": "192.168.65.1",
    "remote_port": 30591
  },
  "response": {
    "data": {
      "error": "hmac-sha256:7bbe77426c8ef1883b14a9a8a6c557524e728a95c43bed1970a9af2e9abd61a5"
    }
  }
}

The object type is response, and includes the timestamp of the response along with several related fields about the path, token, and any error messages including sensitive error message content.

Note the non-sensitive error field message from the server "1 error occurred:\n\t* unsupported path\n\n". This helps inform troubleshooting with more context.

That said, Vault hashes the actual error returned to the client in the response.data.error field because it holds potentially sensitive information.

Telemetry metrics

Steve from the SRE team and Oliver in Operations sometimes work together on troubleshooting Vault performance issues. Vault telemetry metrics offer them key insights into cluster or server performance.

Vault emits telemetry metrics configurable for push or pull based consumption. HashiCups uses both solution types for ingesting Vault telemetry metrics depending on the project or use case.

The configuration and format of the metrics depends on the consumer, and output can take the form of tabular or JSON based data. Oliver enabled telemetry for pull metrics from Prometheus on the team's internal testing cluster. This means the team members with correct capabilities can query the /sys/metrics API endpoint for telemetry data.

Grafana dashboard screenshot — Grafana dashboard showing a range of Vault telemetry metrics for near real time updates.

A visual dashboard system like Grafana is the typical place for Oliver and Steve to interact with Vault telemetry metrics. The team can also manually query metrics when the situation requires immediate access outside the context of a dashboard.

Here are 2 raw telemetry data examples taken directly from the Vault CLI and HTTP API.

(Persona: Operations)

Oliver has a token with the capability to read from the /sys/metrics endpoint, so they use the CLI to read the endpoint:

$ vault read /sys/metrics

The output is in tabular format containing native Go map structures which are not easy to read. You can search the output for certain metric names to zero in on their values as part of troubleshooting, or use another output format.

The following is an abbreviated output example:

Key          Value
---          -----
Counters     [map[Count:1 Labels:map[] Max:0 Mean:0 Min:0 Name:vault.audit.log_request_failure Rate:0 Stddev:0 Sum:0] ...snip...]
Gauges       [map[Labels:map[] Name:vault.autopilot.failure_tolerance Value:2]...snip...]
Points       []
Samples      [map[Count:1 Labels:map[] Max:0.7014579772949219 Mean:0.7014579772949219 Min:0.7014579772949219 Name:vault.audit.file/.log_request Rate:0.07014579772949218 Stddev:0 Sum:0.7014579772949219]...snip...]
Timestamp    2024-08-01 16:38:20 +0000 UTC

The metrics named vault.audit.log_request_failure, vault.autopilot.failure_tolerance, and vault.audit.file/.log_request are present along with their values in this example.

(Persona: SRE)

Steve has a token with the capability to read from the /sys/metrics endpoint, so they use the HTTP API to read the endpoint to request JSON formatted metrics:

$ curl \
    --silent \
    --header "X-Vault-Token: $STEVE_TOKEN" \
    $VAULT_ADDR/v1/sys/metrics | jq

Vault responds with metrics in JSON format, as in this abbreviated output example:

{
  "request_id": "",
  "lease_id": "",
  "lease_duration": 0,
  "renewable": false,
  "data": {
    "Counters": [
      {
        "Count": 1,
        "Labels": {},
        "Max": 1,
        "Mean": 1,
        "Min": 1,
        "Name": "vault.cache.hit",
        "Rate": 0.1,
        "Stddev": 0,
        "Sum": 1
      }
    ],
    "Gauges": [
      {
        "Labels": {},
        "Name": "vault.autopilot.failure_tolerance",
        "Value": 2
      }
    ],
    "Points": [],
    "Samples": [
      {
        "Count": 1,
        "Labels": {},
        "Max": 0.013542000204324722,
        "Mean": 0.013542000204324722,
        "Min": 0.013542000204324722,
        "Name": "vault.barrier.get",
        "Rate": 0.0013542000204324722,
        "Stddev": 0,
        "Sum": 0.013542000204324722
      }
    ],
    "Timestamp": "2024-07-26 14:41:00 +0000 UTC"
  },
  "warnings": null
}

The metrics named vault.cache.hit, vault.autopilot.failure_tolerance, and vault.barrier.get are present along with their values in this example.

(Persona: SRE)

Steve has a token with the capability to read from the /sys/metrics endpoint, so they use the HTTP API to read the endpoint, and request Prometheus formatted metrics.

$ curl \
    --silent \
    --header "X-Vault-Token: $STEVE_TOKEN" \
    $VAULT_ADDR/v1/sys/metrics?format=prometheus | jq

Vault responds with metrics in Prometheus format, as in this abbreviated output example:

...snip...
# HELP vault_raft_leader_lastContact vault_raft_leader_lastContact
# TYPE vault_raft_leader_lastContact summary
vault_raft_leader_lastContact{quantile="0.5"} 24
vault_raft_leader_lastContact{quantile="0.9"} 54
vault_raft_leader_lastContact{quantile="0.99"} 55
vault_raft_leader_lastContact_sum 166521
vault_raft_leader_lastContact_count 4352
...snip...
# HELP vault_raft_storage_bolt_write_count vault_raft_storage_bolt_write_count
# TYPE vault_raft_storage_bolt_write_count gauge
vault_raft_storage_bolt_write_count{cluster="vault-cluster-a4f76a03",database="fsm"} 429
vault_raft_storage_bolt_write_count{cluster="vault-cluster-a4f76a03",database="logstore"} 513
...snip...
# HELP vault_token_lookup vault_token_lookup
# TYPE vault_token_lookup summary
vault_token_lookup{quantile="0.5"} 0.26087498664855957
vault_token_lookup{quantile="0.9"} 0.26087498664855957
vault_token_lookup{quantile="0.99"} 0.26087498664855957
vault_token_lookup_sum 7.748914802446961
vault_token_lookup_count 12

The metrics named vault_raft_leader_lastContact, vault_raft_storage_bolt_write_count, and vault_token_lookup are present along with their values in this example.

Troubleshoot a server issue

(Persona: Operations)

Oliver is starting a self-managed Vault server for a user acceptance testing cluster that they are preparing. They installed Vault on Linux with the official community edition package.

The configuration file Oliver created for the cluster servers looks like this example:

api_addr      = "https://127.0.0.1:8200"
ui = true
disable_mlock = true

storage "raft" {
  path     = "/opt/vault/data"
  node_id  = "uat-vault-1"
}

listener "tcp" {
  address       = "0.0.0.0:8200"
  tls_cert_file = "/opt/vault/tls/tls.crt"
  tls_key_file  = "/opt/vault/tls/tls.key"
}

When Oliver tries to start the 'vault' service, it fails to start and systemctl returns the following error message:

Job for vault.service failed because the control process exited with error code.
See "systemctl status vault.service" and "journalctl -xeu vault.service" for details.

This error message includes some actions Oliver can take to dive deeper into the error condition and find its cause.

For example, if Oliver uses checks the systemd journal, they'll learn more about the cause of this error.

$ sudo journalctl -u vault.service

Example abbreviated output:

Jun 24 21:27:57 ubuntu-jammy systemd[1]: Starting "HashiCorp Vault - A tool for managing secrets"...
Jun 24 21:27:57 ubuntu-jammy vault[3122]: time="2024-06-24T21:27:57Z" level=warning msg="DBUS_SESSION_BUS_ADDRESS envva>
Jun 24 21:27:57 ubuntu-jammy vault[3122]: Cluster address must be set when using raft storage
Jun 24 21:27:57 ubuntu-jammy vault[3122]: 2024-06-24T21:27:57.576Z [INFO]  proxy environment: http_proxy="" https_proxy>
Jun 24 21:27:57 ubuntu-jammy systemd[1]: vault.service: Main process exited, code=exited, status=1/FAILURE
Jun 24 21:27:57 ubuntu-jammy systemd[1]: vault.service: Failed with result 'exit-code'.
Jun 24 21:27:57 ubuntu-jammy systemd[1]: Failed to start "HashiCorp Vault - A tool for managing secrets".
...snip...

Notice the helpful error explanation text, "Cluster address must be set when using raft storage". This indicates that there is a missing server configuration option, and Oliver must add the option to the existing configuration to resolve the issue.

Oliver can research the Vault server configuration file documentation for the Integrated Storage (Raft) storage backend to learn more about the cluster address requirement. They can then update the configuration with the necessary option.

Here is an example of the working configuration file after Oliver's update:

1 2 3 4 5 6 7 8 9 101112131415api_addr      = "https://127.0.0.1:8200"
cluster_addr  = "https://127.0.0.1:8201"
ui = true
disable_mlock = true

storage "raft" {
  path     = "/opt/vault/data"
  node_id  = "uat-vault-1"
}

listener "tcp" {
  address       = "0.0.0.0:8200"
  tls_cert_file = "/opt/vault/tls/tls.crt"
  tls_key_file  = "/opt/vault/tls/tls.key"
}

Troubleshoot a client issue

The Vault client CLI emits helpful warnings and errors when issues arise. Vault users can find the issue root cause and fix the problem with these messages. The following are some examples of CLI errors with causes and resolutions.

Client and server protocol mismatch

(Persona: Operations)

A commonly encountered issue where the client emits a message that is useful for troubleshooting involves a basic mismatch in protocol usage between the client and the server.

Users like Oliver, who make regular use of the CLI can encounter this type of issue while operating a development mode server, for example.

If Oliver starts Vault in development mode without specifying any flags, like this:

$ vault server -dev

The development mode server starts with operational logging emitted to the standard output; an abbreviated example of that output follows.

==> Vault server configuration:

Administrative Namespace:
             Api Address: http://127.0.0.1:8200
                     Cgo: disabled
         Cluster Address: https://127.0.0.1:8201
...snip...
WARNING! dev mode is enabled! In this mode, Vault runs entirely in-memory
and starts unsealed with a single unseal key. The root token is already
authenticated to the CLI, so you can immediately begin using Vault.

You may need to set the following environment variables:

    $ export VAULT_ADDR='http://127.0.0.1:8200'

...snip...

The Vault server is ready. If Oliver attempts to access the server in another terminal session without exporting the proper VAULT_ADDR environment variable or passing the -address flag, commands can fail.

For example, Oliver checks the server status:

$ vault status

The command fails with the following useful error message:

WARNING! VAULT_ADDR and -address unset. Defaulting to https://127.0.0.1:8200.
Error checking seal status: Get "https://127.0.0.1:8200/v1/sys/seal-status": http: server gave HTTP response to HTTPS client

The client warns that both VAULT_ADDR and the flag -address have unset values, and mentions the value it will use instead, "https://127.0.0.1:8200". The client also returns an error message that includes "http: server gave HTTP response to HTTPS client".

The Vault CLI expects to use an HTTPS connection to the server by default.

Since Oliver started the development mode server without using the flag to enable built-in TLS, the server started with an insecure HTTP listener. The CLI needs HTTPS, but the server does not have a TLS enabled listener in this case, and the CLI exits with the error.

The solution to this issue is actually contained as a tip in the server output:

You may need to set the following environment variables:

    $ export VAULT_ADDR='http://127.0.0.1:8200'

Oliver can export the VAULT_ADDR environment variable, and specify that the CLI use an HTTP URL for the server address.

export VAULT_ADDR='http://127.0.0.1:8200'

Now the CLI commands work as expected.

$ vault status

Example output:

Key             Value
---             -----
Seal Type       shamir
Initialized     true
Sealed          false
Total Shares    1
Threshold       1
Version         1.16.1
Build Date      2024-04-03T12:35:53Z
Storage Type    inmem
Cluster Name    vault-cluster-ffe35744
Cluster ID      7afe2b22-02bc-5e2c-34a8-7c74706dd719
HA Enabled      false

When running Vault in dev mode, how do you know what address to set for the VAULT_ADDR environment variable?

Reveal answer

When the Vault startup process completes, it will print the VAULT_ADDR address to use.

Troubleshoot a use case issue

(Persona: Developer)

Danielle is leading the development team on NewCup, a brand new application prototype that enables customers to design their own cups using powerful AI models.

They should have access to manage a new collection of related secrets in the developer Vault cluster through the K/V secrets engine enabled at the path project-newcup-secrets. Their access should include the ability to list, create, read, and update all secrets at this path.

Danielle knows that one of the secrets they can manage is the newcup-aggregator aggregator API key, which another member of the developer team has written to the secrets engine.

They use the Vault API to read this secret:

$ curl \
    --silent \
    --header "X-Vault-Token: $VAULT_TOKEN" \
    $VAULT_ADDR/v1/project-newcup-secrets/data/newcup-aggregator \
    | jq

Vault responds with the secret contents, including the aggregator API key:

1 2 3 4 5 6 7 8 9 1011121314151617181920212223{
  "request_id": "c5f9ccdd-cd3f-75b4-c052-a81a52506b78",
  "lease_id": "",
  "renewable": false,
  "lease_duration": 0,
  "data": {
    "data": {
      "api-key": "3DC9E750-B2D0-48B5-9234-53B0237961FE",
      "project": "newcup-api"
    },
    "metadata": {
      "created_time": "2024-07-10T12:46:14.374772274Z",
      "custom_metadata": null,
      "deletion_time": "",
      "destroyed": false,
      "version": 1
    }
  },
  "wrap_info": null,
  "warnings": null,
  "auth": null,
  "mount_type": "kv"
}

Danielle checks to learn if the other project secrets are available by listing the contents of /project-newcup-secrets/metadata/:

$ curl \
    --silent \
    --request LIST \
    --header "X-Vault-Token: $VAULT_TOKEN" \
    $VAULT_ADDR/v1/project-newcup-secrets/metadata/

Example output:

{
  "errors": [
    "1 error occurred:\n\t* permission denied\n\n"
  ]
}

Danielle did not expect a permission denied error, and they should have the ability to list the secrets, so they start troubleshooting the issue.

Given that this issue involves permission to the secret, Danielle can begin troubleshooting by examining the ACL policies associated with their token through a token self-lookup.

$ curl \
    --silent \
    --header "X-Vault-Token: $VAULT_TOKEN" \
    $VAULT_ADDR/v1/auth/token/lookup-self \
    | jq

Example output:

1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930313233"request_id": "be564785-b2cc-6144-9b6f-9af048715aa8",
"lease_id": "",
"renewable": false,
"lease_duration": 0,
"data": {
  "accessor": "gdcI63UU8r82i4aWf3XEqFLt",
  "creation_time": 1720617312,
  "creation_ttl": 2764800,
  "display_name": "token",
  "entity_id": "",
  "expire_time": "2024-08-11T13:15:12.63603008Z",
  "explicit_max_ttl": 0,
  "id": "hvs.CAESIBT6b5vjwt-Xz7Os191-AIYpmXSj8Oro7ufL5xKDdisnGh4KHGh2cy5WZDJabXg5bEF6MG9haGVDSDlyd0ZPeWE",
  "issue_time": "2024-07-10T13:15:12.636051263Z",
  "meta": null,
  "num_uses": 0,
  "orphan": false,
  "path": "auth/token/create",
  "policies": [
    "default",
    "developers-base",
    "project-newcup-developers",
    "project-sip-developers",
  ],
  "renewable": true,
  "ttl": 2677460,
  "type": "service"
},
"wrap_info": null,
"warnings": null,
"auth": null,
"mount_type": "token"
}

The results of Danielle's token lookup hold a list of policies associated with their token, and include the project-newcup-developers policies in that list.

Danielle does not have the capability to read the policy to know if it is correct.

curl \
    --silent \
    --header "X-Vault-Token: $VAULT_TOKEN" \
    $VAULT_ADDR/v1/sys/policies/acl/project-newcup-developers \
    | jq

Example output:

{
  "errors": [
    "1 error occurred:\n\t* permission denied\n\n"
  ]
}

Danielle reaches out to the operations team through a support request to confirm that the project-newcup-developers policy includes the capability to list secrets.

(Persona: Operations)

Oliver from operations can access the Vault audit device log content from the SIEM solution and find the corresponding request that Danielle made. Here is the raw JSON from that request as it appears in the audit device log.

1 2 3 4 5 6 7 8 9 10111213141516171819202122232425262728293031323334353637383940414243{
  "auth": {
    "accessor": "hmac-sha256:737024561db4b427d21aebbabcfb8b50b4c1b4bca5586fdbdb19b40a7b8ba595",
    "client_token": "hmac-sha256:a066b2a77e3189a5a84cc850cd26968f1df2a7ec56bae4ef8159ccffc52d3b1e",
    "display_name": "token",
    "policies": [
      "default",
      "developers-base",
      "project-newcup-developers",
      "project-sip-developers",
    ],
    "policy_results": {
      "allowed": false
    },
    "token_policies": [
      "default",
      "project-newcup-developers"
    ],
    "token_issue_time": "2024-07-10T13:15:12Z",
    "token_ttl": 2764800,
    "token_type": "service"
  },
  "error": "1 error occurred:\n\t* permission denied\n\n",
  "request": {
    "client_id": "PRS5y/+w5WIHBHxxS5DxFnmE5cxIHcNRXa/ij/pbE0o=",
    "client_token": "hmac-sha256:3597e9e721e49fd337916dc52ef8db74e4245defe2bded55e1dfd2a1fb94a625",
    "client_token_accessor": "hmac-sha256:737024561db4b427d21aebbabcfb8b50b4c1b4bca5586fdbdb19b40a7b8ba595",
    "id": "9aa0a8f4-f9e1-72e6-d844-789dce22d708",
    "mount_class": "secret",
    "mount_point": "project-newcup-secrets/",
    "mount_running_version": "v0.19.0+builtin",
    "mount_type": "kv",
    "namespace": {
      "id": "root"
    },
    "operation": "list",
    "path": "project-newcup-secrets/metadata/",
    "remote_address": "192.168.65.1",
    "remote_port": 16452
  },
  "time": "2024-07-10T13:26:01.890660588Z",
  "type": "request"
}

In addition to several handy data points, the request information also holds the permission denied error, along with the path of the request operation and the type of operation requested.

Oliver agrees that this is possibly due to a policy issue, and reads the project-newcup-developers policy to confirm:

$ vault policy read project-newcup-developers

Example output:

path "project-newcup-secrets/+/*" {
  capabilities = ["create", "read", "update"]
}

The capabilities do not include "list". This is the root cause of the permission denied error that has Danielle and the NewCup project team blocked.

Oliver updates the policy to add the list capability. Since policy updates are not retroactive, they ask Danielle to authenticate to Vault for a new token with the updated policy attached.

(Persona: Developer)

Danielle authenticates to Vault, and gets a new token with the updated project-newcup-developers policy attached. They are able to list all secrets in the project-newcup-secrets secrets engine now:

$ curl \
    --silent \
    --request LIST \
    --header "X-Vault-Token: $VAULT_TOKEN" \
    $VAULT_ADDR/v1/project-newcup-secrets/metadata/

Example output:

{
  "request_id": "ca2f609f-a609-c72a-db9f-92e31e4dbc9b",
  "lease_id": "",
  "renewable": false,
  "lease_duration": 0,
  "data": {
    "keys": [
      "newcup-aggregator",
      "newcup-emitter",
      "newcup-team-backups",
      "newcup-team-credentials"
    ]
  },
  "wrap_info": null,
  "warnings": null,
  "auth": null,
  "mount_type": "kv"
}

Summary

Vault provides logs and metrics to help you identify and support your Vault deployment. You can ship both logs and metrics to observability and SIEM tools to consume this critical information.

Understand static and dynamic secrets

Next steps