Monitor Consul server health and performance with metrics and logs
Consul server metrics and logs give you detailed statistical and performance information about your Consul cluster. Metrics provide a general overview of system health and performance, while logs provide context and details used to diagnose issues and identify the root cause of problems. Once you enable these Consul observability features, Consul emits runtime metrics and operational logs of its subsystems.
In this tutorial, you will enable Consul server metrics and server logging for your Consul cluster. You will use Grafana to explore dashboards that provide information regarding health, performance, and operations for your Consul cluster. In the process, you will learn how using these features can provide you with deep insights into the operational health and performance of your Consul cluster.
Scenario overview
To begin this tutorial, you will use Terraform to deploy a self-managed Consul cluster and an observability suite on Elastic Kubernetes Service (EKS).
Each Consul server can emit server metrics and server logs that contain timings, protocols, and additional information for analyzing the health and performance of your Consul cluster. By configuring the Consul Helm chart, you can configure your Consul servers to emit this observability information so Prometheus and Promtail can scrape and store the data. You can then visualize the metrics and logs with Grafana.
In this tutorial, you will:
- Deploy the following resources with Terraform:
- Elastic Kubernetes Service (EKS) cluster
- A self-managed Consul datacenter on EKS
- Grafana, Prometheus, and Loki on EKS
- Perform the following Consul control plane procedures:
- Review and enable servers metrics and server logging features
- Explore dashboards with Grafana
Prerequisites
The tutorial assumes that you are familiar with Consul and its core functionality. If you are new to Consul, refer to the Consul Getting Started tutorials collection.
For this tutorial, you will need:
- An AWS account configured for use with Terraform
- (Optional) An HCP account
- aws-cli >= 2.0
- terraform >= 1.0
- consul >= 1.17.0
- consul-k8s >= 1.2.0
- helm >= 3.0
- git >= 2.0
- kubectl > 1.24
Clone GitHub repository
Clone the GitHub repository containing the configuration files and resources.
Change into the directory that contains the complete configuration files for this tutorial.
Review repository contents
This repository contains Terraform configuration to spin up the initial infrastructure and all files to deploy Consul, the demo application, and the observability suite resources.
The eks
directory contains the following Terraform configuration files:
aws-vpc.tf
defines the AWS VPC resourceseks-cluster.tf
defines Amazon EKS cluster deployment resourceseks-consul.tf
defines the self-managed Consul deploymenteks-observability.tf
defines the Prometheus, Promtail, Loki, and Grafana resourcesoutputs.tf
defines outputs you will use to authenticate and connect to your Kubernetes clusterproviders.tf
defines AWS and Kubernetes provider definitions for Terraformvariables.tf
defines variables you can use to customize the tutorial
The directory also contains the following subdirectories:
../../dashboards
contains the JSON configuration files for the example Grafana dashboardsconfig
contains custom Consul ACL configuration file and the Consul synthetic load generator configuration filehelm
contains the Helm charts for Consul, Prometheus, Promtail, Loki, and Grafana
Deploy infrastructure and demo application
With these Terraform configuration files, you are ready to deploy your infrastructure. Initialize your Terraform configuration to download the necessary providers and modules.
Then, deploy the resources. Confirm the run by entering yes
.
The Terraform deployment could take up to 15 minutes to complete.
Connect to your infrastructure
Now that you have deployed the Kubernetes cluster, configure kubectl
to interact with it.
Ensure all services are up and running successfully
Check the pods across all namespaces to confirm they are running successfully.
Configure your CLI to interact with Consul datacenter
In this section, you will set environment variables in your terminal so your Consul CLI can interact with your Consul datacenter. The Consul CLI reads these environment variables for behavior defaults and will reference these values when you run consul
commands.
Set the Consul server destination address.
Retrieve the ACL bootstrap token from the respective Kubernetes secret and set it as an environment variable.
Remove SSL verification checks to simplify communication to your Consul datacenter.
In a production environment, we recommend keeping this SSL verification set to true
. Only remove this verification if you have a Consul datacenter without TLS configured in development environment and demonstration purposes.
Verify that you can communicate with your Consul cluster by printing all known nodes and the metadata about them.
Enable Consul server metrics and logging
Consul server metrics and logs provide you with detailed health and performance information for your Consul clusters. In this section, you will review the parameters that enable these features and update your Consul installation to apply the new configuration.
Review the Consul values file
Consul lets you expose metrics and logs for your server pods so they may be scraped by a Prometheus service that is outside of your service mesh. Review these snippets from the helm/consul-v2-telemetry.yaml
configuration file to see the parameters that enable these features.
Consul metrics are only exposed on port 8500
. Setting httpOnly: false
in the TLS block allows Prometheus to scrape this port for metrics.
The following block enables metrics for all agents in your Consul datacenter.
This block configures your Consul servers to emit server logs.
Refer to the Consul metrics for Kubernetes documentation and official Helm chart values to learn more about metrics configuration options and details.
Update the Consul deployment
Update Consul in your Kubernetes cluster with Consul K8S CLI to let Prometheus collect metrics from your Consul servers. Confirm the run by entering y
.
Refer to the Consul K8S CLI documentation to learn more about additional settings.
The Consul update could take up to 5 minutes to complete.
Review the official Helm chart values to learn more about these settings.
Configure the anonymous ACL policy
In addition to configuring Consul, you need to modify the anonymous ACL policy to allow agent:read
permissions so Prometheus can scrape metrics from the secure Consul servers. Other permissions in the included file will allow the Consul load generator service to communicate with the respective Consul features.
Review the Consul ACL Policies documentation to learn more.
Note
In a production environment, we recommend using the Prometheus Consul Exporter for the most secure, restrictive access to Consul metrics on port 8501
.
Deploy the Consul load generator
Deploy the Consul load generator to create synthetic loads for KV, service registration, and the ACL engine. This will create more realistic visualizations in your Grafana dashboards.
Explore Consul health and performance dashboards
Consul control plane metrics and logs provide you with detailed health and performance information for your Consul servers. In this section, you will use Grafana to examine how this information provides insights into your Consul control plane.
Explore Consul telemetry dashboard
Navigate to the control plane monitoring dashboard.
The example dashboards take a few minutes to populate with data after the telemetry metrics feature is enabled.
This dashboard provides several sections that give you a variety of information for your Consul control plane. These graphs can be useful to analyze the health of your Consul server pods to identify any anomalies in behavior.
Notice that the System Stats
tab includes CPU usage and memory usage metrics. High metrics in these areas can cause long loading times, slow performance, and unexpected crashes.
Now, click on the Consul Server Behavior
tab. This tab gives insight into the health of Consul's raft protocol, with higher than average numbers indicating slowdowns in reaching a state of concensus between Consul servers.
Click on the Feature: Catalog
tab.
This tab provides health information about the registration/deregistration of nodes, services, and checks in Consul. This can provide useful insight into the load pressure on each of your Consul servers.
Tip
Consul telemetry metrics contain a large set of statistics that you can use to create custom dashboards for monitoring your Consul clusters according to your production environment's unique requirements. Refer to the Consul telemetry overview for a complete list and description of available metrics.
Explore Consul server logs dashboard
Navigate to the control plane logs dashboard.
The Grafana dashboard may take a few moments to fully load in your browser.
Notice that the example dashboard panes provide detailed event and error insights for your Consul control plane.
For example, the RPC Server Call Request Type Distribution
pie chart gives you the read/write ratio of RPC server calls in your Consul cluster during a specific time window.
Type request_type=write
in the search field to look deeper into the server logs.
Notice how this action applies a filter to the respective visualizations and raw logs containing that value so you can zoom into error logs for further analysis and troubleshooting. Click on one of the raw logs to view the entire access log contents.
Notice that you can explore the other fields associated with your search terms to learn more information about a particular error or event.
(Optional) Enable HCP Consul Central Observability
HCP Consul Central observability provides you with detailed health and performance information for your self-managed or HCP Consul Dedicated clusters. HCP Consul Central provides a fast time-to-value for visualizing this information without the need to maintain or create your own observability suite. It also provides a centralized observability location for platform teams that manage, monitor, and observe services across entire organizations.
In this section, you will link your self-managed cluster to HCP and examine how these metrics provide insights into your Consul control plane.
Link your self-managed Consul cluster to HCP
Login to the HCP cloud portal in your browser. Click Get Started with Consul.
Click Self-Managed Consul and select Link existing for the linking method. Click the Get Started button once complete.
Enter a name for your Consul cluster, select the Kubernetes runtime, and select Read/Write as the cluster access mode. We recommend using the cluster’s datacenter name as the cluster ID in this field. Click the Continue button once complete.
Select your preferred tool for updating your Consul deployment, deselect the Consul telemetry collector checkbox, then only perform the first step to set secrets to authenticate with HCP.
The Consul telemetry collector collects metrics from the Consul data plane. This tutorial focuses on Consul control plane. Follow the Consul proxy metrics tutorial to learn more about how HCP Consul Central can provide you with data plane metrics.
Confirm you set the Kubernetes secrets required for linking your self-managed Consul cluster to HCP Consul Central. You should find three secrets that start with consul-hcp
.
Review the Consul values file
Consul lets you connect your self-managed cluster with HCP Consul. Review the snippet in the values file below to see the parameters that enable this feature.
Link your cluster to HCP Consul Central
Configure your Consul cluster to link to HCP Consul Central.
Update Consul in your Kubernetes cluster with Consul K8S CLI. Confirm the run by entering y
.
The Consul update could take up to 5 minutes to complete.
Review the official Helm chart values to learn more about these settings.
(Optional) Explore HCP Consul observability dashboard
HCP Consul control plane metrics provide you with detailed health and performance information for your self-managed or HCP Consul Dedicated clusters. In this section, you will examine how these metrics provide insights into your Consul control plane.
Return to the HCP dashboard page in your browser. It may take a moment to sync with your self-managed Consul cluster.
Click Observability on the navigation pane and scroll through the server health section to explore the observability insights.
HCP Consul contains a large set of statistics that you can utilize to monitor your control plane. Refer to the HCP Consul Central observability documentation for a complete list and description of available metrics.
Clean up self-managed HCP resources
Open the HCP Consul Central portal and unlink your self-managed cluster to clean up your HCP resources.
Clean up resources
Destroy the Terraform resources to clean up your environment. Confirm the destroy operation by inputting yes
.
Note
Due to race conditions with the cloud resources in this tutorial, you may need to run the destroy
operation twice to remove all the resources.
Next steps
In this tutorial, you enabled Consul server metrics and logs to enhance the health and performance monitoring of your Consul cluster. This integration offers increased control plane understanding, reduced operational overhead, and faster incident resolution.
For more information about the topics covered in this tutorial, refer to the following resources: