Consul Multi-Cluster reference architecture
This guide applies to Consul versions 1.8 - 1.10.
This guide describes recommended best practices for Infrastructure Architects and Operators to follow when deploying multiple federated Consul clusters in a production environment. This guide complements the single cluster deployment guide as many of the same recommendations are relevant. These recommendations have also been encoded into official Terraform modules for:
Note
If you are deploying Consul to Kubernetes, please refer to the Consul on Kubernetes Reference Architecture.
WAN federation of Consul clusters allows you to deploy the same services in different datacenter locations or cloud regions and discover services across datacenter locations or cloud regions . The Consul clusters operate independently and communicate over the WAN on specific ports (ports vary according to the approach used below). Unless explicitly configured via CLI or API, Consul servers will only return results from their local datacenter location or cloud region.
There are two primary approaches for federating multiple Consul clusters together:
- Basic WAN Federation - allows for discovery in all datacenter locations or cloud regions; requires the ability for all Consul server agents in all datacenter locations or cloud regions to be able to communicate with each other via TCP/UDP port 8302 (WAN gossip) & TCP port 8300 (Remote RPC).
- Advanced WAN Federation, with network areas - allows for service discovery and service mesh in defined datacenter locations; requires an Enterprise license; requires a single TCP port 8300 (Remote RPC)
Recommended architecture
Basic WAN federation
There are several important considerations for basic WAN federation.
- All Consul clusters must be connected as a full-service mesh. This means that all Consul servers, in each datacenter location, must be able to communicate to each other via RPC and gossip.
- Each Consul cluster maintains its’ own separate LAN gossip pools.
- Secure Consul clusters require TLS encryption, gossip encryption, and ACL replication.
- All Consul servers in all federated datacenter locations must use RPC certificates signed by the same Certificate Authority (CA). All Consul datacenter names should be referenced in the Subject Alternate Name (SAN) certificate (e.g. server.dc1.consul, server.dc2.consul, etc.)
- The first Consul cluster created is designated as primary and is the authority for some global state (e.g. ACLs, Intentions, Connect CA).
For organizations with large numbers of datacenter locations, it becomes increasingly difficult to support a fully connected mesh. It is often desirable to use topologies like hub-and-spoke where one location acts as a central management location ("hub") and other locations can communicate with the hub but not with each other ("spoke"). Advanced WAN Federation addresses this concern and is described in the next section below.
Advanced WAN federation
There are several important considerations for Consul clusters leveraging advanced WAN federation.
- Advanced WAN federation supports ‘hub & spoke’ Consul clusters, enabling partially connected network topologies.
- Only remote RPC connectivity is required as WAN gossip is not needed with this approach.
- Each Consul cluster maintains its own separate LAN gossip pools.
- Secure Consul clusters require TLS encryption and ACL replication.
- All Consul servers in federated Consul datacenters must use RPC certificates signed by the same Certificate Authority (CA). All Consul datacenter names should be referenced in the SAN certificate (e.g. server.dc1.consul, server.dc2.consul, etc.).
To better illustrate network areas, it may be helpful to compare both basic WAN federation and advanced WAN federation. In the diagram below, all Consul clusters are connected to each other to allow for both RPC and WAN gossip communication. For large enterprises, it is often not practical to have this many connections across their complex network topologies as the number of connections increases exponentially with the number of datacenter locations.
With advanced WAN federation, you can enable a 'hub and spoke' model. In a 'hub and spoke' model, all Consul clusters do not need to communicate with all other Consul clusters. Advanced WAN federation allows for a more centralized approach and communication between various Consul clusters can be controlled. In the example below; Consul cluster 1 can communicate with the other Consul clusters, but Consul clusters 2, 3, and 4 cannot communicate directly with each other.
Replication
In general, Consul does not replicate data between multiple Consul clusters. However, there are some special situations where a limited subset of data can be replicated.
Consul’s built-in ACL replication capability enables authentication for agents and services between Consul clusters. The secondary Consul cluster can provide failover for all ACL components created in the primary Consul cluster. Sharing policies reduces operational overhead for the operator. In addition to ACL replication, the following tools can be leveraged for replicating K/V store data between Consul clusters:
- kv import/kv export (move entire directories between Consul clusters)
- consul-replicate - cross Consul cluster KV replication
Note
As referenced in the Consul Security Considerations documentation, it is good practice to enable TLS server name-checking in order to avoid accidental cross-joining of Consul agents.
Failover options
Consul's prepared queries allow clients to failover to another Consul cluster for service discovery. For example, if an application in the local Consul cluster goes down, a prepared query lets users define a geographic fallback order to the nearest datacenter location to check for healthy instances of the same service. Additionally, prepared queries support dynamic and hybrid policies that allow for greater flexibility.
Note
Consul clusters must be WAN linked for a prepared query to work across datacenter locations. For additional disaster recovery failover considerations, refer to the Consul Disaster Recovery Considerations documentation.
Next steps
- Consul Deployment Guide
- Production Readiness Checklist
- Consul Disaster Recovery Considerations
- Consul Security Considerations
- Federate Multiple Datacenters Using WAN Gossip
- Federate Multiple Datacenters with Network Areas
- ACL Replication for Multiple Datacenters