Scale Terraform Enterprise instances hosted on Docker
This topic describes how to migrate Terraform Enterprise instances hosted on Docker to active-active
mode so that you can scale your deployment.
Introduction
When your organization requires increased reliability or performance from Terraform Enterprise that your current single application instance cannot provide, we recommend switching to active-active
mode. In this mode, Terraform Enterprise connects to external systems that store and manage application and state data.
Operating Terraform Enterprise in active-active
mode improves application scalability, but it also increases operational complexity. Consider the following aspects of operating Terraform Enterprise in active-active
mode:
- Observability concerns when monitoring multiple instances
- Custom automation required to manage the lifecycle of application nodes
- CLI-based commands for administration
Note: Contact your Customer Success Manager before attempting to follow this guide. They can walk you through the process and make it as seamless as possible.
Prerequisite
The primary requirement for active-active
is an auto-scaling group or equivalent with a single instance running Terraform Enterprise. This auto-scaling group (ASG) should be behind a load balancer, and you can expose it to the public Internet, depending on your requirements.
As mentioned earlier, your Terraform Enterprise application installation should be completely automated to ensure the auto-scaling group can scale down to zero and back up to one without human intervention.
Note: Operating Terraform Enterprise in active-active
mode on VMware infrastructure requires configuring a load balancer to route traffic across Terraform Enterprise servers. This documentation does not cover that setup. While auto-scaling groups are unavailable via native vCenter options, you must still configure a fully automated deployment. You must also reduce the available servers to one for upgrades, maintenance, and support.
Your Terraform Enterprise application must be configured to run in external
operational mode to connect to an external PostgreSQL database and object storage.
Step 1: Prepare to Externalize Redis
Prior to reconfiguring Terraform Enterprise, Redis must be externalized. Redis is used for background work scheduling across multiple nodes in an active-active
installation.
Prepare Network
There are new access requirements involving ingress and egress:
- Port 6379 (or the port the external Redis uses) must be open between the nodes and the Redis service.
- Port 8201 must be open between the nodes to allow Vault to run in High Availability mode.
Provision Redis
Externalizing Redis allows multiple active application nodes. Terraform Enterprise is installed as a standard product on VMware machines and validated to work with the native Redis services from AWS, Azure, and GCP. Terraform Enterprise is not compatible with Redis Cluster or Redis Sentinel.
Refer to the cloud-specific configuration guides for more details.
Step 2: Update your Configuration File Templates
Before installing, you must change the templates for the configuration files mentioned in the prerequisites.
Update Application Settings
Your existing Terraform Enterprise application settings are still necessary, but must be expanded. Refer to Configuration Reference for a full list of configuration options.
Enable active-active
mode
Update the Terraform Enterprise configuration to reflect the active-active
operational mode:
Key | Required Value | Specific Format Required |
---|---|---|
TFE_OPERATIONAL_MODE | active-active | Yes, string. |
Configure External Redis
You must also expand your Terraform Enterprise application settings to support an external Redis instance:
Key | Required Value | Specific Format Required |
---|---|---|
TFE_REDIS_HOST | Hostname in host:port format of an external Redis instance. | Yes, string. |
TFE_REDIS_USE_AUTH* | Set to true , if you are using a Redis service that requires a password. | Yes, boolean. |
TFE_REDIS_USER* | User used to authenticate to Redis. | Yes, string. |
TFE_REDIS_PASSWORD* | User used to authenticate to Redis. | Yes, string. |
TFE_REDIS_USE_TLS* | Set to true if you are using a Redis service that requires TLS. | Yes, boolean. |
* Fields marked with an asterisk are only necessary if your particular external Redis instance requires them.
To use in-transit encryption with GCP Memorystore for Redis, you must download the CA certificate for your Redis instance and configure it within the ca_certs
Terraform Enterprise application setting. Additionally, ensure to configure the redis_port
and redis_use_tls
settings correctly.
Add Encryption Password
Add the encryption password value to your configuration. The password must be identical between node instances for the active-active
architecture to function:
Key | Description | Value can change between deployments? | Specific Format Required |
---|---|---|---|
TFE_ENCRYPTION_PASSWORD | Used to encrypt sensitive data | No. Changing makes decrypting existing data impossible. | No |
Step 3: Connect to External Redis
Once you are prepared to include the modified configuration options in your configuration files, you must connect a single node to your newly provisioned Redis service by rebuilding your node instance with the new settings.
Re-provision Terraform Enterprise Instance
Terminate the existing instance by scaling down to zero. Once terminated, you can scale back up to one instance using your revised configuration.
Wait for Terraform Enterprise to Install
It can take up to 15 minutes for the node to provision and install the Terraform Enterprise application. You can monitor the provisioning status by watching your auto scaling group in your cloud’s web console. To confirm the successful implementation of the Terraform Enterprise application you can use the tfectl
CLI tool in the Terraform Enterprise container to monitor the application status:
Refer to the CLI reference for more status and troubleshooting commands.
Validate Application
With installation complete, it is time to validate the new Redis connection. Terraform Enterprise uses Redis both as a cache for API requests and a queue for long running jobs (e.g., Terraform runs). Test the queue for long running jobs by running real Terraform operations through the system.
Once you are satisfied the application is running as expected, you can move on to step 4 to scale up to two nodes.
Step 4: Scale to Two Nodes
You can now safely change the number of instances in your auto scaling group (or equivalent) to two.
Scale Down to Zero Nodes
Scale down to zero nodes to fully disable the admin dashboard. Wait until the the existing instance is terminated.
Scale Up to Two Nodes
Now that you have tested your external Redis connection change the min and max instance count of your Auto Scaling Group to two nodes.
Wait for Terraform Enterprise to Install
You need to wait up to 15 minutes for the application to respond as healthy on both nodes. Monitor the status of the install using the same methods.
Note that you must check each node independently.
Validate Application
Finally, confirm the application is functioning as expected when running multiple nodes by running Terraform plans and applying them through the system (and any other tests specific to your environment).
Confirm the general functionality of the Terraform Enterprise user interface to validate the tokens you added in Step 2 are set correctly. Browse the Run
interface and your organization's private registry to confirm your application functions as expected.
Scaling Beyond Two Nodes
Terraform Enterprise supports scaling up to five nodes as part of the Active/Active deployment. When scaling beyond two nodes, you should also carefully evaluate and scale external services, particularly the database server. Regardless of the number of nodes, you must drain and scale down to a single node before upgrading.
PostgreSQL Server
The Terraform Enterprise PostgreSQL server will typically hit the CPU capacity before other resources, so we recommend closely monitoring the CPU in a two-node configuration before scaling up to three or more nodes. You may also need to manually modify the database maximum connection count to allow for the additional load. Defaults vary, so please refer to the documentation for the cloud hosting your installation.
- AWS - RDS connection limits
- AWS - Aurora Scaling
- Azure - Azure Database Limits
- Google Cloud - Cloud SQL Quotes and Limits
- PostgreSQL 12 - Connection Documentation
Redis Server
Some workloads may rarely cause spikes in the Redis server CPU or memory. We recommend monitoring the Redis server and scaling it up as necessary.
- AWS - Monitoring ElastiCache for Redis with CloudWatch
- Azure - Monitor Azure Cache for Redis
- Google Cloud - Monitoring Redis Instances
Network Infrastructure/API Limits
As you scale Terraform Enterprise beyond two nodes, you may be adding additional stress to your network and dramatically increasing the number of API calls made in your cloud account. Each cloud has its own default limits and processes by which those limits might be increased. Please refer to the documentation for the cloud hosting your installation.
- AWS - EC2 instance network limits
- AWS - Request Throttling for the EC2 API
- Azure - Virtual Machine Network Limits
- Azure - Resource Manager Throttling
- Google Cloud - Network Quotas and Limits
- Google Cloud - API rate limits
Depending on your infrastructure and Terraform Enterprise configuration, you may need to configure your application gateway or load balancer for sticky sessions. Sticky session refers to the practice of using a load balancer or gateway device with a specific setting enabled that ensures traffic is routed back to the original system that initiated a request. For example, an Active/Active deployment on Azure with SAML authentication requires sticky sessions to ensure the authentication with the SAML server is successful. The terminology for this varies across clouds. Refer to the documentation for your infrastructure.
- AWS - Sticky Sessions for your Application Load Balancer
- AWS - Configure sticky sessions for your Classic Load Balancer
- Azure - Application Gateway Cookie-based affinity
- Azure - Load Balancer distribution modes: Session Persistence
- Google Cloud - Session affinity
HCP Terraform agents - Alternative Solution
Instead of scaling Terraform Enterprise beyond two to five nodes, you can use HCP Terraform agents. HCP Terraform agents can run in other regions, other clouds, and even private clouds. Agents poll Terraform Enterprise for work and then Terraform plans and applies will run on the target system that has the agent executable installed. This has a much smaller impact on the Terraform Enterprise servers than running Terraform locally.