Detecting Drift

One of the core challenges of infrastructure as code is keeping an up-to-date record of all deployed infrastructure and their properties. Terraform manages this by maintaining state information in a single file, called the state file.

Terraform uses declarative configuration files to define the infrastructure resources to provision. This configuration serves as the target source of truth for what exists on the backend API. Changes to Infrastructure outside of Terraform will be detected as deviation by Terraform and shown as a diff in future runs of terraform plan. This type of change is referred to as "drift", and its detection is an important responsibility of Terraform in order to inform users of changes in their infrastructure. Here are a few techniques for developers to ensure drift is detected.

Capture all state in READ

A provider's READ method is where state is synchronized from the remote API to Terraform state. It's essential that all attributes defined in the schema are recorded and kept up-to-date in state. Consider this provider code:

// resource_example_simple.go
package example

func resourceExampleSimple() *schema.Resource {
    return &schema.Resource{
        Read:   resourceExampleSimpleRead,
        Create: resourceExampleSimpleCreate,
        Schema: map[string]*schema.Schema{
            "name": {
                Type:     schema.TypeString,
                Required: true,
                ForceNew: true,
            },
            "type": {
                Type:     schema.TypeString,
                Optional: true,
            },
        },
    }
}

func resourceExampleSimpleRead(d *schema.ResourceData, meta any) error {
   client := meta.(*ProviderApi).client
   resource, _ := client.GetResource(d.Id())
   d.Set("name", resource.Name)
   d.Set("type", resource.Type)
   return nil
}

As defined in the schema, the type attribute is optional, now consider this config:

# config.tf
resource "simple" "ex" {
   name = "example"
}

Even though type is omitted from the config, it is vital that we record it into state in the READ function, as the backend API could set it to a default value. To illustrate the importance of capturing all state consider a configuration that interpolates the optional value into another resource:

resource "simple" "ex" {
   name = "example"
}

resource "another" "ex" {
  name = "${simple.ex.type}"
}

Update state after modification

A provider's CREATE and UPDATE functions will create or modify resources on the remote API. APIs might perform things like provide default values for unspecified attributes (as described in the above example config/provider code), or normalize inputs (lower or upper casing all characters in a string). The end result is a backend API containing modified versions of values that Terraform has in its state locally. Immediately after creation or updating of a resource, Terraform will have a stale state, which will result in a detected deviation on subsequent plan or applys, as Terraform refreshes its state and wants to reconcile the diff. Because of this, it is standard practice to call READ at the end of all modifications to synchronize immediately and avoid that diff.

func resourceExampleSimpleRead(d *schema.ResourceData, meta any) error {
   client := meta.(*ProviderApi).client
   resource, _ := client.GetResource(d.Id())
   d.Set("name", resource.Name)
   d.Set("type", resource.Type)
   return nil
}

func resourceExampleSimpleCreate(d *schema.ResourceData, meta any) error {
   client := meta.(*ProviderApi).client
   name := d.Get("name").(string)
   client.CreateResource(name)
   d.SetId(name)
   return resourceExampleSimpleRead(d, meta)
}

Error checking aggregate types

Terraform schema is defined using primitive types and aggregate types. The preceding examples featured primitive types which don't require error checking. Aggregate types on the other hand, schema.TypeList, schema.TypeSet, and schema.TypeMap, are converted to key/value pairs when set into state. As a result the Set method must be error checked, otherwise Terraform will think it's operation was successful despite having broken state. The same can be said for error checking API responses.

# config.tf
resource "simple" "ex" {
   name = "example"
   type = "simple"
   tags = {
      name = "example"
   }
}

// resource_example_simple.go
package example

func resourceExampleSimple() *schema.Resource {
    return &schema.Resource{
        Read:   resourceExampleSimpleRead,
        Create: resourceExampleSimpleCreate,
        Schema: map[string]*schema.Schema{
            "name": {
                Type:     schema.TypeString,
                Required: true,
                ForceNew: true,
            },
            "type": {
                Type:     schema.TypeString,
                Optional: true,
            },
            "tags": {
                Type:     schema.TypeMap,
                Optional: true,
            },
        },
    }
}

func resourceExampleSimpleRead(d *schema.ResourceData, meta any) error {
   client := meta.(*ProviderApi).client
   resource, err := client.GetResource(d.Id())
   if err != nil {
      return fmt.Errorf("error getting resource %s: %s", d.Id(), err)
   }
   d.Set("name", resource.Name)
   d.Set("type", resource.Type)
   if err := d.Set("tags", resource.TagMap); err != nil {
      return fmt.Errorf("error setting tags for resource %s: %s", d.Id(), err)
   }
   return nil
}

Use Schema Helper methods

As mentioned, remote APIs can often perform mutations to the attributes of a resource outside of Terraform's control. Common examples include data containing uppercase letters and being normalized to lowercase, or complex defaults being set for unset attributes. These situations expectedly result in drift, but can be reconciled by using Terraform's schema functions, such as DiffSuppressFunc or DefaultFunc.