How To Protect Sensitive Data in Terraform

Updated on November 2, 2021

Terraform

DigitalOcean Spaces

Infrastructure

By Savic and Kathryn Hancox

How To Protect Sensitive Data in Terraform

The author selected the Free and Open Source Fund to receive a donation as part of the Write for DOnations program.

Introduction

Terraform provides automation to provision your infrastructure in the cloud. To do this, Terraform authenticates with cloud providers (and other providers) to deploy the resources and perform the planned actions. However, the information Terraform needs for authentication is very valuable, and generally, is sensitive information that you should always keep secret since it unlocks access to your services. For example, you can consider API keys or passwords for database users as sensitive data.

If a malicious third party were to acquire the sensitive information, they would be able to breach the security systems by presenting themselves as a known trusted user. In turn, they would be able to modify, delete, and replace the resources and services that are available under the scope of the obtained keys. To prevent this from happening, it is essential to properly secure your project and safeguard its state file, which stores all the project secrets.

By default, Terraform stores the state file locally in the form of unencrypted JSON, allowing anyone with access to the project files to read the secrets. While a solution to this is to restrict access to the files on disk, another option is to store the state remotely in a backend that encrypts the data automatically, such as DigitalOcean Spaces.

In this tutorial, you’ll hide sensitive data in outputs during execution and store your state in a secure cloud object storage, which encrypts data at rest. You’ll use DigitalOcean Spaces in this tutorial as your cloud object storage. You’ll also learn how to mark variables as sensitive, as well as explore tfmask, which is an open source program written in Go that dynamically censors values in the Terraform execution log output.

Prerequisites

A DigitalOcean Personal Access Token, which you can create via the DigitalOcean control panel. You can find instructions in the DigitalOcean product documents, How to Create a Personal Access Token.
Terraform installed on your local machine and a project set up with the DigitalOcean provider. Complete Step 1 and Step 2 of the How To Use Terraform with DigitalOcean tutorial, and be sure to name the project folder terraform-sensitive, instead of loadbalance. During Step 2, do not include the pvt_key variable and the SSH key resource.
A DigitalOcean Space with API keys (access and secret). To learn how to create a DigitalOcean Space and API keys, see the tutorial, How To Create a DigitalOcean Space and API Key.

Note: This tutorial has specifically been tested with Terraform 1.0.2.

Marking Outputs as `sensitive`

In this step, you’ll hide outputs in code by setting their sensitive parameter to true. This is useful when secret values are part of the Terraform output that you’re storing indefinitely, or if you need to share the output logs beyond your team for analysis.

Assuming you are in the terraform-sensitive directory, which you created as part of the prerequisites, you’ll define a Droplet and an output showing its IP address. You’ll store it in a file named droplets.tf, so create and open it for editing by running:

nano droplets.tf

Add the following lines:

terraform-sensitive/droplets.tf

resource "digitalocean_droplet" "web" {
  image  = "ubuntu-20-04-x64"
  name   = "web-1"
  region = "fra1"
  size   = "s-1vcpu-1gb"
}

output "droplet_ip_address" {
  value = digitalocean_droplet.web.ipv4_address
}

This code will deploy a Droplet called web-1 in the fra1 region, running Ubuntu 20.04 on 1GB RAM and one CPU core. Here you’ve given the droplet_ip_address output a value and you’ll receive this in the Terraform log.

To deploy this Droplet, execute the code by running the following command:

terraform apply -var "do_token=${DO_PAT}"

The actions Terraform will take will be the following:

OutputTerraform used the selected providers to generate the following execution plan. Resource actions are indicated with the
following symbols:
  + create

Terraform will perform the following actions:

  # digitalocean_droplet.web will be created
  + resource "digitalocean_droplet" "web" {
      + backups              = false
      + created_at           = (known after apply)
      + disk                 = (known after apply)
      + id                   = (known after apply)
      + image                = "ubuntu-20-04-x64"
      + ipv4_address         = (known after apply)
      + ipv4_address_private = (known after apply)
      + ipv6                 = false
      + ipv6_address         = (known after apply)
      + ipv6_address_private = (known after apply)
      + locked               = (known after apply)
      + memory               = (known after apply)
      + monitoring           = false
      + name                 = "web-1"
      + price_hourly         = (known after apply)
      + price_monthly        = (known after apply)
      + private_networking   = (known after apply)
      + region               = "fra1"
      + resize_disk          = true
      + size                 = "s-1vcpu-1gb"
      + status               = (known after apply)
      + urn                  = (known after apply)
      + vcpus                = (known after apply)
      + volume_ids           = (known after apply)
      + vpc_uuid             = (known after apply)
    }

Plan: 1 to add, 0 to change, 0 to destroy.
...

Enter yes when prompted. The output will look similar to this:

Outputdigitalocean_droplet.web: Creating...
...
digitalocean_droplet.web: Creation complete after 40s [id=216255733]

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

Outputs:

droplet_ip_address = your_droplet_ip_address

You will find that the IP address is in the output. If you’re sharing this output with others, or in case it will be publicly available because of automated deployment processes, it’s important to take actions to hide this data in the output.

To censor it, you’ll need to set the sensitive attribute of the droplet_ip_address output to true.

Open droplets.tf for editing:

nano droplets.tf

Add the highlighted line:

terraform-sensitive/droplets.tf

resource "digitalocean_droplet" "web" {
  image  = "ubuntu-20-04-x64"
  name   = "web-1"
  region = "fra1"
  size   = "s-1vcpu-1gb"
}

output "droplet_ip_address" {
  value = digitalocean_droplet.web.ipv4_address
  sensitive = true
}

Save and close the file when you’re done.

Apply the project again by running:

terraform apply -var "do_token=${DO_PAT}"

The output will be:

Outputdigitalocean_droplet.web: Refreshing state... [id=216255733]
...
Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

Outputs:

droplet_ip_address = <sensitive>

You’ve now explicitly censored the IP address—the value of the output. Censoring outputs is useful in situations when the Terraform logs would be in a public space, or when you want them to remain hidden, but not delete them from the code. You’ll also want to censor outputs that contain passwords and API tokens, as they are sensitive information as well.

You’ve now hidden the values of the defined outputs by marking them as sensitive. You’ll now see how to mark variables as sensitive.

Marking Variables as `sensitive`

Similar to outputs, variables can also be marked as sensitive. Since you have only one variable defined (do_token), open provider.tf for editing:

nano provider.tf

Modify the do_token variable to look like this:

terraform-sensitive/provider.tf

terraform {
  required_providers {
    digitalocean = {
      source = "digitalocean/digitalocean"
      version = "~> 2.0"
    }
  }
}

variable "do_token" {
  sensitive = true
}

provider "digitalocean" {
  token = var.do_token
}

When you’re done, save and close the file. The do_token variable is now considered sensitive.

To try outputting a sensitive variable, you’ll define a new output in droplets.tf:

nano droplets.tf

Add the following lines at the end:

terraform-sensitive/droplets.tf

output "dotoken" {
  value = var.do_token
}

Save and close the file. Then, try applying the configuration by running:

terraform apply -var "do_token=${DO_PAT}"

You’ll receive an error message similar to this:

Output╷
│ Error: Output refers to sensitive values
│
│   on droplets.tf line 13:
│   13: output "dotoken" {
│
│ To reduce the risk of accidentally exporting sensitive data that was intended to be only internal, Terraform requires
│ that any root module output containing sensitive data be explicitly marked as sensitive, to confirm your intent.
│
│ If you do intend to export this data, annotate the output value as sensitive by adding the following argument:
│     sensitive = true
╵

This error means that sensitive variables can not be shown in nonsensitive outputs to prevent information leakage. You can, however, force them to be shown by wrapping the output value as nonsensitive, like so:

terraform-sensitive/droplets.tf

...

output "dotoken" {
  value = nonsensitive(var.do_token)
}

nonsensitive resets the sensitivity preference of the variable, allowing it to be shown. This should be used sparingly, and only when the output is a non-reversible derivative of the sensitive variable.

You’ve now seen how to mark variables as sensitive, and how to override that preference. In the next step, you’ll configure Terraform to store your project’s state in the encrypted cloud, instead of locally.

Storing State in an Encrypted Remote Backend

The state file stores all information about your deployed infrastructure, including all its internal relationships and secrets. By default, it’s stored in plaintext, locally on the disk. Storing it remotely in the cloud provides a higher level of security. If the cloud storage service supports encryption at rest, it will store the state file in an encrypted state at all times, so that potential attackers won’t be able to gather information from it. Storing the state file encrypted remotely is different from marking outputs as sensitive—this way, all secrets are securely stored in the cloud, which only changes how Terraform stores data, not when it’s displayed.

You’ll now configure your project to store the state file in a DigitalOcean Space. As a result, it will be encrypted at rest and protected with TLS in transit.

By default, the Terraform state file is called terraform.tfstate and is located in the root of every initialized directory. You can view its contents by running:

cat terraform.tfstate

The contents of the file will be similar to this:

terraform-sensitive/terraform.tfstate

{
  "version": 4,
  "terraform_version": "1.0.2",
  "serial": 3,
  "lineage": "16362bdb-2ff3-8ac7-49cc-260f3261d8eb",
  "outputs": {
    "droplet_ip_address": {
      "value": "...",
      "type": "string",
      "sensitive": true
    }
  },
  "resources": [
    {
      "mode": "managed",
      "type": "digitalocean_droplet",
      "name": "web",
      "provider": "provider[\"registry.terraform.io/digitalocean/digitalocean\"]",
      "instances": [
        {
          "schema_version": 1,
          "attributes": {
            "backups": false,
            "created_at": "2021-07-11T06:16:51Z",
            "disk": 25,
            "id": "254368889",
            "image": "ubuntu-20-04-x64",
            "ipv4_address": "...",
            "ipv4_address_private": "10.135.0.3",
            "ipv6": false,
            "ipv6_address": "",
            "locked": false,
            "memory": 1024,
            "monitoring": false,
            "name": "web-1",
            "price_hourly": 0.00744,
            "price_monthly": 5,
            "private_networking": true,
            "region": "fra1",
            "resize_disk": true,
            "size": "s-1vcpu-1gb",
            "ssh_keys": null,
            "status": "active",
            "tags": [],
            "urn": "do:droplet:254368889",
            "user_data": null,
            "vcpus": 1,
            "volume_ids": [],
            "vpc_uuid": "fc52519c-dc84-11e8-8b13-3cfdfea9f160"
          },
          "sensitive_attributes": [],
          "private": "..."
        }
      ]
    }
  ]
}

The state file contains all the resources you’ve deployed, as well as all outputs and their computed values. Gaining access to this file is enough to compromise the entire deployed infrastructure. To prevent that from happening, you can store it encrypted in the cloud.

Terraform supports multiple backends, which are storage and retrieval mechanisms for the state. Examples are: local for local storage, pg for the Postgres database, and s3 for S3 compatible storage, which you’ll use to connect to your Space.

The back-end configuration is specified under the main terraform block, which is currently in provider.tf. Open it for editing by running:

nano provider.tf

Add the following lines:

terraform-sensitive/provider.tf

terraform {
  required_providers {
    digitalocean = {
      source = "digitalocean/digitalocean"
      version = "~> 2.0"
    }
  }

  backend "s3" {
    key      = "state/terraform.tfstate"
    bucket   = "your_space_name"
    region   = "us-west-1"
    endpoint = "https://spaces_endpoint"
    skip_region_validation      = true
    skip_credentials_validation = true
    skip_metadata_api_check     = true
  }
}

variable "do_token" {}

provider "digitalocean" {
  token = var.do_token
}

The s3 back-end block first specifies the key, which is the location of the Terraform state file on the Space. Passing in state/terraform.tfstate means that you will store it as terraform.tfstate under the state directory.

The endpoint parameter tells Terraform where the Space is located and bucket defines the exact Space to connect to. The skip_region_validation and skip_credentials_validation disable validations that are not applicable to DigitalOcean Spaces. Note that region must be set to a conforming value (such as us-west-1), which has no reference to Spaces.

Remember to put in your bucket name and the Spaces endpoint, including the region, which you can find in the Settings tab of your Space. Note that the do_token variable is no longer marked as sensitive. When you are done customizing the endpoint, save and close the file.

Next, put the access and secret keys for your Space in environment variables, so you’ll be able to reference them later. Run the following commands, replacing the highlighted placeholders with your key values:

export SPACE_ACCESS_KEY="your_space_access_key"
export SPACE_SECRET_KEY="your_space_secret_key"

Then, configure Terraform to use the Space as its backend by running:

terraform init -backend-config "access_key=$SPACE_ACCESS_KEY" -backend-config "secret_key=$SPACE_SECRET_KEY"

The -backend-config argument provides a way to set back-end parameters at runtime, which you are using here to set the Space keys. You’ll be asked if you wish to copy the existing state to the cloud, or start anew:

OutputInitializing the backend...
Do you want to copy existing state to the new backend?
  Pre-existing state was found while migrating the previous "local" backend to the
  newly configured "s3" backend. No existing state was found in the newly
  configured "s3" backend. Do you want to copy this state to the new "s3"
  backend? Enter "yes" to copy and "no" to start with an empty state.

Enter yes when prompted. The rest of the output will look similar to the following:

OutputSuccessfully configured the backend "s3"! Terraform will automatically
use this backend unless the backend configuration changes.

Initializing provider plugins...
- Reusing previous version of digitalocean/digitalocean from the dependency lock file
- Using previously-installed digitalocean/digitalocean v2.10.1

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.

Your project will now store its state in your Space. If you receive an error, double-check that you’ve provided the correct keys, endpoint, and bucket name.

Your project is now storing state in your Space. The local state file has been emptied, which you can check by showing its contents:

cat terraform.tfstate

There will be no output, as expected.

You can try modifying the Droplet definition and applying it to check that the state is still being correctly managed.

Open droplets.tf for editing:

nano droplets.tf

Modify the highlighted lines:

terraform-sensitive/droplets.tf

resource "digitalocean_droplet" "web" {
  image  = "ubuntu-20-04-x64"
  name   = "test-droplet"
  region = "fra1"
  size   = "s-1vcpu-1gb"
}

output "droplet_ip_address" {
  value = digitalocean_droplet.web.ipv4_address
  sensitive = false
}

You can remove the dotoken output from before. Save and close the file, then apply the project by running:

terraform apply -var "do_token=${DO_PAT}"

The output will look similar to the following:

OutputTerraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  ~ update in-place

Terraform will perform the following actions:

  # digitalocean_droplet.web will be updated in-place
  ~ resource "digitalocean_droplet" "web" {
        id                   = "254368889"
      ~ name                 = "web-1" -> "test-droplet"
        tags                 = []
        # (21 unchanged attributes hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.
...

Enter yes when prompted, and Terraform will apply the new configuration to the existing Droplet, which means that it’s correctly communicating with the Space its state is stored on:

Output...
digitalocean_droplet.web: Modifying... [id=216419273]
digitalocean_droplet.web: Still modifying... [id=216419273, 10s elapsed]
digitalocean_droplet.web: Modifications complete after 12s [id=216419273]

Apply complete! Resources: 0 added, 1 changed, 0 destroyed.

Outputs:

droplet_ip_address = your_droplet_ip_address

You’ve configured the s3 backend for your project so that you’re storing the state encrypted in the cloud in a DigitalOcean Space. In the next step, you’ll use tfmask, a tool that will dynamically censor all sensitive outputs and information in Terraform logs.

Using `tfmask` in CI/CD Environments

In this section, you’ll download tfmask and use it to dynamically censor sensitive data from the whole output log Terraform generates when executing a command. It will censor the variables and parameters whose values are matched by a RegEx expression that you provide.

Dynamically matching parameter and variable names is possible when they follow a pattern (for example, contain the word password or secret). The advantage of using tfmask over marking the outputs as sensitive is that it also censors matched parts of the resource declarations that Terraform prints out while executing. It’s imperative you hide them when the execution logs may be public, such as in automated CI/CD environments, which may often list execution logs publicly.

Compiled binaries of tfmask are available at its releases page on GitHub. For Linux, run the following command to download it:

sudo curl -L https://github.com/cloudposse/tfmask/releases/download/0.7.0/tfmask_linux_amd64 -o /usr/bin/tfmask

Mark it as executable by running:

sudo chmod +x /usr/bin/tfmask

tfmask works on the outputs of terraform plan and terraform apply by masking the values of all variables whose names are matched by a RegEx expression that you specify. You will use the environment variables TFMASK_VALUES_REGEX and TFMASK_CHAR to supply the regex expression, as well as the character replacing the actual values.

You’ll now use tfmask to censor the name and ipv4_address of the Droplet that Terraform would deploy. First, you’ll need to set the mentioned environment variables by running:

export TFMASK_CHAR="*"
export TFMASK_VALUES_REGEX="(?i)^.*(ipv4_address|name).*$"

This regex expression will match all strings starting with ipv4_address or name (as well as themselves), and will not be case sensitive.

To make Terraform plan an action for your Droplet, modify its definition:

nano droplets.tf

Modify the Droplet’s name:

terraform-sensitive/droplets.tf

resource "digitalocean_droplet" "web" {
  image  = "ubuntu-20-04-x64"
  name   = "web"
  region = "fra1"
  size   = "s-1vcpu-1gb"
}

output "droplet_ip_address" {
  value = digitalocean_droplet.web.ipv4_address
  sensitive = false
}

Save and close the file.

Because you’ve changed an attribute of the Droplet, Terraform will show its full definition in its output. Plan the configuration, but pipe it to tfmask to censor variables according to the regex expression:

terraform plan -var "do_token=${DO_PAT}" | tfmask

You’ll receive output similar to the following:

Output
digitalocean_droplet.web: Refreshing state... [id=216419273]

Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
  ~ update in-place

Terraform will perform the following actions:

  # digitalocean_droplet.web will be updated in-place
  ~ resource "digitalocean_droplet" "web" {
        id                   = "254368889"
      ~ name                 = "**********************************"
        tags                 = []
        # (21 unchanged attributes hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.
...

Note that tfmask has censored the values for name, ipv4_address, and ipv4_address_private using the character you specified in the TFMASK_CHAR environment variable, because they match the regex expression.

This way of value censoring in the Terraform logs is very useful for CI/CD, where the logs may be publicly available. The benefit of tfmask is that you have full control over what variables to censor (using the regex expression). You can also specify keywords that you want to censor, which may not currently exist, but which you anticipate using in the future.

You can destroy the deployed resources by running the following command and entering yes when prompted:

terraform destroy -var "do_token=${DO_PAT}"

Conclusion

In this article, you’ve worked with a couple of ways to hide and secure sensitive data in your Terraform project. The first measure, using sensitive to hide values from the outputs and variables, is useful when only logs are accessible, but the values themselves can stay present in the state stored on disk.

To remedy that, you can opt to store the state file remotely, which you’ve achieved with DigitalOcean Spaces. This allows you to make use of encryption at rest. You also used tfmask, a tool that censors values of variables—matched using a regex expression—during terraform plan and terraform apply.

You can also check out Hashicorp Vault to store secrets and secret data. It can be integrated with Terraform to inject secrets in resource definitions, so you’ll be able to connect your project with your existing Vault workflow. You may want to check out our tutorial on How To Build a Hashicorp Vault Server Using Packer and Terraform on DigitalOcean.

This tutorial is part of the How To Manage Infrastructure with Terraform series. The series covers a number of Terraform topics, from installing Terraform for the first time to managing complex projects.

Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.

Learn more about our products

Tutorial Series: How To Manage Infrastructure with Terraform

Terraform is a popular open source Infrastructure as Code (IAC) tool that automates provisioning of your infrastructure in the cloud and manages the full lifecycle of all deployed resources, which are defined in source code. Its resource-managing behavior is predictable and reproducible, so you can plan the actions in advance and reuse your code configurations for similar infrastructure.

In this series, you will build out examples of Terraform projects to gain an understanding of the IAC approach and how it’s applied in practice to facilitate creating and deploying reusable and scalable infrastructure architectures.

Browse Series: 12 tutorials

1/12 - Infrastructure as Code Explained
2/12 - How To Use Terraform with DigitalOcean
3/12 - How To Structure a Terraform Project

About the author(s)

Savic

Author

See author profile

Expert in cloud topics including Kafka, Kubernetes, and Ubuntu.

See author profile

Kathryn Hancox

Editor

See author profile

Former Senior Technical Editor at DigitalOcean, with a strong focus on DevOps and System Administration content. Areas of expertise include Terraform, PyTorch, Python, and Django.

Category:

Tags: