Put order in your cloud resources with Terraform

Put order in your cloud resources with Terraform

Terraform is an open source CLI tool created by Hashicorp to simplify the task of creating and managing the underlying infrastructure of applications. The idea behind it is to write the desired state of our infrastructure with code in plain-text files. The tool reads these files and updates the real infrastructure on our cloud providers to match what is stated in the files.

This way of working makes managing the infrastructure of our projects very efficient. The Terraform files are saved alongside with the source code in our version control system; this means that the state of our infrastructure matches any version of the code running on it.

Furthermore, as it is a CLI tool, we can also use it in automation scripts. Changes to the underlying infrastructure can then be made without direct human intervention.

In this article, we will look in detail at how Terraform works, its strengths, and the best practices for using the tool. By the end, you will have a high-level understanding of the tool and know why every software engineer should know how to use Infrastructure as Code tools for managing their cloud resources. Let’s dive in.

Why use Terraform

The problem

The traditional way of provisioning cloud infrastructure —such as that provided by in AWS, Azure or GCP— is with point and click web clients. This process is slow, inefficient and error prone. To improve this, we can create a script to automatically provision all the resources. This can work for clean deployments, but as soon as we start making changes to already existing infrastructure, we realize the shortcomings of this solution. We need to update the script to handle different types of situations: for example, reading the current state before deciding on the changes, or ensuring that the script is idempotent.

In short, what should be a simple script quickly turns into an unwieldy mess. For this reason, it is clear that we need an automated tool for this process.

The solution

This is where Terraform comes in. As a developer, all we have to do is specify the resources we want and how they are connected. The tool then creates and updates the underlying infrastructure according to these files.

Because the language is declarative —we just specify the end result—, there is much less work for the developer to update and maintain the code. It’s a more robust way of managing the infrastructure than using traditional scripts —which use imperative languages— because we don’t have to tell the tool how to do the provisioning.

Finally, because it is a CLI tool, it is very easy to build scripts on top of it. We can integrate it into CI/CD pipelines to provision changes in the infrastructure every time new code is pushed, all completely automatically. There is no need for human intervention in the provisioning of infrastructure resources.

Terraform in a project

Now that we know why Terraform can be a good solution for provisioning cloud resources, let’s see how we can make use of it in a real project.

The State file

Each time Terraform is executed, the final state of the run is stored in a special .tfstate file. This file contains the current state of all infrastructure managed by Terraform. Each time the tool is run, it reads this file and compares it with the desired configuration stated in the .tf files and the current state of the infrastructure. If they don’t match, it will plan the steps necessary to synchronize the state of the infrastructure with the desired configuration.

For this reason, it is very important that the state file is saved in a common location where everyone can access it. This way we have a single point of truth about the state of the infrastructure.

The provider of this shared location must guarantee that the access to this file is atomic. In other words, when a user takes the file to read or write on it, the file must remain in a LOCKED state; preventing anyone from using it until the lock is released. This ensures that when we read the file, it will still have the current state of the infrastructure when we finish our execution. By default, if we don’t configure the backend, the state file is saved locally in the directory from which Terraform was launched.

To set up the remote storage of the state file, we need to configure the backend at the beginning. For example, to configure the remote backend storage in an Azure Storage Container we write in main.tf:

terraform {
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 3.9.0"
    }
  }

  backend "azurerm" {
    resource_group_name  = "terraform-resource-group"
    storage_account_name = "terraform-storage-account"
    container_name       = "terraform-storage-container"
    key                  = "terraform-state.tfstate"
  }
}

Make sure you have the resources already set up in your provider, in this case we need to have in Azure the Resource Group called terraform-resource-group with an Storage Account terraform-storage-account, and inside it a container called terraform-storage-container. There is no need to create the key, if it does not exist Terraform will create a new empty file.

As soon as we run

$ terraform init

Terraform will read the state file in the remote container called terraform-state.tfstate and initialize its local configuration. Subsequent runs of the tool will use this file as the source of truth to know what resources are being managed by Terraform.

Remember that, in order to make changes to remote infrastructure providers, you must first authenticate to the CLI tool that they provide. In the case of Azure you will need to install the Azure CLI. Check the documentation for the provider of your choice to see how to do this.

Terraform configuration

Once we have our backend configuration set up, we can start working with Terraform to manage our infrastructure from our text editor of choice.

Terraform files are written in a configuration language called HCL (Hashicorp Configuration Language). The basic building block of a Terraform script the declaration of resources with their attributes. Each resource in the script corresponds to a real resource that will be deployed in the infrastructure provider.

As an example of how to declare a resource from Azure:

resource "azurerm_resource_group" "test_resource_group" {
  name     = "example_resource_group"
  location = "West Europe"
}

The first directive azurerm_resource_group corresponds to the “Resource group” resource in Azure Cloud. Its name within the Terraform script will be test_resource_group; this allows you to provision multiple resources of the same type, and Terraform can distinguish between them all. Do not confuse this with its name in Azure, which is an attribute of the resource. It is defined inside the main block as name = "example_resource_group". You can check the documentation for each resource available in the Terraform registry.

It is common not to hardcode the value of the attributes directly in the script, as it may depend on other resources. We can reference the value of another resource by writing {resource}.{terraform-name}.{attribute}. For example, to reference the name of the resource group in another part of the file we will write azurerm_resouce_group.test_resource_group.name.

There will also be some attributes that may change depending on various external factors. In this case you can define them as variables in a file called variables.tf, and reference them in the script with the scheme var.{var-name}.

In variables.tf we write:

variable "resource_group_name" {
  default = "example_resouce_group"
  type    = string
}

variable "resource_group_location" {
  default = "West Europe"
  type    = string
}

And we reference them in main.tf:

resource "azurerm_resource_group" "test_resource_group" {
  name     = var.resource_group_name
  location = var.resource_group_location
}

Note that we have defined the variables with the default parameter. This means that the variable will take this value if there is no assignment. To make assignments to variables, we need to create a .tfvars file. Usually, there will be several .tfvars files, one for each environment we use Terraform for. Let’s create a variable file for the staging environment called staging.tfvars:

resource_group_name = "rg-staging"
resource_group_location = "North Switzerland"

This way, each time we run the script in the staging environment, we can ensure that the resource group is named rg-staging and that it is deployed in the region North Switzerland. We can have other files with different values for other deployments, making this way of working a very flexible and robust system.

These are the basic building blocks of a Terraform script. Now it’s up to you to define the infrastructure you need and its dependencies, and to start generating the Terraform resources you want with the corresponding variables that depend on external factors. Once you have done this, it is time to run the scripts to create the infrastructure.

Use Terraform locally

Once we have created our infrastructure declaration files, we can start running Terraform. First, if we haven’t already done so, we need to initialize our backend configuration. We can do this by opening a terminal in the working directory of the Terraform files and executing:

$ terraform init

This will initialize the backend file on the storage in the cloud that we configured earlier.

If the backend is already initialized (you can check this by looking for a .terraform directory in the root path of the Terraform files), you can always start clean again with:

$ terraform init -reconfigure

There is a useful command to check that the configuration files are syntactically valid and consistent. It can be run as a first check before running any remote infrastructure commands.

$ terraform validate

Once the validation passes, we can move forward with:

$ terraform plan -var-file=<path-to-tfvars-file>

This command will simulate a run in the infrastructure provider with both the current state file and the terraform script. It will output every resource that is updated, destroyed or created from the current state. It is advisable to run this command first to check that the changes we are about to make are the ones we want.

If we have variables to fill, we can pass the .tfvars file to the tool as an argument with -var-file. It will then use the statements in that file to populate the appropriate variables in the Terraform script. Otherwise, the tool will ask you to manually enter each variable that needs a value.

Finally, we can commit the changes using:

$ terraform apply -var-file=<path-to-tfvars-file>

Once this is done, the real infrastructure is updated to the new state. To avoid unintentional execution, it will still require a manual confirmation step before committing the changes. If we want to use this command in an automated script, we need to bypass this confirmation with the -auto-approve flag.

Out of sync state with the current infrastructure

If changes are made to the infrastructure manually or with a third party tool, the state file will not be updated and will therefore be out of sync with the real state of the infrastructure. terraform apply and terraform plan will do an in-memory refresh of the real state to avoid any conflicts. In case we only want to update the state file with external changes, we can re-sync it with the following command:

$ terraform apply -refresh-only -var-file=<path-to-var-file>

Note that the only thing this command does is update the state file to reflect the current infrastructure. If the changes are not persisted in the .tf files, the next time you run Terraform, those “manual” changes will be destroyed by Terraform.

You can ignore changes to certain attributes in Terraform-managed resources by adding the lifecycle property with ignore_changes to the resource of our choice. For example, to avoid removing a manual change to the value of a resource, we can use:

  lifecycle {
    ignore_changes = [
      value
    ]
  }

The value attribute will be ignored by Terraform if it changes.

Use Terraform in CI/CD pipelines

One of the best features of Terraform is its ability to integrate with CI/CD pipelines, so that the underlying infrastructure of a project is updated as the project is built. This makes working in an agile way much easier. If the developer needs to make a change to the infrastructure, they can update the Terraform script accordingly; once their changes are merged into the main development branch, the pipeline will automatically update the infrastructure.

It is important to note that Terraform needs to be authenticated in order to make changes to the infrastructure providers. In the case of a local execution, the developer usually authenticates the tool using their own user. In the case of pipelines or automated scripts, it is necessary to create some sort of service connection to the provider and use secrets to authenticate. Each provider has different ways of doing this, so it is advisable to check the documentation on how to do this.

In Azure, for example, this is done by creating a Service Principal, which is a special type of user that is used to authenticate non-interactive applications. We then need to add the necessary roles and permissions to the Service Principal to allow it to create, update and delete resources in the provider.

Once we have this, we add it to the pipeline, for example in Github Actions:

on:
  push:
    branches:
      - develop

jobs:
  apply-terraform:
    runs-on: ubuntu-latest
    # These environment variables are used to authenticate the Service Principal that terraform will use to update Azure
    env:
      ARM_CLIENT_ID: ${{ secrets.AZURE_CLIENT_ID }}
      ARM_CLIENT_SECRET: ${{ secrets.AZURE_CLIENT_SECRET }}
      ARM_SUBSCRIPTION_ID: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
      ARM_TENANT_ID: ${{ secrets.AZURE_TENANT_ID }}
    
    steps:
      - uses: actions/checkout@v2

      - name: Get Terraform
        uses: hashicorp/setup-terraform@v2
        with:
          terraform_version: 1.3.4
      
      - name: Terraform init
        run: terraform init -backend-config=${{ inputs.backend-config }}

      - name: Terraform validate
        run: terraform validate

      - name: Terraform Apply
        if: github.event_name == 'push'
        run: terraform apply -var-file=${{ inputs.var-file }} -auto-approve

Of course, we NEVER put sensitive items like secrets in the source code. We need to use .env files to store them locally for interactive executions, or check the documentation of the CI provider of choice on how to store secrets for pipelines (in the case of Github with ${{ secrets.<name> }}).

By running this pipeline first, we make sure that the infra is updated accordingly, if necessary, every time there is a push to the development branch, probably just after accepting a PR to merge into that branch.

The chicken or the egg?

Cunning readers may have noticed that in order to manage our infrastructure with Terraform, we need some sort of infrastructure already in place: to store the remote state file and the service connection needed for authentication as an automated tool.

At the time of writing, it is not possible to provision from scratch using terraform in an automated way. The way to solve this is to create these initial bootstrapping resources manually or with a traditional script. Once these are set up, Terraform can be run to provision the rest of the infrastructure.

Example

You can find a sample project where terraform is being used to deploy an infrastructure on my Github.