Practical Terraform: You're Doing it Wrong (Part 1)
We've all written Terraform IaC that we're not proud of before--it happens. I'm here today to talk about the Terraform you write that you think you're proud of...until it outgrows your team, becomes hard to manage, and terrifies you anytime you terraform apply
.
1. Monolithic Modules
Terraform modules are a set of reusable code, similar to a class in object-oriented programming. I like to say that the module is the "cookie cutter"; anytime you use it, you are "instantiating" it.
When you create a module, try to keep it short and sweet. Users of your module don't want to worry about a heap of baggage, like 50+ variables, 4+ providers, and 100+ direct resources, spat out in the first plan.
Take this module for a web application, for example:
webapp_bad
├── api_gateway.tf
├── cloudfront.tf
├── cloudwatch.tf
├── cognito.tf
├── iam.tf
├── lambda.tf
├── load_balancer.tf
├── outputs.tf
├── postgres.tf
├── providers.tf
├── variables.tf
└── vpc.tf
This is a typical stack for a web application on AWS, but I'll let your imagination fill in the myriad of resources. It's a lot. The mistake here is that we've created a monolithic module that only serves the purpose of 1 specific application stack.
A better approach would be to decouple these into smaller, more reusable units:
webapp_good/
├── backend/...
├── frontend/...
└── network/...
2. Useless Modules
Next on our pitfall hit list are the modules that simply aren't needed. If you find yourself writing a module with variables for almost every argument defined on the resources, ask yourself: Am I accomplishing something novel or simply repeating what's already easily handled by the direct resource(s) I'm creating? Likewise, we don't want a module containing only 1-2 resources, as that could be achieved easily on our own.
Remember, modules should accomplish one or more of the following tasks:
Automate standard configuration or resource best practices (e.g., tagging, security policies, naming conventions, etc.).
Organize configuration into reusable blocks of code, with some flexibility but not so much that we re-implement the individual resources.
Break down a complex infrastructure solution into smaller units that are easier to maintain.
For example, let's try not to write useless modules like this s3 bucket module:
# file: useless_s3/main.tf
variable "bucket" {
type = string
}
variable "acl" {
type = string
}
resource "aws_s3_bucket" "this" {
bucket = var.bucket
acl = var.acl
}
# file: main.tf
module "useless_s3_bucket" {
source = "./modules/useless-s3"
bucket = "my-useless-bucket"
acl = "private"
}
This can be achieved without a module. The module here only adds another layer of nesting, and it will surely frustrate your team when they have to traverse the module tree just to add a simple argument to the underlying aws_s3_bucket
resource.
3. Folder Layout & Blast Radius
Next, let's talk about file structure when writing Terraform. This is crucial when scaffolding your Infrastructure-as-Code configuration for a brand-new project. You can be a whiz at the HCL syntax and functions but still create inefficiencies and undue complexity if not laid out carefully.
Try to think about folders in terms of two types: modules and deployments. Modules are where you define a set of re-usable Terraform and should represent components of your infrastructure, like ingredients in a recipe. I typically refer to the folder where you terraform apply
from as the deployment scope.
A common mistake is to create a monolithic module that defines the entire infrastructure solution but with variables, then create multiple deployments of this mono-module, such as a dev, test, and prod deployment:
├── deployments
│ ├── dev/
│ ├── test/
│ └── prod/
├── modules
│ ├── backend_app/
│ ├── ecs/
│ └── frontend_app/
This example could be better because it creates a huge blast radius for each of the deployments/environments. Any change within the frontend_app
will still require terraform apply
to re-scan configuration for resources in instance of the ecs/
module too.
Consider breaking your deployments into more than one deployable scope. Ask yourself the following questions:
How often will these resources need to be changed/redeployed?
How critical are these resources? Are they more or less critical than other groups of resources?
Critical resources such as the VPC and ECS/EKS clusters are great to segregate into their deployable scope/folder; those do not change very often and are vital to the operations of application services. This ensures a safer, leaner, more agile Terraform project structure like so:
├── deployments
│ ├── dev
│ │ ├── app // <-- Instantiates backend_app/ and frontend_app/
│ │ └── ecs_cluster// <-- Instantiates ecs/ and VPC resources
│ ├── prod
│ │ ├── app
│ │ └── ecs_cluster
│ └── test
│ ├── app
│ └── ecs_cluster
└── modules
├── backend_app
├── ecs
└── frontend_app
4. Version Constraints
Terraform supports version constraints for the required_providers {}
as well as the version of Terraform itself. We'll focus on providers for now. Let's examine how you might use version constraints with increasingly better examples.
Worst
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
}
}
}
The worst way to use version constraints is not to use them at all.
Bad
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "5.71.0"
}
}
}
I might spark some controversy by saying this, but yes, I think this is bad. Terraform providers like AWS, Azure, GCP, and others constantly release new versions, constantly adding new features and fixing bugs.
By pinning your provider to a specific version, you will likely find yourself combing through dozens of Terraform modules every now and then to update every single version constraint. There's no forgiveness here--all your Terraform must agree on the exact same version of the provider--making it a colossal pain to perform upgrades or receive new features/fixes.
Better
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.71.0"
}
}
}
This is better because it uses the pessimistic constraint operator ~>
, which in this example will allow the last version number (the patch version) to change, but not the major or minor version numbers.
It allows updates only to patch versions of 5.71.x
, meaning only versions that are >= 5.71.0
and < 5.72.0
. Again, we will have to painstakingly update tons of these lines in our modules when we inevitably need to upgrade versions.
Best
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = ">= 5.71.0, < 6.0.0"
}
}
}
Actually, the best can be written with ~>
as well:
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.71"
}
}
}
This is great because it allows the minor and patch version numbers to slide forward but not the major version. Most reputable providers, such as AWS, follow semantic version (SemVer) standards very well, and they typically do not introduce breaking changes until a new major version is released.
It's your choice to use ~>
or the more verbose >= x.y.z, < a.b.c
but I've had people ask about the mysterious ~>
syntax enough times that I favor simplicity--everyone looks at >= 5.71.0, < 6.0.0
and understands it quite easily.
By using this constraint we ensure that the configuration can gracefully receive new features/fixes (minor and patch version upgrades) without breaking changes. To upgrade the provider version within this constraint, simply run terraform init -upgrade
.
Conclusion
Let's wrap up. Terraform is powerful, but as Uncle Ben famously said, it comes with great responsibility.
We can make our lives, and our coworkers' lives, a lot easier by applying the principles from this guide to the Terraform we write. Consider those pain points the next time you find yourself modifying old Terraform code or doing a deployment and dreading the large, scary Terraform plan. I hope you will think back to these tips and find them useful!
10/14/2024: Part 2 is out now! Get even more, better, practical Terraform tips with Part 2 here!
Follow for more content like this!