Terraform Mono-repo vs Poly-repo: Choosing the Best Structure

Managing enterprise infrastructure at scale requires more than just writing HCL code; it requires a strategic approach to repository architecture. As organizations grow, the friction between speed and safety increases. You are likely facing a choice: keep all Terraform configurations in one place for visibility or split them across dozens of repositories for isolation. This decision impacts your CI/CD speed, state file contention, and the blast radius of every terraform apply. In this guide, we evaluate the trade-offs between Terraform mono-repo and poly-repo structures to help you build a resilient DevOps foundation.

TL;DR — Choose a poly-repo structure if you have decentralized teams requiring strict isolation and a minimized blast radius. Opt for a mono-repo (ideally managed by Terragrunt) if you need a "single source of truth," unified versioning, and simpler dependency management across tightly coupled infrastructure layers.

Architectural Overview: Mono-repo vs. Poly-repo

💡 Analogy: A mono-repo is like a massive department store where everything is under one roof, sharing the same security and utilities, but a fire in one aisle might close the whole building. A poly-repo is a series of independent boutiques on a high street; if one boutique has an issue, the others remain open, but walking between them to find related items takes more effort.

A Terraform mono-repo houses all your environment configurations—VPCs, EKS clusters, databases, and IAM roles—within a single Git repository. This approach typically uses a directory-based structure to separate environments (e.g., /prod, /staging, /dev). As of Terraform 1.7+, features like "removed" blocks and improved provider handling have made managing large states within a single repository slightly easier, yet the fundamental challenge of "state locking" remains. When one developer is running an apply in the /network folder, others might be blocked if the CI/CD pipeline is not granularly configured.

Conversely, a poly-repo architecture creates separate repositories for different infrastructure components or teams. A networking team owns the infra-network repo, while the applications team owns infra-apps-dev. This decentralization is the standard for organizations following a Microservices or "You Build It, You Run It" philosophy. It inherently limits the "blast radius"—the potential damage caused by a misconfiguration or a corrupted state file—to a single specific repository rather than the entire organization's cloud footprint.

Technical Comparison Table

To choose the right path, you must evaluate how your team operates. The following table breaks down the critical metrics that define the success of an Infrastructure as Code (IaC) strategy.

Feature Mono-repo (Terragrunt) Poly-repo (Standard)
Blast Radius Medium to High Low (Isolated)
State Contention High (Managed via directories) Very Low
Code Reuse Easy (Local paths) Moderate (Git tags/Registry)
Visibility Unified "Single Source" Fragmented / Distributed
CI/CD Complexity High (Needs path filtering) Low (1 Pipeline per Repo)
Governance Centralized Policies Distributed / Harder to audit

The two most critical rows here are Blast Radius and CI/CD Complexity. In a mono-repo, a rogue PR might trigger an apply that affects resources in multiple environments if your CI/CD path filtering isn't perfect. However, in a poly-repo, managing common variables (like a central logging bucket ID) requires an external "Source of Truth" or heavy use of terraform_remote_state, which introduces its own complexity. I have found that while poly-repos are safer, they often lead to "dependency hell" if not managed with a private module registry.

When to Choose a Terraform Mono-repo

A mono-repo is ideal for smaller teams or projects where the infrastructure components are tightly coupled. It is particularly powerful when paired with Terragrunt, a wrapper that provides extra tools for keeping your configurations DRY (Don't Repeat Yourself). With Terragrunt, you can define your backend configuration once at the root and inherit it across dozens of child modules.

In a mono-repo structure, you can easily track the history of an entire environment in a single Git timeline. This makes auditing easier for compliance teams. When you need to pass an output from your VPC module to your RDS module, you can do so by referencing local file paths or using Terragrunt's dependency blocks, which are significantly faster and more reliable than the standard remote_state data source.

# Example Terragrunt Mono-repo Structure
# root/terragrunt.hcl (Centralized Backend/Provider config)
# dev/
#   vpc/terragrunt.hcl
#   app/terragrunt.hcl (Depends on VPC)
# prod/
#   vpc/terragrunt.hcl
#   app/terragrunt.hcl

dependency "vpc" {
  config_path = "../vpc"
}

inputs = {
  vpc_id = dependency.vpc.outputs.vpc_id
}

The code above demonstrates how dependencies are handled within a mono-repo using Terragrunt. This approach ensures that when you run an apply-all, Terragrunt understands the graph and applies the VPC before the application layer. This level of orchestration is much harder to achieve across multiple independent repositories without complex CI/CD triggering logic.

Scaling with a Terraform Poly-repo

As you scale to hundreds of developers and thousands of cloud resources, the mono-repo becomes a bottleneck. State file locking becomes a daily occurrence, and the CI/CD pipeline starts to take hours to run because it has to evaluate the entire directory tree. This is where the poly-repo shines. By splitting infrastructure into separate repositories, you allow teams to move at different speeds. The Security team can update IAM policies in their own repo without waiting for the App team to finish debugging a database migration.

To make poly-repos work, you must rely on Versioned Modules. Instead of referencing local paths, you reference modules via Git tags or a Terraform Registry. This ensures that a change in the "Standard VPC Module" doesn't automatically break every environment. Each environment's repository must explicitly opt-in to the new version.

# Example Poly-repo module call (infra-apps-prod repo)
module "app_server" {
  source  = "git::https://github.com/org/terraform-modules.git//app-cluster?ref=v2.4.1"
  
  # Fetching state from a different repository
  vpc_id  = data.terraform_remote_state.network.outputs.vpc_id
  region  = "us-east-1"
}

data "terraform_remote_state" "network" {
  backend = "s3"
  config = {
    bucket = "company-terraform-state"
    key    = "network/terraform.tfstate"
    region = "us-east-1"
  }
}

The trade-off here is the reliance on terraform_remote_state. If the networking team changes an output name in their repository, your infra-apps-prod repo will fail during the next plan. This "contract" between repositories requires strict communication and versioning policies, often enforced by a central DevOps or Platform Engineering team.

Decision Matrix: Making the Final Choice

Selecting between these two isn't a permanent "lock-in," but migrating between them is painful. When I worked with a mid-sized SaaS company, we started with a mono-repo. It worked perfectly until we hit 15 engineers. At that point, the PR review queue for that single repo became a massive bottleneck, and we had to pivot to a hybrid poly-repo approach.

📌 Key Takeaways:

  • Use Mono-repo if: You have a small, centralized team; your infrastructure is small enough to fit in one brain; you want to use Terragrunt to keep code dry.
  • Use Poly-repo if: You have more than 3 distinct teams; you have strict compliance requirements for separation of duties; you need to minimize the risk of a single mistake taking down all environments.
  • The Hybrid Approach: Keep your Modules in individual repositories (Poly-repo) but keep your Live Environments (the calls to those modules) in a Mono-repo managed by Terragrunt. This provides the best of both worlds: versioned isolation for code and unified management for state.

Frequently Asked Questions

Q. When should you switch from a mono-repo to a poly-repo?

A. You should consider switching when your CI/CD pipeline becomes a bottleneck, often indicated by frequent state-lock errors or when the time to run `terraform plan` across the repository exceeds 10–15 minutes. Additionally, if different teams require different access levels to Git repositories for compliance, a poly-repo is necessary.

Q. Is Terragrunt necessary for a Terraform mono-repo?

A. It is not strictly necessary, but it is highly recommended. Standard Terraform mono-repos often suffer from massive code duplication across environment folders. Terragrunt solves this by allowing you to define your backend and provider logic in a single parent file, making the mono-repo significantly easier to maintain at scale.

Q. How does a mono-repo handle CI/CD for multiple environments?

A. Mono-repos rely on "path filtering" in CI/CD tools (like GitHub Actions `on.push.paths` or GitLab `rules:changes`). This ensures that only the specific directory changed triggers a pipeline run. Without granular path filtering, every small change would trigger a plan for every environment, wasting resources and increasing the risk of accidental applies.

Post a Comment