Essay

Infrastructure as Code, Infrastructure as Data

The industry spent a decade learning to treat infrastructure as code. The next step is to treat it as data: entities, relationships, attributes, and constraints that can be queried, validated, and reasoned about independently of any rendering target.

The promise and the ceiling

Infrastructure as Code was a revolution. It brought version control, peer review, and repeatability to a discipline that had relied on handcrafted configurations and tribal knowledge. Templates replaced ad-hoc typing. Repositories replaced wikis. CI/CD pipelines replaced hope.

But Infrastructure as Code has a ceiling, and at scale, you hit it hard.

The problem is in the name. Code is an imperative artefact. It describes how to reach a desired state. Even declarative variants (Terraform HCL, Kubernetes YAML) are ultimately rendering instructions: they describe what a particular target system should look like. They do not describe what the infrastructure is. They do not capture the relationships between components, the constraints that must hold, or the intent behind the configuration.

When the network is small, this distinction does not matter. When it spans continents and tens of thousands of devices, it is the difference between a system that can be automated and a system that can reason about itself.

Where code breaks down

Consider a simple scenario: provisioning a new point of presence. In the IaC paradigm, this means generating config files for routers, switches, load balancers, DNS, monitoring, and perhaps a dozen other systems. Each template renders a device-specific configuration. The templates are well-tested. The pipeline is solid.

Now ask a question the templates cannot answer: if the power feed in this PoP fails, which customer traffic is affected, and is there a redundant path?

The templates do not know. They rendered configs. They have no model of the topology, no concept of redundancy groups, no understanding of demand or capacity. To answer that question, someone must trace the path manually, or build a separate system that reverse-engineers the topology from configs. That separate system is the beginning of a domain model, and the fact that it is separate from the source of truth is the problem.

This is not a hypothetical. At scale, the questions that matter most are cross-cutting: what is the blast radius of this failure? Where is redundancy insufficient? Which maintenance action reduces the most risk? Config templates cannot answer these questions because they encode rendering logic, not infrastructure knowledge.

The temporal blindness of IaC

There is a second, subtler limitation. Infrastructure as Code describes a narrow slice of time: the immediate or near-term intent. A Terraform plan says "the system should look like this now." But infrastructure exists across a full time horizon, and the questions that matter span it entirely.

On the observed side, you need history: what did the infrastructure look like last quarter? When did this redundancy degrade? What was the failure trend over the past year? On the intended side, you need the full spectrum of the future: the just-in-time intent (the immediate mutation), committed plans (approved and scheduled), and speculative plans (aspirational forecasts for next year). The present is just the boundary between these two, and it is the least interesting point on the timeline.

IaC has version control. Git commit logs record what changed in the code and when. But a commit history is not a queryable model of infrastructure state over time. You cannot ask a git log "what was the redundancy of this path last quarter" or "how has MTBF trended for this component type over the past year." Reconstructing historical state from diffs is archaeology, not engineering. And on the future side, IaC has no model of committed versus speculative plans, no way to represent the plan maturity spectrum from aspiration to execution. A domain model captures both: the full observed timeline (what was and what is) and the full intended timeline (what should be next, what is planned, and what is dreamed of). That temporal depth is what makes simulation, capacity planning, and trend-based risk assessment possible.

The shift: from code to data

The solution is to move the source of truth from rendered configuration to a structured domain model: a graph of entities, relationships, and attributes that represents what the infrastructure is, not how it is configured.

Where most started

Config Templates

Jinja, Go templates, ERB. Device-specific configs rendered from variables. The template is the source of truth. Knowledge is embedded in rendering logic.

Industry standard

Infrastructure as Code

Terraform, Ansible, Pulumi. Declarative descriptions of desired state for specific target systems. Version-controlled, peer-reviewed, repeatable. But each file describes one system's view, not the whole.

The domain shift

Infrastructure as Data

A structured domain model: entities (routers, links, PoPs, power feeds), relationships (connects-to, depends-on, serves), attributes (capacity, redundancy, SLO), and constraints (N+k, diversity, latency bounds). The model is the source of truth. Config is a rendering target, one of many.

Where it leads

Intent-Driven Infrastructure

Operators express intent (what should be), the system computes the delta against the model (what is), generates a plan (how to get there), and executes autonomously. The model enables reasoning. Intent replaces imperative instructions.

Each generation does not replace the previous one. Config templates still exist in Generation 4, but they are generated from the model, not maintained by hand. The shift is in where the authoritative knowledge lives.

Anatomy of a domain model

A domain model for infrastructure has four fundamental building blocks. They look deceptively simple, but getting them right is the hardest and most consequential design decision in any infrastructure automation programme.

Entities

The nouns of infrastructure. Devices, interfaces, links, sites, power feeds, cooling units, demands. Each entity has a type, a lifecycle state, and attributes specific to its role.

Relationships

The verbs. A device is hosted at a site. An interface connects via a link. A demand is served by a path. Relationships encode topology, dependency, and ownership. The structure that configs alone cannot capture.

Attributes

The measured and declared properties. Capacity, utilization, SLO targets, lifecycle state, vendor, firmware version. Attributes are typed, validated, and versioned. They are the data that drives every downstream consumer.

Constraints

The rules that must hold. Redundancy policies (N+k), diversity requirements (no two paths share a conduit), latency bounds, power budgets. Constraints are not comments in a config file. They are executable, enforceable invariants evaluated against the model.

A well-designed domain model is not a database schema. It is a computable representation of infrastructure that supports queries (what is the current redundancy of this path?), mutations (add a new link and recompute capacity), validation (does this change violate any constraint?), and projection (what will capacity look like in six months given demand forecasts?).

Multiple levels of abstraction

A single flat model is not enough. Infrastructure naturally exists at multiple levels of abstraction, and the model must express them explicitly. A backbone link at the design level is an abstract intent: "100G connectivity between Site A and Site B." At the realization level, it becomes concrete: a specific fiber path, specific transponders, specific wavelengths on specific physical media. The abstract entity and its concrete realization are both first-class objects in the model, connected by a realization relationship.

This layering is fundamental. Planning and capacity analysis operate on the abstract layer: you reason about logical topology without drowning in physical detail. Deployment and configuration operate on the concrete layer: you need the exact port, the exact optic, the exact config stanza. Validation bridges both: an abstract constraint ("this path must have N+2 diversity") is checked against the concrete realization ("do these three physical paths actually traverse independent conduits?").

The published literature calls this pattern a multi-abstraction-layer topology. In practice, hyperscale network operators have found that three to five layers (from high-level design intent down to physical cabling) is sufficient to model the full lifecycle of infrastructure, from planning through deployment, operation, and decommission.

Three temporal views

A domain model captures infrastructure across two dimensions: abstraction layers (abstract to concrete) and time (past to future). The temporal dimension is where the model diverges most fundamentally from IaC. The present is just a boundary, a fleeting instant between what was and what will be. The value lives on either side.

Observed state spans the past. Not just what the infrastructure looks like right now, but what it looked like last month, last quarter, last year. History is essential: failure trends (is MTBF degrading for this component type?), capacity evolution (how fast is demand growing on this path?), and drift patterns (which systems diverge most from intent?) are all questions that require temporal depth. The recent past feeds real-time operations. The deeper history feeds simulation, capacity planning, and risk assessment.

Intended state spans the future, the full spectrum from immediate to aspirational. It is the target that automation converges toward. When an operator declares that a site should have N+2 redundancy on its uplinks, that is intent. When a capacity planner says a backbone link should carry no more than 60% utilization, that is intent. Intended state is a contract, and it exists at every time horizon.

Planned state is what the infrastructure will look like after changes are executed. But planned state is not a single snapshot. It is a spectrum of commitment. Plans start speculative and aspirational: a capacity forecast that says "we will need 40% more backbone bandwidth in EMEA by Q3." At that stage the plan is directional, not actionable. It identifies a need, not a solution.

As the time horizon shrinks, plans gain resolution. The speculative forecast becomes a concrete proposal: which sites, which links, which vendors, which bill of materials. Constraints are evaluated, trade-offs are weighed, dependencies are mapped. The plan sharpens in space (which facility, which rack, which port), in time (which maintenance window, which quarter), and in action (which workflows, which rollback strategy). Eventually the plan is committed: approved, funded, scheduled, and ready to execute.

But a committed plan is still not reality. To bridge the gap between the current observed state and the planned state, you need a strategy: the orchestration of all tasks, dependencies, and actions required to transform one into the other. The strategy is the workflow, the sequenced, validated, safety-gated execution path that takes the infrastructure from where it is to where it needs to be.

Each step along that execution path is a just-in-time intent: the immediate, narrow mutation that the system should perform right now. The long-range plan says "add 40% backbone capacity in EMEA." The strategy decomposes that into a sequence of concrete intents: provision this link, configure this router, shift this traffic, validate this SLO. Each JIT intent is itself a closed-loop cycle (declare, plan, act, observe, assess) nested inside the larger strategy. The plan is the destination; the strategy is the route; each JIT intent is the next turn.

The power of maintaining all three temporal views is in the delta between them. The gap between intended and observed is drift, something to correct. The gap between planned and intended is aspiration, something to pursue. The convergence of all three is operational health. And the strategy is the engine that drives convergence, one just-in-time intent at a time.

What the model enables

Once the domain model is the source of truth, capabilities emerge that were impossible with config-centric approaches.

Config generation, not config management

Device configs become a rendering target: one of many. The model generates router configs, DNS records, monitoring rules, capacity dashboards, and documentation from the same source. Change the model, and every downstream artefact updates consistently. Config drift is no longer a problem because configs are never edited directly; they are always derived.

Cross-domain validation

Before a change is committed to the model, it is validated against every constraint. Does this decommission violate N+2 redundancy? Does this new demand exceed the available bandwidth? Does this maintenance window overlap with another that would create a single point of failure? The model answers these questions before any device is touched. Validation is not an afterthought bolted onto a CI pipeline; it is intrinsic to the model itself.

Impact analysis and simulation

What happens if this link fails? The model can simulate it: remove the link, recompute paths, evaluate demand against remaining capacity, and report exactly which customers are affected and by how much. This is the foundation of fragility measurement. Deterministic failure analysis (k−1, k−2) and probabilistic simulation (Monte Carlo with MTBF/MTTR distributions) both require a model that understands topology, capacity, and demand.

Intent-driven automation

With a model that captures intended state and observed state, automation becomes a convergence engine. The system continuously computes the delta between intent and reality, generates plans to close the gap, validates those plans against constraints, and executes them, either with human approval (guided workflows) or fully autonomously (Zero Touch). The model is what makes intent-driven automation possible. Without it, automation is just scripting with better tooling.

The hard parts

If this approach is so powerful, why has the industry not universally adopted it? Because the hard parts are genuinely hard.

Schema design is a long-term commitment. Getting the entity types, relationship types, and attribute schemas right requires deep domain expertise and the discipline to iterate slowly. A wrong abstraction at the model layer propagates to every consumer. This is domain-driven design applied to physical infrastructure, and it demands the same rigour. At scale, a formal schema review board becomes essential: a small group that evaluates every proposed entity-kind, relationship-kind, and attribute for orthogonality, separation of concerns, and long-term extensibility. The temptation to encode information as opaque strings (or worse, as conventions buried in naming) must be resisted. If it matters, it is a typed attribute or a relationship. If it does not matter, it does not belong in the model.

Migration from config-centric systems is painful. Existing infrastructure was not built with a domain model in mind. Reverse-engineering the model from configs and operational knowledge is months of work, and the resulting model is only as good as the data that feeds it. Inventory systems are notoriously inaccurate. Topology data is incomplete. Demand attribution is approximate. The model must be designed to tolerate and progressively correct these imperfections.

Organizational change is harder than technical change. A domain model centralizes the source of truth. That means teams that previously owned their config templates must now contribute to and consume from a shared model. The network team, the compute team, the power team, and the capacity planning team must agree on entity definitions and relationship semantics. This is a governance problem as much as a technical one.

Published foundations

These ideas are not theoretical. They have been validated at hyperscale in production and documented in the academic literature.

NSDI 2020

Experiences with Modeling Network Topologies at Multiple Levels of Abstraction

Mogul, J.C., Goricanec, D., Pool, M., Shaikh, A., Turk, D., Koley, B. et al. · USENIX NSDI 2020

Introduces MALT (Multi-Abstraction-Layer Topology), a formal entity-kind / relationship-kind model for representing network topology at multiple levels of abstraction. Demonstrates abstract-to-concrete realization, multiple concurrent models for planning (candidate, what-if, plan of record), schema governance through a formal review board, and extension to non-network domains including power and cooling. A direct validation of the domain-modeling approach described in this essay.

Google Research

The Zero Touch Network

Google · Production infrastructure

Describes the operational outcome of intent-driven infrastructure: a network that operates autonomously through closed-loop control. The domain model is the foundation that makes Zero Touch possible. Intent is expressed against the model, plans are validated against the model, and the system converges observed state toward intended state without human intervention.

The trajectory is clear. Infrastructure as Code brought discipline to a chaotic practice. Infrastructure as Data brings intelligence. The model is the foundation on which simulation, validation, intent-driven automation, and ultimately autonomous systems are built.

The question for any infrastructure organization is not whether they need a domain model. It is whether they will build one deliberately, informed by the hard lessons of those who have done it before, or discover the need the hard way when their config templates can no longer answer the questions that matter.

Model the domain, and the code follows. Get the entities, relationships, and constraints right, and the config generation, the validation, the simulation, and the automation all become tractable problems. Get them wrong, and no amount of tooling will compensate.

← Back to home The Template Trap → View the architecture portfolio →