Situ — Enterprise Architecture Studio

Two years ago I wrote about using LLMs in infrastructure work. At the time it was early days — ChatGPT was less than a year old, GitHub Copilot was the hot new thing, and half the people I talked to thought AI in operations was a gimmick.

Two years later, it's not a gimmick. It's not magic either. Here's where things actually stand.

What got better

Code generation quality. The models are significantly better at generating infrastructure code than they were in 2023. Bicep and ARM template generation is more reliable, syntax errors are rarer, and the models understand Azure resource dependencies better. I still review every line. But I spend less time fixing stupid mistakes.

A Bicep module that would have taken four rounds of correction in 2023 now takes one or two. The models understand naming conventions better. They know about existing resources. They handle child resources without getting confused about parent references. These were common failure modes a year ago. They're rare now.

Context windows. Longer context windows mean I can provide more of my existing codebase as reference, and the model stays consistent with my patterns. In 2023, I'd paste one module and ask for another. Now I can paste five modules and ask for a sixth, and it follows the established conventions across all of them.

Multimodal input for troubleshooting. I can paste screenshots of error messages, dashboard graphs, and log snippets, and the model can correlate them. This was barely possible a year ago. Now it's genuinely useful for triage — especially when I'm tired and the obvious pattern isn't jumping out at me.

What hasn't changed

I still don't trust the output. Every generated script goes through review. Every suggested architecture change gets manual validation. The models are more fluent, more confident, and wrong in new and interesting ways. Hallucinations are less frequent but more subtle — a hallucinated module name in 2023 was obvious. A hallucinated API version that actually exists but doesn't support the parameter I need: that's harder to catch.

The models don't understand my environment. They don't know my naming conventions, my tagging standards, my security baselines, my compliance requirements. They can generate code that works. They can't generate code that fits my context without me providing that context explicitly.

I've developed a habit of front-loading my prompts with constraints: naming convention, required tags, security baseline rules, region preferences, resource group structure. The more context I provide, the less editing I do. This is obvious in retrospect but took me months to internalise.

AI doesn't replace judgment. The hardest decisions in infrastructure — architecture trade-offs, technology selection, capacity planning, incident priority — still require human judgment. The models can surface options and analyse trade-offs, but they can't weigh business context against technical constraints. They don't know which system is mission-critical and which can wait until Monday.

What I stopped doing

I stopped trying to build "AI agents." The autonomous agent hype was intense in 2023-2024. I experimented with agents that could monitor infrastructure and take remediation actions. I found it terrifying. An agent that can restart a VM is an agent that can restart the wrong VM during month-end close. I want a human in every decision loop, and I expect to want that for years.

I stopped hiding my AI usage. In 2023, using AI tools felt like something you should be quiet about. Now I'm open about it with colleagues: here's how I use these tools, here's where they help, here's where they don't. The stigma is fading. The productive conversation is about which tools work for which tasks, not whether using them is "cheating."

I stopped trying to keep up with every model release. The pace of model releases in 2024 has been exhausting. GPT-4, GPT-4o, Claude 3, Claude 3.5, Gemini, Llama 3, Mistral — every few months a new best model. I use what works for my workflow and ignore the rest. Tool stability matters more than benchmark scores.

The honest assessment

AI tools have made me about 30-40% faster at routine infrastructure work. They haven't made me better at architecture. They've reduced the tedium of boilerplate generation, log analysis, and documentation. They haven't changed how I think about system design or operational discipline.

The biggest productivity gain isn't in any single task. It's in recovery time. When I'm tired, when it's late, when the obvious solution isn't coming to mind — the tools help me push through the friction. That's not a technical capability. It's a quality-of-life improvement that compounds over a career.

If you're a senior infrastructure person who hasn't seriously tried these tools yet: start. The learning curve is shallow, the benefits are real, and the biggest risk is not that you'll become dependent on them — it's that you'll waste time doing manually what your peers are doing in half the time.

If you're reading this in 2026 or later and laughing at how primitive these tools were: good. That means things got better. They usually do.

Two Years of AI-Assisted Infrastructure

What got better

What hasn't changed

What I stopped doing

The honest assessment