← All writing

Everyone is building AI agents right now.

Most of them will not make it to production.

I have deployed LLM-based automation at AWS handling 100+ manual engineering processes. I have built agent workflows for a small business that genuinely work. I have seen what breaks and what holds up. Here is an honest breakdown.


Where agents actually work

✓ Works in production
  • Repetitive, well-defined tasks with clear success criteria
  • Workflows where a wrong output is catchable before it causes damage
  • Tasks where 80% accuracy is genuinely useful — content generation, summarisation, first drafts
  • Internal tooling where the user is technical enough to spot errors
  • High-volume, low-stakes tasks where cost compounds fast
✕ Fails in production
  • Anything requiring consistent multi-step reasoning over long contexts
  • Tasks where the cost of a wrong answer is high and silent
  • Workflows that were not clearly defined before the agent was built
  • Any system without a human review layer at the right checkpoint
  • Processes that depend on information that changes faster than context windows

The two mistakes I see constantly

Mistake 1: treating agents as autonomous decision-makers

The most common failure mode I see: teams build an agent, give it a task, and expect it to handle every edge case independently. Then it encounters an edge case it was not trained for and produces a wrong output that nobody catches for three days.

The framing that works: agents are very fast, very tireless interns who need supervision. They are not autonomous systems. They are accelerators for human workflows. Design accordingly — with checkpoints, with logging, with humans at the moments that matter.

Mistake 2: automating before the process is defined

You cannot automate a process you have not mapped. Every time I see an agent workflow fail catastrophically, I find the same root cause: the team tried to automate something they could not describe precisely in plain language.

Garbage process in. Garbage automation out — just faster.

Before you build the agent, write down the exact decision criteria, the exact inputs, the exact acceptable outputs, and the exact failure modes you want to handle. If you cannot write that document, you are not ready to build the agent.

What the teams getting real value are doing differently

At AWS, the LLM Ops automation that cut 100+ manual processes was not built as one system. It was built as 100+ small, scoped agents — each one responsible for exactly one well-defined task, each one with logging, each one with a human escalation path for failures.

At Linkby, the AI-agent-driven pipeline development works because the scope is narrow: generate a Dagster pipeline from a specification, run the tests, flag failures. That is it. The agent does not decide whether the pipeline should exist — a human does. The agent just does the work of building it.

The five rules I follow: Define the task precisely before building the agent. Build model-agnostic from day one. Put humans at the right checkpoints. Cost-profile every workflow before deploying. Log everything — agents fail in unexpected ways without visibility.

The honest assessment

Agents are genuinely powerful. The productivity gains I have seen in production are real. But the gap between a demo and a reliable production system is enormous, and most teams underestimate it.

The teams winning with agents are not the ones with the most sophisticated models. They are the ones who are ruthlessly clear about what the agent is and is not responsible for — and who built the observability to know when something goes wrong.

What is the most effective agent workflow you have seen in a real production environment? I am genuinely curious — drop a comment on LinkedIn.