What are AI agents, and when do you actually need one?
A script that emails you when a metric drops is not an agent. An agent is a model that decides its own next step, calls a tool, reads the result, and decides again - in a loop - until the job is done. That loop is the whole idea, and it is also the reason agents are harder to trust. Here is when the loop is worth it, and when a plain prompt or a fixed workflow will beat it.
A script that watches a dashboard and emails you when signups drop is not an AI agent. Neither is a single call to a model that summarizes a document. Both are useful; both are something simpler. An agent is a model placed inside a loop: it looks at a goal, decides its own next step, calls a tool, reads what came back, and decides again - repeating until it judges the job done. The loop, and the model choosing what happens next, are the whole definition. Everything hard about agents follows from those two facts.
The loop is the whole idea
Strip an agent to its core and you get four moves on repeat: observe the current state, decide the next action, act by calling a tool, and observe the result to decide again. A plain LLM call runs that circle exactly once and stops. A workflow you wrote by hand runs a circle you drew in advance - step one, then two, then three, always in that order. An agent is different only in that the model, not you, picks the next step at runtime, and the loop keeps going until the model decides to stop.
That is why agents feel powerful and why they are hard to trust in the same breath. Handing the model control of the next step is what lets it handle a task you could not fully script - and it is also what lets a single bad decision compound across a dozen iterations before anyone notices. A workflow fails in a place you chose. An agent can fail in a place you did not know existed.
Prompt, workflow, agent
It helps to see the three as a ladder, because most teams reach one rung too high. A single prompt is one input and one output: classify this ticket, rewrite this paragraph, pull the totals from this invoice. A workflow chains fixed steps, some of which may be model calls, in an order you decided: extract the fields, look them up, format the reply. An agent is the workflow you could not draw because the order depends on what each step returns - so you let the model route itself and gave it tools to act with.
The cost climbs fast as you go up. A prompt is one predictable call. A workflow is several, still predictable. An agent can make five calls or fifty, take a path you have never seen, and cost a different amount every time it runs. That unpredictability is the price of the flexibility - so you only want to pay it when the task genuinely needs it.
When the loop is worth it
You probably want an agent when
- The steps are not known in advance - they depend on what earlier steps return.
- The task needs tools: search, a database, code execution, an API call.
- A human would iterate - try, read the result, adjust, try again.
- The input space is too open to enumerate as fixed branches.
You probably do not when
- One prompt and one response already do the job - most classify, summarize, and extract tasks.
- The path is fixed and known - that is a workflow, and code will be cheaper and more reliable.
- A wrong action is expensive and hard to undo, with no room for a review step.
- You cannot yet measure whether the output is right - build the eval before the autonomy.
The honest default is to start one rung lower than feels exciting. Try to solve the problem with a prompt. If it needs several known steps, write a workflow. Reach for an agent only when you can point at the specific decision the model has to make that you cannot make for it in advance. Plenty of shipped agent projects are workflows wearing a costume - and they would be faster, cheaper, and easier to debug with the costume off.
What an agent needs to be trustworthy
If a task does warrant an agent, the loop is the easy part - a competent engineer can wire one up in an afternoon. The hard part is making it safe to leave running. That means bounded tools (the agent can only touch systems you explicitly hand it, and write actions sit behind confirmation or a review step), a step limit so a confused agent cannot loop forever or spend without a ceiling, and logging of every decision so you can reconstruct why it did what it did. Above all it needs evaluation: a fixed set of tasks with known-good outcomes you can re-run on every change, because an agent that improves on one input and quietly regresses on five is the normal case, not the exception.
This is the gap most agent projects fall into - the distance between a demo that works once and a system that works every day on inputs you did not hand-pick. Closing it is ordinary software engineering: guardrails, tests, monitoring, and the discipline to keep the autonomy no larger than the task requires.
The short version
An agent is a model that runs its own observe-decide-act loop with tools until the job is done. It earns its cost when the steps genuinely cannot be known ahead of time, and it wastes that cost - in money, latency, and debugging - on any task a prompt or a fixed workflow already handles. Pick the lowest rung that does the job, then spend your effort on the guardrails and evaluation that make the loop safe to trust.