News

The Loop is the Engine Inside Every Agentic AI System that makes an AI Agent an AI Agent

Since we renamed this column from “The Citizen Developer” to “Human-in-the-Loop,” we’ve focused almost exclusively on the human. Today, we’re going to take a good, long look at the loop, mainly because it is becoming increasingly important.

What Is the Loop?

When you first started working with Generative AI (GenAI), you asked the chatbot questions, and it provided answers, and that was where the entire process ended.

What makes an AI Agent “agentic” beyond “generative” is the loop, a fundamental execution pattern that makes an AI agent an agent rather than a chatbot. Every agent, regardless of framework, runs on the same basic cycle: Perceive → Reason → Plan → Act → Observe → Repeat.

In this process, the agent takes in its environment, reasons about what to do next, selects a tool or action, executes it, receives feedback on the result, and begins again. It keeps looping until the task is complete or until it otherwise decides it's done. Note that it doesn’t return to the human operator for feedback. Rather, it evaluates the result of its own actions to determine next steps.

It can be safely said that every AI agent operates by moving around a loop. This loop is what gives agents autonomy, reliability, and the ability to recover from failure. Agentic AI, powered by the loop, is contextual, continuous, and environment-aware.

Lilian Weng's formula captures the architecture simply: Agent = LLM + Memory + Planning + Tool Use. The agent loop is the runtime that ties the four components together.

How the Loop is Used

The "loop" is not a singular construct. It's better viewed as a spectrum of feedback loops, ranging from simple iterative cycles to complex systems that continuously learn, adapt, and improve over time.

The basic single-agent loop. Through its harness, an LLM uses tools in sequence, reasoning about what to call next, observing the result, and continuing until it is done. To get there, the LLM repeatedly and iteratively decides what to do next. Tools are the only way the LLM can affect the world, and the harness provides its connection to those tools. The loop runs until the model decides it's done.

The ReAct pattern. This is the most widely used and most understandable loop variant, ReAct (Reasoning + Acting), structures each iteration as: Thought → Action → Observation. The agent generates a reasoning trace outlining its current understanding and planning the next action, selects and invokes an appropriate tool, and then receives the tool's output as new information to inform the next cycle.

The reflection loop. Not to be confused with the Reflecting Pool that's currently grabbing headlines. According to AI expert Andrew Ng, the agent critiques its own work between iterations, repeatedly improving the output before completing the task. This allows AI to critique its own work and iterate to improve quality based on that critique. This is much like automated code review.

The multi-agent loop. Multiple specialized agents each run their own loops, orchestrated by a coordinator. Anthropic explains that a multi-agent system consists of multiple agents that are LLMs autonomously using tools in a loop, working together. A lead agent plans, uses tools to create parallel agents that search for information simultaneously, and then synthesizes their findings.

The evaluator-optimizer loop. One LLM call generates a response while another provides evaluation and feedback in a loop. This is particularly effective when it uses clear evaluation criteria and when iterative refinement after evaluation provides measurable value. Think of the evaluator as the digital equivalent of a coxswain.

Eliminating the One-Shot

The loop solves a specific problem that stumps one-shot prompting: tasks that can't be completed in a single pass because you don't know all the required steps in advance, or because you need to adapt based on what you find along the way. Somewhat like, but short of, pure “trial-and-error.”

Casius Lee of Oracle suggests you consider the request: "Find me the three cheapest flights to Tokyo next month, check if my loyalty points cover any of them, and book the best option." The chatbot has no mechanism to proceed. It generates a response and stops. It can answer questions about flights. It can explain how loyalty points work. It cannot execute the workflow. The interaction is stateless. Each prompt is processed in isolation, with no persistent context, no access to intermediate results, and no ability to chain decisions across steps. An agent with a loop handles that chain natively.

Andrew Ng's most-cited insight captures the broader implication: that wrapping a less powerful model in the right agentic loop can outperform a more powerful model used in zero-shot mode.

The advantages of agentic workflows include better performance, parallelism (agents can fetch and analyze multiple sources simultaneously, unlike humans), and modularity (which allows you to swap in different tools or models for different steps).

The loop is also what makes recovery possible. A tight loop means the agent can adapt based on what it actually finds — if a search returns no results, it reformulates the query; if an API returns an error, it tries a fallback.

What Loops Change

Three things are decidedly different thanks to loops.

  1. The software development model. Anthropic's 2024 article, "Building Effective Agents", draws a useful distinction: workflows are predetermined sequences where the developer defines the control flow; agents are open-ended loops where the model decides the control flow. This is a fundamental shift in who controls execution logic. You're no longer writing every branch. You're designing the environment and constraints within which the agent decides.
  1. The cost model. It should be noted that loops are more expensive than chat. Anthropic's engineering team reports that agents consume roughly 4x as many tokens per chat, and multi-agent systems about 15x as many, while token usage accounts for around 80% of performance variance. Multi-agent outperformed single-agent by 90.2% on Anthropic's internal evaluations, so the capability improvement is real, but you're paying for it. The question is whether the task complexity justifies the cost. This isn't academic when building on top of AI platforms. Uncontrolled loops are uncontrolled costs.
  1. The discipline of context engineering. Those of you who have been following "Human-in-the-Loop" for some time know well the shift from prompt engineering to context engineering. This is perhaps the biggest conceptual shift. On June 18, 2025, Tobi Lütke tweeted the canonical definition: context engineering is "the art of providing all the context for the task to be plausibly solvable by the LLM." One week later, Andrej Karpathy's endorsement amplified the concept: "context engineering is the delicate art and science of filling the context window with just the right information for the next step."

The emphasis on "next step" is the key. In a loop, the model's context window resets and updates with every iteration. What you feed it at each pass, including tool results, memory, prior actions, and retrieved documents,  determines what it reasons about and what it does. Prompt engineering asks, "How do I phrase this?" Context engineering asks, "What information does the AI need, and in what form?" Most agent failures are now context failures rather than model failures.

What Prominent Experts Are Saying

Lilian Weng (OpenAI): Her 2023 post "LLM-Powered Autonomous Agents" established the foundational framework that others are still building on. The three-component architecture — planning, memory, tool use — running inside a loop is now canonical. LLMs functioning as the agent's brain, complemented by planning (breaking down large tasks into manageable subgoals and self-reflection), memory (short-term in-context learning and long-term recall), and tool use.

Andrew Ng (DeepLearning.AI): He was the first major voice to argue loudly that the loop changes the performance calculus. The insight that agentic wrapping can outperform raw model capability became the basis for his widely followed Agentic AI course at DeepLearning.AI. His framework identifies three degrees of autonomy: low autonomy (developer-coded, predetermined steps), medium autonomy (LLM chooses which steps or tools to use), and high autonomy (the agent can design its own workflow, even writing new functions or tools).

Boris Cherny, the creator of Claude Code, is the source of the single most-quoted line in the current loop discourse. He delivered it at WorkOS Acquired Unplugged on June 2, 2026: "Now it's actually leveled up, I think, again, to the next wave of abstraction where I don't prompt Claude anymore. I have loops that are running. They're the ones that are prompting Claude and figuring out what to do. My job is to write loops."

That's not abstraction. That's a practitioner describing his actual daily workflow. The context matters. Six months earlier, his workflow had already shifted dramatically — he described it this way: "The way that I coded a year ago was I wrote code with some autocomplete in the IDE. At that point, I was running maybe five, ten Claudes in parallel, and my coding was prompting Claude to write code." That was already a radical departure.

Then it moved again: "In November, I uninstalled my IDE, 'cause, well, I wasn't using it. In the last month, I just haven't opened it, so I just uninstalled it."

He didn't stop there. In March 2026, he confirmed that Claude Code is now 100% written by Claude Code itself, a recursive milestone.

The critical nuance is what Cherny means by his job shifting to "writing loops." Cherny spent a significant part of 2025 writing the loop contracts, skills, and configurations that enable his current workflow. He wrote the CLAUDE.md files. He wrote the loop specifications. He ran five parallel Claude Code instances against separate git checkouts and observed which patterns produced reliable output and which produced garbage. He built the judgment infrastructure that the loops now execute against. That work is prompting. It's just prompting that happened upstream, before the loop ran, and that now compounds forward instead of evaporating at session end.

Andrej Karpathy (former OpenAI, Tesla, now Anthropic): During the Sequoia AI Ascent 2026 event, Karpathy posted his own cleaned-up transcript from his April 2026 fireside chat with Stephanie Zhan at Sequoia. His comments on loops are embedded in a broader argument about Software 3.0, but the loop concept is its structural spine.

In December 2025, as the inflection point: "I started to notice that with the latest models, the chunks just came out fine. Then I kept asking for more, and they still came out fine. I couldn't remember the last time I corrected it. I started trusting the system more and more. I do think it was a stark transition. A lot of people experienced AI last year as a ChatGPT-adjacent thing, but you really had to look again as of December, because things changed fundamentally, especially in this agentic, coherent workflow."

On what Software 3.0 actually means for loops: "Software 3.0 is about programming through prompting. What's in the context window is your lever over the interpreter, and the interpreter is the LLM. It interprets your context and performs computation in digital information space."

His OpenClaw installation example is the clearest concrete illustration of loop-based thinking replacing scripted code: "The OpenClaw installation was instead a block of text that you copy and paste into your agent. It is like a little skill: copy this, give it to your agent, and it will install OpenClaw. That is more powerful because you're working in the Software 3.0 paradigm. You don't have to spell out every detail. The agent has intelligence. It looks at your environment, performs intelligent actions, and debugs in the loop."

On the difference between vibe coding (the floor) and agentic engineering (the ceiling), which directly maps to loops as a professional discipline: "Agentic engineering is about preserving the quality bar of professional software. You are not allowed to introduce vulnerabilities because of vibe coding. You are still responsible for your software, just as before. But can you go faster? Spoiler: you can. The question is how to do that properly."

On what happens to human skill inside a loop-based world and why humans can't just step out: "You can outsource your thinking, but you can't outsource your understanding."

And the most pointed statement on what the loop still requires humans to do using the Stripe/Google email mismatch bug from his own app as the example: "People have to be in charge of the spec and plan... You are in charge of taste, engineering, design, and whether the system makes sense. You ask for the right things — for example, we tie everything to unique user IDs. The agents fill in the blanks."

Karpathy's June 2025 post on context engineering reframed the entire discipline. He and Shopify CEO Tobi Lütke together shifted the conversation from "how do you prompt" to "how do you manage what the model sees at each loop iteration." The broader arc he describes: the AI development paradigm has shifted three times from Prompt Engineering to Context Engineering to what Martin Fowler's team is calling Harness Engineering. Perhaps we should stay tuned for “harness failures.”

Anthropic: Their "Building Effective Agents" paper and subsequent engineering blog posts have been the most practically influential. The Anthropic team's clearest contribution is the agent/workflow distinction and the strong recommendation to start simple. Their key observation is that the most successful implementations often avoid complex frameworks or specialized libraries, instead favoring simple, composable patterns and that sometimes the right answer is not to build agentic systems at all, since these systems trade latency and cost for better task performance.

Their multi-agent research system, in which a lead agent plans and spawns subagents to explore different aspects simultaneously, has demonstrated the architectural principle at a production scale.

The Architecture of Agentic AI

The loop is neither a feature nor a pattern to be optionally adopted. It is the architecture of agentic AI. Everything that makes an AI agent meaningfully different from a chatbot, such as tool use, multi-step reasoning, self-correction, parallelism, and recovery from failure,  depends on the loop.

Agentic AI products you will encounter and that you will build, manage, and support, run on loops. The cost structure is different from chat. The failure modes are different. The design discipline,  context engineering rather than prompt engineering, is different. For the human who needs to deal with them, understanding the loop at a conceptual level is table stakes for understanding and effectively using Agentic AI.

About the Author

Technologist, creator of compelling content, and senior "resultant" Howard M. Cohen has been in the information technology industry for more than four decades. He has held senior executive positions in many of the top channel partner organizations and he currently writes for and about IT and the IT channel.

Featured