An AI agent is a language model wrapped in a loop that can call tools, read the results and decide what to do next, repeating until a goal is reached. Building one from scratch is far simpler than the hype suggests: at its core it is a while loop around a model that supports tool calling, where you hand the model a goal and a set of functions, it asks to run one, you run it, you feed the result back, and it goes again until it is done. This guide takes you from the definition to a concrete, minimal agent you can run today, then up through the levels of autonomy and what changes when you put an agent in production. We will build the loop by hand first so you understand exactly what is happening, then point you at the SDKs that do this for you. Everything here is current as of June 2026.
What an AI agent actually is
An AI agent is software that uses a language model to decide and act in a loop, rather than just answering once. The model is the brain, but a brain with no hands cannot do anything, so you give it tools: functions it can call to read a file, query a database, search the web or hit an API. The agent runs a loop: the model receives the goal and the list of available tools, it either answers or asks to call a tool, your code runs that tool and returns the result, and the model uses that result to decide its next move. That loop is the whole idea. "Agentic AI" is the broader term for systems built this way; an "AI agent" is one such system. For the precise definitions, see the glossary entries on AI agent, agentic AI, tool calling and the agent harness.
- Model: the reasoning core that decides what to do (an LLM that supports tool calling).
- Tools: functions the model can call to act on the world, each with a name, a description and an input schema.
- Loop: model decides, your code runs the chosen tool, the result goes back, repeat until done.
- See the glossary: AI agent, agentic AI, tool calling, agent harness for the formal definitions.
The build loop, step by step
Every agent, from a ten-line script to Claude Code, runs the same loop. You send the model the conversation so far plus the tool definitions. The model replies in one of two ways: with a final answer (it is done), or with a request to call one or more tools. If it asks for a tool, your code executes that tool, captures the output, appends it to the conversation as a tool result, and sends everything back. The model reads the result and decides again. You keep looping until the model returns a final answer or you hit a safety limit on iterations. The two non-negotiable guardrails are a maximum number of turns, so a confused agent cannot loop forever, and validation of tool inputs, because the model is asking you to run real code with arguments it chose.
- Send the goal, conversation history and tool definitions to the model.
- If the model returns a final answer, stop and return it.
- If it requests a tool, validate the input, run the tool, append the result, and loop.
- Always cap the number of iterations and validate tool arguments before executing.
A minimal agent you can build
Here is the smallest agent that does something real: a model with one tool (a calculator) running the tool-calling loop by hand against the Anthropic Messages API. The pattern is identical for any provider that supports tool calling. The model gets the question and the tool definition; when it replies with stop_reason "tool_use", we run the tool, send back a tool_result, and loop until it gives a plain text answer. Read it once and the magic disappears: an agent is a loop, a model and a dictionary of functions.
# pip install anthropic
# A minimal agent: one tool, the tool-calling loop by hand.
import anthropic
client = anthropic.Anthropic() # reads ANTHROPIC_API_KEY from the env
# 1) Define the tools: a name, a description, and an input schema.
tools = [
{
"name": "calculator",
"description": "Evaluate a basic arithmetic expression.",
"input_schema": {
"type": "object",
"properties": {"expression": {"type": "string"}},
"required": ["expression"],
},
}
]
# 2) Map tool names to the real functions that run them.
def calculator(expression: str) -> str:
# Real code: validate hard. A toy eval is fine only for a demo.
allowed = set("0123456789+-*/(). ")
if not set(expression) <= allowed:
return "error: invalid characters"
return str(eval(expression)) # demo only; never eval untrusted input in prod
TOOLS = {"calculator": calculator}
# 3) The loop.
def run_agent(goal: str, max_turns: int = 8) -> str:
messages = [{"role": "user", "content": goal}]
for _ in range(max_turns):
resp = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
tools=tools,
messages=messages,
)
if resp.stop_reason != "tool_use":
return "".join(b.text for b in resp.content if b.type == "text")
messages.append({"role": "assistant", "content": resp.content})
results = []
for block in resp.content:
if block.type == "tool_use":
out = TOOLS[block.name](**block.input)
results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": out,
})
messages.append({"role": "user", "content": results})
return "stopped: hit the turn limit"
print(run_agent("What is 4321 * 1234, then add 99?"))That is genuinely all an agent is. To make it useful you add more tools (read a file, call your API, query a database), give each a precise description so the model knows when to use it, and harden the execution path. The eval in the calculator is for the demo only; never run model-chosen code or expressions without strict validation or a sandbox.
Use a framework once you understand the loop
Building the loop by hand once is the best way to understand agents, but in production you reach for a framework that handles the loop, retries, streaming, sessions and permissions for you. In 2026 the two most direct paths are the Claude Agent SDK, which exposes the same agent loop, tool set and context management that power Claude Code (install @anthropic-ai/claude-agent-sdk for TypeScript or claude-agent-sdk for Python), and the OpenAI Agents SDK, a lightweight Python and TypeScript framework that turns any function into a tool with automatic schema generation (pip install openai-agents). Both give you tool calling, multi-step loops, human-in-the-loop checkpoints, subagents and first-class MCP support out of the box. The principle is the same one you just built; the SDK just removes the plumbing.
- Claude Agent SDK: the same loop and tools that run Claude Code, programmable in Python and TypeScript, with built-in MCP and subagents.
- OpenAI Agents SDK: a lightweight multi-agent framework that turns any function into a validated tool (pip install openai-agents).
- Both handle the loop, retries, streaming, sessions and permissions you would otherwise write by hand.
- Connect external tools through MCP rather than bespoke glue; see What Is an MCP Server.
The levels of autonomy
Not every agent should be fully autonomous, and choosing the right level is a design decision, not a default. Think of a ladder. At the bottom the model only suggests and a human does everything. One rung up it drafts and a human approves each action. Higher, it acts autonomously on low-risk steps but pauses for approval on anything sensitive (a human-in-the-loop checkpoint). At the top it runs an entire workflow unattended. The right level depends on the cost of a mistake: the more an error hurts, the more human oversight you keep. Most reliable production agents sit in the middle, fully autonomous on safe, reversible actions and gated on the rest. The Automation and Agentic Systems course covers this as the five levels of LLM autonomy.
- Suggest only: the agent proposes, a human does everything. Lowest risk, lowest leverage.
- Draft and approve: the agent prepares the action, a human confirms before it runs.
- Autonomous with checkpoints: it acts on safe steps and pauses for approval on risky ones.
- Fully unattended: it runs the whole workflow alone; reserve this for low-stakes, reversible tasks.
Productionizing your agent
A demo agent and a production agent differ in everything around the loop. The model and the tools are the easy part; reliability is the work. Validate every tool input, because the model is choosing the arguments. Run anything that executes code or touches the outside world in a sandbox with timeouts and resource limits, never on a machine you care about. Log every step (the goal, each tool call, each result) so you can see what the agent did and debug it when it goes sideways. Cap iterations and cost so a confused agent cannot loop forever or run up a bill. And keep a human in the loop for irreversible or sensitive actions. These are the same lessons the founder builds learned the hard way: CallAssistant gave its voice agent tightly defined tools because there is no "are you sure?" on a phone call, and CodeCourier ran untrusted code only inside a disposable sandbox.
- Validate tool inputs and run code-executing tools in a sandbox with timeouts and limits.
- Log the goal, every tool call and every result so the agent is observable and debuggable.
- Cap iterations and spend so a runaway loop cannot cost you time or money.
- Gate irreversible or sensitive actions behind a human-in-the-loop approval step.
- Learn from real builds: CallAssistant (tight tools) and CodeCourier (sandboxing) on the Builds page.
Step by step
Pick a tool-calling model
Choose a model that supports tool calling (for example a Claude or GPT tier) and get an API key. The agent loop is identical across providers that support tools.
Define your tools
For each action the agent needs, write a function and a tool definition with a name, a clear description and an input schema. The description is what the model reads to decide when to call it.
Write the loop
Send the goal, conversation and tool definitions to the model. If it returns a final answer, stop. If it requests a tool, validate the input, run the tool, append the result, and send everything back.
Add guardrails
Cap the number of iterations, validate every tool argument, and run any code-executing tool in a sandbox with timeouts. Log each step so you can see what the agent did.
Choose an autonomy level
Decide which actions the agent may take unattended and which need human approval, based on the cost of a mistake. Gate irreversible or sensitive actions behind a checkpoint.
Move to an SDK for production
Once the loop is clear, adopt the Claude Agent SDK or the OpenAI Agents SDK to get retries, streaming, sessions, permissions and MCP support without writing the plumbing yourself.
