An AI agent is a language model wrapped in a loop that can call tools, read the results and decide what to do next, repeating until it reaches a goal. The model is the brain, the tools are its hands, and the loop is what makes it act rather than just answer once.

How do I build an AI agent from scratch?

Pick a model that supports tool calling, define your tools as functions with a name, description and input schema, then write a loop: send the goal and tools to the model, run any tool it requests, feed the result back, and repeat until it returns a final answer. Add a turn cap and input validation as guardrails.

Do I need a framework like LangChain to build an agent?

No. An agent is a loop around a tool-calling model, and you can build a working one in a few dozen lines. Frameworks like the Claude Agent SDK or the OpenAI Agents SDK are worth adopting once you understand the loop, because they handle retries, streaming, sessions, permissions and MCP for you, not because the core idea is hard.

What is the difference between an AI agent and a chatbot?

A chatbot answers a message and stops. An agent runs a loop: it can call tools to act on the world, read the results, and take multiple steps toward a goal before responding. The presence of tools and a decision loop is what makes something an agent rather than a chatbot.

How do I make an AI agent safe to run in production?

Validate every tool input because the model chooses the arguments, run any code-executing tool in a sandbox with timeouts and resource limits, cap iterations and spend, log every step so the agent is observable, and keep a human in the loop for irreversible or sensitive actions.

How many tools should an AI agent have?

As few as the task needs. Each tool adds to the context and to the chance the model picks the wrong one, so a handful of sharp, well-described tools beats a large pile. Give each tool a precise description, because that description is how the model decides when to use it.

How to Build an AI Agent from Scratch (2026)

What an AI agent actually is

An AI agent is software that uses a language model to decide and act in a loop, rather than just answering once. The model is the brain, but a brain with no hands cannot do anything, so you give it tools: functions it can call to read a file, query a database, search the web or hit an API. The agent runs a loop: the model receives the goal and the list of available tools, it either answers or asks to call a tool, your code runs that tool and returns the result, and the model uses that result to decide its next move. That loop is the whole idea. "Agentic AI" is the broader term for systems built this way; an "AI agent" is one such system. For the precise definitions, see the glossary entries on AI agent, agentic AI, tool calling and the agent harness.

Model: the reasoning core that decides what to do (an LLM that supports tool calling).
Tools: functions the model can call to act on the world, each with a name, a description and an input schema.
Loop: model decides, your code runs the chosen tool, the result goes back, repeat until done.
See the glossary: AI agent, agentic AI, tool calling, agent harness for the formal definitions.

The build loop, step by step

Every agent, from a ten-line script to Claude Code, runs the same loop. You send the model the conversation so far plus the tool definitions. The model replies in one of two ways: with a final answer (it is done), or with a request to call one or more tools. If it asks for a tool, your code executes that tool, captures the output, appends it to the conversation as a tool result, and sends everything back. The model reads the result and decides again. You keep looping until the model returns a final answer or you hit a safety limit on iterations. The two non-negotiable guardrails are a maximum number of turns, so a confused agent cannot loop forever, and validation of tool inputs, because the model is asking you to run real code with arguments it chose.

Send the goal, conversation history and tool definitions to the model.
If the model returns a final answer, stop and return it.
If it requests a tool, validate the input, run the tool, append the result, and loop.
Always cap the number of iterations and validate tool arguments before executing.

A minimal agent you can build

Here is the smallest agent that does something real: a model with one tool (a calculator) running the tool-calling loop by hand against the Anthropic Messages API. The pattern is identical for any provider that supports tool calling. The model gets the question and the tool definition; when it replies with stop_reason "tool_use", we run the tool, send back a tool_result, and loop until it gives a plain text answer. Read it once and the magic disappears: an agent is a loop, a model and a dictionary of functions.

# pip install anthropic
# A minimal agent: one tool, the tool-calling loop by hand.
import anthropic

client = anthropic.Anthropic()  # reads ANTHROPIC_API_KEY from the env

# 1) Define the tools: a name, a description, and an input schema.
tools = [
    {
        "name": "calculator",
        "description": "Evaluate a basic arithmetic expression.",
        "input_schema": {
            "type": "object",
            "properties": {"expression": {"type": "string"}},
            "required": ["expression"],
        },
    }
]

# 2) Map tool names to the real functions that run them.
def calculator(expression: str) -> str:
    # Real code: validate hard. A toy eval is fine only for a demo.
    allowed = set("0123456789+-*/(). ")
    if not set(expression) <= allowed:
        return "error: invalid characters"
    return str(eval(expression))  # demo only; never eval untrusted input in prod

TOOLS = {"calculator": calculator}

# 3) The loop.
def run_agent(goal: str, max_turns: int = 8) -> str:
    messages = [{"role": "user", "content": goal}]
    for _ in range(max_turns):
        resp = client.messages.create(
            model="claude-sonnet-4-5",
            max_tokens=1024,
            tools=tools,
            messages=messages,
        )
        if resp.stop_reason != "tool_use":
            return "".join(b.text for b in resp.content if b.type == "text")
        messages.append({"role": "assistant", "content": resp.content})
        results = []
        for block in resp.content:
            if block.type == "tool_use":
                out = TOOLS[block.name](**block.input)
                results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": out,
                })
        messages.append({"role": "user", "content": results})
    return "stopped: hit the turn limit"

print(run_agent("What is 4321 * 1234, then add 99?"))

A complete minimal agent in Python: one tool, the model-plus-tool-calling loop by hand. The same shape works with any tool-calling model.

That is genuinely all an agent is. To make it useful you add more tools (read a file, call your API, query a database), give each a precise description so the model knows when to use it, and harden the execution path. The eval in the calculator is for the demo only; never run model-chosen code or expressions without strict validation or a sandbox.

Use a framework once you understand the loop

Building the loop by hand once is the best way to understand agents, but in production you reach for a framework that handles the loop, retries, streaming, sessions and permissions for you. In 2026 the two most direct paths are the Claude Agent SDK, which exposes the same agent loop, tool set and context management that power Claude Code (install @anthropic-ai/claude-agent-sdk for TypeScript or claude-agent-sdk for Python), and the OpenAI Agents SDK, a lightweight Python and TypeScript framework that turns any function into a tool with automatic schema generation (pip install openai-agents). Both give you tool calling, multi-step loops, human-in-the-loop checkpoints, subagents and first-class MCP support out of the box. The principle is the same one you just built; the SDK just removes the plumbing.

Claude Agent SDK: the same loop and tools that run Claude Code, programmable in Python and TypeScript, with built-in MCP and subagents.
OpenAI Agents SDK: a lightweight multi-agent framework that turns any function into a validated tool (pip install openai-agents).
Both handle the loop, retries, streaming, sessions and permissions you would otherwise write by hand.
Connect external tools through MCP rather than bespoke glue; see What Is an MCP Server.

The levels of autonomy

Not every agent should be fully autonomous, and choosing the right level is a design decision, not a default. Think of a ladder. At the bottom the model only suggests and a human does everything. One rung up it drafts and a human approves each action. Higher, it acts autonomously on low-risk steps but pauses for approval on anything sensitive (a human-in-the-loop checkpoint). At the top it runs an entire workflow unattended. The right level depends on the cost of a mistake: the more an error hurts, the more human oversight you keep. Most reliable production agents sit in the middle, fully autonomous on safe, reversible actions and gated on the rest. The Automation and Agentic Systems course covers this as the five levels of LLM autonomy.

Suggest only: the agent proposes, a human does everything. Lowest risk, lowest leverage.
Draft and approve: the agent prepares the action, a human confirms before it runs.
Autonomous with checkpoints: it acts on safe steps and pauses for approval on risky ones.
Fully unattended: it runs the whole workflow alone; reserve this for low-stakes, reversible tasks.

Productionizing your agent

A demo agent and a production agent differ in everything around the loop. The model and the tools are the easy part; reliability is the work. Validate every tool input, because the model is choosing the arguments. Run anything that executes code or touches the outside world in a sandbox with timeouts and resource limits, never on a machine you care about. Log every step (the goal, each tool call, each result) so you can see what the agent did and debug it when it goes sideways. Cap iterations and cost so a confused agent cannot loop forever or run up a bill. And keep a human in the loop for irreversible or sensitive actions. These are the same lessons the founder builds learned the hard way: CallAssistant gave its voice agent tightly defined tools because there is no "are you sure?" on a phone call, and CodeCourier ran untrusted code only inside a disposable sandbox.

Validate tool inputs and run code-executing tools in a sandbox with timeouts and limits.
Log the goal, every tool call and every result so the agent is observable and debuggable.
Cap iterations and spend so a runaway loop cannot cost you time or money.
Gate irreversible or sensitive actions behind a human-in-the-loop approval step.
Learn from real builds: CallAssistant (tight tools) and CodeCourier (sandboxing) on the Builds page.

Step by step

Pick a tool-calling model
Choose a model that supports tool calling (for example a Claude or GPT tier) and get an API key. The agent loop is identical across providers that support tools.
Define your tools
For each action the agent needs, write a function and a tool definition with a name, a clear description and an input schema. The description is what the model reads to decide when to call it.
Write the loop
Send the goal, conversation and tool definitions to the model. If it returns a final answer, stop. If it requests a tool, validate the input, run the tool, append the result, and send everything back.
Add guardrails
Cap the number of iterations, validate every tool argument, and run any code-executing tool in a sandbox with timeouts. Log each step so you can see what the agent did.
Choose an autonomy level
Decide which actions the agent may take unattended and which need human approval, based on the cost of a mistake. Gate irreversible or sensitive actions behind a checkpoint.
Move to an SDK for production
Once the loop is clear, adopt the Claude Agent SDK or the OpenAI Agents SDK to get retries, streaming, sessions, permissions and MCP support without writing the plumbing yourself.

Frequently asked questions

Keep learning

Guide

What Is an MCP Server? (and How to Build One)

An MCP server exposes tools, resources and prompts to AI agents over one open protocol. Here is how it works, a minimal server you can build, and common servers.

Open Guide

What Is Agentic Engineering? The 2026 Pillar Guide

Agentic engineering is building software by directing AI coding agents that plan, edit and run code. What it is, how it differs, and how to learn it in 2026.

Open Guide

AI Automation for Business: A Practical Playbook

A practical playbook for AI automation in business: where to start, build vs buy, n8n vs Zapier vs Make, real workflow examples, and measuring ROI honestly.

Open Term

AI Agent

AI agent meaning, in plain English: a system that uses a language model to decide and act in a loop toward a goal, calling tools along the way. Clear definition, examples, and how AI agents differ from a chatbot.

Open Term

Agentic AI

Agentic AI is AI that acts autonomously toward a goal, planning and using tools across many steps. Definition, how it differs from generative AI, and real examples.

Open Term

Tool Calling

Tool calling (aka function calling) is when an AI model or agent outputs structured JSON to ask your code to run a function, so it can act and use tools instead of only replying with text. Definition, how it works, and how it relates to tool chaining and MCP.

Open Term

Agent Harness

An agent harness is the scaffolding around an AI model that runs the loop, manages context, dispatches tool calls and enforces safety so the model can act.

Open Lesson

Building Your Own AI Tools with APIs

Build custom AI tools on top of model APIs, including image-to-structured-data workflows, instead of buying SaaS

Open Lesson

The 5 Levels of LLM Autonomy

Place any agentic system on a five-level autonomy scale, see why validation is the real blocker, and climb a level safely

Open Lesson

Human in the Loop: Continuous Learning Systems

Design systems where humans approve at the right checkpoints and every mistake becomes a rule the system learns from

Open