# Agentic School - Full corpus > The complete, machine-readable corpus of every published Agentic School page in both locales, as one Markdown document. Free forever. Sections are separated by horizontal rules; each entry carries its canonical URL and locale. --- ## Courses ### Foundations - From Zero to Your First Shipped App - Canonical URL: https://agenticschool.dev/courses/foundations - Level: einsteiger - Lessons: 7 The complete starting point. Understand how LLMs and coding agents actually work, install Claude Code and Codex, prompt them like a pro, then scaffold, build and ship a real website to the public internet. ### Claude Code Mastery - Becoming a Power User - Canonical URL: https://agenticschool.dev/courses/claude-code-mastery - Level: einsteiger - Lessons: 7 Go deep on Claude Code. Teach it your rules with CLAUDE.md, build reusable skills and commands, automate quality gates with hooks, connect tools with MCP, and run multi-agent workflows without burning context or money. ### The Modern App Stack - Auth, Data and Payments - Canonical URL: https://agenticschool.dev/courses/modern-app-stack - Level: fortgeschritten - Lessons: 7 Assemble a real product. Learn how modern apps fit together, add Clerk authentication and Google OAuth, model reactive data in Convex, handle secrets safely, charge customers with Stripe, and migrate from dev to production. ### Automation and Agentic Systems - Canonical URL: https://agenticschool.dev/courses/automation-agentic-systems - Level: fortgeschritten - Lessons: 7 Build automations and agentic tooling. Compare n8n, Zapier and Trigger.dev, automate the browser with Playwright, run code safely in sandboxes, build your own AI tools on top of APIs, and design human-in-the-loop systems. ### Quality, Security and the Agent-First Business - Canonical URL: https://agenticschool.dev/courses/quality-security-agent-first - Level: technik - Lessons: 7 Make it production-grade. Set up tests and CI/CD, lock down security and privacy, get found by Google and by AI through SEO and GEO, design agent-first products, and build and ship your own agentic product end to end. --- ## Lessons ### 1.1 How LLMs Actually Work: Tokens, Context Windows and the Performance Cliff - Canonical URL: https://agenticschool.dev/courses/foundations/how-llms-actually-work-tokens-context-windows-and-the-performance-cliff - Duration: 22 min Summary: Before you touch a single tool, you need a working mental model of what a large language model is doing. This lesson explains tokens, context windows and the performance cliff that hits long conversations, so every later decision about models, prompts and agents makes sense. #### Summary A large language model does one thing astonishingly well: it predicts the next token given everything it has seen so far. Once you understand tokens, the context window and the performance cliff that hits long inputs, every later decision about which model to pick, how to prompt it and how to run agents stops feeling like guesswork. This lesson gives you that mental model in plain language, with the few numbers that actually matter in 2026. #### What you will learn You will learn what a token is, why you pay and get limited per token rather than per word, how the context window holds the whole conversation, and why dumping more text into a model often makes its answers worse rather than better. By the end you can read a model spec sheet and predict how a model will behave before you spend a cent on it. #### Prerequisites None. You do not need to code or to have used an AI tool before. If you have ever typed a message into ChatGPT, Claude or Gemini, you already have all the background you need. We link the deeper terminal and Git fundamentals later, when you actually need them. #### The problem Most people treat an LLM like a search engine or a person. They paste in a huge document, ask a vague question, and are surprised when the answer is shallow, wrong or ignores half of what they pasted. The model did not get lazy. It hit limits that are baked into how it works. Without a mental model of tokens and context you will keep blaming the tool for behaviour that is completely predictable. #### Tokens, not words A model never sees letters or words the way you do. Before anything happens, your text is split into tokens, which are common chunks of characters. A token is roughly four characters or about three quarters of a word in English. Common words are a single token; rare words, code symbols and other languages cost more. Two things are measured in tokens: the price you pay and the size of what a model can hold at once. That is why a "cheap" model can get expensive on long documents, and why German or code prompts cost more tokens than the same idea in plain English. - 1 token is about 4 characters or 0.75 English words. - 1,000 tokens is roughly 750 words, or about a page and a half of text. - Pricing is quoted per million tokens, split into input (what you send) and output (what the model writes back). Output usually costs several times more than input. - You are billed for the WHOLE conversation every turn, because the model re-reads everything each time it replies. #### The context window The context window is the maximum number of tokens a model can consider at once: your instructions, the files you pasted, the conversation history and the answer it is writing, all added together. In 2026 a typical strong model has a context window of around 200,000 tokens, with some models advertising 1,000,000 or more. Think of it as the model's desk. Everything relevant has to fit on the desk at the same time. When the desk is full, something has to come off, and the model effectively forgets it. This is why a long chat starts losing track of instructions you gave near the start: those tokens fell off the desk. #### The performance cliff Here is the part almost nobody tells beginners: bigger context is not the same as better answers. As you fill a context window, model quality degrades long before you hit the hard limit. Models attend best to the start and end of a long input and get fuzzy in the middle, a pattern often called "lost in the middle". A 1,000,000 token window sounds amazing, but in practice answer quality on a packed window can be noticeably worse than on a tight, well-chosen 20,000 token prompt. This is the performance cliff. The lesson is blunt: relevance beats volume. A short prompt with exactly the right context beats a giant prompt every single time. - Quality is highest when the window is mostly empty and every token earns its place. - Quality drops as the window fills, especially for information buried in the middle. - Huge advertised windows (1M+) rarely deliver their full quality at the top end - treat them as a safety margin, not a workspace. - When in doubt, start a fresh conversation rather than piling onto a long one. #### Step by step: see it for yourself You can build intuition in ten minutes without writing code. Open any chat model and run this small experiment. The point is to feel how tokens, the window and the cliff show up in real answers. - Ask the model: "How many tokens is the word internationalization, and why?" Notice it splits into several tokens because it is long and rare. - Paste a long article (a few thousand words) and ask a question about one sentence in the exact middle. Then ask the same question about the first sentence. The middle answer is usually weaker. - In a very long chat, ask the model to repeat an instruction you gave near the top. Watch it struggle or invent - those early tokens are off the desk. - Start a brand-new chat, paste only the relevant paragraph, and ask again. The answer is sharper. That is relevance beating volume. #### Typical mistakes The classic beginner error is the "dump everything" prompt: paste a 50 page PDF and ask one narrow question. The model drowns. The second mistake is the never-ending chat, where you keep one conversation open for days and wonder why it gets dumber. The third is assuming a bigger context window means you can be lazy about relevance. All three come from not respecting the cliff. The fix is always the same: send less, but send the right less, and reset often. #### Business ROI This is not academic. Tokens are money and context discipline is quality. A team that understands this writes tighter prompts, picks cheaper models for simple tasks, and gets more reliable output, which means less rework. On a real workflow that runs thousands of times, trimming a bloated prompt from 30,000 tokens to 5,000 can cut your bill by 80 percent AND improve the answers. Understanding the cliff is the single highest-leverage thing a non-technical founder can learn before spending on AI at scale. #### Checklist Before you move on, make sure you can answer these without looking back. If any answer is shaky, reread the relevant section - this model sits under everything else in the course. - Can you explain a token to a colleague in one sentence? - Do you know roughly how many words fit in a 200,000 token window? - Can you describe the performance cliff and why relevance beats volume? - Do you know why you are billed for the whole conversation each turn? #### Resources Keep the idea handy as you work: when an answer is bad, your first two questions are always "is my context too big?" and "is the right information actually near the top or bottom?" The fundamentals page on tokens goes deeper on tokenization if you want the underlying detail, and the model comparison in the next lesson builds directly on the numbers introduced here. #### Your task Run the four-step experiment above in a chat model of your choice and write down, in your own words, one sentence describing the moment you saw the model "forget" or get fuzzy. Keeping that concrete memory makes every prompting decision later in the course click into place. #### Next lesson Now that you know what a model is doing under the hood, the obvious next question is which model to use. The next lesson compares Haiku, Sonnet and Opus against GPT and Gemini, explains benchmarks honestly, and shows where to get strong models cheaply or for free. ### 1.2 Choosing Your Model: Haiku vs Sonnet vs Opus, GPT, Gemini and Benchmarks - Canonical URL: https://agenticschool.dev/courses/foundations/choosing-your-model-haiku-vs-sonnet-vs-opus-gpt-gemini-and-benchmarks - Duration: 24 min Summary: There is no single best model, only the right model for a task and a budget. This lesson maps the 2026 model landscape - Claude Haiku, Sonnet and Opus, OpenAI GPT, Google Gemini - explains how to read benchmarks without being fooled, and shows where to get strong models cheaply through OpenRouter and free Gemini credits. #### Summary Every model provider ships a family of models, not one model. They come in tiers: small and fast, mid and balanced, large and smart. Once you see the tiers instead of the brand names, choosing becomes simple. This lesson gives you a decision rule you can apply to any new model that launches, plus the practical tricks to get top models cheaply or for free. #### What you will learn You will learn the three-tier mental model, how Claude (Haiku, Sonnet, Opus), OpenAI GPT and Google Gemini line up inside it, how to treat benchmarks with healthy suspicion, and how to access strong models without paying full sticker price using OpenRouter and free Gemini credits. #### Prerequisites The previous lesson on tokens and context. You need to be comfortable with the idea that price is quoted per million tokens and that bigger context is not automatically better, because both ideas drive model choice. #### The problem Beginners either default to the single model they have heard of, or they chase whatever topped a leaderboard last week. Both are expensive mistakes. Using a flagship model to reformat a list is like hiring a surgeon to apply a plaster. Using a tiny model for hard architectural reasoning produces confident nonsense. The skill is matching task difficulty to model tier, and that skill outlives any individual model name. #### The three tiers Forget brand loyalty and think in tiers. Small models are fast and cheap, great for classification, extraction, simple rewrites and high-volume jobs. Mid models are the balanced workhorse for most real coding and writing. Large models are slower and pricier but reason far better on genuinely hard problems. Almost every provider mirrors this structure, so once you internalise it you can place any new model instantly. - Small / fast: Claude Haiku, GPT small tier, Gemini Flash. Use for volume, extraction, routing, cheap drafts. - Mid / balanced: Claude Sonnet, GPT mid tier, Gemini Pro. Your daily driver for coding and serious writing. - Large / smart: Claude Opus, GPT large reasoning tier, Gemini Ultra/Pro top tier. Use for hard reasoning, tricky debugging, architecture. - Rule of thumb: start one tier lower than you think, and only move up if the output is genuinely not good enough. #### How to read benchmarks honestly Benchmarks are useful and also routinely misleading. A model can top a coding benchmark and still feel worse in your actual project, because benchmarks measure narrow tasks under ideal conditions and providers optimise hard for them. Treat benchmarks as a rough filter, not a verdict. The only benchmark that matters is your own: take three real tasks from your work, run them through two or three models, and judge the output yourself. Pay attention to consistency, not just peak performance, because a model you ship with needs to be reliably good, not occasionally brilliant. #### Pricing and the real cost picture Price is quoted per million input and output tokens, and the spread between tiers is large - often 10x or more between a small and a large model. Because output costs several times more than input, verbose models and chatty prompts cost more than you expect. The practical move is to route by difficulty: cheap model for the 80 percent of easy calls, expensive model only for the hard 20 percent. On a workflow at scale this single decision often matters more than which provider you chose. #### Getting strong models cheaply or free You do not have to pay full price to start. There are three reliable routes in 2026, and a beginner should know all of them before committing budget. - OpenRouter: a single account and API key that gives you access to almost every model (Claude, GPT, Gemini, open models) through one endpoint, with transparent per-token pricing and easy model switching. Ideal for comparing models without juggling five accounts. - Free Gemini credits: Google routinely offers a generous free tier and credits through its AI Studio, which is a genuinely strong, low-cost way to get a capable mid model for experiments and low-volume tools. - Provider free tiers and trials: most providers give you some free usage to evaluate. Use it deliberately to run your own three-task benchmark. #### Why 1M-context models disappoint You will see models advertising 1,000,000 token context windows and assume they are strictly better. In practice they often disappoint, for the exact reason from the previous lesson: the performance cliff. A model can technically accept a million tokens and still answer worse than a focused prompt because quality degrades as the window fills. Treat a huge context window as occasional insurance for a genuinely large document, not as permission to stop curating context. Most days, a mid model with a tight prompt beats a giant-context model with a sloppy one. #### Step by step: choose a model for a real task Make this concrete with one task from your own work. The goal is to practise the decision rule, not to find a permanent answer. - Write down the task and rate its difficulty: simple, normal, or genuinely hard. - Pick the matching tier: small, mid, or large. - Run it through two models in that tier (use OpenRouter to switch quickly). - Judge the output yourself on quality and consistency, then note which you would actually ship and why. #### Typical mistakes The big ones: always reaching for the flagship and burning money on easy tasks; trusting a leaderboard over your own three-task test; assuming a bigger context window means a smarter model; and locking into one provider so you never notice when a competitor ships something better for your use case. OpenRouter exists precisely so switching costs stay low. #### Business ROI Model selection is one of the clearest levers on your AI bill and your output quality. Routing easy work to a small model and reserving the large model for hard reasoning can cut costs by an order of magnitude while improving reliability, because each task gets the right tool. For a founder, the discipline of "match tier to difficulty, benchmark on your own tasks, keep switching cheap" is worth more than any single model choice. #### Checklist You are ready to move on when you can confidently do the following without second-guessing the brand names. - Place any new model into small, mid or large from its spec and price. - Explain why you would not use a flagship for a simple extraction job. - Run a three-task personal benchmark instead of trusting a leaderboard. - Name where to get a strong model cheaply or free (OpenRouter, Gemini credits). #### Resources Set up an OpenRouter account now so model switching is friction-free for the rest of the course, and grab free Gemini credits from Google AI Studio for low-cost experiments. You will use both repeatedly. The next lesson moves from the model to the tool that wraps it. #### Your task Create an OpenRouter account, then run one real task from your work through a small, a mid and a large model. Write a two-line note on which tier was actually good enough. That note is your first piece of real, personal benchmark data and it will guide your model choices for months. #### Next lesson A model on its own just talks. To make it do work - read files, run commands, edit code - you wrap it in a harness. The next lesson explains what a harness is and compares Claude Code, Codex, Pi and OpenCode so you know which tool to reach for. ### 1.3 What Is a Harness? Claude Code vs Codex vs Pi vs OpenCode - Canonical URL: https://agenticschool.dev/courses/foundations/what-is-a-harness-claude-code-vs-codex-vs-pi-vs-opencode - Duration: 22 min Summary: A model predicts text. A harness turns that text into action: reading files, running commands, editing code, looping until a task is done. This lesson defines the harness clearly and compares the four that matter in 2026 - Claude Code, Codex, Pi and OpenCode - including when the minimal Pi harness genuinely beats the heavyweight option. #### Summary The model is the engine. The harness is the car around it: the steering, pedals and tools that let the engine actually take you somewhere. A harness gives the model the ability to read and write files, run terminal commands, see the results, and loop until the job is done. This lesson defines the harness and compares the four worth knowing in 2026 so you stop confusing the model with the tool wrapped around it. #### What you will learn You will learn what a harness does, why the same model behaves very differently inside different harnesses, and how Claude Code, Codex, Pi and OpenCode compare on philosophy, control and integration. You will leave with a decision rule, including the non-obvious case where the tiny Pi harness beats the powerful Claude Code. #### Prerequisites The two previous lessons. You should be comfortable that a model only predicts tokens and that context discipline matters, because a harness is largely a machine for feeding the right context to the model and acting on its output. #### The problem People say "Claude is better than GPT" or "Codex cannot do X" when they actually mean the harness, not the model. The same Claude model feels brilliant in one tool and clumsy in another because the harness decides what files it sees, what commands it can run and how it loops. If you do not separate model from harness in your head, you will draw the wrong conclusions and pick the wrong tool. #### What a harness actually does A harness is the program that sits between you and the model and gives the model hands. At each step it gathers context (your instruction plus relevant files and command output), sends it to the model, reads back the model's proposed action, executes that action in your environment, and feeds the result back so the model can decide the next step. That loop - context, model, action, result, repeat - is the whole game. Everything that makes one harness better than another is a variation on how well it manages that loop. - Reads and edits files in your project. - Runs terminal commands and reads their output (tests, builds, git). - Manages the context window: deciding what to include and when to compact. - Loops autonomously until the task is done or it needs you. #### Claude Code Claude Code is Anthropic's official command-line harness, built around Claude models. It is the heavyweight: deep project integration, a rich permissions system, support for project rules in a CLAUDE.md file, reusable skills, hooks and sub-agents. It is the tool this very course is built and maintained with. When you want an agent that lives inside a real repository, runs your tests, respects your conventions and handles multi-step work, Claude Code is the default recommendation. The trade-off is that it is opinionated and powerful enough to feel heavy for tiny one-off jobs. #### Codex Codex is OpenAI's command-line coding agent, wrapping GPT models. It is pragmatic and strong at reading unfamiliar code, explaining it, and making focused, well-scoped changes. If you live in the OpenAI ecosystem or want a second opinion from a GPT-family model on a tricky change, Codex is a natural reach. Treat it as a capable peer to Claude Code rather than a rival to fear: many serious builders keep both installed and pick per task. #### Pi and OpenCode Pi is the minimalist. It is a deliberately tiny, transparent, extensible harness: you can see exactly what it does, and you bend it to your workflow rather than the other way around. OpenCode is an open-source harness in a similar spirit, model-agnostic and community-driven, often used with models routed through OpenRouter. Both trade hand-holding for control and transparency. The surprising truth is that for certain jobs - a quick scripted task, a custom loop, a setup where you want full visibility and zero magic - the minimal harness beats the heavyweight, because there is less between you and the model and nothing hidden. - Pi: minimal, transparent, highly extensible. You own the behaviour. - OpenCode: open-source, model-agnostic, pairs well with OpenRouter. - When minimal wins: scripted or repeatable tasks, full-visibility setups, custom loops, or when a heavyweight harness fights your workflow. #### Step by step: a decision rule You do not need to agonise. Run your task through this rule and pick in seconds. The point is to stop treating the choice as a loyalty question and start treating it as a fit question. - Deep work inside a real repo with tests, conventions and multi-step changes: Claude Code. - Reading or explaining unfamiliar code, or a GPT second opinion: Codex. - A scripted, repeatable, or fully transparent custom loop: Pi. - Open-source, model-agnostic, OpenRouter-routed setup: OpenCode. - Unsure: start with Claude Code, because the rest of this course assumes it. #### Typical mistakes The big mistake is blaming the model for the harness. The second is tool tribalism - insisting one harness is universally best when the honest answer is "it depends on the task". The third is reaching for the heaviest tool for trivial jobs, where a tiny harness or even a plain chat would be faster. Keep two harnesses installed and let the task decide. #### Business ROI Choosing the right harness is choosing how much of your team's work you can safely delegate to an agent. A heavyweight harness with good project rules lets you hand off real engineering tasks end to end, which is where the time savings compound. A minimal harness lets you build cheap, transparent, repeatable automations. Knowing both means you never over-pay in complexity for a simple job or under-power a hard one. #### Checklist Confirm you can do the following before moving on, because the next lesson gets hands-on with installation. - Explain the difference between a model and a harness in one sentence. - Describe the context-model-action-result loop. - Match a task to Claude Code, Codex, Pi or OpenCode. - Give one example where a minimal harness beats Claude Code. #### Resources You do not need to install anything yet - the next lesson walks installation step by step. For now, just hold the mental model: model is the engine, harness is the car. The fundamentals page on the terminal will help if the command line still feels foreign. #### Your task Write down the single task you most want an agent to do for you this week, then apply the decision rule and name the harness you would use. You will install it in the next lesson, so pick something real. #### Next lesson Time to stop reading and start installing. The next lesson gets Claude Code and Codex running on your machine, explains the subscription plans, and makes the case for why the 200 dollar per month plan is actually cheap for a serious builder. ### 1.4 Installing and Running Claude Code and Codex - Canonical URL: https://agenticschool.dev/courses/foundations/installing-and-running-claude-code-and-codex - Duration: 26 min Summary: This is the hands-on setup lesson. You will install Claude Code and Codex, run your first real agent session, understand the 20, 100 and 200 dollar plans (and why 200 is cheap for a serious builder), and set up the basic guardrails that protect your business and intellectual property from day one. #### Summary Enough theory. In this lesson you install the two coding agents, run your first session, and set up the guardrails that keep your work safe. We also demystify the subscription plans, because the pricing genuinely confuses people and the wrong choice either wastes money or throttles you mid-task. #### What you will learn You will install Claude Code and Codex, authenticate them, run a first agent task, understand the 20, 100 and 200 dollar plan tiers, and put three baseline protections in place so an agent never leaks a secret or touches code it should not. #### Prerequisites You need Node.js installed and a basic comfort with opening a terminal. If either is new to you, the Fundamentals pages on Node.js and terminal basics cover them from scratch - install Node.js first, then come back here. You will also want a code editor like VS Code, covered in its own fundamentals page. #### The problem The setup step is where most beginners stall. They are unsure which plan to buy, nervous about giving an agent access to their machine, and worried about safety and cost. So they never actually run the tool. This lesson removes every one of those blockers in order so you finish with a working, safe setup. #### Step by step: install Claude Code Claude Code installs through npm, the package manager that comes with Node.js. Open your terminal and run the install command, then start it inside a project folder. The first run will ask you to log in through your browser to connect your Anthropic account. ```bash # Install the Claude Code CLI globally npm install -g @anthropic-ai/claude-code # Move into a project folder (make one if needed) mkdir my-first-app && cd my-first-app # Start Claude Code - it will open a browser to log in on first run claude ``` Installing and launching Claude Code Once it starts, just talk to it. Try: "Create a README.md that explains this is my first AI-built project." Watch it propose the change, ask permission, and write the file. That loop - propose, permission, act - is the safety model you will rely on. #### Step by step: install Codex Codex is OpenAI's CLI and installs the same way. Having both gives you a second opinion and a fallback when one provider is busy or a particular model fits the task better. ```bash # Install the Codex CLI globally npm install -g @openai/codex # Launch it inside your project codex ``` Installing and launching Codex Authenticate with your OpenAI account when prompted. Run a small task to confirm it works, like asking it to explain the README you just created. You now have two coding agents installed. #### Understanding the plans: 20, 100 and 200 dollars Anthropic offers Claude through tiered plans, and the same logic applies to OpenAI. The 20 dollar plan is for light, occasional use. The 100 dollar plan suits regular daily users. The 200 dollar plan is for heavy, all-day builders and gives you the most generous usage of the strongest models. The honest framing: if you build for a living, 200 dollars per month is cheap. One hour of a developer's time costs more than that, and a serious plan saves you many hours every single week. The expensive option is the cheap plan that throttles you in the middle of real work. - 20 dollars/month: light use, evenings and weekends, learning. - 100 dollars/month: daily driver for most professionals. - 200 dollars/month: heavy all-day building, maximum access to top models. - Reframe: compare the plan to one billable hour, not to a streaming subscription. It pays for itself in saved time within days. #### Protecting your business and IP An agent runs commands and edits files, so basic guardrails are non-negotiable from day one. None of this is hard, and skipping it is how beginners leak API keys or push private code to the public internet. Put these three protections in place before you do any real work. - Use private repositories. When you connect to GitHub, default every business project to private so your code and IP are not publicly visible. - Never commit secrets. API keys, passwords and tokens live in a .env file that is listed in .gitignore so it is never sent to GitHub. The agent must never paste a secret into committed code. - Run with sensible permissions. Let the agent ask before running commands at first, so you see what it wants to do. Loosen this only once you trust the workflow. Both tools also respect a project rules file (CLAUDE.md for Claude Code, AGENTS.md for Codex) where you state these guardrails in plain language so the agent follows them automatically. We build those properly in Course 2. #### Typical mistakes The common errors here are buying the wrong plan and then concluding "AI is not good enough" when you were actually just throttled; granting blanket auto-run permissions before you understand the tool; and starting work without a .gitignore so a secret ends up in your first commit. Each is avoidable with the steps above. #### Business ROI A correctly set up agent is the highest-leverage hire you will ever make, available instantly and tireless. The plan cost is trivial against the hours saved, but only if you choose a tier that matches your usage and set up guardrails so you can delegate confidently. Founders who treat the 200 dollar plan as expensive are usually optimising the wrong line item by a factor of ten. #### Checklist You are done when all of the following are true. Do not move on until every box is genuinely ticked, because the rest of the course assumes a working agent. - Claude Code installed, authenticated, and it wrote a file for you. - Codex installed and authenticated. - You picked a plan that matches your real usage. - A .gitignore exists, secrets live in .env, and your project defaults to a private repo. #### Resources If install commands fail, the usual cause is a missing or outdated Node.js - revisit the Node.js fundamentals page. Keep the official Claude Code and Codex docs bookmarked for flags and updates. The next lesson teaches you how to actually talk to these agents so they deliver. #### Your task Install both agents, create a fresh project folder with a .gitignore and an empty .env, and have Claude Code generate a simple HTML page that says hello. Confirm the agent asked permission before writing. You now have a safe, working setup and your first agent-built file. #### Next lesson You have the tools. Now you need to drive them. The next lesson is prompt engineering that actually works: axioms, the unlimited-budget framing, asking the agent to push back, and writing spec sheets that get great results on the first try. ### 1.5 Talking to Agents: Prompt Engineering That Actually Works - Canonical URL: https://agenticschool.dev/courses/foundations/talking-to-agents-prompt-engineering-that-actually-works - Duration: 24 min Summary: Prompting an agent is not magic words, it is clear management. This lesson teaches the techniques that actually move quality: writing axioms and project contracts, the unlimited-budget framing, explicitly asking the agent to push back, and turning a vague idea into a spec sheet the agent can execute without guessing. #### Summary The single biggest jump in results does not come from a better model, it comes from a better brief. Talking to an agent well is a management skill: you set clear rules, frame the work for quality over shortcuts, invite disagreement, and hand over a spec instead of a vibe. This lesson gives you the concrete patterns that separate frustrating sessions from agents that nail it first time. #### What you will learn You will learn to write axioms (non-negotiable rules), to use the unlimited-budget framing that stops an agent cutting corners, to explicitly ask for pushback so the agent catches your mistakes, and to convert a vague request into a spec sheet with goal, context, constraints and acceptance criteria. #### Prerequisites A working Claude Code or Codex setup from the previous lesson. You should also remember the context lessons: a great prompt is partly about giving the right context, not just the right words. #### The problem The classic failure is the one-line wish: "build me a website". The agent has to guess the audience, the stack, the design, the scope and the definition of done, and it guesses wrong. Then you blame the AI. The real issue is that you briefed a contractor in five words and expected a finished house. Good prompting removes the guessing. #### Axioms: state the rules once Axioms are non-negotiable rules you state up front so the agent stops re-deciding them every task. Instead of correcting the same thing ten times, you write it as a rule. "Always use TypeScript. Never use em dashes. Every new feature needs a test. Never commit secrets." These read like a contract, and the agent treats them as constraints rather than suggestions. In Course 2 you will move these into a permanent CLAUDE.md, but even pasted at the top of a session they sharply raise consistency. - Phrase rules as absolutes: "Always...", "Never...", "Every... must...". - Cover style, stack, testing, security and tone - the things you keep correcting. - Keep them short and unambiguous; an agent follows a crisp rule better than a paragraph of nuance. #### The unlimited-budget framing Agents, like junior staff, default to the fastest acceptable answer. If you want world-class work you have to say so. Framing the task as if budget and time are unlimited - "do the right thing for the long-term health of this project, not the easy thing; do not cut corners" - measurably raises quality, because it licenses the agent to add tests, handle edge cases and refactor instead of bolting on the quickest fix. It sounds soft, but it reliably shifts output from "technically works" to "actually good". #### Ask for pushback By default an agent is agreeable and will cheerfully implement a bad idea. The fix is one sentence: "Before you start, tell me anything wrong with this plan, anything I have missed, and a better approach if one exists." This flips the agent from order-taker to advisor. It will catch missing requirements, point out security holes, and suggest simpler designs. Some of the most valuable moments in agentic work come from the agent disagreeing with you - but only if you explicitly give it permission to. #### Step by step: write a spec sheet A spec sheet is the difference between a vague wish and an executable brief. It does not need to be long. Four parts are enough, and writing them forces you to make the decisions the agent would otherwise guess. - Goal: what should exist when this is done, in one or two sentences. - Context: the stack, the relevant files, examples to follow, and anything the agent cannot infer. - Constraints: your axioms plus task-specific limits (no new dependencies, must work on mobile, and so on). - Acceptance criteria: a concrete checklist that defines "done" so both you and the agent know when to stop. ```markdown ## Goal A contact form on the landing page that emails me submissions. ## Context - Stack: this Astro project, existing styles in src/styles. - Follow the button style already used in the hero section. ## Constraints - TypeScript only. No new UI library. - Never expose the email API key in client code. - Add a test for the form validation. ## Acceptance criteria - [ ] Form validates name and email before submit. - [ ] Submitting shows a success message. - [ ] The validation test passes. ``` A minimal but complete spec sheet you can paste into the agent #### Typical mistakes The recurring errors: the one-line wish with no spec; never stating axioms so you correct the same things forever; accepting the first answer instead of asking for pushback; and over-stuffing the prompt with irrelevant context, which triggers the performance cliff from lesson one. Tight, structured briefs beat long rambling ones. #### Business ROI Briefing skill is the multiplier on everything else. The same agent and model produce mediocre or excellent work depending entirely on the brief, and a good spec sheet often turns three frustrating rounds into one clean delivery. For a founder, learning to write a tight spec is the cheapest possible way to triple the value you get from every AI subscription you already pay for. #### Checklist Before moving on, make sure you can produce each of these on demand, because every later lesson assumes you can brief an agent well. - A short list of axioms for your own projects. - A task framed for quality with the unlimited-budget and pushback lines. - A four-part spec sheet for a real feature. - An honest sense of when your prompt is too long, not too short. #### Resources Save your axioms somewhere reusable - you will paste them into a CLAUDE.md in Course 2. The Agent Task Brief template in the resource library is a ready-made spec-sheet skeleton you can copy. Next, you put all of this to work building a real project. #### Your task Take the project idea you chose two lessons ago and write a full spec sheet for its first feature, including your axioms, the unlimited-budget framing and the pushback request. Hand it to your agent and notice how much sharper the result is than a one-line prompt would have been. #### Next lesson You can brief an agent. Now you build something real. The next lesson scaffolds an actual project, starts a dev server so you can see it in the browser, and puts it under version control with Git and a private GitHub repo. ### 1.6 Your First Project: Scaffolding, Dev Server, Git and GitHub - Canonical URL: https://agenticschool.dev/courses/foundations/your-first-project-scaffolding-dev-server-git-and-github - Duration: 26 min Summary: Now you build. This lesson scaffolds a real web project with your agent, runs the dev server so you see it live in the browser, then puts everything under version control with Git and a private GitHub repository - without ever committing a secret. This is the moment your idea becomes a running thing. #### Summary This is the lesson where the abstract becomes concrete. You scaffold a real project, run it in your browser, and put it under version control on a private GitHub repo. By the end you have a live local app and a safe, backed-up history of your work - the foundation every later project builds on. #### What you will learn You will scaffold a project with your agent, start a dev server and view it at localhost, understand what Git and commits actually do, and push your code to a private GitHub repository with secrets safely excluded. These are the everyday motions of building, so we make them muscle memory now. #### Prerequisites A working agent setup, Node.js, and a GitHub account (free). The Fundamentals pages on Git, GitHub, the terminal and what a dev server is will help if any term here is new - skim them, then come back. You will not need to memorise Git commands; the agent runs most of them, but you should understand what they do. #### The problem Beginners get stuck between "I have an idea" and "it is running on my screen". They are unsure how to start a project, scared of the terminal, and confused by Git. The result is a folder of files with no history and no backup, where one bad change loses hours of work. This lesson closes that gap properly. #### Step by step: scaffold and run Rather than memorise a framework, ask your agent to scaffold a modern starter and explain each step. A common, beginner-friendly choice is a Vite-based project, but the exact stack matters less than understanding the flow: create the project, install dependencies, start the dev server, open the browser. ```bash # Scaffold a new Vite project (your agent can run this for you) npm create vite@latest my-first-app # Move in and install the dependencies it needs cd my-first-app npm install # Start the local dev server npm run dev ``` Scaffolding a project and starting the dev server The dev server prints a local address, usually something like http://localhost:5173. Open it in your browser and you will see your app running on your own machine. "localhost" simply means "this computer" - nothing is on the public internet yet, which is exactly what you want while building. Edit a file, save, and the browser updates instantly. That live loop is what makes web building feel fast. #### What Git actually does Git is version control: it takes snapshots of your project called commits, so you can always go back. Think of it as an unlimited undo history with labels. Each commit records what changed and why, which means a bad edit is never a disaster - you just return to the last good commit. This single habit removes the fear that stops beginners from experimenting, because nothing you do is permanent until you decide it is. ```bash # Start tracking this project with Git git init # Stage all current files for the first snapshot git add . # Save the snapshot with a message describing it git commit -m "Initial project scaffold" ``` Turning a folder into a version-controlled project #### Keeping secrets out before you commit Before you ever push to GitHub, make sure secrets cannot escape. A .gitignore file lists things Git should ignore. Your .env file - where API keys live - and the node_modules folder both belong there. Get this right once and you will never accidentally publish a key. Your agent can create this file, but you should be able to recognise a correct one. ```bash # .gitignore - tells Git what to never track node_modules .env .env.local dist ``` A minimal .gitignore that protects secrets and build output The rule is absolute: secrets go in .env, .env goes in .gitignore, and the agent never writes a key into committed code. If you remember nothing else about security from this course, remember this. #### Step by step: push to a private GitHub repo GitHub stores your repository in the cloud: a backup, a history and, later, the thing your deployment connects to. Create the repository as private so your business code stays yours. The agent can do most of this, but here is what is happening under the hood. ```bash # Create a PRIVATE repo and push, using the GitHub CLI gh repo create my-first-app --private --source=. --push # Or, if you created the repo on github.com first: git remote add origin https://github.com/yourname/my-first-app.git git push -u origin main ``` Publishing your code to a private GitHub repository Private is the default you want for anything commercial. Public repos are great for open source, but your business IP should start private and only go public deliberately, after a security review - a topic Course 5 covers in depth. #### Typical mistakes The painful ones: committing a .env file with a live API key (now in your history forever, even if you delete it later); never running git commit, so there is no history to roll back to; creating a public repo for private business code; and panicking at the terminal instead of letting the agent run the commands while you read what they do. Slow down, read each command, and commit early and often. #### Business ROI Version control is cheap insurance with an enormous payoff. A single recoverable mistake - rolling back a broken change in seconds instead of rebuilding it for hours - pays for the habit many times over. A private repo protects the IP your business is built on. And a running local app means you can iterate fast, which is the whole point of building with agents in the first place. #### Checklist You are ready to ship once every one of these is true. Do not skip the secret-safety items - they matter more than they look. - Your project runs locally and you can open it at localhost. - The project is a Git repo with at least one commit. - .env is in .gitignore and contains no secrets in any committed file. - The code is pushed to a PRIVATE GitHub repository. #### Resources The Fundamentals pages on Git, GitHub and the dev server are your reference if a step felt shaky. The Claude Code Setup checklist in the resource library captures these guardrails as a reusable list. One step remains: getting this off your machine and onto the public internet. #### Your task Scaffold a small project, run it locally, make at least two commits as you change something, and push it to a private GitHub repo with a correct .gitignore. Confirm on github.com that your .env is nowhere to be found. You now have a real, version-controlled, backed-up project. #### Next lesson Your app runs on your machine. The final lesson of this course puts it on the public internet: deploying to Vercel, connecting a real domain, and handling DNS and Cloudflare so anyone in the world can visit your site. ### 1.7 Ship It: Deploy to Vercel, Connect a Domain, DNS and Cloudflare - Canonical URL: https://agenticschool.dev/courses/foundations/ship-it-deploy-to-vercel-connect-a-domain-dns-and-cloudflare - Duration: 24 min Summary: The finish line. You will deploy your project to Vercel so it is live on the public internet, connect a real custom domain, and understand DNS and Cloudflare well enough to point any domain at any site. When this lesson ends you will have shipped a real, public website built with AI - exactly what Course 1 promised. #### Summary This is the lesson that turns a local project into a real website anyone can visit. You will connect your GitHub repo to Vercel, get a live URL in minutes, point a custom domain at it, and understand DNS and Cloudflare well enough to never be scared of them again. Finish this and you have done the thing the whole course promised: built and shipped a real app with AI. #### What you will learn You will deploy to Vercel from your private GitHub repo, set up automatic deploys on every push, connect a custom domain, understand what DNS records do, and use Cloudflare for DNS, free HTTPS and basic protection. No prior hosting experience needed. #### Prerequisites A project pushed to a private GitHub repo from the previous lesson, and a free Vercel account. The Fundamentals page on DNS will deepen the domain section, but this lesson explains enough to get you live. If you do not own a domain yet, you can ship on Vercel's free URL first and add a domain later. #### The problem Beginners often build something good and then it dies on their laptop because deployment feels like a wall of unfamiliar words: hosting, DNS, nameservers, SSL, CDN. So the project never goes live and never gets feedback. This lesson dissolves that wall by doing it once, in order, with the jargon explained as it appears. #### Step by step: deploy to Vercel Vercel is a hosting platform built for exactly this. The smoothest path is to connect your GitHub repo: Vercel watches it and redeploys automatically every time you push. You barely touch the terminal. - Sign in to Vercel with your GitHub account. - Click "Add New Project" and import your private repository - Vercel can read private repos once you authorise it. - Vercel auto-detects the framework and build settings; for a standard Vite or Next.js app you can accept the defaults. - Click Deploy. In a minute or two you get a live URL like my-first-app.vercel.app that anyone in the world can open. From now on, every git push to your main branch triggers a new deploy automatically. This is the modern workflow: you build locally, push, and your live site updates itself. If you set environment variables (your secrets) in the Vercel dashboard rather than in code, they stay safe and never touch your repo - the same .env discipline from the last lesson, applied in production. #### What DNS actually is DNS, the Domain Name System, is the internet's phone book. It translates a human name like yoursite.com into the address of the server that should answer. When you "connect a domain", you are adding records to that phone book that say "for this name, send visitors to Vercel". The two records you will meet most are an A record (points a domain at an IP address) and a CNAME (points one name at another name). That is genuinely most of what you need to know to ship. - A record: maps a domain or subdomain to a numeric IP address. - CNAME record: maps a name to another name (for example www to your Vercel target). - Nameservers: decide which company holds your DNS phone book in the first place. - Propagation: DNS changes can take minutes to a few hours to spread worldwide - not broken, just slow. #### Step by step: connect a custom domain In Vercel, open your project, go to the domain settings, and add your domain. Vercel then tells you exactly which DNS records to create at your domain provider. You copy those records into your provider's DNS settings, save, and wait for propagation. Vercel verifies the connection and issues a free HTTPS certificate automatically, so your site loads securely with the padlock. - Add your domain in the Vercel project settings. - Copy the A and/or CNAME records Vercel gives you. - Paste them into your DNS provider and save. - Wait for verification, then visit your domain - it now serves your live site over HTTPS. #### Where Cloudflare fits in Cloudflare is a popular place to manage DNS, and it does more than hold records. Point your domain's nameservers at Cloudflare and you get a clean DNS dashboard, free HTTPS, a global content delivery network that makes your site load faster worldwide, and basic protection against attacks and bots. For most builders the pattern is: domain registered somewhere, DNS managed at Cloudflare, site hosted on Vercel, with Cloudflare's records pointing at the Vercel target. You do not need every Cloudflare feature on day one - the free DNS, HTTPS and CDN are reason enough to use it. #### Typical mistakes The frequent ones: assuming the site is broken when DNS simply has not propagated yet (wait, then recheck); editing the wrong DNS records or leaving stale ones that conflict; forgetting to set environment variables in Vercel so the live site is missing its secrets; and proxying through Cloudflare in a way that double-handles HTTPS. When something looks wrong, check propagation and your environment variables before assuming the worst. #### Business ROI Shipping is where value finally appears. A project on your laptop earns nothing; a live site can be shown to customers, indexed by Google, recommended by AI, and iterated on with real feedback. Automatic deploys mean shipping an improvement costs you a single git push, so the loop from idea to live update is minutes, not days. That speed is the entire competitive advantage of building this way. #### Checklist You have completed Course 1 when all of these are true. Take a moment - this is a genuine milestone. - Your project is deployed on Vercel and live at a public URL. - Every git push redeploys the site automatically. - Secrets live in Vercel environment variables, not in the repo. - A custom domain (or the Vercel URL) loads your site over HTTPS. #### Resources Keep the Vercel and Cloudflare dashboards bookmarked, and lean on the DNS fundamentals page whenever records confuse you. The Vercel Deploy checklist in the resource library captures these steps for next time. You have now done the whole loop: idea, build, version control, ship. #### Your task Deploy your project to Vercel and confirm it loads at a public URL. If you own a domain, connect it through DNS and confirm HTTPS works. Then make one small change locally, commit, push, and watch Vercel redeploy automatically. Congratulations: you have built and shipped a real website with AI. #### Next lesson You can now take an idea all the way to a live site. Course 2, Claude Code Mastery, turns the agent from a helpful tool into a reliable power-user workflow: project rules, reusable skills, automated quality gates, MCP and multi-agent systems. ### 2.1 CLAUDE.md and AGENTS.md: Teaching Your Agent the Rules - Canonical URL: https://agenticschool.dev/courses/claude-code-mastery/claude-md-and-agents-md-teaching-your-agent-the-rules - Duration: 24 min Summary: A project rules file is the single highest-leverage upgrade to your agent. CLAUDE.md (for Claude Code) and AGENTS.md (for Codex and other harnesses) tell the agent your conventions, stack, guardrails and tone once, so it stops re-deciding them every task. This lesson shows you how to write one and run it as a continuously learning markdown system where every correction you make gets captured forever. #### Summary CLAUDE.md and AGENTS.md are plain markdown files that live in your repository and the agent reads automatically at the start of every session. They hold your axioms, your stack, your conventions, your security rules and your tone. Writing a good one is the difference between correcting the same five mistakes forever and an agent that simply gets your project. The deeper idea, and the one that compounds, is to treat the file as a living skills library: every gotcha you hit goes into a markdown file once and never costs you again. #### What you will learn You will learn exactly what belongs in a project rules file, how to phrase rules as crisp axioms the agent will actually obey, the difference between a root CLAUDE.md and the AGENTS.md that Codex and other harnesses read, and how to turn the file into a continuous-learning system where every repeated correction becomes a permanent rule. By the end you will have a real file you can paste into your own project today. #### Prerequisites A working Claude Code setup and the prompt-engineering lesson from Course 1, because the axioms you wrote there move directly into this file. You should also have a project under version control from Course 1, since the rules file lives in the repo and is committed alongside your code so the whole project shares the same standards. #### The problem You correct the agent. It uses the wrong package manager, you fix it. It adds an em dash, you fix it. It skips the test, you fix it. Next session, fresh context, same three mistakes. You are paying tokens and attention to re-teach the same lessons over and over because nothing you said persisted. The agent is not stupid, it just has no memory between sessions. A rules file is that memory. Without one, your agent starts every day with amnesia and you are its full-time supervisor. #### What a rules file actually is CLAUDE.md is a markdown file Claude Code loads automatically and treats as standing instructions for the whole session. Put it at the root of your repo for project-wide rules. You can also keep a personal one at ~/.claude/CLAUDE.md for rules that follow you across every project, and Claude Code merges them. AGENTS.md is the equivalent file that Codex and a growing number of other harnesses read, so a cross-tool team often keeps both, or keeps the real rules in one and a short pointer in the other. Anything you would otherwise retype in every prompt belongs here: language, framework, package manager, testing policy, security rules, naming conventions and tone. - Root CLAUDE.md: project rules, committed to the repo, shared by everyone who works in it. - Personal ~/.claude/CLAUDE.md: your own cross-project preferences, not committed. - AGENTS.md: the same role for Codex and other harnesses, so multi-tool setups stay consistent. - It is loaded every turn, so keep it tight - a bloated rules file eats the same context window your task needs. #### Writing rules as axioms An agent follows a crisp absolute far better than a paragraph of nuance. Phrase every rule as "Always...", "Never..." or "Every... must...". Group them by topic so the file stays scannable. Here is a real, compact CLAUDE.md you could drop into a project today. Notice it covers the things you keep correcting, not everything imaginable. Short and obeyed beats long and ignored. ```markdown # Project Rules ## Stack - TypeScript only. No plain JavaScript files. - Package manager is bun. Never use npm or yarn. - Framework is Astro for marketing pages, React for the app. ## Conventions - Use rounded-sm for border-radius on everything. - Never use em dashes. Use a normal "-" instead. - Components are PascalCase, files are kebab-case. ## Quality - Every new feature ships with a test. - Before you say a task is done, run: bun run lint && bun run typecheck && bun run test. ## Security - Secrets live in .env, which is gitignored. Never write a key into committed code. - Default new repos to private. ## Tone - Direct and concise. No filler, no apologising, no "as an AI". ``` A real CLAUDE.md you can adapt to your own project #### The skills-library philosophy This is the part that turns a config file into a competitive advantage. Treat CLAUDE.md as a growing library of hard-won knowledge, not a one-time setup. Every time the agent makes a mistake that will recur, you do not just fix it in the moment - you write it down as a rule. The agent picked the wrong directory for tests? Add a rule. It misunderstood how your auth flow works? Add a note. Over a few weeks the file becomes a precise model of how your specific project actually works, full of the gotchas that no generic model could know. The agent then needs less and less hand-holding because the institutional knowledge lives in the repo, not in your head. - The trigger is repetition: any correction you can imagine making twice becomes a rule the first time. - Capture gotchas, not just style: "the convex dev server must be running before tests" is worth more than a naming rule. - You can tell the agent to do it: "Add a rule to CLAUDE.md so you never make this mistake again" works and is a habit worth building. - The file is shared and committed, so a gotcha you discovered is now solved for every future session and every teammate. #### Keeping it living, not stale A rules file rots if you only ever add to it. Review it after big tasks. Prune rules that no longer apply because the stack changed. Promote a useful one-off instruction you keep pasting into a permanent rule. Split a giant file into linked sub-files if it gets unwieldy, since you can reference other markdown files from CLAUDE.md and keep the root lean. The investment compounds in the most satisfying way: every future task starts from a smarter baseline, and the gap between you and someone running a bare agent widens every week. This is the same continuous-learning loop that runs through the rest of the course - capture the lesson once, benefit forever. #### Typical mistakes The common failures: writing a 500-line rules file that buries the important rules and eats your context window; phrasing rules as gentle suggestions ("it would be nice if...") that the agent treats as optional; setting it up once and never updating it, so it slowly drifts out of sync with the real project; and duplicating rules across CLAUDE.md and AGENTS.md until they contradict each other. Keep it tight, keep it absolute, keep it current, and keep a single source of truth. #### Business ROI A good rules file is the cheapest quality and consistency upgrade you will ever buy. It converts your personal knowledge into an asset that lives in the repo, so a new agent session - or a new hire, or a contractor - is productive and on-standard immediately instead of after a week of corrections. For a founder, this is how you stop being the bottleneck. The institutional knowledge that used to live only in your head, and walked out the door every time a session ended, now compounds in a file your whole business shares. That is leverage you keep forever. #### Checklist You are ready to move on when each of these is true. Do not skip the living-document items - they are where the real value is. - You have a root CLAUDE.md committed to your project with stack, conventions, quality and security rules. - Your rules are phrased as absolutes, not suggestions. - You have captured at least one real gotcha you previously corrected by hand. - You know how to ask the agent to add a new rule when it slips up. #### Resources Keep the official Claude Code memory and CLAUDE.md documentation bookmarked for the exact loading and merge behaviour, since the details evolve. The axioms you saved in the Course 1 prompting lesson are the seed of this file - paste them in as your first rules. The Agent Task Brief template in the resource library pairs well with a rules file: the rules cover the constant standards, the brief covers the per-task specifics. #### Your task Create a CLAUDE.md at the root of a real project. Fill it with your stack, three conventions you keep correcting, your quality gate command, and your secret-handling rule. Then, the next time the agent makes a mistake, instead of just fixing it, tell it to add a rule so it never happens again. Watch the file grow into something only your project could have. #### Next lesson With rules in place, the agent knows your standards. The next lesson packages your actual workflows: skills and slash commands that turn a proven multi-step process into a single reusable trigger, so common work runs the same way every time without you retyping the brief. ### 2.2 Skills and Commands: Reusable Superpowers - Canonical URL: https://agenticschool.dev/courses/claude-code-mastery/skills-and-commands-reusable-superpowers - Duration: 23 min Summary: Once you catch yourself giving the agent the same multi-step instructions twice, you should package them. Slash commands and skills turn a proven workflow into a single reusable trigger, so the agent does complex, consistent work without you retyping the brief. This lesson shows what a SKILL.md file really is, how it differs from a slash command, and how to build a small personal library that matches how you actually work. #### Summary Skills and slash commands are reusable, named workflows. Instead of describing the same task from scratch every time, you invoke one and the agent runs your proven steps with your context already loaded. A slash command is a quick trigger for a prompt you reuse. A skill is a richer, self-contained capability the agent can pull in when it is relevant. Both are how you stop retyping your best workflows and start banking them as permanent assets in the repo. #### What you will learn You will learn the concrete difference between a slash command and a skill, what actually lives inside a SKILL.md file including its frontmatter, how to capture a workflow you keep repeating, and how to build a focused library of two or three skills that fit your own way of working rather than a sprawling collection you forget about. #### Prerequisites A working Claude Code setup and a CLAUDE.md from the previous lesson, because skills and commands build on the conventions defined there - a skill that scaffolds a component should obey the same naming and style rules your project file already states. #### The problem You have a workflow you trust. Maybe it is "scaffold a new page, add it to the router, write a smoke test, run the quality gate". You type the same paragraph briefing it every time, slightly differently, and you get slightly different results. The knowledge is in your head and your fingers, not anywhere reusable. When you are tired you forget a step. When a teammate does it, they do it their own way. The workflow is real and proven, but it is not captured, so it is fragile and inconsistent. #### Slash commands: a reusable prompt A slash command is the simplest form of reuse: a markdown file containing a prompt you invoke by typing a slash and its name. Drop a file in the commands folder and Claude Code exposes it as a command. It is perfect for a single, well-defined action you trigger often - run the quality gate, open a PR with your standard description, summarise what changed since the last commit. The file can take arguments, so one command flexes across cases. Reach for a command when you want speed and a single clear action. ```markdown --- description: Run the full pre-push quality gate and report failures clearly. --- Run these in order and stop at the first failure: 1. bun run lint 2. bun run typecheck 3. bun run test If anything fails, show me the exact error and the file it points to, then propose the smallest fix. Do not push. ``` A slash command at .claude/commands/ship-check.md, invoked as /ship-check #### Skills: a packaged capability A skill is more than a prompt. It is a folder with a SKILL.md file at its root, carrying frontmatter (a name and a description) plus instructions, and optionally bundled scripts, templates or reference files the agent can use. The description matters enormously: the agent reads it to decide when the skill is relevant and pulls the skill in automatically, so a sharp description is what makes a skill discoverable. Use a skill when a workflow is multi-step, has its own context or assets, and benefits from being a self-contained unit you maintain in one place. Here is a real SKILL.md frontmatter and body for a component-scaffolding skill. ```markdown --- name: new-component description: Scaffold a new React component with its test and story. Use when the user asks to create, add or scaffold a component. --- # New Component When creating a component: 1. Create the component in src/components as a PascalCase .tsx file. 2. Use rounded-sm and the existing button/card styles - never invent new tokens. 3. Create a colocated .test.tsx with a render smoke test. 4. Export it from the components barrel file. 5. Run bun run typecheck on just the new files and fix any errors. Follow all rules in the project CLAUDE.md. Never add a new UI dependency. ``` A real SKILL.md: frontmatter (name, description) plus instructions #### Capturing a workflow and choosing the form The signal to package something is simple: the second time you brief the agent on the same sequence, that sequence is a candidate. Write the steps down once. Then choose the form by asking how rich it is. A single clear action with no assets is a slash command. A multi-step process with its own context, templates or bundled scripts is a skill. Do not overthink it - you can start with a command and promote it to a skill later when it grows. The win is capturing the workflow at all, the same continuous-learning habit from the last lesson applied to processes instead of rules. - Repeated twice = candidate for packaging. Do not wait for the tenth time. - Single action, no assets, want speed: slash command. - Multi-step, own context or bundled files, want a self-contained unit: skill. - A great skill description is half the work - it is how the agent knows when to use it. #### A focused personal library Start small and deliberate. Two or three skills and commands for the workflows you do most - your quality gate, scaffolding a component, opening a standard PR - beat a sprawling collection of twenty you half-remember. A focused library is one you actually reach for, and every entry earns its place by saving you a real, repeated briefing. As with the rules file, prune what you stop using and promote what you keep pasting by hand. The goal is not maximum skills, it is maximum leverage per skill. #### Typical mistakes The usual traps: building a skill before you have run the workflow enough to know the right steps, so you bake in a bad process; writing a vague skill description so the agent never figures out when to invoke it; hoarding dozens of skills you forget exist; and duplicating logic that already belongs in CLAUDE.md. Skills are for workflows, the rules file is for standards - keep them in their lanes. #### Business ROI A packaged workflow is a process that no longer depends on the person who invented it. Your best way of scaffolding, testing or shipping becomes a one-word trigger that runs the same way for you, for a teammate, and for a contractor on day one. That is how a small team punches above its weight: the founder encodes the right way to do something once, and everyone runs it consistently forever. Inconsistent processes are a silent tax on quality, and a small skills library removes it for the price of a few markdown files. #### Checklist You are ready to move on when each of these is true. The next lesson automates the checks these workflows often end with. - You can explain when a slash command beats a skill and vice versa. - You have written at least one slash command for a workflow you repeat. - You can describe what lives in a SKILL.md, including why the description matters. - Your library is small and focused, not a pile of unused entries. #### Resources Bookmark the official Claude Code docs on skills and slash commands for the exact folder locations and frontmatter fields, which evolve over time. The resource library has a starter pack of command and skill templates you can copy. Pair every skill with your CLAUDE.md so packaged workflows automatically inherit your project standards. #### Your task Pick the one workflow you brief the agent on most often. Write it as a slash command first - just the prompt in a markdown file in your commands folder. Invoke it twice on real work. If it grows steps or needs its own assets, promote it to a skill with a sharp description. You now have your first reusable superpower banked in the repo. #### Next lesson Skills and commands run when you trigger them. The next lesson goes one level further: hooks and scripts that run automatically at key moments, so your quality gates fire without you ever remembering to invoke them. ### 2.3 Hooks and Scripts: Automating Your Workflow - Canonical URL: https://agenticschool.dev/courses/claude-code-mastery/hooks-and-scripts-automating-your-workflow - Duration: 24 min Summary: Hooks let you run your own scripts automatically at key moments in the agent lifecycle - before a tool runs, after a file is edited, when the agent finishes, before a push. This is how you enforce quality gates like tests, linting and type checks so nothing broken ever leaves your machine, without relying on memory or discipline. This lesson shows the real settings.json that wires it up. #### Summary Hooks run scripts automatically at defined moments, so the right thing happens whether or not anyone remembers to do it. Claude Code has its own lifecycle hooks that fire around the agent's actions, and Git has hooks that fire around commits and pushes. The classic, highest-value use is a quality gate that runs your linter, type checker and tests before code can leave your machine, so a regression physically cannot reach your repo or production. This lesson wires up both with real, runnable config. #### What you will learn You will learn what a hook is, the lifecycle events Claude Code lets you hook into, how to configure a hook in settings.json, how that differs from a Git pre-push hook, and how to assemble a quality gate that protects your project automatically. You will leave with a working settings.json snippet and a working Git hook you can adapt. #### Prerequisites A version-controlled project from Course 1 and at least one check you can run from the command line, such as a lint or test script. The CLAUDE.md from lesson one helps too, since your rules file should already name the quality gate command that your hooks will enforce. The deep dive on what belongs in the gate is Course 5. #### The problem You know you should run the tests before pushing. You usually do. But it is Friday, the change is small, you are sure it is fine, and you skip it. That is the push that breaks production. Relying on human discipline for routine checks fails eventually, for everyone, because attention is finite and the boring check is the first thing to go when you are tired or rushed. The fix is not more discipline. It is removing the human from the loop entirely so the check runs every single time, automatically, with no decision involved. #### What a hook is A hook is a command that runs automatically when a specific event fires. You decide the event and the command, and if the command exits with a failure, the action can be blocked. Claude Code exposes lifecycle hooks around the agent's work: before a tool runs (PreToolUse), after a tool runs (PostToolUse), when the agent finishes responding (Stop), and others. You can use PostToolUse to auto-format a file the moment the agent edits it, or PreToolUse to block a dangerous command. These hooks are deterministic - they are the harness running your script, not the model deciding to. - PreToolUse: runs before a tool call. Use it to validate or block an action before it happens. - PostToolUse: runs after a tool call. Use it to auto-format or lint a file right after an edit. - Stop: runs when the agent finishes responding. Use it to run a final check or notify you. - Hooks are deterministic harness behaviour, so they fire reliably - unlike asking the model nicely to remember. #### Wiring a hook in settings.json Claude Code hooks are configured in settings.json, in your project at .claude/settings.json or in your user config. Each hook matches an event and a tool pattern and runs a shell command. Here is a real PostToolUse hook that formats and lints any TypeScript file the moment the agent edits or writes it, so the codebase is never left in a messy state between turns. ```json { "hooks": { "PostToolUse": [ { "matcher": "Edit|Write", "hooks": [ { "type": "command", "command": "bun run prettier --write \"$CLAUDE_FILE_PATHS\" && bun run eslint --fix \"$CLAUDE_FILE_PATHS\"" } ] } ] } } ``` .claude/settings.json - auto-format and lint after every edit The matcher targets the Edit and Write tools, and the command runs against the files the agent just touched. Now formatting is never a thing you or the agent has to remember, because the harness does it after every change. Check the official hooks docs for the exact environment variables and matcher syntax, since these are refined over time. #### The pre-push quality gate The single highest-value automation is a gate that runs your full check suite before code can leave your machine. The most robust place for this is a Git pre-push hook, because it protects the repo regardless of which agent or human is pushing. A Git hook is just an executable script in the .git/hooks folder, or better, managed by a tool like Husky so it is committed and shared. If any check fails, the script exits non-zero and the push is refused. Broken code cannot reach your repo, full stop. ```bash #!/usr/bin/env bash # .husky/pre-push - blocks the push if any check fails set -e # stop at the first failing command echo "Running quality gate before push..." bun run lint bun run typecheck bun run test echo "All checks passed. Pushing." ``` A pre-push Git hook that runs the full quality gate With set -e, the first failing check aborts the script and the push never happens. This one gate prevents the most common way teams ship regressions: someone skipping the tests on a "small" change. It costs you a minute on each push and saves you the afternoon a broken deploy would have cost. #### Agent hooks versus Git hooks These two systems complement each other, and knowing which to use matters. Claude Code hooks fire around the agent's actions and are great for keeping each edit clean - format on write, block a forbidden command, notify you when a long task finishes. Git hooks fire around version-control events and are the right place for the hard quality gate, because they guard the repo no matter who or what initiates the push. The pattern most serious builders use: agent hooks for tidy-as-you-go, Git hooks for the final gate that nothing gets past. Belt and braces. #### Typical mistakes The frequent ones: putting your only quality gate inside an agent hook, so a manual push from the terminal sails straight past it; writing a hook that is so slow every push becomes painful and you start bypassing it; forgetting set -e so a failing check is ignored and the push proceeds anyway; and not committing your Git hooks (use Husky) so they only protect your machine and not your teammates. A gate only works if it is fast, shared, and genuinely blocking. #### Business ROI Automation beats discipline every time, and the math is brutal in its favour. A pre-push gate that takes a minute prevents the broken deploy that costs an afternoon of firefighting plus the trust hit of a customer-facing bug. Encoding the check as a hook means it runs for everyone, every time, with zero ongoing attention. For a founder, this converts "I hope everyone remembers to test" into "broken code cannot ship" - a guarantee rather than a wish. Good automation makes the right thing the default thing, and the default thing is what actually happens at 6pm on a Friday. #### Checklist You are ready to move on when each of these is true. The gate you build here is the backbone of the quality work in Course 5. - You can name three Claude Code lifecycle events and what each is good for. - You have a PostToolUse hook that formats or lints after edits. - You have a pre-push Git hook that runs lint, typecheck and test and blocks on failure. - Your Git hooks are committed and shared, not just on your machine. #### Resources Bookmark the official Claude Code hooks documentation for the current event names, matchers and environment variables. Husky is the standard tool for committed, shared Git hooks - its docs walk you through setup in a couple of commands. The testing lesson in Course 5 covers exactly what to put inside the gate so it catches real bugs without being slow. #### Your task Add a pre-push Git hook to a real project that runs your lint, typecheck and test commands and blocks the push if any fail. Deliberately break a test, try to push, and watch it get refused. Then add a Claude Code PostToolUse hook that formats files after edits. You now have automation enforcing your standards instead of your memory. #### Next lesson Your agent now follows rules, runs packaged workflows, and enforces gates automatically. The next lesson connects it to the world beyond your files: MCP, the Model Context Protocol, which gives your agent access to databases, design tools, services and data sources through one standard plug. ### 2.4 MCP Explained: Connecting Your Agent to Everything - Canonical URL: https://agenticschool.dev/courses/claude-code-mastery/mcp-explained-connecting-your-agent-to-everything - Duration: 23 min Summary: MCP, the Model Context Protocol, is the standard way to give an agent new capabilities: a connection to a database, a design tool, a browser, a search service. This lesson explains MCP in plain terms, shows a real .mcp.json that connects a server, and gives you the judgement to know when an MCP server is the right move and when a simple CLI command in your rules file is better. #### Summary MCP is a universal connector for agents. Instead of every tool inventing its own bespoke integration, an MCP server exposes its capabilities in one standard way that any compatible agent can use. Connect one and your agent can suddenly query a database, read a design file, drive a browser or search your docs - using those actions like any other tool. The protocol is why agent capabilities have exploded: build a server once and every MCP-aware agent can use it. #### What you will learn You will learn what MCP is and why a shared protocol matters, the difference between a server and a client, how to add a server to Claude Code with a real .mcp.json config, and the most important and most overlooked judgement in this whole area: when an MCP server is genuinely the right tool versus when a plain command-line tool described in your CLAUDE.md does the job with less overhead. #### Prerequisites A working agent setup and the context lessons from Course 1, because every connected server consumes context - its tool definitions sit in the window - and the performance cliff still applies. The CLAUDE.md and hooks lessons help too, since the CLI-versus-MCP decision often comes down to a rule or a script you already know how to write. #### The problem Your agent is brilliant inside your files and blind everywhere else. It cannot see your production database, your design files, the live state of a web page, or the contents of an external service. So you become a human clipboard: you copy data out of a tool, paste it into the agent, copy the agent's output back. That manual relay is slow, error-prone, and caps how much real work you can delegate. Before MCP, the only way to fix it was a custom integration per tool, which almost nobody built. #### The problem MCP solves MCP defines one standard plug. A server speaks MCP to expose a capability - a database, a file store, a browser, an API. A client (your agent harness) speaks MCP to use any server. Because the protocol is shared, the integration is written once per tool and works across every agent that supports MCP, the same way USB meant you stopped needing a different cable for every device. That single decision is why support spread so fast and why there is now an MCP server for almost everything you would want to connect. - Server: exposes a capability over MCP (a database, a design tool, a browser, a search index). - Client: your agent harness, which can use any MCP server you connect. - Standard plug: one integration per tool works across every MCP-aware agent, instead of N bespoke integrations. - A server can offer tools (actions the agent takes) and resources (data the agent reads). #### Adding a server In Claude Code you register MCP servers in a .mcp.json file at your project root (committed, so the team shares the same connections) or in your user config for personal ones. Each entry names a server and how to launch it. Here is a real .mcp.json connecting two common servers: a filesystem server scoped to a folder, and a Playwright server that lets the agent drive a real browser. Connect it once and those actions become available to the agent like any other tool. ```json { "mcpServers": { "filesystem": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-filesystem", "./data"] }, "playwright": { "command": "npx", "args": ["-y", "@playwright/mcp@latest"] } } } ``` .mcp.json - connecting a filesystem server and a browser server Servers that need credentials (a database, a hosted API) take them through environment variables rather than hard-coded keys, so the same .env discipline from Course 1 applies: secrets stay out of committed config. After adding a server, your agent can list and call its tools directly. Check each server's own docs for its exact launch command and required environment variables. #### MCP versus a plain CLI tool This is the judgement that separates people who understand agents from people who collect integrations. An MCP server is not automatically better than a command-line tool. Claude Code can already run any CLI you have installed, and a one-line instruction in your CLAUDE.md - "use the gh CLI for GitHub, the stripe CLI for Stripe" - is often simpler, lighter and more transparent than a dedicated MCP server. CLI tools shine when a mature command-line interface already exists, when you want to see exactly what command ran, and when you want zero extra context overhead. MCP earns its place when there is no good CLI, when you need structured data and resources rather than text output, or when the interaction is genuinely interactive, like driving a browser or querying a live database through a typed interface. Reach for the CLI first; reach for MCP when the CLI genuinely cannot do the job well. - Prefer a CLI when: a good command-line tool already exists, you want full transparency, and you want minimal context cost. - Prefer MCP when: there is no decent CLI, you need structured tools and resources, or the task is interactive (browser, live DB). - Every connected server adds tool definitions to your context window, so each one has a real, ongoing cost. - Connect deliberately. Three servers you use beat fifteen that bloat your context and surface area. #### Typical mistakes The common errors: connecting every MCP server you can find and drowning your context window in tool definitions you never use; reaching for an MCP server when a CLI tool you already have would do the job more simply; hard-coding credentials into .mcp.json instead of using environment variables; and forgetting that an MCP server is third-party code with access to your stuff, so you should vet what you connect just as you would vet a dependency. Connect few, connect deliberately, connect things you trust. #### Business ROI MCP is what turns a code agent into a general-purpose operator for your business. Connect the right servers and the agent stops needing you as a clipboard: it can read your database, check a live page, pull from a service, and act on real data end to end. That is the difference between an agent that drafts code and an agent that runs a workflow. The discipline of connecting deliberately - and choosing a CLI when a CLI is enough - keeps that power cheap and your context sharp, so you get the reach without paying the cliff tax. #### Checklist You are ready to move on when each of these is true. The next lesson scales from one well-equipped agent to several. - You can explain MCP as a standard plug, and name the server and client roles. - You have added at least one MCP server to a real project via .mcp.json. - You can state a clear case where a CLI tool beats an MCP server, and vice versa. - You understand that each connected server costs context and surface area. #### Resources Bookmark the official Model Context Protocol site and the Claude Code MCP documentation for the current configuration format and the registry of available servers. The filesystem, Playwright and database servers are good first connections. Pair MCP with a rule in your CLAUDE.md that tells the agent which CLI to prefer for which job, so it only reaches for a server when the CLI genuinely cannot help. #### Your task Connect one MCP server to a real project through .mcp.json - the filesystem or Playwright server is an easy start. Then deliberately do the opposite: pick a task you might have reached for a server for, and instead add a one-line rule to your CLAUDE.md telling the agent to use an existing CLI tool. Notice which felt lighter. That instinct is the real skill here. #### Next lesson A single well-equipped agent is powerful, but it can only hold so much in context. The next lesson scales to several agents: sub-agents, agent teams, the context-cliff motivation behind them, and how to keep a team from quietly multiplying your costs. ### 2.5 Sub-Agents, Agent Teams and Workflows - Canonical URL: https://agenticschool.dev/courses/claude-code-mastery/sub-agents-agent-teams-and-workflows - Duration: 25 min Summary: One agent can only hold so much in context before quality drops off the cliff. Sub-agents let you delegate a self-contained job to a fresh agent with clean context, which keeps the main agent sharp. This lesson covers the context-cliff motivation, the labyrinth analogy for delegation, when to run sub-agents sequentially versus in parallel, and how to keep a team of agents from quietly multiplying your bill. #### Summary A sub-agent is a fresh agent you spawn for a contained task. It does the work in its own clean context window and returns only the result, so your main agent stays focused and never hits the performance cliff from carrying everything at once. This lesson explains why that matters, gives you the labyrinth analogy that makes good delegation obvious, and covers the practical choices - sequential versus parallel, which model to use, how to scope the job - that decide whether a team helps or just burns money. #### What you will learn You will learn why sub-agents exist and the context-cliff problem they solve, the labyrinth analogy for scoping delegated work, how to write a clean brief for a sub-agent so it returns a tight conclusion rather than a mess, when to run agents sequentially versus in parallel, and how to watch costs so a team of agents does not quietly multiply your bill several times over. #### Prerequisites The tokens, context and performance-cliff lesson from Course 1, and a comfortable single-agent workflow. Multi-agent only pays off once you have genuinely mastered one agent, because everything here is about managing the limits you first met in Course 1 - now across several agents instead of one. #### The problem You give your agent a big task. Halfway through, its context window is stuffed with file contents, command output, dead ends it explored and conversation history. The performance cliff hits: it starts forgetting the original goal, contradicting earlier decisions, and getting fuzzy on details that were crystal clear an hour ago. The work it does in the second half is worse than the first, not because the model degraded, but because its desk is buried. You cannot fix this by being a better prompter. The window is simply full of noise that the task generated along the way. #### Why sub-agents exist A sub-agent solves the buried-desk problem by doing the messy exploration somewhere else. You spawn a fresh agent with an empty context, hand it one self-contained job, and it works through all the file-reading, command-running and dead-ends in its own window. When it is done, it returns only the conclusion to your main agent. All the noise - the twelve files it read, the failed approaches, the raw command output - stays in the sub-agent's context and never touches yours. Your main agent's desk stays clean, so it stays sharp for the whole task. That is the entire point: sub-agents are a context-management tool first, a parallelism tool second. #### The labyrinth analogy Picture your main agent walking through a labyrinth toward a goal, holding the thread that marks the path back. Every side passage it explores in person risks losing its place and forgetting the route. Now imagine it can send a scout down a side passage instead. The scout explores the whole branch - reads everything, hits the dead ends, finds the answer - and comes back to report just one sentence: "that passage leads to the treasury, take the second left." The main agent never left the main path, never lost the thread, and gained exactly the knowledge it needed. That is a sub-agent. You delegate a branch of the work, the sub-agent absorbs all the mess of exploring it, and your main context receives only the clean conclusion. - Good sub-agent jobs are self-contained: "find which file defines the auth middleware and summarise how it works". - The sub-agent does all the noisy work: reading many files, running searches, trying approaches. - It returns a conclusion, not a transcript: a tight summary the main agent can act on. - Your main context only grows by that conclusion, not by everything the sub-agent touched. #### Sequential versus parallel Once you have sub-agents, you can run them two ways, and choosing right matters for both correctness and cost. Run them sequentially when each step depends on the last - explore, then based on what you found, build, then based on the build, test. The order is the logic. Run them in parallel when the jobs are genuinely independent: summarising five separate files, or checking three unrelated subsystems. Parallel is faster but only safe when there is no shared state the agents would trip over. The honest default is sequential: it is easier to reason about, easier to debug, and most real work has dependencies between steps. Reach for parallel only when the independence is obvious. - Sequential: each step needs the previous step's result. Easier to reason about and debug. The safe default. - Parallel: jobs are fully independent with no shared state. Faster, but only correct when truly independent. - When unsure, go sequential - a wrong parallel split causes subtle, hard-to-trace bugs. - Note the CLAUDE.md axioms in this very project insist on sequential sub-agent work for exactly this reason. #### Cost control Every agent costs tokens, and a team multiplies that fast. Three sub-agents plus a main agent is four context windows being filled and billed, and if you spawn them carelessly the bill climbs quietly. Three habits keep it sane. First, only spawn a sub-agent when the focus it buys is worth the spend - a single agent handles plenty of work fine, and not everything needs delegating. Second, match the model to the job: a narrow extraction or summarisation sub-task can run on a small, cheap model while the main agent uses a strong one for the reasoning. Third, scope sub-agent jobs tightly so they finish fast and do not wander, because a vague brief turns a cheap scout into an expensive explorer. The goal is to buy focus where focus is worth it, not to run a standing army of agents. #### Typical mistakes The expensive ones: spawning sub-agents for work a single agent would handle fine, so you pay for coordination you did not need; running jobs in parallel that secretly share state and corrupt each other; giving a sub-agent a vague brief so it returns a sprawling transcript that re-pollutes your main context instead of a clean conclusion; and using a flagship model for every sub-task when a cheap one would do. Delegate deliberately, scope tightly, and pick the model per job. #### Business ROI Sub-agents are how you scale the amount of real work an agentic workflow can do without quality collapsing on the cliff. A team that delegates well can take on tasks far larger than a single context window could ever hold, while keeping the final output sharp, because the main agent never drowns in detail. Done with discipline - delegate only when worth it, cheap models for narrow jobs, tight scopes - you get the throughput of a team for a fraction of what careless multi-agent spending would cost. For a founder, that is the difference between an agent that helps with tasks and an agentic system that runs whole projects. #### Checklist You are ready to move on when each of these is true. The next lesson goes deep on managing context across one or many agents. - You can explain why a sub-agent protects the main context window. - You can use the labyrinth analogy to decide what makes a good delegated job. - You know when to run sub-agents sequentially versus in parallel, and why sequential is the default. - You have three concrete habits for keeping multi-agent cost under control. #### Resources Bookmark the Claude Code documentation on sub-agents for the current way to define and invoke them, including custom sub-agent configurations. The model-selection lesson from Course 1 is your reference for matching a cheap model to a narrow sub-task. Your CLAUDE.md is the right place to state team rules - sequential by default, when to delegate - so the agent applies them automatically. #### Your task Take one task that normally bloats your context - something that needs exploring several files before acting. Run it twice: once with a single agent doing everything, and once where you have the agent spawn a sub-agent to explore and report back a conclusion before it acts. Notice how much cleaner the main session stays the second time. That is the cliff being managed instead of hit. #### Next lesson Managing context across one or many agents is its own craft. The next lesson goes deep on the levers: why auto-compaction quietly hurts, how to re-inject your goals after it, writing handover documents, when to reset a chat entirely, and dialling thinking effort to match the task. ### 2.6 Context Engineering: Compaction, Handovers, Resets and Thinking Effort - Canonical URL: https://agenticschool.dev/courses/claude-code-mastery/context-engineering-compaction-handovers-resets-and-thinking-effort - Duration: 25 min Summary: Long agentic work lives or dies on context management. This lesson covers the practical levers: why auto-compaction quietly hurts and how to re-inject your goal after it, writing a handover document so a fresh session picks up cleanly, knowing when a reset beats pushing on, and dialling thinking effort up or down to match the difficulty of the task. These are the techniques that keep long tasks sharp instead of letting them rot. #### Summary Context engineering is the active management of what the agent is holding right now. Four levers do most of the work. Compaction summarises a long conversation to free up the window, but the automatic version quietly drops things you cared about, so you steer it. A handover passes a clean summary to a fresh session. A reset throws away a confused context and starts over with a tight prompt. And thinking effort trades cost for depth on genuinely hard problems. Master these four and long tasks stay sharp instead of degrading into confusion. #### What you will learn You will learn why auto-compaction is a trap and how to compact on your own terms, how to re-inject your goal so the agent does not drift after a summary, how to write a handover document that lets a fresh session continue without losing the thread, how to recognise when a reset beats one more round of patching, and how to set thinking effort so you pay for deep reasoning only when the task earns it. #### Prerequisites The tokens, context and performance-cliff lesson from Course 1, and real experience running multi-step tasks long enough that you have felt a session get fuzzy. These techniques only matter once sessions get long, so the sub-agent lesson before this is a natural lead-in - sub-agents manage context across agents, this lesson manages it within one. #### The problem You are deep into a long task. The context window fills, and at some point the harness automatically compacts - it summarises the conversation so far to make room. Suddenly the agent feels different. It forgot a decision you made early on, or it is now optimising for the wrong goal, or it confidently contradicts something it agreed to an hour ago. Auto-compaction did its job mechanically: it freed space. But it summarised by its own priorities, not yours, and quietly dropped the constraint that mattered most to you. You did not choose what to keep, so you lost the thread without noticing. #### Why auto-compaction hurts, and steering it Auto-compaction is not malicious, it is just lossy and untimed. It fires when the window is nearly full, which is usually the worst moment, mid-thought, and it compresses everything down by a generic heuristic that does not know your real goal. The result is a subtle quality drop you often only notice after the agent has drifted. The fix is to take control of the moment and the content. Compact deliberately, at a clean breakpoint between sub-tasks rather than mid-step. And right after any compaction, re-inject your goal: restate the objective, the key constraints and the current state in a few lines. That one paragraph re-anchors the agent and undoes most of what compaction blurred. - Compact at a clean breakpoint you choose, not when the window forces it mid-task. - After compaction, re-inject the goal: restate the objective, the non-negotiable constraints, and where you are now. - Watch for drift after a summary - if the agent's answers feel off, a lost constraint is the usual cause. - A CLAUDE.md helps here too: rules in the file survive because they are reloaded, unlike points buried in chat. #### Handover documents Sometimes the cleanest move is not to compact but to hand over to a fresh session with a clean window. The tool for this is a handover document: a short markdown file you have the agent write that captures everything the next session needs. Done well, a new agent reads the handover and is instantly as informed as the old one was, but with an empty, sharp context. This is the single most reliable way to run work that is far too long for one window without the quality decay, and it doubles as a record you can hand to a teammate. ```markdown # Handover: Checkout flow refactor ## Goal Move checkout from the old form to Stripe embedded checkout. Must keep the existing success page and not change the cart. ## Done so far - Added the Stripe SDK and the new checkout component. - Wired the create-session endpoint (see src/api/checkout.ts). ## Current state / next step - The embedded form renders but the redirect after payment 404s. - Next: fix the return_url in checkout.ts to point at /order/success. ## Constraints (do not forget) - TypeScript only, run the quality gate before pushing. - Never log the Stripe secret key. ``` A handover document that lets a fresh session continue cleanly Ask the agent to write this before you end a long session, then paste it into a fresh chat to continue. The new session starts with a clean window and a precise brief, which is almost always sharper than pushing a bloated session one more round. #### When to reset A reset is starting over with a fresh, focused prompt and abandoning the current context. It sounds like throwing away work, but it is often the fastest path forward. When an agent is looping on the same failed fix, contradicting itself, or carrying a context so bloated that every answer is mediocre, more patching rarely helps - you are arguing with a confused window. Reset instead. Take what you have learned, write a tight prompt or a handover, and start clean. The signal is simple: if you have explained the same thing twice and it still is not landing, the context is the problem, not the instruction. A reset clears it. #### Thinking effort Thinking effort is how much deliberate reasoning the model does before answering. More effort means it works through the problem more carefully, which costs more tokens and time but lifts quality on genuinely hard tasks - tricky debugging, architectural decisions, subtle logic. Less effort is right for routine work where deep reasoning is wasted money. The practical rule mirrors model selection from Course 1: dial effort up for the hard 20 percent that actually needs it, and keep it low for the easy 80 percent. You can set a default and raise it per task. Paying for maximum reasoning on a rename or a formatting pass is the same mistake as hiring a surgeon to apply a plaster. - High effort: hard debugging, architecture, subtle logic - the cases where careful reasoning changes the answer. - Low effort: routine edits, formatting, simple refactors - depth is wasted money here. - Match effort to difficulty the same way you match model tier to difficulty. - Effort and a strong model compound: save both for the genuinely hard problems. #### Typical mistakes The recurring ones: letting auto-compaction fire and never re-injecting your goal, so the agent silently drifts; pushing a confused, bloated session for round after round when a reset would have been faster; never writing a handover, so every long task degrades instead of continuing clean; and burning maximum thinking effort on trivial work while wondering why the bill is high. The meta-mistake is treating context as something that just happens to you rather than something you actively engineer. #### Business ROI Context engineering is what makes large, multi-hour agentic work actually reliable instead of a slow slide into confusion. Compacting on your terms, handing over cleanly and resetting at the right moment keep output quality high across tasks far bigger than one window - which means less rework and fewer subtle bugs introduced by a drifting agent. Dialling thinking effort to match difficulty controls cost directly. For a founder running real workflows, these habits are the difference between agentic work that scales and agentic work that quietly degrades the longer it runs. #### Checklist You are ready to move on when each of these is true. The final lesson assembles the pro setup that makes all of this smooth day to day. - You can explain why auto-compaction hurts and what re-injecting the goal does. - You can have the agent write a handover document and continue in a fresh session. - You know the signals that mean reset, not patch. - You match thinking effort to task difficulty instead of always maxing it. #### Resources Bookmark the Claude Code documentation on context management and the compact and clear commands for the current behaviour and flags. Keep a handover template in your resource library so writing one is a two-minute habit. Your CLAUDE.md is your insurance against compaction loss, since rules in the file are reloaded while points buried in chat are not. #### Your task On your next long task, do two things deliberately. When the window gets full, compact at a clean breakpoint yourself and then re-inject your goal in a short paragraph. Later, have the agent write a handover document, open a fresh session, paste it in, and continue. Notice how much sharper the fresh session feels. You have just engineered your context instead of letting it engineer you. #### Next lesson You can manage rules, skills, hooks, MCP, multi-agent work and context. The final lesson of this course assembles the pro setup that ties it together: a custom status line, the handful of CLI flags you will use daily, and running sessions from your phone or the cloud so building is not tied to one desk. ### 2.7 Pro Setup: Status Line, Key Flags, Phone and Cloud Sessions - Canonical URL: https://agenticschool.dev/courses/claude-code-mastery/pro-setup-status-line-key-flags-phone-and-cloud-sessions - Duration: 23 min Summary: This lesson assembles the power-user setup. A custom status line that surfaces your model, context size, branch and cost. The handful of flags that matter every day - resuming sessions with --resume, continuing with -c, and the dangerous-but-useful permissions skip. And how to move sessions between your local machine and the cloud, including controlling a session from your phone through a browser. This is the setup serious builders run. #### Summary A pro setup removes friction so the tool gets out of your way. The right flags let you resume and continue work instantly and, when you genuinely need it, run without the permission prompt. A good status line keeps the numbers that matter - model, context fill, branch, cost - in constant view, so you see the performance cliff or a runaway bill coming. And cloud and phone sessions mean a long task keeps running while you step away, and you can start or check work from anywhere. This lesson wires all three up. #### What you will learn You will learn the everyday flags worth memorising and exactly what each does, how to configure a custom status line that surfaces the information you actually watch, and the options for running sessions remotely - moving work between local and cloud, and driving a session from your phone through a browser. By the end your setup will feel like a cockpit instead of a black box. #### Prerequisites A confident single-agent workflow and the installation lesson from Course 1, since this builds on a working Claude Code setup. The context lesson just before this one pairs especially well, because a status line is largely a tool for watching the context fill that you learned to manage there. #### The problem Out of the box you are flying blind and retyping yourself. You cannot see how full your context is, so the cliff surprises you. You cannot see the running cost, so the bill surprises you. You close your terminal and lose the thread of a session you wanted to continue. You confirm the same safe command for the hundredth time. And the moment you leave your desk, all work stops. None of this is hard to fix, and fixing it is the difference between fighting the tool and flowing with it. #### The key flags A few flags carry most of the daily value. Learn these four and the friction drops away. --resume lets you pick a past session from a list and continue exactly where it left off. -c (continue) jumps straight back into your most recent session in this directory, which is the one you reach for constantly. -p runs a prompt non-interactively and prints the result, perfect for scripting the agent into other tools. And --dangerously-skip-permissions bypasses the per-action confirmation prompts. That last one is named to scare you, correctly: it is genuinely useful for a trusted, sandboxed, repeatable workflow, and genuinely dangerous anywhere the agent could run something destructive on data you care about. ```bash # Continue your most recent session in this folder claude -c # Pick a past session from a list and resume it claude --resume # Run one prompt non-interactively (great for scripts) claude -p "Summarise what changed since the last commit" # Skip permission prompts - only in a trusted, sandboxed setup claude --dangerously-skip-permissions ``` The four flags you will reach for most Treat --dangerously-skip-permissions as a power tool with a guard removed. Use it for a contained, repeatable task you trust completely, ideally in a sandbox or a throwaway environment. Never use it on a session that can touch production, secrets or data you cannot afford to lose. The permission prompt is friction by design, and most of the time you want it. #### A custom status line A status line is a strip of live information Claude Code shows you continuously. You configure it with a small script that the harness calls and feeds session data as JSON, and you print whatever you want back. The point is to keep the numbers you actually watch in view at all times: which model you are on, how full the context window is (your early-warning for the cliff), your current Git branch, and the running cost. Once you can see context fill and cost without asking, you start managing both instinctively - you compact before the cliff and you notice a runaway bill in seconds instead of at the end of the month. ```json // .claude/settings.json - point at your status line script { "statusLine": { "type": "command", "command": "~/.claude/statusline.sh" } } ``` Registering a custom status line in settings.json ```bash #!/usr/bin/env bash # ~/.claude/statusline.sh - reads session JSON on stdin, prints a status line input=$(cat) model=$(echo "$input" | jq -r ".model.display_name") dir=$(echo "$input" | jq -r ".workspace.current_dir" | xargs basename) branch=$(git -C "$(echo "$input" | jq -r ".workspace.current_dir")" branch --show-current 2>/dev/null) echo "[$model] $dir @ ${branch:-no-git}" ``` A minimal status line script showing model, folder and branch This is a starting point - extend the script to show context size and cost from the session JSON once you are comfortable. Check the official status line docs for the exact JSON fields available, since they grow over time. The habit it builds, glancing at your context and cost the way a driver glances at the dashboard, is worth more than any single field. #### Phone and cloud sessions The last upgrade unties building from your desk. Cloud sessions run the agent on a remote machine rather than your laptop, which means a long task keeps going after you close the lid, and you can pick it up later from anywhere. Because a cloud session lives on a server with its own web interface, you can open it in a browser - including the browser on your phone - and read what the agent is doing, answer a question it has, or kick off a new task while you are away from your computer. The pattern that emerges is fluid: start a heavy task on your machine or in the cloud, step out, check on it from your phone, and the work never stalled because you walked away. Building becomes an always-on activity instead of something chained to one screen. - Cloud session: the agent runs on a remote machine, so long tasks continue after you close your laptop. - Browser access: open the session in any browser, including on your phone, to monitor or steer it. - From your phone you can read progress, answer the agent's questions, and start new tasks without your computer. - The flow: kick off work, step away, check and nudge from your phone, come back to a finished job. #### Typical mistakes The ones that bite: using --dangerously-skip-permissions on a session that can reach production or secrets, which is exactly how an agent does real damage; never setting up a status line and then being blindsided by the cliff or a surprise bill; forgetting -c exists and laboriously re-explaining context you could have resumed in one keystroke; and treating cloud and phone access as gimmicks rather than the thing that keeps long tasks moving while you live your life. Set the dashboard up, respect the dangerous flag, and use the remote options for real. #### Business ROI A pro setup compounds every other skill in this course by removing the friction that quietly taxes all of them. A status line that surfaces context and cost prevents the two most common ways agentic work goes wrong - the quality cliff and the runaway bill - before they cost you anything. The resume and continue flags reclaim minutes every single session, which add up fast. And cloud and phone access mean expensive long-running work happens in the background of your day instead of blocking your screen, so you get more done in the same hours. For a founder, this is the setup that turns the agent from a tool you sit in front of into a workforce you supervise from anywhere. #### Checklist You have completed Course 2 when each of these is true. Take a moment - you are now genuinely running a power-user setup. - You know what --resume, -c, -p and --dangerously-skip-permissions each do, and when the dangerous one is and is not appropriate. - You have a custom status line that at minimum shows your model and branch, with context and cost as the next step. - You can run a session in the cloud and check on it from a browser. - You understand how to start, monitor and steer work from your phone. #### Resources Bookmark the official Claude Code docs on CLI flags, the status line and cloud sessions for the current commands and JSON fields, which evolve. The jq tool used in the status line script is worth installing if you do not have it. Your settings.json is now the home for your status line, your hooks from lesson three and your permissions - keep it tidy and committed where it should be shared. #### Your task Do three things. Set up a basic custom status line that shows your model and branch. Use claude -c to resume your last session and notice the friction it removes. Then start a session in the cloud, open it in your phone's browser, and send it one instruction from there. You now run the setup serious builders use - and you have finished Course 2. #### Next lesson You are now a Claude Code power user: rules, skills, hooks, MCP, multi-agent work, context engineering and a pro setup. Course 3 moves to the modern app stack - architecture, authentication with Clerk, a reactive database with Convex, and payments with Stripe - to turn your projects into real products people pay for. ### 3.1 Architecture 101: Frameworks, Monorepos and How Modern Apps Fit Together - Canonical URL: https://agenticschool.dev/courses/modern-app-stack/architecture-101-frameworks-monorepos-and-how-modern-apps-fit-together - Duration: 24 min Summary: Before you bolt on auth, data and payments, you need a mental map of how a modern app fits together. This lesson explains frameworks like Next.js, Astro, TanStack Start and Vue, the split between frontend, backend and database, why teams like Clerk and Convex separate their fast marketing site from the app itself, and when a monorepo helps instead of getting in the way. #### Summary You finished Course 1 by shipping a real site, and Course 2 by turning your agent into a power tool. Now you graduate from a single static site to a real application with users, data and payments. Before any of that, you need the map. This lesson is that map: what a framework actually does for you, the three layers every app is built from, why serious teams split a fast marketing site from the heavier app, and when a monorepo is a gift versus a tax. Get this clear and every later decision in this course stops feeling arbitrary. #### What you will learn You will learn the difference between frontend, backend and database in plain language, how the main 2026 frameworks compare and when each fits, why companies like Clerk and Convex run a separate marketing site from their app, and the honest rule for when a monorepo pays for itself. By the end you can sketch the architecture of any SaaS on a napkin and explain why it is shaped that way. #### Prerequisites Course 1, especially the project setup and deploy lessons, since architecture decisions build on a working build-and-ship loop. You should be comfortable running terminal commands and have deployed at least one site. If the words "framework" or "TypeScript" are still fuzzy, skim the Fundamentals pages on what a framework is and what TypeScript is first, then come back. #### The problem Most beginners glue an app together by copying whatever a tutorial did, with no model of how the pieces relate. Then they hit a wall: the marketing page is slow because it is bundled with the whole app, auth and data are tangled into the UI, and nobody can say where a given responsibility lives. The result is an app that is hard to change and impossible to reason about. None of that is a coding skill problem. It is a missing mental model. This lesson installs the model so your agent builds on solid ground instead of a pile of copied snippets. #### The three layers: frontend, backend, database Every web app, no matter how fancy, is three layers handing work to each other. The frontend is what runs in the browser: the HTML, CSS and JavaScript that draw the screen and react to clicks. The backend is the code that runs on a server, away from the user, where you put logic and secrets you do not want the browser to see. The database is where data lives permanently so it survives a page refresh and is shared between users. A request flows one way and back: the browser asks the backend for something, the backend reads or writes the database, and sends an answer the browser renders. Most confusion about "where does this code go" dissolves the moment you can name which of these three layers a piece of work belongs to. - Frontend: runs in the browser, draws the UI, never holds secrets, anyone can read it. - Backend: runs on a server, holds business logic and API keys, talks to the database. - Database: stores data permanently and shares it across users and sessions. - Golden rule: a secret (API key, password) belongs in the backend, never in frontend code, because the browser ships its code to every visitor. #### What a framework gives you A framework gives you structure, routing and conventions so you are not assembling an app from raw files. It decides how URLs map to pages, how pages fetch data, how the app is bundled and served, and a hundred small things you would otherwise reinvent. The 2026 options each lean a different way, and the right pick depends on how much of your product is content versus app. - Astro: built for fast, content-heavy sites - blogs, docs, marketing pages. Ships almost no JavaScript by default, so pages load fast and rank well. Reach for it when most of your product is content. - Next.js: the heavyweight React framework for rich, interactive apps. Huge ecosystem, server rendering, and the default many teams choose for a full product app. - TanStack Start: a newer full-stack React framework built on TanStack Router, with excellent type safety and a lighter, more transparent feel. This very platform runs on TanStack Start. - Vue (with Nuxt): the main alternative to React. Same job, different syntax and philosophy. Pick it if you or your team already think in Vue. You do not need to master all four. Pick one app framework and one content framework and stick with them. A common, sane combination is Astro for the marketing site and a React framework (Next.js or TanStack Start) for the app. The exact names matter less than understanding the axis: content-fast versus app-rich. #### The marketing-site versus app split Here is the pattern that confuses beginners until someone points it out: serious products run their public marketing site and their logged-in app as two separate things, often on two subdomains. Look at Clerk and Convex themselves - the homepage at the bare domain is a fast marketing site, and the dashboard lives at a separate address. They are not the same codebase serving both. The reason is that the two surfaces have opposite needs. Public marketing pages must be lightning fast and fully indexable by Google and AI crawlers, because that is how you get found. The logged-in app can be heavier and richly interactive, because by then the user has already arrived and signed in, and SEO no longer matters. Bundling them together forces a bad compromise: either your marketing pages drag in the whole app and load slowly, or your app is hamstrung by marketing-page constraints. Splitting them lets each be optimised for its real job. - Marketing site: bare domain (yoursite.com), fast, SEO and GEO optimised, often Astro. Its job is to get found and convert visitors. - App: a subdomain (app.yoursite.com), heavier, interactive, behind auth. Its job is to deliver the product. SEO does not matter here. - Auth and dashboards (Clerk) and your database (Convex) live on the app side, never bundled into the marketing pages. - This split is why your homepage can score 100 on Lighthouse while your app is a rich, stateful React experience. #### What a monorepo is, and when it helps A monorepo is a single Git repository that holds several related projects - say your marketing site, your app and a shared package of components - instead of one repo per project. The appeal is real: you share code (types, UI components, utilities) across surfaces without publishing packages, and one pull request can change everything that needs to change together. The cost is real too: monorepos add tooling overhead (workspace managers, more complex builds, slower CI if you are careless) that a solo beginner does not need on day one. The honest rule: reach for a monorepo when you genuinely share meaningful code across two or more surfaces and the duplication is hurting you. Until then, separate repos are simpler and perfectly fine. Do not adopt a monorepo because a big company does - they have hundreds of engineers and you have an agent and a laptop. - Monorepo: one repo, many projects, shared code, coordinated changes. - Worth it when: marketing site and app share types or components, and you change them together often. - Skip it when: you have a single app, or two surfaces that barely share code. The overhead is not free. - You can always start with separate repos and merge into a monorepo later when the pain is real. #### Typical mistakes The recurring errors at this stage: bundling the marketing site and app into one project, then wondering why the homepage is slow and SEO suffers; putting a secret API key in frontend code where every visitor can read it; adopting a monorepo on day one because it looks professional, and drowning in build configuration; and choosing a framework by hype rather than by whether your product is content-heavy or app-heavy. Every one of these comes from skipping the mental model. Name the layer, name the surface, and the right structure follows. #### Business ROI Architecture decided well at the start is nearly free; decided badly, it taxes every future change. A clean marketing-versus-app split means your homepage stays fast and gets found, which is the top of your entire funnel - slow marketing pages quietly cost you customers and search ranking forever. Knowing which layer owns which responsibility means you (and your agent) ship features without creating tangles that later need expensive rewrites. The founders who move fastest are not the ones who picked the trendiest framework; they are the ones whose architecture lets them change one thing without breaking three others. #### Checklist Before you move on, make sure you can answer these without looking back. This map sits under every other lesson in the course. - Can you explain frontend, backend and database, and which one holds secrets? - Can you say when you would pick Astro versus Next.js or TanStack Start? - Can you explain why teams split the marketing site from the app? - Can you state the honest rule for when a monorepo is worth it? #### Resources Keep the official docs for your chosen frameworks bookmarked - Astro, Next.js, TanStack Start and Nuxt all have excellent getting-started guides. The Fundamentals pages on what a framework is and what TypeScript is back up this lesson if a concept stayed fuzzy. The next three lessons fill in the three layers one by one: auth, then data, then the secrets that connect them. #### Your task Sketch your own product (or a product you admire) as a diagram with three boxes - frontend, backend, database - and an arrow showing a request flowing through them. Then mark which parts are the marketing site and which are the app. Five minutes with a pen makes the rest of this course concrete, because you now have a real shape to attach each new piece to. #### Next lesson With the architecture clear, the next lesson adds the first real building block: authentication with Clerk, including what OAuth actually is, why you must never build auth yourself, and a full Google OAuth walkthrough from a development instance to production. ### 3.2 Clerk: Authentication and OAuth from Dev to Production - Canonical URL: https://agenticschool.dev/courses/modern-app-stack/clerk-authentication-and-oauth-from-dev-to-production - Duration: 30 min Summary: Authentication is one of the easiest things to get dangerously wrong, so you never build it yourself - you use a dedicated service. This lesson explains what OAuth actually is, why hand-rolling auth is a trap, how to add Clerk, and how to move cleanly from a development instance to a production instance, including the full Google OAuth walkthrough: consent screen, credentials and the custom-domain DNS records you add at the end. #### Summary Your app now needs users: people who sign up, log in, and have their own data. That is authentication, and it is a place where a small mistake leaks every customer password you hold. The right move in 2026 is unambiguous: do not build it yourself, use a service. Clerk is that service for most of the builders this course is for. This lesson explains the one concept you must understand (OAuth), gets Clerk into your app, and then takes you through the part everyone finds fiddly - going from development keys to a real production instance with Google sign-in on your own domain. #### What you will learn You will learn what OAuth is in plain language, why hand-rolling auth is a trap that even large companies fall into, how Clerk fits your app, the crucial difference between a development instance and a production instance, and the exact steps to enable Google OAuth in production, including the Google consent screen, credentials, and the DNS records that make it run on your domain. #### Prerequisites A working app from Course 1 and the architecture map from the previous lesson, since auth lives on the app side, not the marketing site. You also need the secrets discipline from Course 1: auth keys are highly sensitive and must never be committed. The next lesson on secrets goes deeper, but the basic rule (keys in .env, .env in .gitignore) must already be second nature. #### The problem Building auth yourself sounds like a rite of passage. It is actually a minefield. You have to hash and salt passwords correctly, manage sessions and tokens, handle password resets, defend against credential stuffing and brute force, store everything securely, and keep up with new attack techniques forever. Get one detail wrong and you leak customer data, which is a legal and reputational disaster. Meanwhile your competitors using Clerk shipped login in an afternoon and moved on to building their actual product. There is no prize for the hand-rolled version. This is a solved problem, and your job is to use the solution, not reinvent it. #### What OAuth actually is OAuth is the protocol behind every "Sign in with Google" or "Sign in with GitHub" button. The core idea is delegation without sharing your password. When a user clicks "Sign in with Google", your app never sees their Google password. Instead, your app hands them off to Google, Google confirms who they are and asks them to approve sharing their email and name with your app, and Google sends back a token that proves "yes, this is really them". Your app trusts Google's word. That is the whole concept: a trusted third party vouches for the user so you never have to store or verify their password. It is more secure (you hold no passwords), more convenient (one click, no new account), and it is what users expect. Clerk handles the entire OAuth dance for you - you just enable the providers you want. - OAuth = let a trusted provider (Google, GitHub, Apple) vouch for the user so you never touch their password. - Your app gets a token proving the identity, plus basic profile info the user approved sharing. - More secure for you (no passwords stored), more convenient for them (one click). - Clerk runs the OAuth flow; you enable a provider and add its credentials. #### Adding Clerk to your app Clerk gives you drop-in components for sign-up, sign-in, user profile and session management, plus the backend pieces to protect your routes. You install its package, add your keys to your environment, wrap your app in its provider, and drop in the prebuilt components. The keys come in two flavours: a publishable key (safe in frontend code, it only identifies your app) and a secret key (backend only). The exact import paths depend on your framework, so follow Clerk's docs for the precise lines. ```bash # .env.local - dev keys, never committed (.env* is gitignored) VITE_CLERK_PUBLISHABLE_KEY=pk_test_xxxxxxxxxxxxxxxxxxxx CLERK_SECRET_KEY=sk_test_xxxxxxxxxxxxxxxxxxxx ``` Clerk keys live in your env file. Note the pk_test / sk_test prefixes - those mean development. Notice the test prefixes. A development instance issues keys that start with pk_test and sk_test; a production instance issues pk_live and sk_live. That prefix is your at-a-glance check for which environment a key belongs to, and it is the single most common thing people mix up. Once your keys are set, you wrap your app in Clerk's provider and use its components. ```tsx // A protected page: only signed-in users see the content. import { SignedIn, SignedOut, SignInButton, UserButton } from '@clerk/clerk-react' export function AppShell() { return (
) } ``` Clerk components handle the entire signed-in / signed-out UI for you. #### Development instance versus production instance Clerk separates your project into two completely independent instances. The development instance is for building: it uses test keys, runs on a Clerk-provided shared domain, and ships with relaxed settings so you can iterate fast. It is not meant for real users. The production instance is the real thing: live keys, your own domain, stricter security, and the place real customers sign in. The two do not share users or settings - they are separate worlds. The mistake beginners make is treating the development instance as good enough and pointing real users at it, or shipping with pk_test keys still in their production environment. The clean flow is: build and test everything on the development instance, then create the production instance, configure it (including a custom domain and OAuth), and swap your deployed environment variables over to the pk_live and sk_live keys. - Development instance: pk_test / sk_test keys, shared Clerk domain, relaxed settings, for building only. - Production instance: pk_live / sk_live keys, your own domain, strict settings, for real users. - They are separate worlds - users and config do not carry over automatically. - Going live = create the production instance, configure it, swap the deployed env vars to the live keys. #### Google OAuth production walkthrough On the development instance, Clerk lets you enable Google sign-in with shared dev credentials so you can test instantly. For production, Google requires that you use your own Google credentials and a verified consent screen, so your users see your app name (not Clerk's) when they sign in. This is the part that trips people up, so here is the order that works. The exact button labels in the Google Cloud Console shift over time, so trust the flow and follow Clerk's and Google's current docs for the precise clicks. - In the Google Cloud Console, create (or pick) a project for your app. - Configure the OAuth consent screen: set the app name, support email, and your domain. Add the scopes for email and profile. Publish it so it is not stuck in test mode. - Create OAuth credentials of type "OAuth client ID" for a web application. Google gives you a Client ID and a Client Secret. - Add the authorized redirect URI that Clerk shows you in its Google provider settings - this is where Google sends the user back after they approve. - Paste the Google Client ID and Client Secret into Clerk's Google provider settings on your production instance, and enable it. - Test: sign in with Google on your production domain. Users should see your app name on the Google consent screen, not a generic one. A subtle gotcha: the redirect URI must match exactly, including https and no trailing slash differences. If sign-in fails with a redirect_uri_mismatch error, that exact-match check is almost always the cause. Copy the URI from Clerk verbatim. #### Custom domain and DNS records A production Clerk instance runs on your own domain so that auth happens at, for example, accounts.yoursite.com or clerk.yoursite.com instead of a generic Clerk URL. To make that work you add DNS records that Clerk gives you to your domain provider (or Cloudflare, from Course 1). These are usually CNAME records that point a subdomain at Clerk's servers so Clerk can serve and secure that subdomain. You add them, wait for DNS to propagate, and Clerk verifies them and issues the certificates. Here is the shape of what you add - your real values come from the Clerk dashboard. ```text ; DNS records you add at your provider (example shape - copy real values from Clerk) ; Type Name (host) Value (target) CNAME clerk frontend-api.clerk.services CNAME accounts accounts.clerk.services CNAME clkmail mail.xxxxx.clerk.services CNAME clk._domainkey dkim1.xxxxx.clerk.services CNAME clk2._domainkey dkim2.xxxxx.clerk.services ``` Production Clerk uses CNAME records on subdomains. The mail/domainkey ones let Clerk send verification emails from your domain. If you manage DNS through Cloudflare, add these as DNS-only (grey cloud, not proxied) unless Clerk says otherwise - proxying auth subdomains can break the TLS handshake. After you save the records, verification can take anywhere from a few minutes to a couple of hours. It is not broken, DNS is just slow, exactly as you learned when connecting a domain in Course 1. #### Typical mistakes The classics: shipping to production with pk_test keys still set, so real users hit your development instance; forgetting to publish the Google consent screen, leaving it in test mode where only whitelisted accounts can sign in; a redirect_uri_mismatch from a redirect URI that is off by a slash or http versus https; proxying the Clerk DNS records through Cloudflare and breaking certificates; and the deepest one of all, trying to build auth from scratch "to learn" and shipping a security hole. Use the service, check your key prefixes, copy redirect URIs verbatim. #### Business ROI Auth is pure downside risk if you build it and pure leverage if you buy it. A leaked password table is a legal nightmare under GDPR and the US privacy laws, plus a trust catastrophe that can end a young product. Clerk removes that entire risk category for a modest monthly cost, and gives you social login, which measurably lifts sign-up conversion because users hate creating yet another password. You ship login in an afternoon instead of a fortnight, you sleep at night, and your sign-up funnel converts better. There is no version of the maths where hand-rolling auth wins for a founder building with agents. #### Checklist You are ready to move on when all of these are true on a real app, not just in theory. - You can explain OAuth in one sentence: a trusted provider vouches for the user so you never hold their password. - Clerk is installed, keys are in .env (gitignored), and signed-in / signed-out UI works. - You know the difference between pk_test and pk_live and which environment each belongs to. - Google OAuth works on your production domain with your own consent screen and the right DNS records. #### Resources Keep the Clerk docs and the Google Cloud Console OAuth guide open while you do the production setup - both change their exact UI over time, and the docs are always current. The Fundamentals pages on what OAuth is and what DNS is back up the two trickiest concepts here. Next, you give your logged-in users something to do: real data with Convex. #### Your task Add Clerk to a test app on the development instance and get sign-in working, then enable Google sign-in. If you own a domain, go one step further: create a production instance, set up the Google consent screen and credentials, add the DNS records, and confirm a real Google sign-in works on your domain with your app name on the consent screen. Doing the production path once removes the fear for every app you build after this. #### Next lesson Users can log in. The next lesson gives them data with Convex, a reactive, type-safe database where your backend logic and your UI stay in sync automatically, and where you will also learn the soft-delete pattern that lets you recover a user mistake instead of losing data forever. ### 3.3 Convex: Your Reactive Database - Canonical URL: https://agenticschool.dev/courses/modern-app-stack/convex-your-reactive-database - Duration: 30 min Summary: Convex is a reactive backend: you define a schema, write queries and mutations as TypeScript functions, and your UI updates live whenever data changes. This lesson explains what a database really is versus an Excel sheet, why reactivity matters, the schema-queries-mutations-actions model, the end-to-end type safety that kills whole classes of bugs, when to add an index, and the soft-delete pattern that lets you recover deleted data instead of losing it forever. #### Summary Your users can log in. Now they need data that persists: their profile, their projects, their saved work. That is the database layer, and Convex is a particularly good fit for the builders this course is for, because it does something most databases do not - it is reactive. When data changes, every part of your UI that depends on that data updates by itself, with no manual refresh code. On top of that, your types flow end to end, from the database into your React components, so a whole category of bugs simply cannot happen. This lesson teaches the model: what a database is, the four kinds of function you write, indexes, and the delete patterns that separate a toy from a real product. #### What you will learn You will learn what a database actually is and why it beats a spreadsheet for an app, what reactivity buys you, how to define a schema and write queries, mutations and actions, how end-to-end type safety prevents bugs before they happen, when and why to add an index, and the difference between hard and soft deletes with a recovery pattern you can copy. #### Prerequisites A working app with Clerk auth from the previous lesson, since you will want data tied to logged-in users. The architecture lesson, so you remember the database is the third layer. And the Fundamentals page on what a database is if the idea of a schema or a table is brand new. Comfort with basic TypeScript helps, because in Convex your database logic is just TypeScript functions. #### The problem Beginners reach for whatever they know: a spreadsheet, a CSV file, or a pile of JSON saved to disk. That works until two users touch the same data, or you need to find one record among ten thousand quickly, or you change the shape of your data and everything silently breaks. Then they bolt a traditional database onto the side and spend days writing glue code: fetch on load, refetch after every change, handle loading states, keep the UI in sync, and chase stale-data bugs where the screen shows old numbers. Most of that glue is pure waste. A reactive, typed database deletes the glue and the bug class with it. #### A database is not a spreadsheet An Excel sheet or a CSV is fine for a list you read by eye. A database is built for a different job: many users reading and writing at once, finding specific records fast even among millions, enforcing the shape of your data so a bad row cannot sneak in, and never losing or corrupting data when two things happen at the same time. A spreadsheet has no idea who else is editing it and no way to guarantee that a "price" column is always a number. A database does both. The mental upgrade is this: a spreadsheet is a document you look at; a database is a service your app talks to, that guarantees the data stays correct and consistent no matter how many users or how much load. The moment your data is shared between users or needs to be queried quickly, you have outgrown the spreadsheet. - Spreadsheet/CSV: one editor at a time, no guarantees on data shape, slow to search at scale, easy to corrupt. - Database: many concurrent users, enforced data shape (the schema), fast lookups via indexes, safe under load. - Convex adds reactivity on top: changes push to every connected client automatically. #### Why reactivity matters In a normal setup you write a lot of code to keep the screen in sync with the database: fetch when the page loads, refetch after you change something, poll for updates from other users, manage loading and error states by hand. Every one of those is a chance for a bug where the UI shows stale data. Convex flips this. A query is a live subscription: you write it once, and whenever any data it reads changes - whether you changed it or another user did - Convex pushes the new result to your component and it re-renders. You delete the refetch logic, the polling, and the entire stale-data bug class. For a collaborative or real-time product (a dashboard, a chat, a shared list) this is transformative, and even for a simple app it removes a pile of tedious, error-prone plumbing. #### Schema, queries, mutations and actions Convex gives you four building blocks. The schema declares the shape of your data - your tables and their fields and types - so the database can enforce it. Queries read data and are reactive. Mutations change data (insert, update, delete) and run in a transaction so they either fully succeed or fully fail. Actions are for talking to the outside world (calling an external API, sending an email) where you need to do something that is not a pure database read or write. Here is a real schema plus a query and a mutation, the everyday motions you will write constantly. ```typescript // convex/schema.ts - the shape of your data, enforced by the database import { defineSchema, defineTable } from 'convex/server' import { v } from 'convex/values' export default defineSchema({ projects: defineTable({ ownerId: v.string(), // the Clerk user id who owns this project name: v.string(), archivedAt: v.optional(v.number()), // null/absent = active; a timestamp = soft-deleted }) // index so 'find this user's projects' is a fast lookup, not a full scan .index('by_owner', ['ownerId']), }) ``` A schema with a typed table and an index. archivedAt is the field that powers soft delete. ```typescript // convex/projects.ts - a query (read, reactive) and a mutation (write, transactional) import { query, mutation } from './_generated/server' import { v } from 'convex/values' // Reactive: any client subscribed to this re-renders when the data changes. export const listActive = query({ args: { ownerId: v.string() }, handler: async (ctx, { ownerId }) => { const rows = await ctx.db .query('projects') .withIndex('by_owner', (q) => q.eq('ownerId', ownerId)) .collect() // Soft-deleted rows have archivedAt set - hide them from the active list. return rows.filter((row) => row.archivedAt === undefined) }, }) export const create = mutation({ args: { ownerId: v.string(), name: v.string() }, handler: async (ctx, { ownerId, name }) => { return await ctx.db.insert('projects', { ownerId, name }) }, }) ``` Queries read and are live; mutations write inside a transaction. Both are plain TypeScript. #### End-to-end type safety This is the quiet superpower. Because your schema is TypeScript and Convex generates types from it, the type of your data flows all the way from the database into your React components. If you rename a field in the schema, every query, mutation and component that used the old name turns red in your editor before you ever run the app. You cannot accidentally read a field that does not exist, pass the wrong argument to a mutation, or assume a string is a number. Whole classes of "it crashed in production because the data was not the shape I assumed" bugs become impossible because your editor and the typechecker catch them while you type. For someone building with an agent, this is huge: the agent gets the same type signals and writes correct code far more often, and your typecheck quality gate catches the rest before it ships. #### Indexes: making reads fast When you ask the database "give me all projects owned by this user", it has two ways to answer. Without an index, it scans every row and checks each one - fine for ten rows, painfully slow for a million. With an index on ownerId, it jumps straight to the matching rows, like using the index at the back of a book instead of reading every page. The rule is simple: any field you regularly filter or sort by deserves an index. You saw it in the schema above - the by_owner index is what makes listActive fast no matter how many total projects exist. Adding indexes early costs almost nothing; discovering you needed them when your app slows to a crawl under real data costs a stressful debugging session. - No index: the database checks every row (a full scan). Fine for tiny tables, slow at scale. - With an index: it jumps straight to matching rows. Fast even with millions of rows. - Index the fields you filter or sort by often - ownerId, status, createdAt are common. - Define the index in the schema; use it in queries with .withIndex(...). #### Soft delete versus hard delete, with recovery A hard delete removes a row permanently - it is gone, and so is any chance of getting it back. A soft delete keeps the row but marks it as deleted, usually with a timestamp like the archivedAt field in the schema above. Real products almost always prefer soft deletes for user-facing data, because users delete things by accident all the time, and "sorry, that is gone forever" is a terrible experience and sometimes a compliance problem. With a soft delete you can offer a trash or an undo, recover the data, and keep history and audit trails intact. You only hard-delete later, on a schedule, for data that has been soft-deleted long enough and that you are legally allowed to purge. Here is the pattern: deleting sets the timestamp, restoring clears it, and a background job can hard-delete truly old rows. ```typescript // Soft delete: mark it, don't destroy it. Restoring is then trivial. export const softDelete = mutation({ args: { id: v.id('projects') }, handler: async (ctx, { id }) => { await ctx.db.patch(id, { archivedAt: Date.now() }) }, }) export const restore = mutation({ args: { id: v.id('projects') }, handler: async (ctx, { id }) => { await ctx.db.patch(id, { archivedAt: undefined }) }, }) // Hard delete: only for data soft-deleted long ago, usually run by a scheduled job. export const purge = mutation({ args: { id: v.id('projects') }, handler: async (ctx, { id }) => { await ctx.db.delete(id) }, }) ``` Soft delete sets a flag and stays recoverable; hard delete is final and reserved for old, purgeable data. #### Typical mistakes The frequent ones: writing manual refetch and polling code when Convex queries are already reactive, so you are fighting the tool; forgetting indexes and watching the app crawl once real data arrives; hard-deleting user data with no recovery, then getting the support ticket you cannot fix; trying to call an external API inside a query or mutation instead of an action; and loosening your schema (everything optional, everything a string) until the type safety that was protecting you evaporates. Lean into the schema and the reactivity rather than working around them. #### Business ROI A reactive, typed database is a productivity multiplier and a quality multiplier at once. You write dramatically less plumbing, so features ship faster. End-to-end types catch bugs before they reach a customer, so you ship with fewer fires. Indexes keep the app fast as you grow, which protects both conversion and your reputation. And the soft-delete pattern turns a category of catastrophic support incidents - "I deleted everything by accident" - into a one-click restore. For a founder, the time saved on plumbing and the bugs that never happen are worth far more than the modest cost of the service. #### Checklist You are ready to move on when each of these is true in a real app you built, not just understood in theory. - You can explain why a database beats a spreadsheet once data is shared or large. - You have a schema with at least one table, an index, and working query and mutation functions. - You can see types flow from the schema into your components and catch a rename error in the editor. - You implemented soft delete with a working restore, and you know when a hard delete is appropriate. #### Resources The Convex docs are excellent and current - keep the schema, queries, mutations and indexes pages bookmarked. The Fundamentals page on what a database is grounds the concepts if any felt shaky. You now have auth and data; the next lesson locks down the secrets that connect every service you have added, including how to encrypt API keys your users entrust to you. #### Your task Add a Convex table for something your app needs (a list of items, projects or notes tied to the logged-in user), with an index on the owner. Write a reactive query and a create mutation, then implement soft delete and restore. Delete an item, confirm it disappears from the list, then restore it and watch it reappear without a page refresh. That single exercise proves you understand reactivity, indexes and recoverable deletes all at once. #### Next lesson Your app now has users and data, which means it has a growing pile of secret keys connecting them. The next lesson is the disciplined version of secret handling: env files, .gitignore, what to do if you ever did push a secret, Convex deploy keys, and encrypting user API keys instead of storing them in plain text. ### 3.4 Secrets and Environment: .env, .gitignore, Deploy Keys and Encryption - Canonical URL: https://agenticschool.dev/courses/modern-app-stack/secrets-and-environment-env-gitignore-deploy-keys-and-encryption - Duration: 26 min Summary: Every service you add comes with secret keys, and leaking one can be catastrophic. This lesson is the disciplined version of secret handling: what a .env file is, why .gitignore is non-negotiable, exactly what to do if you DID push a secret (rotate, do not just delete), how Convex deploy keys work, and why you must encrypt user-supplied API keys at rest instead of storing them in plain text. #### Summary You now juggle keys for Clerk, Convex, soon Stripe, and probably an AI provider. Each one is a key to something valuable - your services, your money, your customers' data. Handling them well is not optional once real money and real users are involved. This lesson is the rigorous version of the secrets habit you started in Course 1: where secrets live, how to keep them out of Git forever, the emergency procedure if one escapes, how deploy keys give your hosting access without exposing your master credentials, and why any key your users give you must be encrypted, not stored as readable text. #### What you will learn You will learn what an env file is and why it exists, how to use .gitignore so a secret can never be committed, why each environment gets its own keys, the exact recovery steps if you already pushed a secret, how Convex deploy keys work and where they belong, and how to encrypt user-supplied API keys at rest so a database leak does not hand attackers your customers' credentials. #### Prerequisites The Course 1 secrets basics (keys in .env, .env in .gitignore) and a multi-service app from earlier in this course, since the risk grows with every integration you add. The Fundamentals page on what an env file is covers the absolute basics if you want them spelled out before going deeper. #### The problem Leaked secrets are one of the most common and most expensive beginner disasters, and they happen quietly. You paste a key into a file to test something, commit it without thinking, push to GitHub, and now that key is in your repository history forever - readable by anyone who can see the repo, and scraped within minutes by bots that scan public GitHub for exactly this. People with a leaked Stripe or cloud key have woken up to thousands of dollars of fraudulent charges. And storing your users' API keys as plain text means one database breach exposes every customer credential at once. None of this requires bad luck. It requires one careless commit. This lesson makes carelessness structurally hard. #### What a .env file is, and gitignore done right A .env file is a plain text file that holds your secrets as KEY=value pairs, kept separate from your code so the code can read them at runtime without the values being hard-coded into committed files. The whole point is separation: the code says "read the Stripe key from the environment", and the actual key sits in .env, which never leaves your machine. The thing that makes this safe is .gitignore - a file listing what Git must never track. Your .env and all its variants belong there, always, in every project, no exceptions. Get this right once per project and a secret physically cannot be committed. ```bash # .env.local - your actual secrets. NEVER committed. CLERK_SECRET_KEY=sk_live_xxxxxxxxxxxxxxxxxxxx STRIPE_SECRET_KEY=sk_live_xxxxxxxxxxxxxxxxxxxx CONVEX_DEPLOY_KEY=prod:your-deployment|xxxxxxxxxxxxxxxxxxxx ENCRYPTION_KEY=base64-32-byte-random-value-here ``` A real .env file: one secret per line, never committed. ```bash # .gitignore - the non-negotiable lines for every project node_modules .env .env.local .env.*.local .DS_Store dist ``` These .gitignore lines stop any .env file from ever entering Git. A useful habit: commit a .env.example file (with the keys but blank or fake values) so anyone setting up the project knows which variables are needed, without ever committing a real secret. The example file is safe to commit precisely because it contains no real values. #### Separate keys per environment Development and production must use different keys. You saw this with Clerk (pk_test versus pk_live) and Stripe does the same (test versus live keys). The reason is blast radius: if a development key leaks, it can only touch test data and test mode, so the damage is contained. A leaked production key touches real customers and real money. Keeping them strictly separate means a mistake while building can never reach live data, and you can hand a teammate or an agent your development keys without risking the business. Never reuse a production key in development "to save time" - that is exactly how a test script accidentally charges a real card or deletes a real user. - Development keys (test mode): safe to use while building, only touch test data. - Production keys (live mode): real money, real users - guard them and use them only in your deployed environment. - Set production secrets in your host (Vercel) and your backend (Convex) dashboards, not in a committed file. - A leaked dev key is an inconvenience; a leaked prod key can be a catastrophe. Keep them apart. #### What to do if you DID push a secret This is the most important section in the lesson, because it will happen to you or someone you work with eventually. The instinct is to delete the line and commit again, or to delete the commit. That is not enough and it is dangerous, because the secret still lives in your Git history and, if the repo was ever public or pushed anywhere, may already be scraped. The only safe assumption is that a pushed secret is compromised. So the real fix is to rotate it: go to the service, revoke the leaked key, and generate a new one. Cleaning the Git history is secondary and optional once the key is dead. Treat rotation as the first and most urgent step, every single time. - Assume the secret is compromised the moment it was pushed. Bots scan public GitHub within minutes. - ROTATE FIRST: go to the service (Stripe, Clerk, Convex, your AI provider), revoke the old key, create a new one. - Update the new key in your .env and in your host/backend environment variables. - Only then, optionally, scrub the history (tools like git filter-repo or BFG) - but a dead key cannot hurt you anyway. - Check the service for unauthorized activity (charges, new users, API calls) while the key was exposed. Say it plainly: deleting the commit does not un-leak the secret. Rotating the key does. If you remember one thing from this lesson, remember to rotate first and clean up second. #### Convex deploy keys When your hosting (Vercel) builds and deploys your app, it needs permission to push your backend functions and schema to Convex. It would be wrong to give your build environment your personal login. Instead, Convex issues a deploy key: a scoped credential that lets an automated environment deploy to one specific Convex deployment, and nothing more. You generate it in the Convex dashboard for your production deployment, then store it as an environment variable in Vercel (never in code). The principle generalises: automated environments get narrow, scoped credentials, not your master account, so a leaked deploy key has a limited blast radius and can be rotated without touching anything else you own. ```bash # In Vercel, set this as an environment variable (NOT in any committed file). # Generated in the Convex dashboard for your PRODUCTION deployment. CONVEX_DEPLOY_KEY=prod:your-deployment-name|xxxxxxxxxxxxxxxxxxxxxxxx # Your build command then uses it to deploy the backend during the Vercel build: # bunx convex deploy --cmd "bun run build" ``` A Convex deploy key is a scoped credential for automated deploys, stored in your host, never committed. #### Encrypting user-supplied keys at rest Here is a scenario this course leads to directly: your product lets users bring their own API key (their OpenAI key, their Stripe key, their key for some service you integrate). You store it so you can act on their behalf. If you store that key as plain text in your database, then a single breach hands an attacker every customer's credentials in one go - a disaster that turns one incident into hundreds. The fix is to encrypt sensitive values before they go into the database and decrypt them only at the moment you use them. Even if someone steals your database, the encrypted blobs are useless without the encryption key, which lives separately in your environment, not in the database. This is the difference between "we had a breach" and "we had a breach and leaked every customer's keys". ```typescript // Encrypt before storing, decrypt only when using. The ENCRYPTION_KEY lives in // your environment, NOT in the database, so a stolen database is useless alone. import { createCipheriv, createDecipheriv, randomBytes } from 'node:crypto' const key = Buffer.from(process.env.ENCRYPTION_KEY!, 'base64') // 32 bytes export function encryptSecret(plain: string) { const iv = randomBytes(12) const cipher = createCipheriv('aes-256-gcm', key, iv) const enc = Buffer.concat([cipher.update(plain, 'utf8'), cipher.final()]) const tag = cipher.getAuthTag() // Store iv + tag + ciphertext together; none of it is secret on its own. return Buffer.concat([iv, tag, enc]).toString('base64') } export function decryptSecret(stored: string) { const raw = Buffer.from(stored, 'base64') const iv = raw.subarray(0, 12) const tag = raw.subarray(12, 28) const enc = raw.subarray(28) const decipher = createDecipheriv('aes-256-gcm', key, iv) decipher.setAuthTag(tag) return Buffer.concat([decipher.update(enc), decipher.final()]).toString('utf8') } ``` AES-256-GCM: encrypt user keys before storing, decrypt only at point of use. The key stays in the environment. You do not need to be a cryptographer to use a well-tested algorithm like AES-256-GCM through your platform's standard crypto library, as above. The rule of thumb: never invent your own encryption, always use the standard library, and keep the encryption key in your environment, completely separate from the data it protects. #### Typical mistakes The painful ones: committing a .env because .gitignore was missing or wrong; deleting a leaked secret from a file and believing it is safe when it still lives in history and is already scraped; reusing a production key in development and accidentally touching live data; putting a deploy key or master credential into committed code; and storing user API keys as plain text so one breach leaks them all. Every one is prevented by the same discipline: secrets out of Git, separate keys per environment, rotate on leak, encrypt sensitive data at rest. #### Business ROI Secret discipline is cheap insurance against a category of disaster that has bankrupted small products: a runaway cloud bill from a leaked key, a fraud incident from a leaked payment key, or a breach that exposes customer credentials and triggers legal liability under GDPR and the US privacy laws. The cost of doing this right is a few minutes of setup per project and a standard encryption function you write once. The cost of doing it wrong is measured in thousands of dollars, legal exposure, and lost trust you may never recover. For a founder, this is one of the highest return-on-effort habits in the entire stack. #### Checklist You are secure enough to add payments when all of these are true across every project you ship. - Every project has a correct .gitignore and no .env has ever been committed. - Development and production use separate keys, with prod secrets set in your host and backend dashboards. - You know the rule: a pushed secret gets rotated first, history cleaned second. - Any user-supplied API key is encrypted at rest, with the encryption key kept out of the database. #### Resources Keep your services' API-key and rotation pages bookmarked so you can revoke a leaked key in seconds when the moment comes. GitHub also offers secret scanning that can alert you to leaked keys - turn it on. The Fundamentals page on what an env file is covers the basics. With your stack secure, the next two lessons add the money: Stripe Checkout and subscriptions, then the harder billing details. #### Your task Audit one of your real projects right now. Confirm .gitignore excludes every .env variant, search the repo history for any accidentally committed key, and if you find one, rotate it immediately. Then add a .env.example with blank values so the project documents its required secrets safely. If your app stores any user-supplied key, wrap it in the encrypt/decrypt functions above. This audit is the kind of thing founders postpone until it is too late - do it today. #### Next lesson Your stack is secure. Now make it earn. The next lesson adds Stripe: hosted versus embedded checkout, products and prices, subscriptions with monthly and yearly billing, and the strict test-versus-production discipline that lets you build payments without ever risking a real charge. ### 3.5 Stripe Part 1: Checkout, Subscriptions, Test vs Production - Canonical URL: https://agenticschool.dev/courses/modern-app-stack/stripe-part-1-checkout-subscriptions-test-vs-production - Duration: 30 min Summary: This is where your product can earn. Stripe handles payments so you never touch raw card data. This lesson covers embedded versus hosted Checkout, the products-and-prices model, subscriptions with monthly and yearly billing, the test cards you build against (4242 4242 4242 4242), and the strict separation of test and production that lets you charge customers confidently without risking real money in development. #### Summary Auth, data and secrets are in place. Now your product can make money. Stripe is the standard for taking payments, and the headline benefit is that you never touch a raw credit card number - Stripe collects it, so your compliance burden shrinks from terrifying to manageable. This lesson gets you to a working subscription: the two flavours of Checkout, how products and prices model your pricing, how subscriptions handle recurring billing including the monthly-versus-yearly toggle your pricing page needs, and the test-mode discipline that lets you build and verify the whole thing without moving a cent of real money. #### What you will learn You will learn the difference between hosted and embedded Checkout and when to use each, how Stripe's products-and-prices model maps to your pricing tiers, how subscriptions handle recurring monthly and yearly billing, how to build entirely in test mode using Stripe's test cards, and the clean checklist for flipping to production once everything works. #### Prerequisites A working app with auth and data, and the secrets discipline from the previous lesson, because Stripe keys are among the most sensitive you will ever hold - a leaked live key is a direct line to your money. You should also be comfortable that test and production are separate worlds, exactly as with Clerk. #### The problem Beginners assume taking payments means building a credit-card form and storing card numbers. That path is a nightmare: handling raw cards puts you under the full weight of PCI compliance, and a mistake exposes you to fraud and legal liability you are not equipped to carry. So people either avoid charging at all, leaving money on the table, or they build something unsafe. Stripe exists precisely so you never store a card number. You hand the payment step to Stripe, it collects the card on its own secure infrastructure, and it tells you the result. Your job shrinks to "set up products and react to what Stripe tells you", which is completely achievable. #### Hosted versus embedded Checkout Stripe Checkout is a prebuilt, secure payment page that Stripe maintains, so you get a polished, PCI-compliant payment flow without building a form. It comes in two flavours. Hosted Checkout redirects the customer to a Stripe-hosted page (checkout.stripe.com), they pay, and Stripe redirects them back to your success URL - simplest to set up, fully maintained by Stripe. Embedded Checkout renders the same secure payment experience inside your own page, so the customer never leaves your app, which can improve conversion and feels more native. Both are equally secure because Stripe still handles the card data either way; the difference is purely whether the payment happens on Stripe's page or embedded in yours. Start with hosted to get working fastest, then move to embedded for a smoother experience (Part 2 covers embedded in depth). - Hosted Checkout: redirect to a Stripe page, pay, redirect back. Fastest to ship, zero card handling. - Embedded Checkout: the same secure flow rendered inside your own app. Better conversion, more polish. - Both are equally secure - Stripe handles the card in both cases. The choice is about user experience. - You never see or store a card number with either option. #### Products and prices Stripe models your pricing with two linked objects: a product and one or more prices. A product is the thing you sell ("Pro plan"). A price is a specific way to pay for it ("20 USD per month" or "200 USD per year"). One product can have several prices - which is exactly how you build a pricing page with a monthly/yearly toggle: the same Pro product has a monthly price and a yearly price, and your toggle switches which price the Checkout uses. You create products and prices in the Stripe dashboard (or via the API), and each price gets an ID like price_xxx that you reference when starting a Checkout. Keeping prices as separate objects, rather than hard-coding amounts in your code, means you can change pricing in the dashboard without redeploying. - Product: what you sell (the "Pro plan"). - Price: a way to pay for it (monthly vs yearly are two prices on the same product). - A monthly/yearly toggle is just choosing between two price IDs at checkout time. - Reference prices by their price_xxx ID; change amounts in the dashboard without touching code. #### Subscriptions: monthly and yearly A subscription is a price billed on a recurring schedule. When a customer checks out with a recurring price, Stripe creates a subscription and charges their card automatically every cycle - monthly or yearly - handling renewals, failed-payment retries and cancellations for you. This is the engine of SaaS revenue, and Stripe runs the billing cycle so you do not have to. For the pricing page this course's house style calls for, you model two recurring prices per tier (a monthly price and a yearly price), default the page to showing the yearly price per month, and let the toggle switch to monthly. Here is the shape of starting a subscription Checkout - the exact API surface evolves, so follow Stripe's current docs, but the structure stays stable. ```typescript // Backend only (uses the SECRET key). Creates a Checkout Session for a // recurring price. The price ID decides monthly vs yearly. import Stripe from 'stripe' const stripe = new Stripe(process.env.STRIPE_SECRET_KEY!) export async function createSubscriptionCheckout(opts: { priceId: string // price_xxx for the chosen monthly OR yearly plan customerEmail: string }) { return stripe.checkout.sessions.create({ mode: 'subscription', // recurring billing, not a one-off payment line_items: [{ price: opts.priceId, quantity: 1 }], customer_email: opts.customerEmail, success_url: 'https://app.yoursite.com/welcome?session_id={CHECKOUT_SESSION_ID}', cancel_url: 'https://app.yoursite.com/pricing', }) } ``` A subscription Checkout session. mode: subscription means recurring; the price ID picks monthly or yearly. This runs on the backend with your secret key, never in the browser. The session it returns is what you redirect the customer to (hosted) or render in your page (embedded). Either way, the card is collected by Stripe. #### Test mode and test cards Stripe gives you two completely separate modes - test and live - each with its own keys and its own data. In test mode you build and verify the entire payment flow without moving any real money, using special test card numbers that Stripe recognises. The famous one is 4242 4242 4242 4242: a Visa that always succeeds. Stripe also provides cards that simulate failures, declines and authentication challenges, so you can test the unhappy paths too. Use any future expiry date, any three-digit CVC, and any postal code. Do all your development here. Your test keys start with sk_test and pk_test, mirroring the same prefix convention you saw with Clerk. - 4242 4242 4242 4242 - succeeds every time. Your default for happy-path testing. - 4000 0000 0000 0002 - always declined, so you can test failure handling. - 4000 0025 0000 3155 - requires authentication (3D Secure), so you can test that flow. - Use any future expiry, any CVC, any postal code. Real cards do nothing in test mode. Test-mode data (customers, subscriptions, payments) is entirely separate from live data and never converts over. When you go live, you start with a clean live dashboard. That separation is a feature: you can experiment freely without ever fearing a real charge. #### Going to production Once everything works in test mode, going live is a short, careful checklist rather than a rebuild. You activate your Stripe account (Stripe needs your business and bank details to pay you out), recreate your products and prices in live mode if you only made them in test, swap your test keys for live keys in your deployed environment, and update any price IDs your code references to the live ones. Then you run one real, small transaction yourself to confirm the whole loop works end to end with live keys. Crucially, you keep test and live strictly separate - never paste a live key into a development env file. Part 2 adds the webhook setup that production billing genuinely needs, so treat going live as "ready to take payments" rather than "fully production-hardened" until you have done Part 2. - Activate your Stripe account with real business and bank details so payouts work. - Recreate products and prices in live mode; note the new live price_xxx IDs. - Swap sk_test / pk_test for sk_live / pk_live in your deployed environment only. - Do one small real transaction yourself to confirm the live loop, then refund it. - Do not call it done until Part 2 wires up webhooks - they are how your app learns what actually happened. #### Typical mistakes The frequent ones: putting the Stripe secret key in frontend code where every visitor can read it (it belongs on the backend only); hard-coding price amounts in code instead of referencing price IDs, so a price change means a redeploy; forgetting that test and live data never merge, then panicking that production looks empty; and the big one this course warns about repeatedly - thinking Checkout alone is enough. Without webhooks (Part 2), your app is guessing whether a payment really succeeded. Build in test mode, keep the secret key on the backend, and treat Part 2 as mandatory, not optional. #### Business ROI Payments are the moment your product stops being a cost and starts being a business. Stripe lets a solo founder accept money from anywhere in the world, on a recurring schedule, without a payments team or PCI auditors, in an afternoon. Subscriptions specifically turn one-time effort into recurring revenue, which is the entire financial appeal of SaaS - you build once and earn monthly. And the monthly-versus-yearly toggle is not cosmetic: yearly plans improve cash flow and reduce churn, which is why this course's house style defaults the pricing page to the yearly price. Getting paid well is as much a product decision as a technical one. #### Checklist You are ready for Part 2 when all of these are true in test mode. - You can explain why you never handle raw card data and what Checkout does for you. - You created a product with both a monthly and a yearly price. - A subscription Checkout works in test mode with card 4242 4242 4242 4242. - The Stripe secret key lives only on the backend, in a gitignored env file. #### Resources The Stripe docs and the test-cards reference are essential and always current - keep them open while you build. The Stripe CLI (used heavily in Part 2) is worth installing now. Your pricing page should follow this course's house style: yearly price shown per month by default, with a monthly/yearly toggle. Next, Part 2 makes billing actually reliable with webhooks. #### Your task In test mode, create a Pro product with a monthly and a yearly price, then build a subscription Checkout that a logged-in user can complete with the 4242 test card. Confirm the subscription appears in your Stripe test dashboard. Then try the declined card (4000 0000 0000 0002) and notice how your app handles the failure - that unhappy-path awareness is what separates a real billing flow from a demo. #### Next lesson Checkout is only half the story. The next lesson handles the parts that make billing trustworthy: webhooks (and why polling is the wrong instinct), signature verification, idempotency, proration when a customer upgrades mid-cycle, coupons and promotion codes, and embedded checkout. These are the details that separate a toy from a real billing system. ### 3.6 Stripe Part 2: Webhooks, Proration, Coupons and Embedded Checkout - Canonical URL: https://agenticschool.dev/courses/modern-app-stack/stripe-part-2-webhooks-proration-coupons-and-embedded-checkout - Duration: 32 min Summary: Reliable billing depends on webhooks: Stripe telling your app what actually happened, instead of your app guessing. This lesson covers what webhooks are and why polling is wrong, webhook secrets and signature verification, idempotency so you never double-process an event, proration when a customer upgrades mid-cycle, coupons and promotion codes, embedded checkout, and the common webhook gotchas that trip everyone up. #### Summary In Part 1 you took a payment. But here is the uncomfortable truth: after redirecting a customer to Checkout, your app does not actually know what happened to them. Did they pay? Did the card fail later? Did they cancel next month? Guessing is how people accidentally give away the product for free or keep charging cancelled customers. The answer is webhooks: Stripe calls your app whenever something happens, so your app reacts to reality instead of guessing. This lesson makes your billing trustworthy: verified webhooks, idempotency, proration, coupons, embedded checkout, and the gotchas that bite everyone the first time. #### What you will learn You will learn what a webhook is and why polling Stripe is the wrong instinct, how to verify a webhook signature so attackers cannot fake events, how idempotency stops you double-processing a duplicated event, how proration fairly handles a mid-cycle plan change, the difference between coupons and promotion codes, how embedded checkout works, and the webhook gotchas that cause the classic "it worked in test but broke in production" failure. #### Prerequisites Stripe Part 1 and a working subscription Checkout, plus a backend that can receive HTTP requests (your Convex actions or your framework's server routes), because a webhook is just Stripe sending a request to your backend. The secrets lesson, too, since the webhook signing secret is one more key to guard. #### The problem The naive instinct after Checkout is to poll: every few seconds, ask Stripe "did they pay yet?" This is wrong on every axis. It wastes requests, it is slow, it misses events that happen later (a renewal next month, a card that fails in 30 days, a cancellation), and it does not scale. Worse, relying on the success-URL redirect alone is unreliable - a customer can pay and then close the tab before the redirect fires, and now they paid but your app never recorded it. The right model is the reverse: do not ask, be told. Stripe pushes an event to your backend the instant something happens, and your app reacts. That is a webhook, and getting it right is what makes billing trustworthy. #### What webhooks are and verifying signatures A webhook is Stripe making an HTTP request to a URL you own whenever an event occurs: a payment succeeded, a subscription renewed, a card failed, a subscription was cancelled. Your backend listens at that URL and updates your database in response. But there is a catch: anyone on the internet could send a fake request to that URL pretending to be Stripe, trying to trick your app into granting free access. So you must verify that each request genuinely came from Stripe. Stripe signs every webhook with a secret (the webhook signing secret, starting with whsec_), and you verify that signature on every request. If verification fails, you reject the request. Never trust an unverified webhook. Here is the verification, which is non-negotiable. ```typescript // Webhook handler (backend). Verify the signature BEFORE trusting anything. import Stripe from 'stripe' const stripe = new Stripe(process.env.STRIPE_SECRET_KEY!) const webhookSecret = process.env.STRIPE_WEBHOOK_SECRET! // whsec_xxx export async function handleStripeWebhook(req: Request) { const signature = req.headers.get('stripe-signature')! const rawBody = await req.text() // MUST be the raw body, not parsed JSON let event: Stripe.Event try { // Throws if the signature is invalid or the body was altered. event = await stripe.webhooks.constructEventAsync(rawBody, signature, webhookSecret) } catch { return new Response('Invalid signature', { status: 400 }) } switch (event.type) { case 'checkout.session.completed': // Grant access / mark the user as subscribed in your database. break case 'customer.subscription.deleted': // Revoke access - they cancelled. break case 'invoice.payment_failed': // Warn the user / start a dunning flow. break } return new Response('ok', { status: 200 }) } ``` Always verify the signature with the raw body before trusting a webhook. An unverified webhook is an open door. One detail that breaks people: you must verify against the raw request body, exactly as Stripe sent it. If your framework parses the body into JSON before you verify, the signature will not match and every webhook will fail. Read the raw body first, verify, then parse. #### Idempotency: handle duplicates safely Stripe guarantees it will deliver each event at least once, which means it might deliver the same event more than once - on a retry after a timeout, or a network hiccup. If your handler grants a month of free credits or sends a welcome email every time it sees checkout.session.completed, a duplicate delivery double-grants or double-emails. The fix is idempotency: design your handler so processing the same event twice has the same effect as processing it once. The simplest reliable approach is to record each event ID you have processed and skip it if you see it again. Stripe also recommends responding quickly with a 200 and doing slow work afterward, so Stripe does not time out and retry unnecessarily. - Stripe delivers each event at least once, so duplicates happen - design for it. - Store processed event IDs; if you have seen one before, acknowledge it and skip the work. - Make actions safe to repeat: "set subscribed = true" is naturally idempotent; "add one month" is not. - Respond 200 fast; if you are slow or error, Stripe retries, which causes more duplicates. #### Proration on mid-cycle upgrades A customer on the monthly plan upgrades to the yearly plan, or moves from a cheaper tier to a pricier one, halfway through their billing period. What should they pay? Proration is Stripe computing the fair amount: it credits the unused portion of what they already paid and charges the difference for the new plan for the rest of the period. You do not calculate this yourself - you tell Stripe to update the subscription to the new price, and Stripe figures out the prorated charge or credit. Your job is to decide the behaviour (charge the difference immediately, or apply it to the next invoice) and to update access in your app when the webhook confirms the change. The mistake to avoid is computing prorated amounts by hand; let Stripe do the arithmetic and react to the resulting events. - Upgrade mid-cycle: Stripe credits the unused time and charges the prorated difference. - You update the subscription to the new price ID; Stripe computes the money. - Choose whether the proration is charged now or added to the next invoice. - React to the webhook (subscription updated, invoice paid) to sync access in your app. #### Coupons and promotion codes These two are related but not the same, and the distinction matters. A coupon is the underlying discount rule: "20 percent off" or "10 USD off, once". A promotion code is a customer-facing code (like LAUNCH20) that applies a coupon. You create a coupon, then create one or more promotion codes that point at it. The reason for two layers is flexibility: one coupon ("20 percent off") can back several codes with different restrictions (expiry, usage limit, first-time customers only). In Checkout you can enable a promotion-code field so customers type a code themselves, or you can apply a coupon directly to a session in code for an automatic discount. Use promotion codes for marketing campaigns you want customers to enter, and direct coupon application for discounts you grant programmatically. - Coupon: the discount rule itself (percent off, fixed amount, duration). - Promotion code: a customer-facing code (LAUNCH20) that applies a coupon. - One coupon can back many promotion codes with different limits and expiries. - Enable the promotion-code field in Checkout for customer-entered codes, or apply a coupon in code for automatic discounts. #### Embedded checkout Part 1 mentioned embedded Checkout; here is where it earns its place. Embedded Checkout renders Stripe's secure payment flow inside your own page instead of redirecting to a Stripe-hosted URL, so the customer stays on your domain the whole time. This usually lifts conversion (every redirect is a chance to lose someone) and feels more like part of your product. The mechanics: your backend creates a Checkout Session in embedded ui_mode and returns a client secret, and your frontend mounts Stripe's embedded component with that secret. The card is still collected entirely by Stripe, so you keep all the security and compliance benefits while owning the experience. Reach for embedded once your hosted flow works and you want to tighten conversion. ```typescript // Backend: create an EMBEDDED checkout session and return its client secret. const session = await stripe.checkout.sessions.create({ ui_mode: 'embedded', // embedded, not a redirect mode: 'subscription', line_items: [{ price: priceId, quantity: 1 }], return_url: 'https://app.yoursite.com/welcome?session_id={CHECKOUT_SESSION_ID}', }) // Send session.client_secret to the frontend, which mounts Stripe's embedded UI. ``` Embedded checkout keeps the customer on your domain while Stripe still handles the card. #### Common webhook gotchas These are the specific things that make webhooks "work in test but break in production", collected so you can avoid each one. Most production billing bugs trace back to one of these. - Parsing the body before verifying: the signature check needs the raw body. Read raw first, verify, then parse. - Using the wrong webhook secret: test and live each have a different whsec_, and the Stripe CLI gives a third one for local forwarding. Match the secret to the environment. - Not handling duplicates: without idempotency, a retried event double-processes. Record processed event IDs. - Trusting the success redirect instead of the webhook: the user can close the tab before the redirect, so grant access from the webhook, not the redirect. - Slow or failing handlers: return 200 fast. If you take too long or error, Stripe retries, multiplying duplicates. - Forgetting to register the production webhook endpoint: the test endpoint does not carry over to live mode. Add the live URL and its secret when you go live. ```bash # Test webhooks locally with the Stripe CLI - it forwards live events to your machine # and prints a whsec_ secret to use for local verification. stripe login stripe listen --forward-to localhost:5296/api/stripe/webhook # In another terminal, trigger a fake event to test your handler: stripe trigger checkout.session.completed ``` The Stripe CLI forwards real events to localhost and lets you trigger test events - the standard way to develop webhooks. #### Typical mistakes Beyond the gotcha list: polling Stripe instead of using webhooks; trusting an unverified webhook and letting an attacker fake a "payment succeeded" event to get free access; computing proration by hand instead of letting Stripe do it; confusing coupons and promotion codes; and shipping embedded checkout without first getting the hosted flow and webhooks solid. The throughline: be told, do not ask; verify everything; let Stripe do the money maths; and treat duplicates as inevitable. #### Business ROI This lesson is the difference between billing that quietly loses you money and billing you can trust. Without verified webhooks, you either give away access you were never paid for or keep charging people who cancelled - both are direct revenue and reputation damage. Idempotency prevents the embarrassing double-charge or double-grant. Proration done by Stripe keeps upgrades fair, which removes friction from the most valuable action a customer can take: spending more. And embedded checkout plus promotion codes are direct conversion levers. Spending a day to get webhooks right protects every dollar that flows through your product from here on. #### Checklist Your billing is production-grade when all of these hold. - You verify every webhook signature against the raw body before acting on it. - Your handler is idempotent - a duplicated event causes no double-processing. - You grant and revoke access from webhooks, not from the success redirect. - Upgrades use Stripe proration, and you have a working promotion code and an embedded or hosted flow live. #### Resources The Stripe webhooks docs, the events reference, and the Stripe CLI are your core tools here - the CLI especially makes local webhook development painless. Keep your test and live webhook signing secrets clearly labelled so you never cross them. Next, the final lesson takes the whole stack live: dev-to-prod migration for Clerk and Convex, DNS, Cloudflare, Search Console and performance. #### Your task Install the Stripe CLI, run stripe listen to forward events to your local backend, and build a webhook handler that verifies the signature and grants access on checkout.session.completed. Trigger a duplicate of the same event and confirm your handler does not double-process it. Then enable a promotion-code field in your Checkout and test a coupon. You now have billing that reacts to reality instead of guessing. #### Next lesson Your product takes payments reliably. The final lesson of the course takes everything live: migrating Clerk and Convex from dev to prod, wiring DNS and Cloudflare, verifying your site in Google Search Console and submitting your sitemap, and getting your Lighthouse scores into the green so the live product is fast and discoverable. ### 3.7 Going Live: Dev-to-Prod Migration, Search Console and Performance - Canonical URL: https://agenticschool.dev/courses/modern-app-stack/going-live-dev-to-prod-migration-search-console-and-performance - Duration: 30 min Summary: The launch lesson for the modern stack. You will migrate Clerk and Convex from development to production without breaking logins or data, wire up your domain and DNS with Cloudflare in front, verify your site in Google Search Console and submit your sitemap so you get indexed, and tune Lighthouse and the page-speed basics that actually matter so the live product is fast and discoverable. #### Summary You have built the whole stack: architecture, auth, data, secrets and payments. Now you take it live and make sure the world can find it. Going live is not a single button - it is migrating each service from its development instance to production in a careful order, pointing your domain at the right places through Cloudflare, telling Google your site exists, and making it fast. Do it in the right order and launch day is calm. Do it ad hoc and you get the classic launch-day failure where one missed key takes down logins or billing while customers are watching. This lesson gives you the calm version. #### What you will learn You will learn a safe order for migrating Clerk and Convex from development to production, how to wire your domain and DNS with Cloudflare in front of your app, how to verify your site in Google Search Console and submit your sitemap so you get indexed, and which Lighthouse and Core Web Vitals basics actually matter for speed, SEO and conversion. #### Prerequisites A complete stack from this course - auth, data and payments, all working in development - since going live ties all three together. The Course 1 deploy lesson, because Vercel, DNS and Cloudflare appeared there and this lesson builds on them. And the production setup pieces from the Clerk, Stripe and secrets lessons, which you now bring together. #### The problem Migration is where confident builders get humbled. Each service has its own dev and prod worlds (Clerk instances, Convex deployments, Stripe modes), and switching them in the wrong order or forgetting one key leaves a half-live app where users can sign up but their data vanishes, or they can pay but never get access. Meanwhile, even a perfect launch is invisible if Google does not know your site exists and your pages are slow. Founders pour weeks into building and then fumble the last mile, so the product launches broken or unfindable. This lesson is that last mile, done in a deliberate order. #### A safe migration order Migrate one service at a time and verify each before the next, so if something breaks you know exactly which step caused it. The safe order is: domain and DNS first (so the address exists), then Convex (so data has a home), then Clerk (so auth points at the live domain and the live database), then Stripe (so payments grant access to the now-live system), then deploy with all the live keys in place. After each switch, test the real flow before moving on. The single most common failure is leaving one test key in the deployed environment, so do a final sweep of every environment variable and confirm not a single sk_test, pk_test or test webhook secret survives in production. - Domain and DNS first: register the domain, point it through Cloudflare at your app and marketing site. - Convex next: create the production deployment, set its env vars, deploy your schema and functions to it. - Clerk next: create the production instance, add its DNS records, configure Google OAuth, swap to pk_live / sk_live. - Stripe last: activate the account, recreate products and prices in live mode, register the live webhook endpoint, swap to live keys. - Final sweep: confirm zero test keys remain in production env vars, then deploy and test the full signup-to-payment flow live. #### Domain, DNS and Cloudflare in front Your marketing site and your app live at different addresses, usually the bare domain (yoursite.com) for marketing and a subdomain (app.yoursite.com) for the app, exactly as the architecture lesson described. You point both through DNS, and a common, strong pattern is to put Cloudflare in front: manage DNS at Cloudflare, get its free HTTPS and global CDN, and route each name to the right place (Vercel for both surfaces, typically). Remember the Clerk subdomains from lesson 2 - those CNAME records usually want to be DNS-only (grey cloud) rather than proxied, so they do not break the auth TLS handshake. Cloudflare in front gives you speed and basic protection for free, which is why most builders use it. Take care that you are not double-proxying or double-handling HTTPS, the gotcha from Course 1. ```text ; Example DNS layout (real targets come from Vercel / Clerk) ; Type Name Value / target Proxy CNAME @ cname.vercel-dns.com proxied ; marketing site CNAME app cname.vercel-dns.com proxied ; the app CNAME clerk frontend-api.clerk.services DNS only ; auth - do NOT proxy CNAME accounts accounts.clerk.services DNS only ; auth - do NOT proxy ``` Marketing and app can be proxied through Cloudflare; Clerk auth subdomains stay DNS-only to keep TLS working. #### Google Search Console and your sitemap A live site that Google does not know about gets no search traffic. Google Search Console is the free tool that tells Google your site exists, shows how it indexes you, and surfaces problems early. The flow: add your site as a property, verify you own it (the easiest method is usually a DNS TXT record Search Console gives you, which you add at Cloudflare), then submit your sitemap URL. Your sitemap is the machine-readable list of your pages - this project already generates one with the generate:sitemaps script - and submitting it tells Google exactly what to crawl. After that, Search Console becomes your early-warning system: it shows indexing coverage, which queries you appear for, and any crawl errors, so you catch SEO problems before they cost you traffic. ```text ; Verify ownership in Search Console with a TXT record (value comes from Google) ; Type Name Value TXT @ google-site-verification=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx ; Then submit your sitemap URL in the Search Console UI: ; https://yoursite.com/sitemap.xml ``` Verify with a DNS TXT record, then submit your sitemap so Google knows what to crawl. Indexing is not instant. After submitting, Google takes days to weeks to crawl and index a new site, and Search Console shows the progress. Submitting the sitemap and fixing any errors it reports is the highest-leverage SEO action you can take at launch. #### Lighthouse and the page-speed basics that matter Lighthouse is the free audit built into Chrome (and runnable in CI) that scores your pages on performance, accessibility, best practices and SEO. Fast pages convert better and rank better - Google uses Core Web Vitals as a ranking signal, and AI crawlers favour fast, clean pages too - so treating speed as part of launch pays off twice. You do not need to chase a perfect 100 on every metric, but you should understand the handful of things that actually move the needle, because most slow sites are slow for the same few reasons. Run Lighthouse on your marketing pages especially, since those are your funnel's front door and where speed matters most. - Images: the number one culprit. Serve modern formats (WebP/AVIF), size them correctly, and lazy-load below-the-fold images. - JavaScript: ship less of it. This is exactly why the marketing site uses a content-first framework - fewer scripts, faster pages. - Largest Contentful Paint (LCP): how fast the main content appears. Optimise your hero image and fonts to improve it. - Cumulative Layout Shift (CLS): things jumping around as the page loads. Reserve space for images and ads so nothing shifts. - Fonts: load them efficiently and avoid invisible-text flashes. Self-hosting and preloading help. - Caching and the CDN: Cloudflare in front already serves static assets fast worldwide - lean on it. #### Typical mistakes The launch-day classics: leaving a test key in production so logins or payments silently fail; migrating services in a tangled order so you cannot tell what broke; proxying Clerk's auth subdomains through Cloudflare and breaking the TLS handshake; forgetting to register the live Stripe webhook endpoint, so production payments never grant access; never verifying the site in Search Console or submitting the sitemap, so the launch is invisible to Google; and shipping heavy, unoptimised images that tank Lighthouse and conversion. Every one is prevented by the ordered checklist and a final environment-variable sweep. #### Business ROI The last mile is where all your building either pays off or quietly fails. A clean migration means launch day is boring instead of an outage in front of your first customers. Search Console and the sitemap are the difference between a product that compounds traffic over months and one that nobody ever finds. And page speed is a direct lever on both conversion and ranking - faster pages literally make more money and rank higher, every single day they are live. For a founder, getting the last mile right protects the entire investment of building the product in the first place. #### Checklist You have completed Course 3 when all of these are true on the live product. - Clerk, Convex and Stripe are all on production, and zero test keys remain in production env vars. - Your domain serves the marketing site and app over HTTPS, with Cloudflare in front and Clerk subdomains DNS-only. - Your site is verified in Google Search Console and your sitemap is submitted. - Lighthouse on your marketing pages is healthy, with images and JavaScript optimised. #### Resources Keep the Clerk and Convex production-deployment guides, the Stripe go-live checklist, and Google Search Console open during launch - each is current where this course defers to docs. Run Lighthouse from Chrome DevTools or your CI on every marketing page. The Course 1 deploy lesson is your reference for Vercel, DNS and Cloudflare basics. You can now build and launch a complete, paid product end to end. #### Your task Take a project from this course live, even a small one. Migrate Clerk and Convex to production in the safe order, point your domain through Cloudflare, do the final env-var sweep to kill every test key, verify the site in Search Console and submit the sitemap, then run Lighthouse on your homepage and fix the two biggest issues it reports. Doing the full last mile once turns launch from a scary unknown into a routine checklist for every product you build after this. #### Next lesson You can now build and launch a real, paid product end to end: architecture, auth, data, secrets, payments and a clean go-live. Course 4 moves beyond a single app into automation and agentic systems - n8n and workflow tools, browser automation, sandboxes, and building your own AI tools that run work for you while you sleep. ### 4.1 Workflow Automation: n8n, Zapier and Trigger.dev - What to Use When - Canonical URL: https://agenticschool.dev/courses/automation-agentic-systems/workflow-automation-n8n-zapier-and-trigger-dev-what-to-use-when - Duration: 26 min Summary: Not everything should be a custom app. Automation platforms connect services and run workflows on triggers. This lesson compares n8n, Zapier and Trigger.dev, covers self-hosting n8n on a cheap VPS for control and cost, and shows how to generate workflow JSON with Claude Code instead of clicking through a builder. #### Summary Before you build a custom app, ask whether a workflow tool can do the job in an afternoon. Automation platforms sit between your services and fire a chain of steps whenever something happens: a form is submitted, a row changes, a timer ticks. The three that matter in 2026 are Zapier (easiest, hosted, pay per task), n8n (flexible, self-hostable, the builder favourite) and Trigger.dev (code-first jobs that live next to your app). This lesson gives you a decision rule, the real cost picture, and the trick that makes all of them faster: have your agent write the workflow JSON instead of dragging nodes around. #### What you will learn You will learn how the three platforms differ in philosophy and cost, why self-hosting n8n on a 5 to 10 dollar VPS often beats per-task pricing once volume grows, when Trigger.dev is the right call, and how to describe a workflow to Claude Code and get importable JSON back. By the end you can pick a platform in seconds and stand one up the same day. #### Prerequisites Courses 1 to 3. You need the API and webhook basics, a comfort with environment variables and secrets, and ideally a deployed project from Course 3 so an automation has something real to talk to. If the words trigger, webhook and cron are not yet second nature, skim the Fundamentals page on what an API is first. #### The problem Beginners reach for one of two extremes. Either they wire everything into Zapier, watch the per-task bill climb as volume grows, and feel trapped. Or they decide to build a custom backend for a job that is really just "when a Typeform comes in, add a row and send a Slack message" - three weeks of work for something a workflow tool does in twenty minutes. The skill is knowing which jobs are workflow-shaped and which platform fits the control, cost and code you actually want. #### Choosing a platform Think in three questions: how much do you care about cost at volume, how much control do you need, and how much code are you willing to write. Zapier optimises for getting live in five minutes with thousands of pre-built integrations. n8n optimises for flexibility and cost control because you can self-host it. Trigger.dev optimises for developers who want their background jobs versioned in their own repo. Match the platform to the answer, not to brand familiarity. - Zapier: hosted, the easiest, the widest integration catalogue. Pricing is per task (every step that runs counts), which is fine at low volume and painful at high volume. Use it for quick glue between SaaS tools where you run hundreds of tasks a month, not millions. - n8n: open-source and self-hostable, a visual node editor like Zapier but you own the instance. Hosted cloud option exists, but the win is running it yourself for a flat VPS cost. Use it when volume is real, you want full control, or you need custom code inside steps. - Trigger.dev: code-first. You write background jobs in TypeScript in your own repo, with retries, scheduling and long-running tasks handled for you. Use it when the automation is really part of your app and belongs in version control next to it, not in a separate clicky UI. - Rule of thumb: prototype in Zapier, graduate high-volume glue to self-hosted n8n, and put app-coupled jobs in Trigger.dev. #### The cost trap and self-hosting n8n Per-task pricing is the thing that bites you. A workflow with six steps that runs 5,000 times a month is 30,000 tasks, and on a hosted plan that adds up fast. Self-hosting n8n flips this to a flat cost: a small VPS on a provider like Hostinger, Hetzner or a basic DigitalOcean droplet runs roughly 5 to 10 dollars a month and will happily execute tens of thousands of workflow runs. You trade a little setup for predictable cost and full control, including custom code nodes and access to your own database on the same box. Here is a minimal Docker Compose sketch to stand n8n up with a persistent volume so your workflows survive a restart. ```yaml # docker-compose.yml - minimal self-hosted n8n on a small VPS services: n8n: image: docker.n8n.io/n8nio/n8n restart: always ports: - '5678:5678' environment: - N8N_HOST=your-domain.com - N8N_PORT=5678 - N8N_PROTOCOL=https - WEBHOOK_URL=https://your-domain.com/ # Always set basic auth or put it behind a reverse proxy - N8N_BASIC_AUTH_ACTIVE=true - N8N_BASIC_AUTH_USER=admin - N8N_BASIC_AUTH_PASSWORD=${N8N_PASSWORD} volumes: # Persist credentials and workflows across restarts - n8n_data:/home/node/.n8n volumes: n8n_data: ``` A starting docker-compose.yml for self-hosted n8n. Put it behind a reverse proxy with HTTPS and never expose it without auth. Run it with docker compose up -d, point a subdomain at the box, and put it behind a reverse proxy (Caddy or Nginx) for HTTPS. The security mindset from Course 3 applies: an open n8n instance with no auth is a remote-code-execution machine for anyone who finds it. Lock it down before you put a single credential in it. #### Generating workflow JSON with your agent Here is the part that makes you faster than people who live in these builders. n8n and Zapier both store workflows as JSON under the hood. That means you do not have to drag nodes around at all - you can describe the workflow to Claude Code, have it produce the JSON, and import it. For anything beyond three steps this is dramatically quicker, and it puts your automation in a file you can version, diff and reuse. The trick is giving the agent the node schema to follow, which you get by exporting one example workflow from the UI first. ```markdown ## Goal An n8n workflow JSON I can import directly. ## Trigger Webhook node. It receives a POST with { email, name, plan }. ## Steps 1. Validate that email contains "@"; if not, respond 400. 2. Insert a row into Postgres (table: leads) with the three fields plus a created_at timestamp. 3. Send a Slack message to #new-leads: "New lead: {name} on {plan}". 4. Respond 200 with { ok: true }. ## Constraints - Match the exact node JSON structure of the example I pasted below. - Use n8n expression syntax ={{ $json.email }} for field references. - Do not invent credential IDs; leave them as placeholders I will fill in. ## Example node structure ``` A spec you can hand to Claude Code to get importable n8n workflow JSON. Always paste one real exported example so it matches the current schema. The same pattern works for Trigger.dev, except the output is TypeScript in your repo rather than JSON in a UI, which is even more natural for an agent. Always import the generated JSON into a test workflow first and run it once before trusting it in production, because node schemas drift between n8n versions and the agent works from whatever example you gave it. #### Typical mistakes The recurring errors: defaulting to Zapier for everything and discovering the per-task bill only at scale; self-hosting n8n with no auth and no HTTPS, which is a security hole, not a saving; building a custom app for a job that was always workflow-shaped; and trusting agent-generated JSON without importing and running it once. Also watch for putting secrets directly in workflow nodes instead of the platform credential store - that leaks them on every export. #### Business ROI Workflow tools are the highest-leverage automation you can ship without writing a backend. A single n8n flow can replace a part-time job: receive leads, enrich them, route them, notify a human, log everything. Moving high-volume glue from per-task Zapier to a self-hosted n8n box can cut a recurring bill from hundreds of dollars a month to under ten, with no loss of capability. And generating the JSON with your agent compresses a half-day of clicking into a ten-minute spec, so you build five automations in the time others build one. #### Checklist You are ready to move on when you can answer these without hesitation, because the rest of this course assumes you can stand up an automation. - Can you name the right platform for a quick SaaS glue job, a high-volume flow, and an app-coupled background job? - Do you know why per-task pricing hurts at scale and roughly what a self-hosted n8n VPS costs? - Can you stand up n8n with Docker Compose, behind auth and HTTPS? - Can you write a spec that gets importable workflow JSON out of your agent? #### Resources Bookmark the official n8n, Zapier and Trigger.dev docs, because node names and pricing change and you want the current source, not a memory. The n8n self-hosting and Docker docs are the canonical reference for the Compose setup above. When an agent-generated workflow misbehaves, your first move is always to export a fresh example from the running instance and re-prompt against it. #### Your task Pick one real, repetitive task in your work that fits the "when X happens, do Y and Z" shape. Build it twice: once by hand in a free Zapier or n8n account to feel the builder, then again by having your agent generate the JSON from a spec. Note which was faster and which you would actually maintain. That comparison tells you how you will build every automation from here. #### Next lesson Automations constantly need to act on websites that have no API. The next lesson covers browser automation and scraping with Playwright, the non-headless manual-login trick, and the .har trick for finding the real API hiding behind a page. ### 4.2 Browser Automation and Scraping: Playwright, Browser Use and the .har Trick - Canonical URL: https://agenticschool.dev/courses/automation-agentic-systems/browser-automation-and-scraping-playwright-browser-use-and-the-har-trick - Duration: 28 min Summary: When there is no API, the browser is your API. This lesson covers automating the web with Playwright and agent-driven browsing, scraping responsibly, the non-headless trick of launching a visible browser and logging in by hand before automating, and the .har trick: recording network traffic in DevTools and letting an agent extract the real API endpoints from it. #### Summary Most of the web has no public API, but every site is already an interface a human can drive, which means a program can drive it too. Playwright is the tool that does this: it controls a real browser the way a user would, clicking, typing and reading the page. On top of that sit two tricks that turn fragile scraping into reliable automation. The non-headless trick lets you log in by hand once and reuse the session. The .har trick lets you skip the rendered page entirely and call the API the site itself is calling. This lesson teaches all three, plus when to reach for an agent-driven browser and how to stay on the right side of ethics and the law. #### What you will learn You will learn Playwright basics, how agent-driven browsing (Browser Use and similar) differs from scripted automation, the visible-browser manual-login pattern that sidesteps the hardest part of authenticated sites, and how to record a .har file and let an agent extract the underlying API calls. You will also learn where the legal and ethical lines are and when to hand scale off to a provider like Brightdata. #### Prerequisites Courses 1 to 3 and a working Node project. You should be comfortable with async code, how HTTP requests and headers work, and reading your browser DevTools. The Fundamentals page on what an API is helps if request and response are still fuzzy, because the whole .har trick is about finding the API a page hides. #### The problem You need data or an action from a site that gives you no API. The naive approach is to scrape the rendered HTML with brittle selectors that break the moment the site changes a class name. The harder cases are sites behind a login, with bot detection, or that load everything through JavaScript so the HTML is empty until scripts run. People burn days fighting selectors and login flows when the site is quietly making clean JSON API calls the whole time, visible in DevTools, that they could call directly. #### Playwright basics Playwright launches and controls Chromium, Firefox or WebKit. You tell it to go to a URL, wait for elements, click, type and read text or attributes. It auto-waits for elements to be ready, which removes most of the flakiness that plagued older tools. The same library powers the end-to-end tests from your quality gates, so the muscle is reusable. Here is the shape of a basic script that loads a page and pulls some data. ```typescript import { chromium } from 'playwright' const browser = await chromium.launch() const page = await browser.newPage() await page.goto('https://example.com/products') // Auto-waits for the selector before reading await page.waitForSelector('.product-card') const titles = await page.$$eval('.product-card h2', (nodes) => nodes.map((n) => n.textContent?.trim()), ) console.log(titles) await browser.close() ``` A minimal Playwright scrape: load a page, wait for elements, read text. Auto-waiting handles most timing flakiness. #### The non-headless manual-login trick The hardest part of automating an authenticated site is the login: passwords, two-factor codes, captchas and bot detection all fight you, and hard-coding credentials into a script is both fragile and a security risk. The trick is to skip automating login entirely. Launch the browser non-headless (visible), navigate to the login page, and pause the script while you log in by hand. Once you are in, Playwright saves the authenticated session (cookies and storage) to a file, and every future run loads that file and is already logged in. No stored passwords, no captcha-solving, no brittle login flow. ```typescript import { chromium } from 'playwright' // One-time: log in by hand, then save the session. const browser = await chromium.launch({ headless: false }) // visible const context = await browser.newContext() const page = await context.newPage() await page.goto('https://app.example.com/login') console.log('Log in manually in the window, then press Enter here...') // Wait for YOU to finish login before continuing await page.waitForURL('**/dashboard', { timeout: 0 }) // Save the authenticated session for all future runs await context.storageState({ path: 'auth.json' }) await browser.close() // Every later run starts already logged in: // const context = await browser.newContext({ storageState: 'auth.json' }) ``` The non-headless login trick: launch visible, log in by hand once, save the session, then reuse it headlessly forever. Keep auth.json out of Git (it is a live session token) and treat it like a secret. Sessions expire, so re-running the one-time login step occasionally is normal. This pattern is the single biggest reliability upgrade for authenticated automation. #### Agent-driven browsing Scripted Playwright is precise but brittle: change the page and your selectors break. Agent-driven browser tools like Browser Use take the opposite approach. You give an agent a goal in plain language ("find the cheapest flight from Zurich to London next Friday and add it to the cart") and it reads the page, decides what to click, and adapts when the layout differs from what it expected. It is slower and less deterministic, but it survives site changes and handles tasks you cannot fully specify in advance. The honest trade-off: use scripted Playwright for stable, high-volume, repeatable jobs where you control the target, and an agent-driven browser for one-off or fuzzy tasks where adaptability beats speed. Many real systems combine them - an agent to discover the flow once, then a generated script to run it cheaply at scale. #### The .har trick This is the highest-leverage move in the whole lesson. Before you scrape any rendered HTML, open DevTools, go to the Network tab, perform the action you want to automate, then right-click and "Save all as HAR". A .har file is a complete recording of every network request the page made, including the JSON API calls behind the scenes. Most modern sites render from clean internal APIs, which means the data you want is often sitting in a tidy JSON response you can call directly - no selectors, no headless browser, far faster and far more robust. The file is large and noisy, which is exactly where an agent shines: hand it the .har and ask it to extract the relevant endpoints. ```markdown ## Task Here is a .har file I recorded while loading the product list page. Extract the underlying API the page uses to fetch products. I want: 1. The exact request URL and method. 2. Which headers actually matter (auth token, content-type) vs noise. 3. Query params or request body, with what each field appears to mean. 4. The shape of the JSON response (the fields I care about: name, price, id). 5. A minimal fetch() call in TypeScript that reproduces just that request. Ignore analytics, fonts, images and tracking pixels. Focus only on the call that returns the product data. ``` A prompt that turns a noisy .har export into a clean, replayable API call. The agent does the tedious filtering for you. Once the agent gives you the fetch call, you have replaced a fragile browser scrape with a direct API call that is faster, cheaper and far more stable. If the API needs an auth token, you grab it from the non-headless session above. This combination - manual login for the token, .har trick for the endpoint - handles a huge fraction of real-world "this site has no API" problems. #### Ethics, legality and scale Automating the web is powerful and not consequence-free, so be deliberate. This section is guidance, not legal advice - when money or personal data is involved, talk to a lawyer. - Read the terms of service and robots.txt. Some sites prohibit automated access, and ignoring that can breach a contract or, in some jurisdictions, computer-misuse law. - Never scrape personal data without a lawful basis. The privacy laws from CLAUDE.md (GDPR, FADP, US state laws) apply to scraped data the same as any other. - Rate-limit yourself and identify honestly. Hammering a site is abusive and gets you blocked; a polite, slow scraper is a better citizen and more reliable. - For legitimate scale, use a provider like Brightdata. They handle proxy rotation, geo-distribution and compliance tooling, which is the responsible way to run large jobs rather than building a covert bot army. #### Typical mistakes The classics: scraping brittle HTML when a clean JSON API was sitting in the .har the whole time; hard-coding credentials instead of using the non-headless session trick; committing auth.json or a token to Git; running an agent-driven browser for a stable high-volume job where a cheap script would do; and ignoring terms of service and rate limits until you get blocked or worse. Check the Network tab before you write a single selector. #### Business ROI Browser automation unlocks data and actions that would otherwise require an integration the vendor never built. The .har trick alone can turn a multi-day scraping project into an afternoon, because you skip the rendered page and call the real API. The non-headless trick removes the most fragile part of any authenticated automation. Together they let a small team automate competitor research, internal data entry across legacy systems, and repetitive vendor-portal tasks that have no API - work that otherwise stays manual forever. #### Checklist Confirm you can do each of these before moving on, because the next lesson assumes you can run untrusted code safely. - Write a basic Playwright script that loads a page and reads data. - Use the non-headless trick to log in once and reuse the session headlessly. - Explain when to use agent-driven browsing instead of a fixed script. - Record a .har file and extract a replayable API call from it with your agent. - State two legal or ethical limits you will always respect when scraping. #### Resources Keep the Playwright docs bookmarked for selector and auto-wait reference, and the Browser Use docs for agent-driven browsing. Your browser DevTools Network tab is the single most useful tool in this lesson - learn to record and save a HAR from it fluently. For large or compliance-sensitive jobs, Brightdata and similar providers are the documented, supported path. #### Your task Pick a site you use that has no API. Record a .har of the action you care about, hand it to your agent with the prompt above, and see whether you can reproduce the action with a direct fetch call. If the site needs a login, save a session with the non-headless trick first. Write down which approach (script, agent-driven, or direct API from the .har) you would ship. #### Next lesson Running code an agent generated - or that a scrape pulled in - on your own machine is risky. The next lesson covers sandboxes: isolated, disposable environments from E2B, Daytona and similar tools that contain a bad run so it cannot touch your system. ### 4.3 Sandboxes: Safe Code Execution with E2B, Daytona and Co. - Canonical URL: https://agenticschool.dev/courses/automation-agentic-systems/sandboxes-safe-code-execution-with-e2b-daytona-and-co - Duration: 24 min Summary: Letting an agent run arbitrary code on your machine is a real risk, not a hypothetical one. A sandbox gives the code an isolated, disposable environment so a bad or hostile run cannot touch your system or data. This lesson explains why sandboxing matters, what a sandbox actually is, how E2B and Daytona work, and exactly when you need one. #### Summary An agent that can write code is also an agent that wants to run it, and the moment you let arbitrary code execute on your laptop or server, one bad line can delete files, leak secrets from your environment, or open a connection you did not intend. A sandbox solves this by giving the code a fresh, isolated, throwaway computer to run on. If the run goes wrong, you discard the sandbox and lose nothing. This lesson explains the risk honestly, defines what a sandbox is, introduces the main services, and gives you a clear rule for when to use one and when a local run is fine. #### What you will learn You will learn why executing AI-generated code on your own machine is a security and reliability hazard, what a sandbox actually provides (isolation, disposability, resource limits), how managed services like E2B and Daytona spin one up on demand for your agent, and the three concrete situations that always call for a sandbox. You will leave able to decide, per task, whether code can run locally or must be contained. #### Prerequisites Courses 1 to 3, especially the secrets and security material from Course 3. You should understand that your machine holds environment variables, SSH keys and logged-in sessions that any local process can read, because that is precisely what a sandbox protects. The API basics matter too, since you drive these services through their SDKs. #### The problem Agentic workflows generate code and then need to run it to see if it works, transform some data, or fulfil a request. The temptation is to just run it - and for code you wrote and reviewed, that is fine. But agent output is not always correct, and if any input came from a user or the open web, it is not always friendly either. A single rm command, an infinite loop that pins your CPU, a script that reads process.env and phones it home: these are not exotic, they are the default failure modes of running untrusted code with full access to your real system. #### What a sandbox actually is A sandbox is an isolated, disposable execution environment - in practice a lightweight virtual machine or a hardened container - that has no access to your real filesystem, your secrets, or your network except what you explicitly allow. Code runs inside it with its own filesystem, its own memory and CPU limits, and a hard time budget. When the run finishes, you tear the sandbox down and nothing persists. The mental model is a clean-room: the code gets exactly the inputs you hand it and produces exactly the outputs you collect, and it cannot reach anything outside the room. - Isolation: the code cannot see your host filesystem, environment variables or other processes. - Disposability: each run is a fresh environment; a compromised or broken sandbox is simply destroyed. - Resource limits: CPU, memory and wall-clock time are capped, so a runaway loop cannot take down your machine. - Controlled I/O: you decide what files go in and what comes out; network access can be restricted or denied. #### E2B, Daytona and friends You do not build sandboxes yourself - you call a service that spins them up in under a second through an SDK. E2B is purpose-built for AI agents executing code: you create a sandbox, run code or shell commands inside it, read the output, and kill it, all from a few lines in your app. Daytona offers fast, ephemeral development environments in a similar spirit, useful when an agent needs a fuller workspace rather than a single execution. Both abstract away the VM and container plumbing so your agent gets a safe place to run code on demand. The pattern below is what almost every integration looks like. ```typescript import { Sandbox } from '@e2b/code-interpreter' // Spin up a fresh, isolated sandbox for this run only const sandbox = await Sandbox.create() try { // Run agent-generated code where it can do no harm to you const execution = await sandbox.runCode(agentGeneratedPython) console.log(execution.logs.stdout) console.log(execution.error) // contained: an error here is not your problem } finally { // Always tear it down - nothing persists await sandbox.kill() } ``` The core sandbox pattern: create, run untrusted code inside, collect output, destroy. The exact SDK surface evolves, so check the current E2B docs. Product details and SDK names change quickly in this space, so treat the snippet as the shape of the pattern and confirm the current method names against the official E2B and Daytona docs before you build. The concept - create, execute, collect, destroy - is stable even as the APIs move. #### When you actually need a sandbox Not every code run needs isolation, and over-sandboxing adds latency and cost. The rule is simple: sandbox whenever the code or its inputs are not fully trusted, and run locally when they are. Three situations always cross that line. - User-submitted code: anything a product lets users write and run (a "run this snippet" feature, a data-transformation field, a formula engine) must be sandboxed. You are executing strangers code on your infrastructure. - Untrusted agent output: when an agent generates code from untrusted input (a scraped page, a user request, a third-party document) and you run it without reading every line, contain it. The agent is only as safe as the worst input it saw. - Parallel experiments: when you want to run many variations at once - test a generated script against ten datasets, fuzz an idea, let several agents try solutions - sandboxes give you clean, isolated, parallel environments that cannot interfere with each other or your host. - When local is fine: code you wrote or fully reviewed, run against trusted inputs, on a machine whose secrets you control. Do not pay the sandbox tax for that. #### Typical mistakes The dangerous ones: running agent-generated code locally "just this once" and leaking an API key from your environment; building a product feature that executes user input with no isolation, which is a textbook remote-code-execution vulnerability; forgetting to set CPU, memory and time limits so a runaway run still hurts you; and leaving sandboxes alive instead of tearing them down, which wastes money and defeats disposability. The opposite mistake also exists: sandboxing trivial, trusted code and slowing yourself down for no security gain. #### Business ROI Sandboxing is the difference between an agentic feature you can safely put in front of users and a liability you cannot ship. A single contained incident - a hostile input that would have wiped a server, now just destroying a throwaway sandbox - pays for the whole practice. For products that run user or agent code, sandboxes are not optional polish; they are the control that lets you offer the feature at all. And for internal experimentation, cheap parallel sandboxes let a small team try ten ideas in the time it would take to carefully run one. #### Checklist You are ready to move on when each of these is solid, because the next lesson has you building tools that may run generated code. - Explain in one sentence why running untrusted code locally is risky. - Define the four properties of a sandbox (isolation, disposability, limits, controlled I/O). - Describe the create-run-collect-destroy pattern with a service like E2B. - Name the three situations that always require a sandbox and one that does not. #### Resources Keep the E2B and Daytona docs bookmarked, since their SDKs evolve and the current method names matter when you build. The security material from Course 3 on secrets and environment variables is the foundation here - a sandbox only helps if you also keep real secrets out of the code you hand it. When in doubt about whether to sandbox, default to yes for anything touching user or third-party input. #### Your task Sign up for E2B (it has a free tier), then write a tiny script that spins up a sandbox, runs a few lines of generated Python inside it, prints the output, and tears it down. Deliberately run a line that would be destructive on your real machine (like writing a junk file) and confirm it stays contained. That hands-on proof is what turns sandboxing from an abstract idea into a habit. #### Next lesson With safe execution covered, you are ready to build real tools on top of model APIs. The next lesson shows how, using two founder case studies: an invoice-assignment tool and a Swiss trading-cards cataloguer that turn a photo into a database record. ### 4.4 Building Your Own AI Tools with APIs - Canonical URL: https://agenticschool.dev/courses/automation-agentic-systems/building-your-own-ai-tools-with-apis - Duration: 28 min Summary: You do not have to wait for someone to build the tool you need. With model APIs you build your own, often in an afternoon. This lesson shows how to call Gemini and Claude directly, get structured JSON back, and use vision to turn a photo into a database record - illustrated by two real founder tools: invoice assignment and a Swiss trading-cards cataloguer. #### Summary The biggest shift in building software is this: when the tool you need does not exist, you build it the same afternoon. Model APIs let you call an LLM from your own code, with your own prompt, and - crucially - get structured data back instead of a wall of prose. Once a model can reliably turn an image or a document into a clean JSON record, a whole class of manual data-entry work disappears. This lesson teaches the direct API call, the structured-output trick that makes the response usable, and two real tools the founder of this school built from exactly these pieces. #### What you will learn You will learn to call Gemini and Claude directly with a minimal fetch request, to force the model to return JSON matching a schema so the output drops straight into a database, to use vision so a photo becomes structured data, and to recognise when building a small internal tool beats paying for SaaS. The two founder case studies make it concrete: photo of an invoice in, assigned record out; photo of a trading card in, catalogued record out. #### Prerequisites Courses 1 to 3. You need the model-selection lesson from Course 1 (tool cost depends entirely on which model you call), the secrets discipline from Course 3 (the API key never touches client code), and a database to write into - Convex from Course 3 is perfect. The Fundamentals page on what an API is covers the request basics if you need them. #### The problem Businesses pay monthly for SaaS tools that do one narrow thing - read receipts, tag images, extract fields from PDFs - and still do not quite fit their workflow. Meanwhile the same job is a single API call away. The blocker has never been capability; it is that people do not realise how little code stands between "I have a photo of an invoice" and "the invoice is in my accounting system, assigned to the right project". This lesson removes that blocker by showing the whole path end to end. #### APIs as building blocks Calling a model API directly gives you total control: your prompt, your model, your output format, no UI in the way. It is also less code than people expect. A request is a POST with your API key in a header and a JSON body describing what you want. Here is a minimal call to Gemini and the same idea against Claude, so you can see both. Google offers a genuinely generous free Gemini tier through AI Studio, which makes it the natural place to prototype vision tools without spending anything. ```typescript // Minimal Gemini call. The key lives in an env var, never in client code. const res = await fetch( 'https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent', { method: 'POST', headers: { 'Content-Type': 'application/json', 'x-goog-api-key': process.env.GEMINI_API_KEY!, }, body: JSON.stringify({ contents: [{ parts: [{ text: 'Summarise this in one sentence: ...' }] }], }), }, ) const data = await res.json() console.log(data.candidates[0].content.parts[0].text) ``` A minimal Gemini API call. Model names and exact paths change - confirm against the current Google AI docs. Claude works the same way: a POST to the Anthropic messages endpoint with your key in an x-api-key header and a messages array in the body. The provider differs, the shape is the same. Pick the model with the model-selection rule from Course 1 - a fast, cheap model for high-volume extraction, a stronger one only when the reasoning is genuinely hard. #### Structured output: the trick that makes it useful A model that replies in prose is not a tool - you cannot put a paragraph into a database column. The trick is to demand structured output: give the model a JSON schema and require it to return data matching that schema exactly. Modern APIs support this directly (a response schema or structured-output mode), and the result is a guaranteed-shape object you can validate and insert. This is what turns "the model said something about the invoice" into "the invoice row has supplier, amount, currency, date and project_id". Always validate the returned JSON against your schema (Zod from your stack is ideal) before trusting it, because a model can still occasionally drift. ```typescript import { z } from 'zod' // The exact shape you want back - this IS your database record. const InvoiceSchema = z.object({ supplier: z.string(), invoiceNumber: z.string(), amount: z.number(), currency: z.string(), issueDate: z.string(), // ISO date projectId: z.string().nullable(), }) // Tell the model to return ONLY JSON matching this schema, then validate. const parsed = InvoiceSchema.parse(JSON.parse(modelJsonString)) // parsed is now a typed, validated record ready to insert. No prose. ``` Define the record shape with Zod, instruct the model to return matching JSON, and validate before insert. Validation catches the rare drift. #### Vision: a photo in, a record out The same API accepts images, not just text. Vision models like Gemini read a picture and, combined with the structured-output trick, return a clean record describing what they see. You send the image bytes alongside your instruction and schema, and you get structured data back. This is the move that automates physical-world data entry: point a phone at a document or an object, and a database row appears. The model does the reading; your schema does the structuring; your code does the inserting. Three steps, and a task that used to be a person typing for hours becomes a photo and a webhook. #### Founder case study: invoice assignment Here is a real one we built. A business was drowning in supplier invoices that each had to be read, have their fields extracted, and - the tedious part - be assigned to the correct internal project before going into the accounting system. We built a small tool: drop an invoice photo or PDF in, a vision model extracts supplier, number, amount, currency and date into the exact schema above, and a second step matches it to the right project using the line items and supplier history. A human still approves edge cases (more on that in the human-in-the-loop lesson), but the reading and assigning that used to eat hours a week now happens in seconds. No SaaS subscription, no per-document fee, total control over the logic, and it fits the business exactly because the business defined the schema. #### Founder case study: Swiss trading cards The second tool is more fun and makes the same point. We had a large collection of Swiss trading cards to catalogue - each needs its player or subject, set, year and condition recorded, which by hand is mind-numbing. The tool is almost embarrassingly simple: photograph a card, a vision model returns a structured record (name, set, year, estimated condition) matching a schema, and it lands in a database with the image attached. What would have been days of manual entry became an afternoon of taking photos. The lesson is not about trading cards; it is that "image to structured database record" is a universal pattern. Invoices, cards, inventory, business cards, receipts, equipment serial plates - the same three steps apply to all of them. #### Build, do not buy Both case studies replaced a SaaS purchase with an internal tool, and that is the strategic point. When a job is narrow and specific to your business, a small API-backed tool you own usually beats a generic product you rent. You get an exact fit, no per-seat or per-document fees, full control of the data, and the ability to change the logic the moment your process changes. This is not "build everything" - use great SaaS for commodity needs. It is "for the narrow, repetitive, business-specific data jobs, a fifty-line tool on a model API often wins". - Build when the job is narrow, specific to your business, and high-volume enough that per-unit SaaS fees add up. - Buy when the need is generic, the SaaS fit is good, and you would be reinventing a mature product. - Own your data and schema. A tool you built bends to your process; a tool you rent makes your process bend to it. #### Typical mistakes The common ones: putting the API key in client-side code where anyone can steal it (it belongs in a server env var, always); asking for prose and then trying to parse it with fragile string matching instead of demanding schema-validated JSON; skipping validation and inserting a malformed record into your database; using an expensive flagship model for simple high-volume extraction when a cheap fast model is plenty; and buying SaaS for a job a fifty-line internal tool would do better and cheaper. #### Business ROI This is the lesson where AI stops being a chat toy and starts replacing line items on your invoice and hours on your calendar. An image-to-record tool can eliminate a part-time data-entry role, and because you own it the marginal cost per document is fractions of a cent of model usage, not a SaaS subscription. The founder tools above each took an afternoon to build and saved recurring hours every week. For a small business, the ability to build the exact tool you need on demand is a structural advantage competitors who only buy SaaS cannot match. #### Checklist You are ready to move on when each of these is true, because the next lessons build funnels and feedback loops on top of tools like these. - Make a minimal API call to Gemini or Claude with the key safely in an env var. - Force structured JSON output and validate it with a schema before use. - Turn a photo into a structured database record with a vision model. - Decide, for a real job, whether to build an internal tool or buy SaaS. #### Resources Grab free Gemini credits from Google AI Studio to prototype vision tools at no cost, and keep the Anthropic and Google AI docs handy because model names and the structured-output API surface change. Zod from your existing stack is your validation layer. The /builds case studies for the invoice automation and Swiss trading cards tools go deeper on each if you want the full story. #### Your task Pick one repetitive data-entry task in your work that starts with an image or document. Build a tiny tool: take the image, send it to Gemini with a Zod schema, validate the JSON, and log the record. You do not need a UI - a script that prints the structured record is proof. Note how long it took versus what the manual task costs you each week. #### Next lesson Tools and automations need people to find them. The next lesson covers the marketing plumbing: lead magnets, capture forms, funnels and the double opt-in email rules you must follow in the EU and Switzerland. ### 4.5 Email, Lead Magnets and Funnels: Capture and Nurture - Canonical URL: https://agenticschool.dev/courses/automation-agentic-systems/email-lead-magnets-and-funnels-capture-and-nurture - Duration: 24 min Summary: A great product nobody hears about earns nothing. This lesson covers the marketing plumbing: the vocabulary (impressions, CTR, conversion, lead, funnel, upsell), lead magnets that earn an email address, newsletter and contact form basics, the double opt-in legal requirement in the EU and Switzerland, and whether to send email from your own domain or a third party. #### Summary You can build the best tool in the world and earn nothing if nobody finds it and nobody comes back. Marketing is the plumbing that fixes that: it captures interested strangers, earns permission to contact them, and nurtures them toward becoming customers. This lesson teaches the vocabulary so you can plan and talk about growth precisely, the mechanics of lead magnets and capture forms, the funnel that turns a contact into a customer, and the legal rules - especially double opt-in - that you must follow when you collect and email people in Europe and Switzerland. #### What you will learn You will learn the core marketing terms (impressions, click-through rate, conversion, lead, funnel, upsell), how to design a lead magnet worth an email address, how newsletter and contact forms actually capture a contact, why double opt-in is a legal requirement and not a nicety in the EU and CH, and the trade-offs between sending email from your own domain and using a third-party provider. #### Prerequisites A live site from Course 1 or 3 to put a form on, and the automation basics from earlier in this course - a funnel is just an automation triggered by a form submission. The legal context from CLAUDE.md (GDPR, FADP, US state laws) is directly relevant here, because email marketing is one of the most regulated things a small business does. #### The problem Founders pour effort into a product and then "do marketing" as a vague afterthought, usually meaning a few social posts that reach no one. They have no way to capture the handful of genuinely interested people who do show up, no permission to follow up, and no language to even diagnose what is failing. Is the problem that nobody sees the page (impressions), that they see it and do not click (CTR), or that they click and do not act (conversion)? Without the vocabulary and the plumbing, you cannot tell, so you cannot fix it. #### The vocabulary you need You cannot improve what you cannot name. These few terms let you diagnose a funnel precisely and talk to any marketer or tool. Learn them once and the whole field stops being a fog. - Impressions: how many times your thing was shown (a search result, an ad, a post). The top of the funnel - no impressions, no anything. - Click-through rate (CTR): of those who saw it, the share who clicked. Measures how compelling your title and offer are. - Conversion: the share of visitors who take the action you want (sign up, buy, download). The number that actually pays. - Lead: a person who gave you permission to contact them, usually an email address. The asset a funnel exists to create. - Funnel: the path from stranger to customer, narrowing at each step (saw it, clicked, became a lead, bought). - Upsell: offering an existing customer something more (a higher tier, an add-on) once they already trust you. The cheapest revenue you will ever earn. #### Lead magnets and capture forms People will not give you their email for nothing, and they should not. A lead magnet is something genuinely useful you give away in exchange for an email: a checklist, a template, a short guide, a free tool, a discount. The bar is real value - if the magnet is thin, the lead is worthless. A capture form collects the email, and the best forms ask for the minimum (often just the email) because every extra field lowers conversion. The form submission is the trigger that starts everything downstream, which is where the automation skills from earlier in this course come in: form submitted, contact stored, welcome sequence begins. - Make the magnet specific and immediately useful - "the 12-point pre-launch security checklist" beats "our newsletter". - Ask for as little as possible. Email only, unless a later step truly needs more. - A contact form is a lead magnet too: someone reaching out is a warm lead, so capture and follow up, do not just reply once. - State clearly what they are signing up for and link your privacy policy on the form. This is both honest and legally required. #### Double opt-in: a legal requirement, not a nicety This is the part too many builders get wrong and it carries real legal risk. In the EU (GDPR) and Switzerland (revFADP), you generally need clear, demonstrable consent before sending marketing email, and the robust way to get and prove it is double opt-in. Single opt-in means someone submits the form and you start emailing. Double opt-in adds a step: after they submit, you email them a confirmation link, and they only become a subscriber once they click it. This proves the email belongs to them, proves they consented, and protects you from complaints and fines. It also improves deliverability because your list is clean. Treat double opt-in as mandatory for any EU or Swiss audience, and good practice everywhere. - Submit form, then send a confirmation email with a unique link. No marketing emails until they click it. - Log the consent: timestamp, IP, and what exactly they agreed to. You may need to prove it later. - Every marketing email must have a working one-click unsubscribe, and you must honour it promptly. - Never buy email lists or add people who did not opt in. It is illegal in the EU and CH and destroys your sender reputation. #### Email funnels: nurture toward purchase Once someone is a confirmed lead, a funnel is simply a sequence of emails that builds trust and moves them toward buying, sent automatically at the right intervals. A typical shape: a welcome email delivering the magnet, a few emails of genuine value (not constant selling), a soft introduction of your paid offer, and a clear call to action. The automation tools from the n8n lesson, or a dedicated email platform, send the right message at the right time without you touching it. The goal is not to spam; it is to stay useful and present until the moment the lead is ready, then make the offer obvious. A well-designed nurture sequence is the difference between a list that converts and a list that ignores you. #### Own domain versus third-party provider You will face one technical choice: send email yourself from your own domain, or use a third-party email provider. For transactional and marketing email at any real volume, use a reputable provider - they handle deliverability, the authentication records (SPF, DKIM, DMARC) that stop your mail landing in spam, bounce handling, unsubscribe management and compliance tooling. Sending bulk email from a raw self-hosted server is a deliverability nightmare and a security exposure. Note that SESSION-PLAN defers picking a specific email provider for this project, so this lesson teaches the durable principles rather than a brand: use your own domain for sender identity and trust, but send through a provider that manages deliverability and consent for you. - Use your own domain as the sender (hello@yourbusiness.com) for trust and brand - never a generic free address for business email. - Send through a provider that handles SPF, DKIM and DMARC, so your mail authenticates and reaches the inbox. - Let the provider manage unsubscribes, bounces and consent records - doing this yourself correctly is harder than it looks. - Keep marketing and transactional sending separated so a marketing complaint never blocks your password-reset emails. #### Typical mistakes The expensive ones: emailing people who only single-opted-in (or never opted in at all) in the EU or CH, which is a legal violation; a weak lead magnet that earns junk emails; capture forms that demand ten fields and kill conversion; no unsubscribe link; sending bulk mail from a raw server straight into spam folders; and "doing marketing" with no way to measure impressions, CTR or conversion, so you cannot tell what is broken. Get consent right first - everything else is fixable, a privacy complaint is not. #### Business ROI An email list is the only marketing asset you fully own - not rented from a platform that can change its algorithm overnight. Every confirmed, consented lead is a person you can reach directly, repeatedly, at near-zero cost, which is why a healthy list is one of the highest-value things a small business builds. Funnels turn that list into recurring revenue automatically, and upsells to existing customers are the cheapest sales you will ever make. The vocabulary lets you find and fix the leaky step instead of guessing. Done right and legally, this plumbing compounds for years. #### Checklist You are ready to move on when each of these is true, because the next lesson adds the human oversight that keeps automated systems safe. - Define impressions, CTR, conversion, lead, funnel and upsell from memory. - Describe a specific lead magnet and a minimal capture form for your business. - Explain why double opt-in is legally required for an EU or CH audience. - State why you send from your own domain through a third-party provider. #### Resources The privacy and consent material in Course 5 goes deeper on GDPR and FADP compliance, and is required reading before you email a single real person in Europe or Switzerland. SESSION-PLAN defers the specific email provider for this project, so when you implement, choose a reputable provider and follow their authentication and double opt-in guides rather than rolling your own. Your privacy policy must be live and linked from every form. #### Your task Design one lead magnet for your business and write the three emails of a minimal nurture funnel: the welcome-and-deliver email, one pure-value email, and one soft-offer email. Sketch the double opt-in confirmation step. You do not have to send them yet; the goal is a concrete, compliant funnel on paper you could wire up in an afternoon. #### Next lesson Automated systems are powerful and occasionally wrong, so the best ones keep a human at the points that matter. The next lesson covers human-in-the-loop design, the founder's Stripe Minions pattern, and turning every mistake into a rule the system learns from. ### 4.6 Human in the Loop: Continuous Learning Systems - Canonical URL: https://agenticschool.dev/courses/automation-agentic-systems/human-in-the-loop-continuous-learning-systems - Duration: 26 min Summary: Full autonomy fails today, so the best agentic systems put a human at the moments that need judgement and learn from every correction. This lesson covers approval checkpoints, the founder's Stripe Minions pattern - an automated pipeline where each stage emails a human for approval before continuing - feedback loops that improve the system, and capturing every gotcha as a rule. #### Summary Full autonomy is a trap right now. Agents are capable but not reliable enough to run unsupervised on anything that touches money, customers or data, because the cost of a confident mistake is too high. The answer is not to abandon automation - it is to keep a human at exactly the points that need judgement, let the system run everywhere else, and feed every correction back so the system needs less oversight over time. This lesson teaches where to place approval checkpoints, the concrete Stripe Minions pattern the founder of this school uses, and how to turn every gotcha into a permanent rule. #### What you will learn You will learn why full autonomy fails today and how to reason about it honestly, how to place approval checkpoints so human effort is minimal but well-targeted, the Stripe Minions pattern where each stage of a pipeline emails a human to approve before continuing, and how to build feedback loops that capture every mistake as a rule - so the system you run this month is smarter than the one you ran last month. #### Prerequisites The automation and tool-building from earlier in this course, because human-in-the-loop is a control layer on top of working automation. You should be comfortable with webhooks and email-triggered actions from the n8n and email lessons, since approval checkpoints are usually implemented as "the system emails a human and waits for a reply or a click". #### The problem The dream is "set it and forget it": an agent that runs your pipeline end to end while you sleep. The reality in 2026 is that agents are right most of the time and wrong in ways that are occasionally expensive and hard to predict. Hand full autonomy to a system that assigns invoices, refunds customers or sends emails, and the one-in-fifty mistake becomes a refund to the wrong account or an email to the wrong list. Founders either over-trust (and get burned) or over-control (and lose the automation entirely). The skill is finding the few checkpoints where a human adds the most safety for the least friction. #### Where humans belong Put a human where a mistake is costly, irreversible, or needs context the system lacks - and nowhere else. The art is keeping checkpoints few but decisive. A good checkpoint is a moment of judgement, not a rubber stamp: if a human approves everything without thinking, the checkpoint is theatre, so design them to surface only the cases that genuinely need a decision. - Checkpoint where it is costly or irreversible: moving money, deleting data, sending to many customers, anything you cannot undo. - Checkpoint where the system is unsure: low-confidence matches, ambiguous inputs, anything outside the cases it has seen. - Auto-run the confident, reversible, low-stakes majority. Most steps in any pipeline are safe to automate fully. - Make the human decision cheap: surface the context, the proposed action, and a one-click approve or reject, so review takes seconds not minutes. #### The Stripe Minions pattern Here is the concrete pattern the founder of this school runs, named the Stripe Minions because it grew out of payment and finance automation where mistakes are expensive. The idea: build the pipeline as a chain of small autonomous workers (the "minions"), but make each stage email a human for approval before it hands off to the next. Each minion does its narrow job, proposes the result, and pauses. The human gets an email with the proposed action and an approve or reject link. Approve, and the next minion takes over automatically; reject, and the pipeline stops or routes for correction. The system does all the work; the human supplies judgement at the gates. It is fully automated in effort and fully supervised in risk, which is exactly the balance autonomy cannot yet deliver on its own. - Break the job into small, single-purpose stages (minions), each with one clear output. - After each stage, email the responsible human: here is what I propose to do next, approve or reject. - On approve, the next minion runs automatically. On reject, halt or route to a fix queue - and capture why. - The email IS the interface: no dashboard to check, the work comes to the human, who acts in seconds from their inbox. This pattern is why the email-triggered automation from earlier in the course matters: an approval checkpoint is just a webhook that fires on a link click and resumes the pipeline. You can build the whole thing in n8n or a small app with the tools you already have. #### Feedback loops: every gotcha becomes a rule A human checkpoint is wasted if the system makes the same mistake forever. The multiplier is the feedback loop: every time a human rejects or corrects something, you capture why and turn it into a rule the system follows next time. This is the continuous-learning part, and it is the same philosophy as the CLAUDE.md skills library from Course 2 - you accumulate hard-won corrections into a growing set of rules so the system improves with use. Over time the checkpoints that fire most often shrink, because the gotchas behind them have been encoded as rules, and the human is left to review only genuinely new situations. - When a human rejects an action, log the input, the wrong proposal, the correct one, and the reason in one place. - Convert recurring rejections into explicit rules the system applies automatically (a check, a prompt instruction, a validation). - Track which checkpoints fire most. A frequently-rejected stage is telling you exactly what rule it is missing. - Treat your rule set as a living document that grows every week. The system you run next month should need less oversight than today. #### Typical mistakes The damaging ones: granting full autonomy to a system that touches money or customers and discovering the failure mode in production; the opposite error of requiring human approval for everything until the automation is slower than doing it by hand; rubber-stamp checkpoints where humans approve without real review, which is worse than no checkpoint because it manufactures false confidence; and - the most common waste - never closing the feedback loop, so the system repeats the same mistake every week instead of learning from the first correction. #### Business ROI Human-in-the-loop is what makes automation safe enough to actually deploy on the work that matters. A pipeline that handles 95 percent autonomously and routes the risky 5 percent to a human inbox does the volume of a team while keeping the safety of human judgement on the cases that count. The feedback loop compounds: each captured gotcha permanently removes a class of error, so the same headcount handles more over time. For a small business this is how you scale operations without scaling staff - the system does the work, people do the judging, and both get better every week. #### Checklist You are ready for the final lesson when each of these is solid, because it places everything you have built on the autonomy spectrum. - Explain why full autonomy fails today in one honest sentence. - Identify the few checkpoints in a real pipeline where a human adds the most safety for the least friction. - Describe the Stripe Minions pattern: each stage emails a human to approve before continuing. - Describe a feedback loop that turns every rejection into a rule the system follows next time. #### Resources The CLAUDE.md and skills-library material from Course 2 is the direct analogue of the rule-capturing feedback loop here, so revisit it - the same continuous-learning discipline applies to systems and to agents. The email and webhook tools from earlier in this course are what you use to implement approval checkpoints. The /builds case studies show these patterns running in real founder projects. #### Your task Take one automation you built earlier in this course and add a single approval checkpoint at its riskiest step, implemented as an email with an approve or reject link in the Stripe Minions style. Then add a place to log every rejection with its reason. Run it a few times and convert the first recurring rejection into an explicit rule. You now have a system that learns. #### Next lesson You have seen autonomy fail and learned to supervise it. The final lesson of this course zooms out to the five levels of LLM autonomy, explains why validation - not generation - is the real blocker to climbing them, and shows where to realistically operate today. ### 4.7 The 5 Levels of LLM Autonomy - Canonical URL: https://agenticschool.dev/courses/automation-agentic-systems/the-5-levels-of-llm-autonomy - Duration: 24 min Summary: Autonomy is a ladder, not a switch. This lesson lays out five levels of LLM autonomy from a level-1 chat assistant to level-5 fully autonomous ship-and-learn loops, explains why validation (not generation) is the real blocker to climbing, explores AI-creates-AI recursion, and shows where to realistically operate today and how to climb a level without getting burned. #### Summary People talk about autonomy as on or off - either the AI does it or you do. It is actually a ladder with distinct rungs, and knowing which rung a system is on tells you how much to trust it and what it would take to climb. This lesson lays out five levels from a level-1 chat assistant to level-5 fully autonomous ship-and-learn loops, then makes the central argument of the whole course: the thing stopping you from climbing is almost never the model's ability to generate, it is your ability to validate. Where validation is cheap and reliable, autonomy can rise; where it is not, a human stays in the loop. We finish with AI-creates-AI recursion and a practical answer to "where should I operate today". #### What you will learn You will learn the five-level autonomy scale and how to place any system on it, why validation is the true bottleneck rather than model capability, what AI creating AI means and why it raises the stakes on validation rather than lowering them, and a concrete method for deciding where to operate now and climbing exactly one level at a time without taking on risk you cannot validate. #### Prerequisites The full arc of this course, because placing a system on the scale draws on everything: automation platforms, tool building, sandboxes and especially the human-in-the-loop lesson, which is really a lesson about validation. The model and context lessons from Course 1 underpin why generation is no longer the hard part. #### The problem Two opposite mistakes dominate. One camp believes the model is now smart enough to run everything autonomously and ships systems that fail in expensive, surprising ways. The other camp is so burned by failures that they keep AI permanently at level one, typing into a chat box, and miss the enormous value of higher rungs. Both misjudge the same thing: they think autonomy is gated by how clever the model is. It is not. It is gated by whether you can check the output reliably and cheaply. Get that straight and the whole question becomes tractable. #### The five levels Here is the ladder. The jump that matters is not from a weaker model to a stronger one - it is from a human checking every step to a system checking itself. Each rung removes a human from a part of the loop, and you can only remove a human from a step you have learned to validate without them. - Level 1 - Chat assistant: you ask, it answers, you do everything with the answer. The model generates, you decide and act. All validation is human, every time. - Level 2 - Assisted action: the model takes actions but asks permission at each step (a coding agent that proposes an edit and waits). You validate every action before it runs. - Level 3 - Supervised pipeline: the system runs multi-step workflows autonomously but stops at approval checkpoints for the risky steps - the Stripe Minions pattern. You validate the few decisive moments, not every step. - Level 4 - Bounded autonomy: the system runs end to end within guardrails and validates most of its own work (tests, schemas, checks), escalating to a human only on genuine exceptions. You validate the system, not each run. - Level 5 - Fully autonomous ship-and-learn: the system sets sub-goals, acts, validates, ships, observes the result, and improves itself in a closed loop. Human attention shifts from doing to designing the validation the system runs on itself. #### Validation is the real blocker This is the load-bearing idea of the whole course. Models got extraordinarily good at generation - writing code, extracting data, drafting copy - faster than almost anyone expected. Generation is largely solved for a huge range of tasks. What did not get solved at the same pace is validation: knowing, reliably and cheaply, whether a given output is actually correct. You cannot safely raise autonomy past the point where you can validate the output, because higher autonomy just means the system acts on its own generation without you checking. So the real engineering work of agentic systems is not better prompts - it is building cheap, reliable validation: tests, schemas, type checks, sanity checks, confidence thresholds, and the approval checkpoints from the last lesson. Wherever you can make validation automatic and trustworthy, you can climb a level. Wherever you cannot, a human stays in the loop, and that is correct, not a failure. - Generation is cheap and good; validation is the scarce, valuable thing. Invest your effort there. - Climbing a level always means replacing a human check with an automated one you trust. - If you cannot describe how you would validate a step without a human, you are not ready to automate that step. - The best agentic engineers are validation engineers - they build the checks that let the system run unsupervised. #### AI that creates AI The frontier rung is recursion: agents that build, test and improve other agents. An agent that writes a tool, generates tests for it, runs them, and refines the tool based on the results is doing in minutes what used to be a development cycle - and it is AI creating AI. This compresses the loop dramatically and is a real glimpse of where level five heads. But notice what it does to the central argument: it does not remove the validation problem, it concentrates it. When an AI creates another AI, the only thing standing between you and compounding, unsupervised error is the validation layer - the tests, the checks, the guardrails. Recursion raises the stakes on validation, it does not retire them. The teams who win at this build the validation that lets recursion run safely, rather than marvelling at the generation. #### Where to operate today and how to climb Be honest about where you are and climb deliberately. Most valuable real-world systems in 2026 sit at level three: autonomous pipelines with human approval at the risky steps. That is not a limitation to be embarrassed about; it is the responsible operating point for anything touching money, customers or data, and it captures most of the value of automation while keeping the safety of human judgement. Climb exactly one rung at a time, and only by building the validation that makes the next rung safe. - Find your current level for a given system: how many human checks does it still require, and at which steps? - Pick one human checkpoint to remove. Ask: what automated validation would let me trust this step without a person? - Build that validation (a test, a schema, a confidence threshold, a sanity check) and prove it catches the failures the human caught. - Only then remove the human from that step. Climb one rung, validate it in production, and repeat. Never skip rungs. #### Typical mistakes The recurring errors: jumping straight to level four or five because the model "seems smart enough", with no validation to catch its mistakes; staying stuck at level one out of fear and leaving enormous value unautomated; confusing better generation with readiness for more autonomy when validation is what actually gates the climb; and chasing AI-creates-AI recursion for its own sake without the validation layer that keeps it safe. Climb on the strength of your validation, never on the strength of the model alone. #### Business ROI Knowing the autonomy ladder turns "should we automate this?" from a gut call into a clear decision: you automate up to exactly the level your validation can support, and you invest in validation to climb further. That focus is worth real money - it stops you shipping unsupervised systems that fail expensively, and it stops you leaving value on the table by under-automating out of fear. The strategic insight for any founder is that validation, not generation, is the scarce skill of this era. The businesses that build cheap, reliable validation will operate at higher autonomy, at lower cost, with more safety, than competitors who only chase the next model. #### Checklist You have completed Course 4 when each of these is true. This is a real milestone - you can now design, build and safely operate agentic systems. - Place any system on the five-level scale from its remaining human checks. - Explain why validation, not generation, is what limits autonomy. - Describe what AI-creates-AI recursion does to the validation problem. - Name your current level for one real system and the single validation that would let you climb one rung. #### Resources The human-in-the-loop lesson is the practical companion to this one - approval checkpoints are validation made concrete. Course 5 turns validation into a discipline with tests, linting and CI/CD, and explores where this exponential trajectory is heading. Keep returning to the one question that decides everything here: how would I validate this step without a human? #### Your task Take one system you built across this course and place it honestly on the five-level scale. Write down the single human checkpoint you would remove next, the exact automated validation that would make removing it safe, and how you would prove that validation works. That one paragraph is the most useful planning you can do for any agentic system you build from here. #### Next lesson Course 5 makes all of this production-grade. It turns validation into a real discipline - tests, security, legal compliance, SEO and agent-first design - and ends with a capstone where you build and ship your own agentic product end to end. ### 5.1 Tests, Tests, Tests: TSC, Linting, Vitest, Playwright and CI/CD - Canonical URL: https://agenticschool.dev/courses/quality-security-agent-first/tests-tests-tests-tsc-linting-vitest-playwright-and-ci-cd - Duration: 28 min Summary: Agents move fast, which makes a safety net essential. This lesson covers the full quality stack: TypeScript type checking, linting, Vitest for unit tests, Playwright for end-to-end tests, a pre-push hook and a GitHub Actions pipeline, so regressions are caught before they ship instead of after a customer finds them. #### Summary There is a myth that AI writes the code so you no longer need tests. The opposite is true. An agent will happily rewrite a working function at 2am because you asked for an unrelated change, and it has no memory of why the old behaviour mattered. Tests are how you tell future agents what must not break. This lesson builds the four-layer quality gate - types, lint, unit, end-to-end - and wires it into a pre-push hook and a CI pipeline so quality is enforced by the machine, not by your discipline. #### What you will learn You will learn what each layer of the quality stack catches, how to write a Vitest unit test and a Playwright end-to-end test that keep working after a refactor, how to block a bad push with a Git hook, and how to run the whole suite automatically on every push with GitHub Actions. By the end you can hand an agent free rein on a repo and trust the gate to catch its mistakes. #### Prerequisites A working TypeScript project from earlier courses and the hooks lesson from Course 2, since CI/CD is just the team-wide version of a pre-push gate. You do not need to be a test expert. The point of this lesson is to make testing a habit your agent does for you, not a discipline you have to remember. #### The problem When you drive an agent hard, it touches a lot of code fast. A change to one file silently breaks another. The agent reports success because the file it edited looks right, and you only discover the regression when a page is blank in production. Manual testing does not scale to agent speed - you cannot click through every flow after every change. Without an automated gate, every agent session is a small gamble with your live product. The fix is to make "is it still working?" a question the machine answers in seconds. #### The four-layer quality gate Each layer catches a different class of bug, and they get slower and more thorough as you go down. Run the fast ones constantly and the slow ones before you ship. Together they form a net dense enough that fast, agent-driven changes stay safe. - Type check (tsc --noEmit): catches shape errors - a function called with the wrong arguments, a property that does not exist, a null you forgot to handle. Instant and free. - Lint (ESLint): catches bug-prone patterns and style drift - unused variables, missing awaits, accidental console.logs. Keeps agent-written code consistent with yours. - Unit tests (Vitest): check that your logic is actually correct - a price calculation, a validation rule, a date formatter. Fast, run on every save. - End-to-end tests (Playwright): drive a real browser through whole user journeys - sign up, add to cart, check out. Slow but they prove the real thing works. #### Writing a Vitest unit test Unit tests are small and fast. You give a function an input and assert the output. The trick that makes them survive agent refactors: test behaviour, not implementation. Assert what the function should produce for a user, never the private steps it takes to get there. Then the agent can rewrite the internals freely and the test still guards the contract. ```typescript import { describe, it, expect } from 'vitest' import { yearlyPricePerMonth } from './pricing' describe('yearlyPricePerMonth', () => { it('shows the discounted monthly figure for a yearly plan', () => { // Monthly is 17% above yearly, so yearly-per-month is monthly / 1.17. expect(yearlyPricePerMonth({ monthly: 117 })).toBe(100) }) it('never returns a negative price', () => { expect(yearlyPricePerMonth({ monthly: 0 })).toBeGreaterThanOrEqual(0) }) }) ``` A Vitest test that pins behaviour, not implementation - run it with bun run test When you ask an agent to add a feature, add a line to your spec: "write a Vitest test for it". Now the agent leaves behind a tripwire that protects the feature from the next agent, including a future version of itself. #### Writing a Playwright end-to-end test Playwright launches a real browser, visits your app and clicks through it like a user. It catches the bugs unit tests cannot: a broken route, a button that does nothing, a form that never submits. Write end-to-end tests for your money paths first - the handful of flows where a break costs you a customer. Select elements by what the user sees or by stable test ids, never by brittle CSS, so a redesign does not break every test. ```typescript import { test, expect } from '@playwright/test' test('a visitor can reach the pricing page from the hero CTA', async ({ page, }) => { await page.goto('/') await page.getByRole('link', { name: 'See pricing' }).click() await expect(page).toHaveURL(/\/pricing/) await expect( page.getByRole('heading', { name: 'Pricing' }), ).toBeVisible() }) ``` A Playwright test for a real user journey - run it with bun run test:e2e #### The pre-push hook: your local gate The first place to enforce the gate is your own machine, the moment before code leaves it. A Git pre-push hook runs your checks and refuses the push if any fail, so broken code never reaches the repo. This is the same idea you met in Course 2, now pointed at the full suite. Keep the local hook fast - types, lint and unit tests - and let the slower end-to-end suite run in CI. ```bash #!/usr/bin/env sh # .husky/pre-push - blocks the push if any check fails bun run typecheck || exit 1 bun run lint || exit 1 bun run test || exit 1 echo "All checks passed - pushing." ``` A pre-push hook that runs the fast layers before any code leaves your machine A hook you can skip with --no-verify is a suggestion, not a gate. The hook is your fast feedback; CI is the gate that nobody can bypass. You want both. #### GitHub Actions: the gate nobody can skip Continuous integration runs your whole suite on GitHub's servers on every push and pull request, regardless of what anyone ran locally. A workflow file in .github/workflows tells GitHub what to do. The example below installs dependencies, then runs all four layers including the end-to-end tests. Set your branch to require this check to pass before anything merges, and a regression simply cannot reach your main branch. ```yaml name: Quality gate on: push: branches: [main] pull_request: jobs: check: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: oven-sh/setup-bun@v2 - run: bun install --frozen-lockfile - run: bun run typecheck - run: bun run lint - run: bun run test - run: bunx playwright install --with-deps - run: bun run test:e2e ``` .github/workflows/quality.yml - the full gate, run automatically on every push #### Typical mistakes The big ones: writing zero tests because "the agent already checked it" (it cannot see your live app); testing implementation details so every refactor turns the suite red and you start ignoring it; making the local hook so slow you bypass it with --no-verify; and having CI but not requiring it to pass before merge, so it becomes decorative. Test behaviour, keep the local gate fast, and make CI mandatory. #### Business ROI A regression that reaches a customer costs you trust, a support ticket and an emergency fix, often at the worst possible time. A test that catches it costs the few seconds it takes to run. Once the gate exists you can let agents work far more aggressively, because the cost of a mistake drops from "broke production" to "the pipeline went red, fix it before merge". That is the real unlock: tests are not overhead on agentic development, they are what makes aggressive agentic development safe. #### Checklist You are ready to move on when every one of these is true for a real project, not just in theory. - You can explain what each of the four layers catches that the others miss. - At least one Vitest test and one Playwright test pass in your project. - A pre-push hook runs the fast layers and blocks a failing push. - A GitHub Actions workflow runs the full suite and is required before merge. #### Resources Bookmark the official Vitest and Playwright docs for matchers and selectors, and the GitHub Actions docs for workflow syntax, since these evolve. Add "write a test for this" to your standard spec-sheet constraints so every agent task leaves a tripwire behind. The hooks lesson in Course 2 is worth a reread now that you are wiring the full suite in. #### Your task Add one Vitest test and one Playwright test to a real project, then add the pre-push hook and the GitHub Actions workflow above. Deliberately break something an agent might break - rename a route, change a function's output - and confirm the gate goes red. That red is the whole point: it caught the mistake before a user did. #### Next lesson Quality covered, the next lesson hardens security: rate limits on every public endpoint, CSP headers, encrypting stored credentials, and the pre-public audit that stops a secret leaking when you flip a repo public. ### 5.2 Security Essentials: Rate Limits, CSP Headers, Key Encryption, Pre-Public Repo Audits - Canonical URL: https://agenticschool.dev/courses/quality-security-agent-first/security-essentials-rate-limits-csp-headers-key-encryption-pre-public-repo-audits - Duration: 26 min Summary: Security is not optional once real users arrive. This lesson covers the essentials every app needs: rate limits on every public endpoint, a Content Security Policy to harden the browser, encrypting stored credentials, audit logs, and a pre-public checklist so a repo never goes public with a secret buried in its history. #### Summary The moment your app is on the public internet, it is being probed by bots within minutes. You do not need a security team to be safe against the common attacks, you need a handful of essentials applied consistently: cap every public endpoint with a rate limit, lock down what the browser will run with a CSP, encrypt the credentials you store, log who did what, and never flip a repo public without auditing its history for secrets. This lesson makes each of those concrete. #### What you will learn You will learn to add rate limits to public endpoints, set a Content Security Policy header, encrypt stored credentials at rest, keep a useful audit log, and run a pre-public repo audit that catches secrets hiding in old commits. These are the measures that separate a hobby project from something you can responsibly point real users at. #### Prerequisites The secrets lessons from Courses 1 and 3, since several measures here build on keeping secrets in environment variables and out of code. A deployed app with at least one public endpoint or form makes the lesson concrete, but you can apply everything to your next project from the first commit. #### The problem Three failures hurt small builders again and again. First, an unprotected endpoint gets hammered - a login form brute-forced, a contact form spammed, or an AI endpoint called in a loop until your provider bill explodes. Second, a stored API key or password sits in plain text, so a single database leak hands an attacker everything. Third, and most common, a founder makes a repo public to share it and a secret committed two years ago is now on the open internet forever. Each is preventable with one habit. #### Rate limit every public endpoint A rate limit caps how often a single client can hit an endpoint in a window of time. It is the cheapest, highest-value security control you have, and any endpoint a stranger can reach needs one: login, signup, password reset, contact forms, and above all any endpoint that costs you money per call, like one that triggers an LLM. Limit by IP for anonymous traffic and by user or API key for authenticated traffic. When the limit is exceeded, return a 429 and stop. ```typescript // A minimal fixed-window limiter. In production use a shared store // (Redis, Upstash, or your DB) so it works across multiple servers. const hits = new Map() export function rateLimit(key: string, max = 10, windowMs = 60_000) { const now = Date.now() const entry = hits.get(key) if (!entry || now > entry.resetAt) { hits.set(key, { count: 1, resetAt: now + windowMs }) return { ok: true } } if (entry.count >= max) return { ok: false, retryAfter: entry.resetAt - now } entry.count += 1 return { ok: true } } // In your handler: // const limit = rateLimit(`login:${ip}`, 5, 60_000) // if (!limit.ok) return new Response('Too many requests', { status: 429 }) ``` A rate limiter sketch - cap by IP for anonymous traffic, by user or key for authenticated #### CSP headers and encrypting stored credentials A Content Security Policy tells the browser exactly which sources of scripts, styles and images it is allowed to load. It is your strongest defence against cross-site scripting: even if an attacker injects a script tag, the browser refuses to run it because the source is not on your allowlist. Start strict and loosen only what you must. Separately, anything sensitive you store - a third-party API key, an OAuth token, a user secret - should be encrypted at rest, so a database breach yields ciphertext, not working credentials. The encryption key itself lives in an environment variable, never in the database it protects. ```text Content-Security-Policy: default-src 'self'; script-src 'self'; style-src 'self' 'unsafe-inline'; img-src 'self' data: https:; connect-src 'self' https://api.your-provider.com; frame-ancestors 'none'; base-uri 'self' ``` A starting CSP - self by default, then allowlist only what your app genuinely needs - default-src 'self' means: unless stated otherwise, only load from your own domain. - connect-src controls which APIs the browser may call - list your real backends only. - frame-ancestors 'none' stops other sites embedding yours in an iframe (clickjacking). - Encrypt stored secrets with a library, not by hand; keep the key in an env var, rotate it if it ever leaks. #### Audit logs: know who did what When something goes wrong - a refund issued, an account deleted, a plan downgraded - you need to answer "who did this, when, and from where". An audit log is an append-only record of sensitive actions: the actor, the action, the target, a timestamp and the source IP. It is not the same as your application logs; it is a deliberate trail for security and accountability. Log every action that moves money, changes permissions, or touches another user's data. Never log the secrets themselves, only that the action happened. #### The pre-public repo checklist This is the one that bites hardest. Deleting a key from your latest commit does not remove it from history - it is still sitting in an old commit that anyone can read the moment the repo goes public. Before you flip any repo from private to public, walk this checklist every time. If you find a secret in history, the only safe move is to treat that secret as compromised: rotate it immediately, then scrub history. - Scan the full history for secrets, not just the current files. Use a tool like gitleaks or trufflehog: gitleaks detect --source . catches keys, tokens and passwords across every commit. - Confirm .env and every secret file are in .gitignore and were never committed. - Check for hardcoded keys, internal URLs, customer data and credentials in code and comments. - If a secret was ever committed: rotate it first (assume it is already public), then rewrite history to remove it before going public. - Only after a clean scan do you flip the repo to public. ```bash # Audit the whole git history for leaked secrets before going public gitleaks detect --source . --verbose # If anything is found, ROTATE the secret first, then scrub history. ``` Run this before every private-to-public flip - history outlives a deletion #### Typical mistakes The recurring ones: shipping an AI or email endpoint with no rate limit and waking up to a five-figure bill; storing third-party tokens in plain text so one leak compromises every connected account; skipping the CSP because it is fiddly, then eating an XSS; and the classic - flipping a repo public to "share the project" without auditing history, leaking a key that has been there for months. Every one is prevented by a habit in this lesson. #### Business ROI Security is invisible when it works and catastrophic when it does not. A leaked key can drain a budget overnight, a breach of stored credentials can end a business, and a single public-repo leak has cost founders real money and real customers. The essentials here cost you an afternoon to set up and then run forever. For an agent-driven team this matters double: agents generate endpoints fast, so the discipline of "every public endpoint gets a rate limit, every secret gets encrypted, every repo gets audited" has to be a rule the agent follows, not a thing you remember. #### Checklist Confirm each of these before you point real users at anything you have built. - Every public and money-spending endpoint has a rate limit returning 429 when exceeded. - A Content Security Policy header is set and as strict as your app allows. - Stored credentials and tokens are encrypted at rest, with the key in an env var. - You ran a full-history secret scan before any repo went public, and rotated anything found. #### Resources Keep gitleaks or trufflehog installed and add the history scan to your pre-public ritual. The OWASP Top 10 and MDN's CSP reference are the timeless sources when you need depth. Put "rate limit this endpoint" and "encrypt this stored secret" into your project rules so agents apply them without being asked. #### Your task Add a rate limit to one real endpoint and confirm it returns 429 when you exceed it. Set a CSP header on your app and fix whatever it breaks until the page works under a strict policy. Then run gitleaks against one of your repos and read the output - even a clean result teaches you to trust the check before you ever go public. #### Next lesson Secure and hardened, the next lesson covers legal and compliance: GDPR, lawful cookie consent including Global Privacy Control, and the US and Swiss privacy laws, explained without the headache. ### 5.3 Legal and Compliance: GDPR, Cookie Consent, Privacy Laws Without Tears - Canonical URL: https://agenticschool.dev/courses/quality-security-agent-first/legal-and-compliance-gdpr-cookie-consent-privacy-laws-without-tears - Duration: 26 min Summary: Collecting any user data brings legal duties. This lesson makes them manageable: the GDPR, UK GDPR, Swiss revFADP and CCPA/CPRA landscape in plain words, lawful basis and real consent, cookie banners done right including Global Privacy Control, your privacy policy and imprint, data export and deletion rights, and when you actually need a DPA. #### Summary Privacy law sounds terrifying and is mostly common sense written down. Collect only what you need, tell people what you collect and why, get real permission before you track them, and let them see and delete their data on request. GDPR, the UK GDPR, the Swiss revFADP and the US state laws all share that core. This lesson translates the landscape into a short list of things you actually build, with a plain-words note that none of it is legal advice - when in doubt, ask a lawyer. #### What you will learn You will learn the shared core of the major privacy laws, what "lawful basis" and "real consent" mean in practice, how to build a cookie banner that complies (including honouring Global Privacy Control), what your privacy policy and imprint must contain, how to handle data export and deletion requests, and the simple test for when you need a Data Processing Agreement. #### Prerequisites A product that collects any user data - an email signup, an account, analytics - from Course 3 or your own project, since compliance is about how you handle that data. No legal background needed. This lesson is practical guidance, not legal advice; for anything high-stakes, get a professional to review your specific situation. #### The problem Most small builders do one of two wrong things. They either ignore privacy law entirely until a user or a regulator asks an uncomfortable question, or they freeze, convinced compliance needs a legal department they cannot afford. The truth sits in between: a solo founder can be genuinely compliant by following a handful of principles and shipping a few standard pages and a working consent banner. The cost of getting it wrong - fines, takedowns, lost trust - is far higher than the afternoon it takes to get it right. #### The landscape in plain words You do not need to memorise every law. They rhyme. Build to the strictest common denominator - effectively GDPR plus US opt-out rights - and you are covered almost everywhere. - GDPR (EU) and UK GDPR: the strict baseline. You need a lawful basis to process personal data, must be transparent, must minimise what you collect, and must honour access and deletion rights. Applies whenever you have EU or UK users, wherever you are based. - Swiss revFADP: the revised Swiss Federal Act on Data Protection. Closely aligned with GDPR, so meeting GDPR largely covers it. Relevant if you serve Swiss users. - CCPA and CPRA (California) and the wave of other US state laws (Virginia, Colorado, and more): consumer-rights based. Built around the right to know, to delete, and to opt out of the sale or sharing of personal data. - Practical rule: design to GDPR, add a clear opt-out for US users, and you satisfy the strictest common denominator across all of them. #### Lawful basis and real consent Under GDPR you must have a lawful basis for every bit of personal data you process. For a small product the two you will use most are "contract" (you need the data to provide the service the user signed up for, like their email to run their account) and "consent" (the user actively agreed, which you need for marketing and non-essential tracking). Real consent is specific, informed, freely given and as easy to withdraw as to give. A pre-ticked box is not consent. A banner that only offers "Accept" is not consent. Bundling marketing into the signup terms is not consent. If you cannot honestly say the user chose, you do not have it. #### Cookie consent done right (including Global Privacy Control) The rule is simple and people break it constantly: non-essential cookies and trackers - analytics, ads, anything that profiles the user - must not load until the user agrees. Essential cookies that make the site function need no consent. A compliant banner gives "Accept all" and "Reject all" equal prominence, lets the user choose per category, and remembers the choice. Modern best practice also honours Global Privacy Control: when a visitor's browser sends the GPC signal, treat it as a valid opt-out automatically and do not even show a nagging banner for tracking they have already refused. ```typescript // Honour Global Privacy Control before showing any tracking banner. const gpc = (navigator as Navigator & { globalPrivacyControl?: boolean }) .globalPrivacyControl if (gpc) { // Treat as a valid opt-out: do not load trackers, do not nag. setConsent({ analytics: false, marketing: false }) } else if (!hasStoredConsent()) { showConsentBanner() // Reject all must be as easy as Accept all. } // Load analytics/marketing scripts ONLY after explicit opt-in. if (getConsent().analytics) loadAnalytics() ``` Respect GPC, gate trackers behind opt-in, and make Reject all as easy as Accept all #### Policy, imprint, and user rights A few standard pieces make you compliant and trustworthy. They are boring to write and you only do it once. - Privacy policy: in plain language, say what you collect, why, the lawful basis, who you share it with (your processors), how long you keep it, and how to contact you or complain. Link it in your footer. - Imprint / legal notice: required in places like Germany, Austria and Switzerland. State who is behind the site - name, address and contact - so visitors know who they are dealing with. - Right to access and delete: a user can ask for a copy of their data and ask you to delete it. Build a path to honour both. Even a manual process is fine at small scale, but you must actually do it within the legal window. - Data export: be able to hand a user their data in a portable format. If your database is clean (Course 3), this is a query, not a project. #### When you need a DPA A Data Processing Agreement is a contract between you and a company that handles personal data on your behalf - your hosting, your database, your email sender, your analytics provider. Under GDPR you need a DPA in place with every such processor. The good news: reputable providers publish a standard DPA you simply accept, often automatically in their terms. The practical test is "does this third party touch my users' personal data?" If yes, find and accept their DPA, and list them as a processor in your privacy policy. If you are the only one touching the data, you do not need one. #### Typical mistakes The frequent ones: loading Google Analytics or ad pixels before the user consents (the single most common violation); a cookie banner with a big "Accept" and a hidden or missing "Reject"; ignoring GPC and other browser signals; no privacy policy or a copied one that lies about what you actually collect; and forgetting DPAs with the providers you rely on. None are hard to fix, and all are easy for a regulator or a competitor to spot. #### Business ROI Compliance is both risk reduction and a trust signal. The downside of ignoring it is real - GDPR fines scale with revenue, and a botched data request can become a public complaint. The upside is quieter but real: a clear privacy policy and an honest consent flow tell customers you take their data seriously, which matters more every year. Build it in from the start and it costs an afternoon; retrofit it under pressure after you have users and it costs a painful migration. Do it early. #### Checklist Run through these for any product that collects user data before you promote it. - You can name the lawful basis for every category of data you collect. - No non-essential tracker loads before opt-in, and GPC is honoured automatically. - Your cookie banner offers Reject all as easily as Accept all and remembers the choice. - A real privacy policy and (where required) an imprint are linked in your footer. - You can export and delete a user's data on request, and you have a DPA with every processor. #### Resources The official ICO (UK), EDPB (EU), Swiss FDPIC and California Privacy Protection Agency sites are the authoritative, timeless references, and they publish plain-language guides for small businesses. A reputable consent-management platform handles the banner and GPC for you if you would rather not build it. For anything high-stakes, a short consult with a privacy lawyer is money well spent. This lesson is guidance, not legal advice. #### Your task Audit one of your projects: list every piece of personal data it collects and the lawful basis for each. Then check your consent flow - does any tracker fire before opt-in, and does Reject all really work? Fix whatever fails. If you have no privacy policy yet, draft one in plain language and link it in your footer. #### Next lesson Compliant and secure, the next lesson gets you found: classic SEO plus GEO/AEO, llms.txt and structured data, the favicon-in-SERP trick, and the system-of-websites strategy so both Google and AI recommend you. ### 5.4 SEO and GEO/AEO: Getting Recommended by Google AND by AI - Canonical URL: https://agenticschool.dev/courses/quality-security-agent-first/seo-and-geo-aeo-getting-recommended-by-google-and-by-ai - Duration: 28 min Summary: Discovery now has two channels: classic search and AI assistants. This lesson covers SEO fundamentals - titles, descriptions, sitemaps, Search Console - plus GEO/AEO: llms.txt, clean markdown content, structured data and being citable. It includes the favicon-in-SERP trick and the system-of-websites strategy of building many niche sites instead of one. #### Summary For twenty years, getting found meant ranking on Google. Now half your future customers will discover you through an AI assistant that reads the web, picks a few sources, and recommends them. You have to win both channels, and the good news is they reward the same thing: clear, well-structured, trustworthy content that a machine can read and cite. This lesson covers the SEO fundamentals and the new GEO/AEO layer - llms.txt, structured data, citability - plus two tactics that punch above their weight: the favicon trick and the system of websites. #### What you will learn You will learn the SEO fundamentals that still matter - titles, meta descriptions, sitemaps and Search Console - the SERP details like favicons that lift click-through, how GEO/AEO and llms.txt make your content citable by AI assistants, the role of clean markdown and structured data, and the system-of-websites strategy of running many focused sites instead of one sprawling one. #### Prerequisites A live, indexed site and the Search Console basics from Course 3, since discovery builds on being indexable in the first place. You should also remember the agentic surface concepts from Course 4 - llms.txt and clean machine-readable content are where SEO and agentic design meet. #### The problem Two failure modes are common. The first is building something good that nobody can find because the basics are missing - no descriptive titles, no sitemap, never verified in Search Console. The second is newer and more dangerous: a site optimised only for Google that is invisible to AI assistants, so when someone asks Claude or ChatGPT for a recommendation in your space, you are never mentioned. Winning one channel and ignoring the other leaves half your discovery on the table. #### SEO fundamentals that still matter The foundation has not changed: help both humans and crawlers understand each page, and earn trust. Get these right before chasing anything clever. - Title tag: unique per page, around 50 to 60 characters, leading with the words people actually search. This is your headline in the results and your single biggest lever on click-through. - Meta description: roughly 140 to 158 characters, written to earn the click, not to stuff keywords. It is your advert under the title. - Clean structure: one H1, logical headings, fast pages, mobile-friendly, descriptive URLs. Crawlers reward clarity. - Sitemap and Search Console: submit an XML sitemap and verify the site in Google Search Console so you can see impressions, clicks and which queries you rank for. - Trust: real content, no thin pages, internal links, and inbound links from sites that matter. #### The favicon-in-SERP trick Here is a small, underused edge. On mobile and increasingly on desktop, Google shows your site's favicon next to your search result. A crisp, distinctive, recognisable favicon makes your listing stand out in a wall of text and measurably lifts click-through, which in turn signals relevance and can help your ranking. Most people ship a default or blurry icon and leave this on the table. Make a clean favicon at the sizes Google expects, reference it correctly in your head tags, and confirm it actually shows. It is ten minutes of work for a permanent advantage on every result you ever rank for. #### GEO/AEO: making AI recommend you GEO (Generative Engine Optimisation) and AEO (Answer Engine Optimisation) are about being the source an AI assistant chooses to read and cite. The mechanics are different from Google but the spirit is the same: be readable, be structured, be trustworthy, be unambiguous. AI assistants favour content that directly answers the question, states facts plainly, and is easy to parse. The single most useful new artefact is llms.txt - a plain-markdown map of your site that points assistants at your best, cleanest content, the same way robots.txt and sitemap.xml guide search crawlers. ```text # Your Company > One clear sentence on what you do and who it is for. ## Core pages - [Pricing](https://example.com/pricing.md): plans and what each includes - [How it works](https://example.com/how-it-works.md): the product in plain steps ## Guides - [Getting started](https://example.com/guides/start.md): first 10 minutes - [API reference](https://example.com/api.md): endpoints and examples ## About - [Founder](https://example.com/about.md): who is behind this and why ``` /llms.txt - a markdown map that points AI assistants at your cleanest content - Serve a clean .md version of each important page so assistants get content without wading through your layout. - Add structured data (JSON-LD) for articles, products, FAQs and your organisation, so both Google and AI parse facts unambiguously. - Write answer-first: lead with the direct answer, then the detail. Assistants quote the clear sentence, not the buried one. - State concrete facts - prices, specs, dates - plainly, because that is what gets cited. #### The system-of-websites strategy Here is a strategy the founder of this school uses deliberately: instead of cramming everything into one broad site, run a system of focused niche sites. Thirteen tightly themed sites, each owning one specific topic or audience, beat one site trying to rank for everything. Each site can be the clearest, most authoritative answer in its narrow niche, which is exactly what both Google and AI assistants reward. The sites cross-link where it genuinely helps the reader, so authority compounds across the system instead of every property starting from zero. With agents, spinning up and maintaining a new focused site is cheap, which makes this strategy far more practical than it was when every site meant a team. - One site, one clear niche: easier to become the definitive answer than a generalist site ever can. - Specificity wins citations: AI assistants prefer the focused source that obviously matches the question. - Cross-link with intent: connect sites where it helps the reader, so authority and discovery compound. - Agents make it cheap: scaffolding and maintaining many small sites is now an afternoon, not a hire. #### Typical mistakes The recurring ones: duplicate or missing title tags so every page looks the same in results; keyword-stuffed descriptions that read like spam and kill click-through; never verifying in Search Console so you are flying blind; shipping no llms.txt and no clean markdown, so AI assistants skip you; and burying your answer three paragraphs down where no assistant will ever quote it. Lead with the answer, keep pages clean, and feed both crawlers and assistants. #### Business ROI Discovery is compounding and nearly free once it works. A page that ranks or gets cited brings visitors every day with no ongoing cost, which is the cheapest acquisition channel there is. The shift to AI discovery is the opportunity: most competitors are still optimising only for Google, so the builders who make their content citable now will own the AI-recommendation channel before it gets crowded. Getting recommended by an assistant that millions of people trust is worth more than a page-two Google ranking, and right now it is wide open. #### Checklist Confirm these for any site you want discovered through both channels. - Every page has a unique, search-led title and a click-worthy meta description. - A sitemap is submitted and the site is verified in Search Console. - A crisp favicon shows next to your result, and structured data is in place. - An llms.txt maps your best content and clean .md versions are served. - Your most important answers lead with the answer, not the backstory. #### Resources Google Search Central and the Search Console documentation are the timeless SEO references; the llms.txt proposal and schema.org are the GEO/AEO ones. The Going Live lesson in Course 3 covers the Search Console setup if yours is not done. Watch your Search Console impressions and click-through over weeks - it is the only feedback loop that tells you what is actually working. #### Your task Pick one real page and rewrite its title and meta description to lead with what people search and earn the click. Ship a clean favicon and confirm it appears in a search result. Then write a first llms.txt for the site mapping your best pages. You now feed both channels - check Search Console in two weeks to see the effect. #### Next lesson Being discoverable by AI leads straight to the next idea: designing products that AI itself loves to use, where the API is the product and the UI is almost beside the point. ### 5.5 Agent-First Products: Why AI Must Love Your API - Canonical URL: https://agenticschool.dev/courses/quality-security-agent-first/agent-first-products-why-ai-must-love-your-api - Duration: 26 min Summary: The next wave of users includes AI agents. A product whose API is clean, documented and a joy to call gets adopted by agents and the humans who direct them. This lesson covers agent-first design and the API-over-UI philosophy through the founder's BizCollect lesson: OpenAPI docs, predictable errors, self-serve API keys, and the idea that AI should be in love with your API. #### Summary For thirty years we designed products for humans clicking screens. That assumption is breaking. More and more, the thing using your product is an AI agent acting for a person, and it never sees your beautiful UI - it reads your docs and calls your API. An agent-first product treats the API as the main surface and the UI as just one client of it. This lesson makes the case, shows what a great agent-facing API looks like, and tells the BizCollect story where the founder of this school learned it the hard way. #### What you will learn You will learn why agents are becoming a customer base worth designing for, what makes an API a joy for an agent to use - discoverable OpenAPI docs, predictable errors, self-serve keys - and the concrete lessons from building BizCollect API-first. You will leave able to evaluate any product you build by asking "would an agent love this, or fight it?" #### Prerequisites The API and architecture lessons from earlier courses, plus the agentic surface and autonomy lessons from Course 4, since agent-first design is the architectural stance those ideas point towards. A sense of what an OpenAPI spec is helps, but this lesson explains enough to get the principle. #### The problem Founders pour months into a slick interface and treat the API as an afterthought - undocumented, inconsistent, gated behind a sales call. Then an agent arrives, cannot figure out how to authenticate, hits an error that returns a vague HTML page, gives up, and recommends a competitor whose API it could actually read. The agent does not care how pretty your dashboard is. If it cannot call you cleanly, you simply do not exist to it. As agents become the ones choosing tools, that is an existential gap. #### Agents are becoming the customer Think about how work increasingly happens: a person tells an agent "find me a supplier and place the order", "pull this data and put it in my sheet", "book the cheapest option that fits". The agent then decides which services to call. It chooses the one it can use - clear docs, predictable behaviour, self-serve access - and skips the ones that need a human in the loop. The human is still the customer, but the agent is the user, and it has ruthless taste: it abandons anything friction-heavy instantly and never complains, it just leaves. Designing for that user is the new competitive edge. #### What an API an agent loves looks like An agent-friendly API is one an agent can discover, authenticate to and use correctly without a human reading the docs for it. That comes down to a few concrete properties. - Discoverable: a published OpenAPI spec so an agent can read every endpoint, parameter and response shape, plus an llms.txt pointing at it. - Self-serve keys: a user (or their agent) can sign up and get an API key without a sales call. Friction here kills agent adoption dead. - Predictable errors: consistent status codes and a structured error body that says what went wrong and how to fix it, so an agent can recover instead of guessing. - Stable and consistent: same naming, same shapes, same auth across endpoints, versioned so a change never silently breaks a caller. - Honest docs: examples that actually run, defaults that match reality, and no undocumented required field. Agents trust the spec literally. ```yaml openapi: 3.1.0 info: title: BizCollect API version: 1.0.0 paths: /v1/businesses: get: summary: Search verified business records parameters: - name: region in: query required: true schema: { type: string } responses: '200': description: Matching business records '429': description: Rate limit exceeded - retry after the given seconds ``` A minimal OpenAPI snippet - this is what an agent reads to learn your API on its own #### Predictable errors are a feature Humans muddle through a confusing error; an agent needs structure. When something fails, return a consistent status code and a small JSON body the agent can parse: a stable error code, a human-readable message, and where relevant a hint about how to recover. A 429 should say how long to wait. A 400 should name the field that was wrong. A 401 should say the key is missing or invalid, not just "unauthorized". Get this right and an agent self-corrects and keeps going; get it wrong and it stalls or hallucinates a fix. Predictable errors are the difference between an API an agent can operate unattended and one that needs a human babysitter. ```json { "error": { "code": "rate_limited", "message": "Too many requests. Retry after 30 seconds.", "retry_after_seconds": 30 } } ``` A structured, recoverable error body an agent can act on without guessing #### The BizCollect lesson BizCollect, one of the founder of this school's own projects, was where this clicked. It collects and serves business data, and the first instinct was the usual one: build a nice dashboard, make the data look good, treat the API as a side door. The realisation that changed it was that almost nobody wanted to sit in a dashboard - they wanted the data inside their own workflow, increasingly fetched by an agent. So we flipped it: the API became the product, with a published OpenAPI spec, self-serve keys, predictable structured errors, and an llms.txt so assistants could discover it. The UI shrank to one thin client of that API, useful for a human poking around but no longer the point. Adoption came through agents and developers who could integrate in minutes without ever booking a call. The lesson generalises hard: build the API first, make it something an agent can fall in love with, and let the UI be one of its clients rather than the whole product. #### Typical mistakes The recurring ones: API as an afterthought behind a polished UI, so agents cannot use you; no OpenAPI spec, so nothing can discover your endpoints; keys locked behind a sales call, which kills self-serve adoption; inconsistent or vague errors that leave an agent stuck; and breaking changes shipped without versioning, silently breaking every caller. Each one is a closed door to the customer base that is growing fastest. #### Business ROI Agent-first is a distribution strategy disguised as an architecture choice. An API an agent can adopt in minutes spreads through every agent and workflow that needs what you do, with zero sales effort, while your competitors are still scheduling demos. The build cost is barely higher - you were going to need an API anyway - but the upside is access to a customer base that is compounding as agents take over more work. The founders who make AI fall in love with their API now are positioning for where buying decisions are heading, not where they have been. #### Checklist Evaluate any product you build against these before you call it agent-ready. - A published OpenAPI spec describes every endpoint, parameter and response. - A user or their agent can get an API key self-serve, with no sales call. - Errors are consistent, structured and tell the caller how to recover. - The API is versioned and stable, so a change never silently breaks a caller. - An llms.txt and clean docs let an agent discover and adopt you unattended. #### Resources The OpenAPI Specification and the Stripe and Anthropic API docs are the gold standard to study - read them as an agent would and notice how little guessing they require. Bookmark the agentic surface lesson from Course 4, since llms.txt and machine-readable docs are where that lesson and this one meet. The Builds section on BizCollect goes deeper on the project story. #### Your task Take one product or endpoint you have built and grade it as an agent would: could an agent find your docs, get a key without a human, call you correctly, and recover from an error - all unattended? Write down each place it would get stuck, then fix the worst one. Even one cleaned-up, well-documented endpoint teaches the whole mindset. #### Next lesson Designing for agents raises the obvious question of where all this is heading. The penultimate lesson zooms out to the exponential curve and the market opportunities it is opening right now. ### 5.6 The Exponential Curve: Where This Is All Heading - Canonical URL: https://agenticschool.dev/courses/quality-security-agent-first/the-exponential-curve-where-this-is-all-heading - Duration: 24 min Summary: Capability in this field has been doubling roughly every few months, and humans are wired to feel linear, not exponential. This lesson builds intuition for that curve, explains why planning around it is different, and points to the opportunities it creates - cybersecurity, validation and verification, human proof-of-identity, and turning physical reality into digital data. #### Summary Everything in the previous courses sits on a moving floor. The capability of these models has been roughly doubling every several months, and that single fact breaks most people's planning, because human intuition is linear and the curve is not. This lesson is the strategic step back. It builds a feel for the exponential, shows why "good enough for now" plans go wrong, and maps where the durable opportunities are - the new problems the curve itself creates faster than anyone can solve them. #### What you will learn You will learn why exponential progress is so easy to underestimate, how to plan for a capability that keeps doubling instead of a fixed target, and where the biggest market openings are: securing AI systems, validating and verifying AI output, proving a human is human, and converting physical reality into digital data. The aim is to position ahead of the curve rather than react to it. #### Prerequisites The 5 Levels of Autonomy lesson from Course 4 and a broad sense of the whole programme, since this lesson synthesises rather than introduces. No new tools. Bring your own current limitations in mind - the point is to ask which of them are permanent and which evaporate at the next doubling. #### The problem People make two opposite mistakes about AI progress, and both come from feeling the curve as a straight line. The pessimist sees today's flaws - it hallucinates, it cannot do X - and concludes the whole thing is overhyped, not realising those flaws are temporary. The optimist assumes a capability is already here when it is two doublings away, and builds on sand. Linear intuition makes you wrong in both directions. Reasoning about exponentials correctly is a skill, and it changes which bets look sane. #### Feeling the exponential The classic illustration: fold a piece of paper in half forty-five times and, if you could, its thickness would reach the moon. Nobody feels that from "fold it again", because each step looks small and the total is unimaginable. AI capability behaves the same way. A doubling every few months feels like a modest update each time, but stack a handful of them and the system at the end is categorically different from the one you reasoned about at the start. The practical consequence: any limitation you are designing around today has a short shelf life. Build for the trajectory, not the snapshot. - Each doubling feels incremental; the cumulative effect is not. Your gut will always lag the curve. - A capability that is "almost there" today is often comfortably there a few doublings out - plan for it arriving, not for it being absent. - Conversely, do not assume something is solved before it is. Track the curve, do not extrapolate a single demo. - The honest stance: treat today's limits as temporary and today's strengths as a floor that keeps rising. #### What this means for planning If the ground keeps rising, you plan differently. Do not build a moat out of a capability gap that will close - "we are better because the AI cannot do this yet" is a plan with an expiry date. Instead, build where rising capability makes you stronger: workflows, data, trust, distribution and relationships that compound as the models improve. Keep your architecture flexible so you can swap in the next, far better model without a rewrite (the model-agnostic instinct from Course 1 pays off here). And bias towards shipping and learning fast, because the cost of building keeps falling, so the scarce resource is increasingly judgement about what to build, not the ability to build it. #### Where the opportunities are The richest opportunities are not in doing what the models already do well - that gets commoditised at the next doubling. They are in the problems the curve creates faster than it solves. As AI gets cheaper and more capable, four gaps widen, and each is a market. - Cybersecurity: cheap, capable AI is a gift to attackers - automated, scaled, personalised attacks. Defending against AI-powered threats, and securing the AI systems everyone is now deploying, is a growing market with no ceiling. - Validation and verification: AI generates output at scale, but someone has to check it is correct, safe and real. As covered in Course 4, validation is the true blocker on autonomy, which makes tools that verify AI output one of the most durable opportunities in the field. - Human proof-of-identity: when AI can perfectly imitate a person's voice, face and writing, proving that an actor is a real, specific human becomes a hard and valuable problem - for payments, access, trust and democracy. - Physical-to-digital data: the models are starving for fresh, real-world data. Turning physical reality - documents, inventory, places, sensors - into clean digital data that AI can use is a deep, underbuilt opportunity. #### Positioning ahead of the curve The strategic move is to build for where the curve is going, so your work compounds with progress instead of being obsoleted by it. Pick problems that get more valuable as AI gets better, not less. A verification business is worth more the more AI-generated content exists. A human-identity business is worth more the better the deepfakes get. A physical-to-digital business is worth more the hungrier the models are for data. Position there and every doubling is a tailwind. Position against the curve - betting your edge is a gap that will close - and every doubling is a countdown. #### Typical mistakes The recurring errors: dismissing the whole field because of today's flaws (linear pessimism); assuming a capability is already reliable when it is a doubling or two away (linear optimism); building a moat out of a capability gap that will obviously close; and locking into one model so tightly that you cannot ride the next, far better one. The cure for all of them is the same: respect the curve and stay flexible. #### Business ROI Reasoning about exponentials correctly is the highest-leverage strategic skill in this programme, because it determines which bets you place at all. Founders who internalised early doublings and built for the trajectory caught waves that looked absurd at the time and obvious in hindsight. The cost of getting this wrong is not a bad quarter, it is building an entire business on a foundation the next model dissolves. Picking problems that strengthen as AI strengthens is how you make the curve work for you instead of against you. #### Checklist Pressure-test your thinking against these before you commit to a direction. - You can explain why a doubling every few months defeats linear intuition. - You can tell a temporary limitation from a durable one for your own project. - Your edge is not a capability gap that the next model closes. - You can name an opportunity that gets more valuable as AI improves, not less. #### Resources Read the major labs' own capability and roadmap notes for the direction of travel, and revisit the autonomy lesson from Course 4, since validation as the real blocker is central to where the opportunities sit. The most useful exercise is ongoing: each time a new model lands, note which of your old assumptions it just broke. That running log is how you keep your intuition calibrated to the curve. #### Your task List three limitations you are currently designing around in a project. For each, honestly judge whether it is permanent or likely to dissolve within a few doublings, and note how your plan changes if it dissolves. Then name one opportunity from this lesson - cybersecurity, validation, human verification, or physical-to-digital - that you could realistically build toward. That is your map for the capstone. #### Next lesson You have the skills and the strategic view. The final lesson is the capstone: take a real problem and build, test, secure, document and ship your own agentic product end to end. ### 5.7 Capstone: Build and Ship Your Own Agentic Product End-to-End - Canonical URL: https://agenticschool.dev/courses/quality-security-agent-first/capstone-build-and-ship-your-own-agentic-product-end-to-end - Duration: 32 min Summary: The capstone ties the whole programme together. You pick a small real problem, spec it, build it with Claude Code using everything from Courses 1 to 4, run the quality gates, secure it, make it discoverable with llms.txt and an API surface, deploy it, and share it in the community - against a clear acceptance-criteria checklist. #### Summary This is where it all becomes one workflow. Not five separate skill sets, but a single loop from idea to live product: scope a real problem, write the spec, build it with your agent, run the quality gate, harden the security, make it discoverable and agent-first, deploy it, and put it in front of people. The deliverable is a shipped thing, however small, plus the proof - to yourself - that you can take any idea through this loop again. This lesson is a guided build with a concrete acceptance checklist, not a reading. #### What you will learn You will run the full end-to-end loop once, deliberately: scope ruthlessly to something finishable, spec it the way Course 1 taught, build it with Claude Code on a real stack, protect it with the tests and security from this course, make it discoverable with SEO and an llms.txt, give it an agent-first API surface, deploy it, and share it. The output is a live product and a repeatable process. #### Prerequisites All five courses. The capstone draws on every skill: model choice and prompting (Course 1), agent mastery (Course 2), the modern stack of auth, data and payments (Course 3), automation and agentic systems (Course 4), and the quality, security, legal, SEO and agent-first lessons of this course. If any of those feels shaky, do a quick pass before you start - the capstone assumes them. #### The problem The single most common way a capstone dies is scope. People pick something huge and inspiring, build forty percent of it, hit the messy middle, and quietly abandon it. An unfinished ambitious project teaches far less than a finished tiny one, because shipping is where you meet every problem the tutorials skipped. The whole discipline of this lesson is choosing something small enough to finish and real enough to matter, then carrying it all the way to live. #### Step one: scope to something you can finish Pick one small, real problem - ideally one you actually have, or one that ties into an opportunity from the previous lesson. Cut it down until you are slightly embarrassed by how small it is, then cut once more. A tool that does one useful thing well and is live beats a platform that does ten things and never ships. Write the scope as a single sentence you could explain to a stranger. - One problem, one user, one clear outcome. If you cannot say it in a sentence, it is too big. - Prefer a real itch of your own - you will know when it is good and you will actually use it. - Ruthlessly defer everything that is not the core loop. A second feature is a second project. - Define done before you start, so you ship instead of polishing forever. #### Step two: spec it and build it with your agent Turn the scope into a real spec sheet - goal, context, constraints, acceptance criteria - the way Course 1 taught, with your axioms baked in. Pick the right model tier for each task, choose a stack you can deploy, and drive Claude Code with project rules in your CLAUDE.md so it follows your conventions automatically. Build the core loop first and get it running locally before you add anything around it. Ask the agent to push back on your plan before it starts - some of the best decisions in a build come from the agent catching a flaw early. ```markdown ## Goal A tool that takes a business address and returns its opening hours as clean JSON. ## Context - Stack: this TanStack Start app, Convex for data, deploy on Vercel. - Follow the API patterns from the agent-first lesson. ## Constraints - TypeScript only. Every endpoint rate-limited. Secrets in env vars. - Publish an OpenAPI spec and an llms.txt. Self-serve API keys. - Add a Vitest test for the parser and a Playwright test for the happy path. ## Acceptance criteria - [ ] GET /v1/hours?address=... returns structured JSON or a clear error. - [ ] Rate limit returns 429 when exceeded. Errors are structured. - [ ] OpenAPI spec and llms.txt are live and accurate. - [ ] Quality gate (tsc, lint, vitest, playwright) is green in CI. - [ ] Deployed to a public URL over HTTPS. ``` A capstone spec sheet that folds in the quality, security and agent-first criteria #### Step three: gate, secure and comply Before this goes anywhere near a real user, run everything from this course. Put the four-layer quality gate in place and make CI required. Apply the security essentials - rate limit every public endpoint, set a CSP, encrypt any stored secret, and run the pre-public history audit if the repo will be public. If it collects any personal data, do the privacy pass: lawful basis, a working consent flow, a privacy policy. This is the part beginners skip and professionals never do. It is also what separates a demo from a product. - Quality: tsc, lint, Vitest and Playwright all green, enforced by a pre-push hook and required CI. - Security: rate limits, CSP header, encrypted secrets, and a clean gitleaks scan before any public repo. - Legal: if it touches personal data, a lawful basis, a compliant consent flow and a privacy policy. - Agent-first: published OpenAPI spec, predictable structured errors, self-serve keys. #### Step four: ship it and make it discoverable Deploy to Vercel, connect a domain if you have one, and confirm it loads over HTTPS with secrets set as environment variables, not in the repo. Then make it findable through both channels from the SEO lesson: real titles and descriptions, a sitemap, Search Console, a crisp favicon, structured data, and an llms.txt plus clean docs so AI assistants can discover and recommend it. A shipped product nobody can find is only half done - give it a way to be discovered by humans and by agents. #### Step five: share it in the community The last step is not optional fluff, it is how you close the loop. Post your shipped product in the community: what it does, what you learned, where it was harder than expected, and the live link. Sharing forces you to actually finish, invites the feedback that makes version two better, and connects you to other builders going through the same loop. The founder of this school shipped imperfect things publicly long before they were ready, and that habit - ship, share, learn, repeat - compounds faster than any amount of private polishing. #### Typical mistakes The capstone killers: scope so large it never ships; skipping the quality gate so it breaks the moment you change it; skipping security so the first bot abuses it; treating the API and discoverability as afterthoughts so neither humans nor agents can find it; and never sharing it, so you lose the feedback and the accountability that finishing demands. Small, gated, secure, discoverable, shared - in that order. #### Business ROI A single shipped product teaches more than a year of tutorials, because shipping surfaces every real problem at once and forces a decision on each. It is also your proof of capability - to yourself, to customers, and to anyone you want to work with. And the loop you just ran is the asset, not only the product: once you can take an idea from scope to a live, tested, secure, discoverable, agent-first thing, you can do it again and again, faster each time. That repeatable loop is the entire return on this programme. #### Checklist Your capstone is complete - genuinely, not nearly - when every one of these is true. This is the acceptance criteria for the whole programme. - The product solves one real problem and is live at a public URL over HTTPS. - The four-layer quality gate is green and required in CI. - Every public endpoint is rate-limited, secrets are encrypted, and a public repo passed a history scan. - If it collects personal data, it has a lawful basis, working consent and a privacy policy. - It has an OpenAPI spec, predictable errors, self-serve keys and an llms.txt. - It has real titles, a sitemap, a favicon and is verified in Search Console. - You posted it in the community with the live link and what you learned. #### Resources Every prior lesson is a resource for this build - keep the spec-sheet template, the quality-gate workflow, the security checklist, the privacy checklist and the agent-first checklist open as you go. The Builds section shows finished projects end to end if you want a reference for scope and shape. The community is where you share the result and where the next round of feedback comes from. #### Your task Build and ship it. Pick the small problem, write the spec, build it with your agent, pass the quality gate, secure and comply, deploy it, make it discoverable and agent-first, then post it in the community with the live link. Tick every box on the checklist above. When the last box is ticked, you have not just finished a course - you have proven you can ship an agentic product end to end, on your own, again and again. #### What comes next You have completed the programme, but the field will not hold still and neither should you. Keep shipping small things, fold new tools in as they appear, and use the changelog as your daily tool-news hub to stay current. Bring your builds, your wins and your stuck moments to the community - that is where learning keeps compounding after the courses end. The loop you now own is the whole point: idea, build, gate, secure, ship, share, repeat. Go run it again. --- ## Fundamentals ### What Is Node.js (and How to Install It) - Canonical URL: https://agenticschool.dev/fundamentals/what-is-nodejs Node.js is a runtime that lets you run JavaScript on your computer instead of only inside a web browser. Almost every modern web project needs it because the tools that build, run and test your app are themselves JavaScript programs, and Node.js is what executes them. If a tutorial tells you to run "npm install" or "npm run dev", that command only works because Node.js is installed. Think of it as the engine your project sits on top of. #### Why you need Node.js When you build a website with a modern framework, you are not just writing files a browser opens. You run a development server, install packages, and compile your code. All of those tools are written in JavaScript and need Node.js to run. Without it, the commands in almost every tutorial simply fail with "command not found". - It runs the dev server that previews your site while you build. - It runs npm, the tool that installs the libraries your project depends on. - It runs build, test and lint tools that ship and check your code. #### How to install it The safest way to install Node.js is from the official website, nodejs.org. Download the LTS version (Long Term Support), which is the stable one recommended for almost everyone, and run the installer. After it finishes, open a fresh terminal and check that it worked. ```bash node --version npm --version ``` Both commands should print a version number. If they do, Node.js and npm are ready. #### Common beginner confusions Node.js is not a different language, it is JavaScript running in a new place. You also do not "use" Node.js directly most of the time. You install it once, and then the tools you actually run (npm, your dev server, your framework) use it under the hood. If a command fails right after installing, the usual fix is to close your terminal completely and open a new one, so it picks up the new installation. One more thing trips people up: the version number matters less than people fear. As long as you are on a recent LTS release, almost every tutorial and project will work, and you rarely have to think about Node.js again once it is set up. You do not need to update it constantly either; a yearly check is plenty for most builders. ### Terminal Basics for Total Beginners - Canonical URL: https://agenticschool.dev/fundamentals/terminal-basics The terminal is a text window where you type commands to tell your computer what to do, instead of clicking buttons. It feels intimidating at first, but you only need about five commands to be productive, and an AI coding agent runs most of the rest for you. Everything you do in the terminal you could also do by clicking around, but typing is faster and is how almost every developer tool expects to be used. #### How the terminal works At any moment the terminal is "in" one folder, called the current directory. Commands act on that folder unless you say otherwise. You move between folders, look at what is inside them, and run programs. That is most of it. #### The commands you actually need These five cover almost everything a beginner does. You type a command, press Enter, and it runs. ```bash pwd # print the folder you are currently in ls # list the files and folders here cd my-app # change into the folder called my-app cd .. # go up one folder mkdir test # make a new folder called test ``` On Windows PowerShell the same ideas apply; "ls" and "cd" work there too. #### Common beginner confusions A terminal that is "just sitting there" after you run a command usually means the command is still working, especially a dev server, which is supposed to keep running until you stop it. To stop a running command, press Ctrl and C together. Paths matter: a command often fails simply because you are in the wrong folder, so run "pwd" and "ls" to check where you are before assuming something is broken. Two more habits save a lot of pain. First, you can press the up arrow to bring back your previous commands instead of retyping them, and the Tab key to auto-complete a long folder name. Second, copy and paste work in the terminal too, though the shortcut can differ from the rest of your system, so if Ctrl and V does nothing, try right-clicking. None of this is something you need to memorise; you will absorb it within a day of real use, and your AI agent narrates what each command does as it goes. ### What Is Git? Version Control Explained - Canonical URL: https://agenticschool.dev/fundamentals/what-is-git Git is a version control tool that takes snapshots of your project, called commits, so you can always go back to a working version. Think of it as an unlimited, labelled undo history for your whole project. Every time you save a commit, Git records exactly what changed and lets you return to that point later, which removes the fear that stops beginners from experimenting. Git runs on your own computer and works completely offline. #### Why Git matters Without version control, a bad edit can wreck hours of work and there is no clean way back. With Git, a bad edit is never a disaster because you just return to the last good commit. It also lets more than one person work on the same project without overwriting each other, and it is the foundation that services like GitHub build on. #### The core workflow The everyday loop is small: you change files, stage the ones you want to save, then commit them with a short message describing what changed. Your AI agent often runs these for you, but it helps to recognise them. ```bash git status # see what changed git add . # stage all changes git commit -m "Add hero section" # save a labelled snapshot ``` A commit message should say what changed and why, in a few words. #### Common beginner confusions Git is not the same as GitHub. Git is the tool on your computer; GitHub is a website that stores a copy of your Git project online. A "commit" is a save point in Git, not the same as saving a file in your editor; you can save a file many times and only commit once you are happy. And committing does not publish anything to the internet by itself. People also worry about branches early on, but you do not need them to start. A branch is just a parallel line of work you can try without touching your main version, and for a solo beginner you can happily ignore branches until a project gets bigger. The single habit that matters most is committing often with clear messages, so your history reads like a story of what you did and you can always step back to any point. ### What Is GitHub? A Beginner Guide - Canonical URL: https://agenticschool.dev/fundamentals/what-is-github GitHub is a website that stores your Git projects online so you can back them up, share them and deploy them. If Git is the tool that saves snapshots of your project on your computer, GitHub is the place those snapshots live in the cloud. It is where most code in the world is kept, and most hosting platforms connect to GitHub to deploy your site automatically whenever you push new changes. #### What GitHub gives you GitHub turns your local Git project into something durable and connected. Your code is backed up off your laptop, others can review and contribute, and deploy services watch your repository and redeploy on every change. - A safe, off-machine backup of your whole project history. - Collaboration: issues, reviews and pull requests for working with others. - Deployment: hosts like Vercel deploy automatically when you push. #### Public vs private repositories A repository (or "repo") is one project on GitHub. It can be public, meaning anyone can see the code, or private, meaning only people you invite can. For a business project, default to private so your code and intellectual property stay yours. You can always make it public later if you choose. #### Pushing your code After you commit locally with Git, you "push" those commits to GitHub to upload them. The first time, you connect your local project to a repo; after that, pushing is one command. The reverse direction is called a "pull", which downloads changes from GitHub back to your computer, and matters once you work across two machines or with other people. ```bash git push ``` Uploads your local commits to the connected GitHub repository. #### Common beginner confusions The biggest one is the fear that pushing exposes your secrets. It can, if you are careless, which is exactly why secrets live in a .env file that Git ignores. Push your code, never your keys. People also mix up GitHub the website with Git the tool: you can have a perfectly good Git project that never touches GitHub, and GitHub simply adds a safe online home plus collaboration on top. Finally, you do not need to understand every button on a repository page to be productive. Connecting your project, committing and pushing covers almost everything a beginner does for weeks. ### JSON, YAML and Markdown Explained - Canonical URL: https://agenticschool.dev/fundamentals/json-yaml-markdown JSON, YAML and Markdown are three plain-text formats you will run into constantly when building anything. JSON and YAML both store structured data like settings and lists, while Markdown is for writing formatted text such as documentation and notes. None of them are programming languages; they are just agreed ways to write information so that both people and programs can read it. Once you can recognise the three, most config files stop looking scary. #### JSON: data for machines JSON (JavaScript Object Notation) stores data as key and value pairs, with curly braces and quotes. It is everywhere: API responses, package files, app settings. It is strict, so a single missing comma or quote breaks it. ```json { "name": "my-app", "version": "1.0.0", "private": true } ``` Keys and string values are always in double quotes. #### YAML: data for humans YAML stores the same kind of data as JSON but uses indentation instead of braces, which makes it easier to read and write by hand. It is common in workflow and deployment config. The catch is that spacing matters: YAML uses spaces for indentation, never tabs. ```yaml name: my-app version: 1.0.0 private: true ``` Same data as the JSON above, indentation instead of braces. #### Markdown: formatted writing Markdown is for text, not data. You add simple symbols to plain text to mark headings, bold, lists and links, and it renders as nicely formatted content. This page, most READMEs, and AI agent instruction files like CLAUDE.md are all Markdown. ```markdown # A heading Some **bold** text and a list: - first item - second item ``` A "#" makes a heading; "**text**" makes it bold; "-" makes a list. #### How to tell them apart A quick rule of thumb settles most confusion. If you see curly braces and lots of quotes, it is JSON. If you see clean indentation with colons and no braces, it is YAML. If you see "#" headings and "-" bullets meant to be read by a human, it is Markdown. You do not have to write any of them perfectly by hand, because your AI agent generates and edits these files for you. What helps is recognising which is which, so when a tool asks for "a YAML config" or "a JSON file" you know what it expects and can spot an obvious mistake at a glance. ### VS Code Setup for Beginners - Canonical URL: https://agenticschool.dev/fundamentals/vs-code-setup VS Code (Visual Studio Code) is a free, widely used code editor where you write, read and organise the files of your project. It is essentially a powerful text editor built for code, with a file explorer, a built-in terminal and an extension marketplace. You do not strictly need it to build with an AI agent, but it gives you a comfortable place to see what the agent is doing and to make small edits yourself. #### Installing VS Code Download VS Code from its official site, code.visualstudio.com, and run the installer for your operating system. It is free and made by Microsoft. Once installed, you open a project by choosing "Open Folder" and pointing it at your project folder. #### The built-in terminal One of the most useful features for beginners is the integrated terminal, so you do not have to switch windows. Open it from the menu (Terminal, then New Terminal) and you get a terminal already pointed at your project folder. This is where you run commands like "npm run dev". #### A few extensions worth adding Extensions add features to VS Code. You do not need many to start, and you can always add more later. - Prettier: formats your code neatly and consistently on save. - ESLint: highlights likely mistakes as you go. - A language pack if you prefer the menus in your own language. #### Common beginner confusions VS Code can look overwhelming because of all the panels, but you only need three at first: the file tree on the left to see your project, the editor in the middle to read and change files, and the terminal at the bottom to run commands. Everything else you can ignore until you want it. A common worry is whether opening the wrong file or panel breaks anything; it does not, VS Code only edits files when you actually type and save. Two small comfort settings help a lot early on: turn on word wrap so long lines stay visible, and enable format on save so Prettier tidies your code automatically every time you hit save. ### What Is a .env File? - Canonical URL: https://agenticschool.dev/fundamentals/what-is-an-env-file A .env file is a plain text file that stores secrets and settings, like API keys and passwords, separately from your code so they never get published by accident. The name is short for "environment". Your app reads values from it at runtime, but the file itself is kept private and is never uploaded to GitHub. Getting this one habit right is the single best thing a beginner can do to avoid leaking a key. #### What goes in a .env file A .env file is a list of name and value pairs, one per line. Anything sensitive or environment-specific belongs here rather than hard-coded in your source files. ```bash OPENAI_API_KEY=sk-your-secret-key DATABASE_URL=postgres://user:pass@host/db SITE_URL=http://localhost:3000 ``` Each line is NAME=value. No quotes are needed for simple values. #### The golden rule: keep it out of Git The whole point of a .env file is that it stays private. You make sure Git never tracks it by listing it in a .gitignore file. The rule is absolute: secrets go in .env, .env goes in .gitignore, and a key never appears in code that gets committed. ```bash # .gitignore .env .env.local ``` Listing .env here tells Git to ignore it so it is never uploaded. #### In production On a real deployment you do not upload your .env file either. Instead you set the same names and values as "environment variables" in your hosting dashboard, for example in Vercel. That way production has its own separate keys, and a mistake in development can never touch live data. #### Common beginner confusions A frequent mix-up is between the secret value and its name. Your code refers to the name, like OPENAI_API_KEY, and the real secret only lives in the .env file and the host dashboard, never in the committed code. Another is the difference between a .env file (for secrets and settings) and .gitignore (the list of things Git should never track); they work together but do different jobs. If you ever do accidentally commit a key, do not just delete the line and move on. Treat the key as compromised, rotate it by generating a new one in the provider dashboard, and the old one becomes useless to anyone who saw it. Getting into this habit early means a slip is a minor annoyance, not a disaster. ### npm vs Bun: Package Managers Explained - Canonical URL: https://agenticschool.dev/fundamentals/npm-vs-bun npm and Bun are package managers: tools that download and install the libraries (called packages) your project depends on. npm comes bundled with Node.js and is the long-standing default, while Bun is a newer, much faster alternative that can also run your code. Both read the same package.json file that lists your dependencies, so for a beginner the practical difference is mostly speed and which commands you type. #### What a package manager does Modern apps are built from hundreds of small reusable libraries rather than written from scratch. A package manager reads your project list of dependencies, downloads them, and keeps their versions consistent so the same project works the same way on any machine. #### The same commands, different tool The everyday commands map almost one to one between npm and Bun, which makes switching easy. ```bash npm install # npm: install all dependencies npm run dev # npm: run the dev script bun install # bun: same thing, faster bun run dev # bun: run the dev script ``` Notice how similar they are. Bun is generally noticeably faster. #### Which should a beginner use If a tutorial uses npm, use npm; if it uses Bun, use Bun. They are interchangeable enough that you can follow either. npm is guaranteed to be present because it ships with Node.js. Bun is worth installing once you want faster installs, but it is an extra tool to set up. Pick the one your project or course already uses and stay consistent. #### Common beginner confusions The word "package" sounds technical but just means a reusable piece of code someone else wrote and published, so you do not have to. When you run install, the manager downloads all the packages your project lists into a folder called node_modules. That folder can get large, which is normal and harmless, and it is one of the things you keep out of Git because it can be rebuilt anytime from your dependency list. People also worry about the lock file (package-lock.json or bun.lock) that appears after installing. You do not edit it by hand; it simply pins exact versions so your project installs identically everywhere. Commit it, and let the package manager update it for you. ### What Is an API? A Plain-Language Guide - Canonical URL: https://agenticschool.dev/fundamentals/what-is-an-api An API (Application Programming Interface) is a way for two programs to talk to each other and exchange data. Instead of a human clicking around a website, one program sends a request to another program and gets a structured response back. When your app shows the weather, sends an email, or asks an AI model a question, it is calling an API behind the scenes. APIs are the plumbing that lets the software you build connect to the rest of the world. #### The restaurant analogy Think of an API like a waiter. You do not walk into the kitchen and cook; you give the waiter your order, the waiter takes it to the kitchen, and brings back your food. The API is the waiter: you send a request describing what you want, and you get back a response, without ever needing to know how the kitchen works inside. #### Requests and responses A request goes to a URL (the endpoint), often with some data attached. The response usually comes back as JSON, a structured data format. Most of the time your code or your AI agent handles the details, but the shape is always the same: you ask, you receive a structured answer. ```json { "city": "Zurich", "temperature": 18, "condition": "cloudy" } ``` A typical API response: structured data, ready for a program to use. #### API keys and why they matter Many APIs require an API key, a secret string that identifies you and often controls billing. Because a key can cost you money if leaked, it belongs in a .env file and never in code that gets committed. This is the same secrets discipline that protects every credential in your project. #### Common beginner confusions A few things trip people up. An API is not a website you visit in a browser; it is a service your program talks to, and visiting an API URL directly often just shows raw data or an error, which is normal. "Rate limits" are another surprise: many APIs cap how many requests you can make in a window, so if calls suddenly fail, you may simply be going too fast. And different APIs use slightly different rules for authentication and request shape, which is why reading the specific API documentation matters. The good news is your AI agent reads those docs and writes the request code for you, so you can use a new API confidently without memorising its details first. ### What Is a Database? - Canonical URL: https://agenticschool.dev/fundamentals/what-is-a-database A database is an organised place to store your app data so you can reliably save it, find it, change it and delete it. When a user signs up, posts a comment, or makes a purchase, that information has to live somewhere that survives after the page reloads, and that somewhere is a database. Unlike a single spreadsheet, a database is built to stay fast and consistent even with millions of records and many users at once. #### Tables, rows and columns The most common way to picture a database is like a set of spreadsheets. Each table holds one kind of thing (users, orders, comments). Each row is one record, and each column is a field on that record, like name or email. The database keeps these organised and lets you query them quickly. #### SQL vs NoSQL There are two broad families. SQL databases store data in strict tables with defined columns and are great when your data has a clear shape. NoSQL databases are more flexible about structure and are handy when your data varies. For most apps either works; the choice matters less than getting the basics of saving and reading data right. #### Soft deletes: a habit worth knowing In real products you usually do not erase data permanently the moment a user deletes something. Instead you mark it as deleted but keep the row, which is called a soft delete. That way data can be recovered and your history stays intact, which protects you from costly mistakes. #### Common beginner confusions A database is not the same as your code, and it is not the same as a backup. Your code reads from and writes to the database, but the data lives separately and survives even when you redeploy your app. People also assume they must run and maintain a database server themselves, which used to be true and is now mostly optional: modern platforms host the database for you, handle backups, and let your AI agent define the structure in code. Finally, you do not query a database by clicking around. You ask it questions with queries, and most modern tools generate those queries for you, so you can model and use real data long before you learn the query language yourself. ### What Is DNS? The Internet Phone Book - Canonical URL: https://agenticschool.dev/fundamentals/what-is-dns DNS (Domain Name System) is the internet phone book: it translates a human-friendly domain name like yoursite.com into the numerical address of the server that should answer for it. Computers find each other using IP addresses, which are hard to remember, so DNS lets you type a name instead. Every time you connect a domain to a website, you are adding DNS records that point your name at the right server. #### How a domain finds your site When someone types your domain, their browser asks DNS "what address answers for this name?" DNS replies with the server address, and the browser connects there. This lookup happens in milliseconds and is why you never have to memorise a string of numbers to visit a website. #### The records you will actually meet To connect a domain to a host you add a couple of records in your DNS provider. You rarely need more than these two as a beginner. - A record: points your domain directly at a server IP address. - CNAME record: points your domain at another domain name, common when connecting to a host. #### Propagation and HTTPS After you add records they need to "propagate", which just means spread across the internet, and can take from minutes to a day. Once your domain points at a modern host like Vercel, it issues a free HTTPS certificate automatically so your site loads securely. Many builders manage DNS through Cloudflare for free HTTPS, a faster global network and basic protection. #### Common beginner confusions The most common one is impatience. You add the records correctly, the site does not load instantly, and you assume something is broken when propagation just needs time. A second is confusing where you bought the domain with where its DNS is managed; you can buy a domain in one place and point its DNS somewhere else, which is exactly what happens when people move DNS to Cloudflare. A third is forgetting that DNS only points the name at a server; it does not host your site. You still need a host like Vercel running your actual website for the domain to show anything. Your host always tells you the exact records to add, so the safest approach is to copy them precisely and then wait. ### What Is OAuth? Login with Google Explained - Canonical URL: https://agenticschool.dev/fundamentals/what-is-oauth OAuth is the standard that lets a user log into your app using an existing account, like Google or GitHub, without ever sharing their password with you. When you click "Continue with Google", OAuth is what happens behind the scenes: Google confirms who the user is and tells your app, so you never see or store their password. It is both more convenient for users and safer for you, because you hold fewer secrets. #### How the flow works You send the user to the provider (say Google) to sign in. They approve sharing some basic information with your app. The provider then sends the user back to your app along with proof of who they are. Your app trusts that proof instead of managing a password itself. #### Why it is safer Storing passwords correctly is genuinely hard and dangerous to get wrong. OAuth means you never store passwords at all, so a whole category of security risk disappears. The provider handles the difficult parts, including things like two-factor authentication, on your behalf. #### You usually do not build it yourself OAuth has many small, security-sensitive details, so almost nobody implements it by hand. Instead you use an authentication service like Clerk that wraps it in a few components. You enable "Login with Google", and the service handles the flow, leaving you to focus on your actual product. #### Common beginner confusions People often blur three related words. Authentication is proving who you are (logging in), authorisation is what you are allowed to do once in, and OAuth is the standard that lets a provider handle the proving part for you. A second confusion is thinking OAuth means you cannot also offer email-and-password login; you can, and most apps offer both side by side. A third is the dev-to-production gap: social login usually works with test credentials while you build, then needs a careful swap to real provider settings before launch. Because an auth service manages all of this, your job is mostly configuration, not cryptography, which is exactly why it is the recommended path. ### What Are Tokens in AI? - Canonical URL: https://agenticschool.dev/fundamentals/what-are-tokens In AI, a token is a small chunk of text, roughly four characters or three quarters of an English word, that a language model reads and writes in. Models do not see whole words the way you do; they break text into tokens, and both the price you pay and the amount a model can handle at once are measured in tokens, not words. Understanding tokens is the key to controlling both the cost and the quality of anything you build with AI. #### Why tokens, not words Models process text as tokens because it lets them handle any language, code and odd spellings consistently. A short common word might be one token; a long or unusual word can be several. Code and non-English text usually cost more tokens than the same idea in plain English, which directly affects your bill. #### Tokens and your bill AI pricing is quoted per million tokens and split into input (what you send) and output (what the model writes back), with output usually several times more expensive. On a workflow that runs thousands of times, trimming a bloated prompt can cut your cost dramatically. Sending less but more relevant text is almost always the right move. #### A different "token" in security Watch out for a naming clash. In AI, a token is a chunk of text. In login and security, a "token" means something completely different: a secret string that proves who you are, like an access token. They share a word but are unrelated concepts, so read the context to know which one is meant. #### Tokens and the context window Tokens also explain a limit you will hit called the context window: the maximum number of tokens a model can consider at once, including your instructions, any files you paste, the conversation so far, and the answer it writes. When that fills up, the model effectively forgets the oldest parts. This is why long, sprawling chats start losing track of things you said earlier, and why starting a fresh conversation often gives better answers than piling onto an old one. The practical takeaway is the same as for cost: send less but more relevant text. Tokens are the single unit that ties together what a model can hold, how well it answers, and what you pay, so a rough feel for them makes you better at everything you build with AI. ### Terminal vs Shell: What Is the Difference? - Canonical URL: https://agenticschool.dev/fundamentals/terminal-vs-shell The terminal is the window you type into, and the shell is the program running inside that window that actually understands and runs your commands. People use the words interchangeably, and for everyday work that is fine, but they are two different things. The terminal just shows text and takes your keystrokes; the shell is the interpreter, like bash, zsh or PowerShell, that turns "ls" into a real action. #### A simple way to picture it Think of the terminal as a TV screen and the shell as the channel playing on it. The screen (terminal) displays whatever the channel (shell) sends. You can run different shells inside the same terminal, the same way you can switch channels on one TV. #### Common shells you will see You do not have to choose a shell to get started; your system comes with one. But the names show up in instructions, so it helps to recognise them. - bash and zsh: the common shells on macOS and Linux. - PowerShell: the modern default shell on Windows. - Each has small syntax differences, which is why a command can work in one and not another. #### Why the difference occasionally matters Most of the time you can ignore the distinction. It matters when a command from a tutorial fails because you are using a different shell than the author. For example, the way you set an environment variable differs between bash and PowerShell. When that happens, telling your AI agent which shell you are on usually fixes it instantly. #### Common beginner confusions Because the words get used loosely, people sometimes think installing a new shell means installing a new terminal, or the other way around. They are separate: you can keep your terminal and switch the shell inside it, or keep your shell and use a different terminal app. Another source of confusion is copying a command that uses a feature your shell does not have and seeing a cryptic error. That is not your computer being broken; it is a shell mismatch, and the fix is usually a small tweak to the command. The single most useful move when something fails is to mention your operating system and shell to your AI agent, which lets it hand you the exact version of the command that works on your setup. ### Localhost and Dev Servers Explained - Canonical URL: https://agenticschool.dev/fundamentals/localhost-and-dev-servers Localhost means "this computer", and a development server (dev server) is a program that runs your website locally so only you can see it while you build. When you run a command like "npm run dev", it starts a dev server and prints an address such as localhost:3000. Opening that address shows your site, but it lives only on your machine and nobody else on the internet can reach it. That is exactly what you want before you are ready to go public. #### What localhost and the port mean A localhost address looks like localhost:3000. "localhost" always means your own computer, and the number after the colon (the port) is just which door the dev server is listening on. Different tools use different ports, so seeing 3000, 5173 or similar is normal and nothing to worry about. #### Why the dev server is special A dev server does more than show your site. It watches your files and refreshes the page the instant you save a change, which is what makes web building feel fast. Because it keeps running and watching, the command does not "finish" and return you to a prompt; that is normal, not a hang. #### Local is not the same as live A site on localhost is not on the internet. Sharing the localhost link with a friend will not work for them, because the address only means "this computer" on each machine. To put your site online you deploy it to a host like Vercel, which gives you a real public URL anyone can open. #### Common beginner confusions A classic one is the "port already in use" message, which simply means a dev server is still running from before. The fix is to stop the old one (Ctrl and C in its terminal) or let the tool pick the next free port. Another is expecting changes to appear without a running dev server; if you close the terminal, the local site stops, which is normal. People also confuse the dev server with the production build. The dev server is optimised for fast feedback while you work, not for real visitors, so a live site is built and deployed separately. None of this needs memorising, but recognising "local" versus "live" early saves a lot of head-scratching the first time something does not behave the way you expect. ### Repos and Version Control Explained - Canonical URL: https://agenticschool.dev/fundamentals/repos-and-version-control A repository (or "repo") is your project folder once it is being tracked by a version control tool, and version control is the system that records every change so you can always go back. Put simply, version control is a labelled, unlimited undo history for your whole project, and the repo is the project plus that history. This is the foundation everything else builds on: backups, collaboration and deployment all start with a repo under version control. #### What version control gives you Version control turns a folder of files into something you can experiment with fearlessly. Every meaningful change is saved as a snapshot you can return to, so a bad edit is never permanent. It also records who changed what and why, which becomes essential the moment more than one person touches the project. #### Local repo vs remote repo Your repo lives on your computer (the local repo) and usually has a copy online (the remote repo), most often on GitHub. You work locally and "push" your saved changes up to the remote to back them up and share them. The two stay in sync as you push and pull changes. #### Why this matters for shipping A repo is not just for safety, it is the thing hosting platforms connect to. When your project is a repo on GitHub, a host like Vercel can watch it and redeploy your live site automatically every time you push. So a clean repo under version control is the on-ramp to actually shipping your work. #### Common beginner confusions A repo can feel like a mysterious special folder, but it is really just your normal project folder with a hidden record of changes attached. You still edit files the same way. People also assume a repo automatically means everything is public or backed up; neither is true until you push to a remote and choose its visibility, and a brand new business repo should default to private. One more point worth internalising early: the history is only as useful as your commits. If you commit rarely with vague messages, the safety net is weak; if you commit often with clear messages, you can step back to any moment with confidence. That single habit is what turns version control from a chore into a genuine superpower. ### What Is TypeScript? - Canonical URL: https://agenticschool.dev/fundamentals/what-is-typescript TypeScript is JavaScript with an added layer of type checking that catches mistakes before you ever run your code. It is the same language you already write, plus labels that say what kind of value each thing should be, like text, a number or a list. When you mix them up by accident, TypeScript warns you immediately instead of letting the bug surface later for a user. Most modern projects, including this one, use it for exactly that reason. #### Types are guardrails A type is just a promise about what a value is. If a function expects a number and you hand it text, TypeScript flags it as you type, in your editor, before anything runs. That turns a class of frustrating runtime bugs into clear messages you fix in seconds. ```typescript function double(n: number) { return n * 2 } double("5") // TypeScript error: a string is not a number ``` The ": number" label is the type. TypeScript catches the wrong input. #### It compiles to JavaScript Browsers and Node.js do not run TypeScript directly. A build step turns your TypeScript into plain JavaScript, removing the type labels. So TypeScript is purely a tool for you while writing; what actually ships is ordinary JavaScript. #### Why it helps when building with AI When an AI agent edits your code, TypeScript acts as an automatic reviewer. If a change breaks an expected type, the type check fails loudly, which catches mistakes before they reach your users. That fast feedback loop is one reason agent-built projects lean on TypeScript so heavily. #### Common beginner confusions The red squiggly lines under your code can feel like errors that stop everything, but most of the time they are warnings pointing at a real problem you should fix, not a crash. A type error means "this value is not the shape I expected", and the message usually tells you exactly what went wrong. Beginners also worry they must add types everywhere by hand; in practice TypeScript figures out most types on its own, and your agent adds the rest. Finally, do not confuse the type check failing with your app being broken at runtime. The whole point is that TypeScript catches the issue before you run anything, so a failing type check is the system doing its job, not a sign that something is on fire. ### What Is a Framework? - Canonical URL: https://agenticschool.dev/fundamentals/what-is-a-framework A framework is a ready-made structure for building apps that handles the common, hard parts for you so you do not start from a blank page every time. Instead of wiring up routing, rendering and project structure by hand, a framework gives you sensible defaults and conventions, and you fill in what makes your app unique. Nearly every modern website is built on one because it turns months of plumbing into something you can stand up in minutes. #### A house with the walls already up Think of a framework like a house where the foundation, walls and plumbing are already in place. You still design the rooms and decide how to live in it, but you are not pouring concrete. The framework makes the decisions that are the same for almost every app, leaving you free to focus on the parts that are actually yours. #### Frameworks you will hear about You do not need to master these, just to recognise the names when an agent or tutorial mentions one. - Next.js: a popular React framework for full web apps. - Astro: great for fast, content-heavy and marketing sites. - TanStack Start: a modern React framework, used by this very project. #### Why it matters for building with AI A framework gives an AI agent a known structure to work inside, with clear conventions for where things go. That makes the agent more reliable, because it is filling in a familiar shape rather than inventing one. Picking a mainstream framework is one of the easiest ways to get better results from agent-built projects. #### Common beginner confusions It is easy to think you must choose the "best" framework before you can start, and to freeze on that decision. In reality any mainstream framework will carry you a long way, and switching later is rare and rarely fatal, so the cost of picking imperfectly is low. People also blur frameworks with languages: a framework is built on top of a language, so React frameworks like Next.js or TanStack Start are still JavaScript and TypeScript underneath. And a framework is not a no-code tool that builds your app for you; it is a structured starting point you build inside. The leverage comes from pairing a familiar framework with an AI agent, which knows its conventions and fills in the shape far faster than building from scratch. --- ## Glossary ### AI Agent - Canonical URL: https://agenticschool.dev/glossary/ai-agent An AI agent is a software system that uses a large language model to pursue a goal by deciding what to do next, taking an action, observing the result, and repeating that loop until the task is done. The key difference from a normal chatbot is the loop and the tools: instead of just replying with text, an agent can call tools (search the web, run code, query an API, edit a file), read what comes back, and adjust its next step. So an AI agent is the model plus the ability to act and the autonomy to choose its own steps toward an outcome you set. #### Agent vs chatbot vs assistant A chatbot answers one message at a time and forgets to act on the world. An AI agent is given a goal and works toward it across multiple steps, choosing actions on its own. The line is autonomy plus tool use, not the underlying model. - Chatbot: you send a message, it replies with text. No actions, no loop. - Assistant: helps with a task but you drive each step. - AI agent: you set a goal, it plans, calls tools, checks results and iterates until done. #### The agent loop Almost every AI agent runs the same loop: the model reads the goal and current state, decides on an action, the system executes it (a tool call), the result is fed back in, and the loop repeats. This perceive, decide, act, observe cycle is what lets an agent handle work that a single prompt cannot, like fixing a failing test or shipping a small feature end to end. #### A concrete example A coding agent like Claude Code is a clear example. You ask it to "add a contact form and make the tests pass". It reads your files (a tool), writes new code (a tool), runs the test suite (a tool), sees a failure, edits the code again, and reruns the tests until they are green. You set the goal once; the agent chose and executed every step in between. That is the agentic difference in practice. ### Agentic AI - Canonical URL: https://agenticschool.dev/glossary/agentic-ai Agentic AI is artificial intelligence that acts autonomously toward a goal, planning its own steps and using tools to get there, rather than only producing a single response on request. Where generative AI gives you an output (a paragraph, an image, a block of code) for each prompt, agentic AI takes a goal and drives a multi-step process to achieve it, deciding what to do, doing it, checking the result and continuing. In short, generative AI creates content; agentic AI gets things done. It is the broader capability that individual AI agents put into practice. #### Agentic AI vs generative AI They are not opposites; agentic AI is usually built on top of generative models. The distinction is what the system does with the model. Generative AI responds. Agentic AI acts: it sets sub-goals, calls tools, and adapts based on what it observes, all to reach an outcome you defined once. - Generative AI: prompt in, content out. One step, no actions on the world. - Agentic AI: goal in, outcome out. Many steps, real actions, self-correction. - An "AI agent" is a single concrete system that exhibits agentic behaviour. #### What makes a system agentic Three properties show up across agentic systems: autonomy (it chooses its own next step), tool use (it can act, not just talk), and a feedback loop (it observes results and adjusts). The more of a task a system can carry without you steering each step, the more agentic it is. People often describe this on a spectrum from a simple assistant up to fully autonomous multi-agent systems. #### Where you already see it Coding agents that take a ticket and open a working pull request, research agents that gather and synthesise sources, and automation flows that watch for an event and complete a multi-step task are all agentic AI. For builders, the practical takeaway is that agentic AI shifts your job from writing every instruction to setting clear goals, providing the right tools and context, and keeping verification in the loop. ### Vibe Coding - Canonical URL: https://agenticschool.dev/glossary/vibe-coding Vibe coding is a way of building software where you describe what you want in plain natural language and let an AI coding tool write the code, so you steer by the result rather than by reading and writing every line yourself. The term was coined by Andrej Karpathy in February 2025, who described it as "I just see stuff, say stuff, run stuff, and copy-paste stuff, and it mostly works", and it was named Collins Dictionary Word of the Year for 2025. In short, vibe coding means you give the vibe and the goal, the AI produces the implementation, and you check whether it does what you wanted. #### How vibe coding works You prompt an AI tool in everyday language ("add a dark mode toggle", "fix this error"), it generates or edits the code, you run it, and you keep going by describing the next change. You lean on the output and the running app to judge progress instead of inspecting the code closely. This makes building fast and accessible, but it also means you are trusting code you may not fully understand. - You describe the goal in plain language, not in code. - The AI writes, edits and often runs the code for you. - You judge success by the result, then prompt the next change. #### Where it shines, and where it bites Vibe coding is great for prototypes, throwaway scripts, learning and getting a first version on screen quickly. The risk shows up when vibe-coded software goes to real users: code you never reviewed can hide security holes, break in edge cases, or become impossible to maintain. The honest practice is to vibe code freely while exploring, then slow down, read the code, add tests and harden it before anything ships. #### Vibe coding vs agentic engineering Vibe coding and agentic engineering both use AI to write code, but they are not the same. Vibe coding optimises for speed and feel and accepts not understanding the result. Agentic engineering keeps you in charge: you still drive an AI agent, but with clear goals, tests, reviews and verification so the output is production-grade. Think of vibe coding as the fun on-ramp and agentic engineering as how you turn that into something you can trust in production. ### MCP (Model Context Protocol) - Canonical URL: https://agenticschool.dev/glossary/mcp MCP (Model Context Protocol) is an open standard that lets AI models connect to external tools, data sources and services through one common interface, instead of every app inventing its own integration. It is often described as a USB-C port for AI: any MCP-compatible model can plug into any MCP server and immediately use what it offers. An MCP server exposes three kinds of capability - tools (actions the model can run), resources (read-only data it can fetch) and prompts (reusable templates) - so a model can do work in the real world without custom glue code for each connection. #### Why MCP exists Before MCP, connecting a model to a database, a file system or an API meant writing bespoke integration code for every pairing, which did not scale. MCP standardises that contract once: write an MCP server for a system and any MCP-aware client (Claude Code, an IDE, a chat app) can use it. Introduced by Anthropic in late 2024, it has become the de facto standard, with official SDKs for TypeScript, Python, C#, Java and Swift and hundreds of public servers by 2026. #### Tools, resources and prompts An MCP server can expose any mix of three primitives, each with a standard way to list and use it. - Tools: actions the model can call, like "run a query" or "create a file". This is tool calling over a shared protocol. - Resources: read-only data the model can fetch, like a document, a database row or a log. - Prompts: reusable prompt templates a server offers so clients get a consistent starting point. #### How a connection works A host application (the AI app) runs one or more clients, and each client holds a dedicated connection to a single MCP server. When the model needs to act, it asks the client to call a tool on the server; the server does the real work (say, runs a SQL query) and returns a structured result the model can read. The model never touches the database directly, which keeps permissions and security in the server where you control them. ### llms.txt - Canonical URL: https://agenticschool.dev/glossary/llms-txt llms.txt is a simple Markdown file you place at the root of your website (at /llms.txt) to give AI systems a clean, curated map of your most important content. Instead of forcing a model to crawl and guess at your HTML, ads and scripts, you hand it a short structured summary plus links to the pages that matter. It was proposed by Jeremy Howard of Answer.AI in September 2024, and it is to AI readers roughly what a sitemap or robots.txt is to search crawlers: a friendly, machine-first front door to your site. #### What the file looks like The format is deliberately small. A valid llms.txt starts with a single H1 holding your site or project name (the only strictly required part), followed by a blockquote with a one or two sentence summary, then optional sections. Each section is an H2 heading with a Markdown list of links, where every link is a title, a URL and a short description. In practice a good file curates roughly 15 to 60 of your canonical pages rather than listing everything. ```markdown # Your Site Name > A one-line summary of what your site is and who it is for. ## Docs - [Getting started](https://example.com/start): Set up in five minutes. - [API reference](https://example.com/api): Every endpoint with examples. ``` A minimal, spec-correct llms.txt: H1, blockquote summary, then H2 link sections. #### llms.txt vs llms-full.txt There are two related files. llms.txt is the short, curated index of links and descriptions. The optional llms-full.txt concatenates the actual content of those pages into one large document, so a system that wants your whole corpus in a single request can grab it. Many sites publish both: the index for discovery and the full file for deep ingestion. #### Does it actually help? Adoption is real but uneven. Developer tools like Cursor and many documentation platforms read llms.txt, and Anthropic documents support for it, but as of 2026 Google and OpenAI have not officially committed to using it. So treat llms.txt as a low-cost, on-brand part of your AEO and GEO strategy rather than a guaranteed ranking lever: it makes your site cheap and unambiguous for AI to understand, and it complements (does not replace) good content and a sitemap. ### Agent Harness - Canonical URL: https://agenticschool.dev/glossary/agent-harness An agent harness is the software scaffolding wrapped around an AI model that turns it into a working agent: it runs the loop that calls the model, handles the model tool calls, manages context and memory, enforces safety, and decides when to stop. The model itself only predicts the next tokens; the harness is everything around it that lets it perceive, act and iterate toward a goal. A practical truth in 2026 is that the harness often matters as much as the model, because two tools running the same model can behave very differently depending on how well their harness is built. #### What the harness does The harness is the runtime that orchestrates a whole agent run. It builds the prompt, exposes the available tools, executes the tool calls the model requests, feeds results back, compacts or trims context as it fills up, persists state across turns, and applies guardrails like permission checks and stop conditions. - Runs the loop: call model, run requested tools, feed results back, repeat. - Manages context: assembles the prompt, compacts history, handles the context window. - Enforces safety: permissions, approvals for risky actions, and when to stop. #### Harness vs scaffolding vs model These terms get blurred, so it helps to separate them. Scaffolding is the setup done before the first prompt (defining tools, system prompt, configuration). The harness is everything that happens after: dispatching tools, compacting context, enforcing rules, persisting state across turns. The model is the reasoning engine inside. Claude Code, Codex CLI, Cursor, Aider and Cline are all examples of agent harnesses, and their patterns are converging. #### Why it matters when choosing a tool Because the harness controls context management, tool access and safety, picking an AI coding tool is largely a choice of harness, not just of model. A strong harness keeps the model on track on long tasks, avoids burning the context window, and stops before it does something destructive. When people say a coding agent "feels smarter", they often mean its harness is better engineered. ### Subagent - Canonical URL: https://agenticschool.dev/glossary/subagent A subagent is a specialised AI agent that a main agent delegates a focused task to, running in its own separate context window with its own system prompt and its own scoped set of tools. Instead of one agent doing everything in a single conversation, the main agent hands off a job (review this code, research this topic, run these tests) to a subagent, which does the work in isolation and returns only a clean summary. This keeps the main conversation uncluttered and lets each subagent be tuned to do one thing well. #### How subagents work When the main agent spawns a subagent, the subagent starts fresh: it sees only its own system prompt and the delegation message, not the entire history of the main session. It works with a narrow toolset and permissions, completes its task, and passes back a short result. Because its context is isolated, all the noisy intermediate work (search results, logs, file dumps) stays out of the main agent context window. - Separate context window: the subagent does not inherit the whole main conversation. - Custom system prompt: a short, focused brief tuned to its single job. - Scoped tools and permissions: only what that task needs, often read-only. #### Why subagents help The big win is context hygiene. A side task that would flood your main session with output (reading dozens of files, scanning logs) instead happens in the subagent and returns just the summary, so the main agent stays focused and the context window does not fill up. A focused brief also makes the subagent more reliable on its specialised task than a single agent juggling everything. #### A concrete example In Claude Code you can define a "code reviewer" subagent with a read-only toolset and a short review brief. After the main agent writes a feature, it delegates the review to that subagent, which inspects the diff in its own context and returns a concise list of issues. The main agent then fixes them, never having loaded the full review reasoning into its own window. ### Tool Calling - Canonical URL: https://agenticschool.dev/glossary/tool-calling Tool calling, also known as function calling, is when an AI model produces structured output (usually JSON) that asks your program to run a specific function, instead of just replying with text. The model does not run the function itself; it picks which tool to use, fills in the arguments in a format that matches a schema you defined, and your code executes it and feeds the result back. Tool calling is the core mechanism that turns a chatbot into an agent: it is how a model fetches live data, runs code, queries a database or sends an email. #### How tool calling works You give the model a list of tools, each described by a JSON schema: the tool name, what it does, and the parameters it accepts with their types. When the model decides a tool is needed, it returns a structured call with the chosen tool and arguments. Your runtime parses that, executes the real function, and returns the output to the model so it can continue, perhaps calling more tools, until it can answer. - You define tools as JSON schemas: name, description, typed parameters. - The model returns a structured call (tool name plus arguments), not free text. - Your code runs the function and feeds the result back to the model. #### Why it matters Tool calling bridges probabilistic reasoning and deterministic execution. The model is good at deciding what to do; your code is good at reliably doing it. By forcing the model to output a structured call that matches a schema, you get dependable, machine-readable actions instead of hoping it formats a request correctly in prose. This is the foundation under AI agents and under standards like MCP, which expose tools over a shared protocol. #### Tool calling and MCP Tool calling is the local mechanism: tools defined for one model in one app. MCP (Model Context Protocol) standardises it across the ecosystem, so an MCP server can offer tools that any MCP-aware client can call without custom wiring. Put simply, tool calling is the verb, and MCP is one widely adopted way to make those tools reusable everywhere. ### System Prompt - Canonical URL: https://agenticschool.dev/glossary/system-prompt A system prompt is the standing instruction given to an AI model before any user message, telling it who it is, how to behave, what rules to follow and what format to use. Where a user prompt is the specific request you type, the system prompt is the persistent context that frames the entire conversation: the model role, tone, constraints and any tools or knowledge it should assume. It is set once by the developer (or by you in a settings file) and applies to every turn until it changes. #### System prompt vs user prompt The two work together but do different jobs. The system prompt is the configuration; the user prompt is the task. Models are trained to treat the system prompt as higher-priority, standing guidance, so it is where you put rules that should hold no matter what the user asks. - System prompt: role, rules, tone, format, constraints. Set once, applies throughout. - User prompt: the specific question or instruction for this turn. - The model weighs the system prompt as persistent, higher-priority context. #### What goes in a good system prompt A strong system prompt states the role ("you are a senior TypeScript engineer"), the rules ("never use em dashes", "always run the tests"), the output format you expect, and any context the model should treat as given. Keep it clear and specific; vague system prompts produce vague behaviour. In agent tools, files like a project instructions file act as a system prompt that teaches the agent your conventions. #### System prompts in agents In an agent harness, the system prompt is part of what the harness assembles every turn, and it counts against the context window, so it should be focused rather than bloated. Subagents take this further: each subagent gets its own short, specialised system prompt tuned to its single task, which is one reason they behave reliably. Because the system prompt is sent on every request, it is also a prime candidate for prompt caching to cut cost. ### Context Window - Canonical URL: https://agenticschool.dev/glossary/context-window A context window is the maximum amount of text an AI model can consider at once, measured in tokens. It includes everything in play: the system prompt, the files or data you paste in, the conversation so far, and the answer the model is writing. Think of it as the model working memory: anything inside the window can influence the response, and once the window is full the oldest content effectively falls out of view. Knowing the size of a model context window, and managing what you put in it, is central to getting good, affordable results. #### Why the context window matters The context window sets a hard ceiling on how much the model can take into account in one go. If your instructions, code and history exceed it, something has to be dropped or summarised, and the model can lose track of details you gave earlier. This is why long, sprawling chats start forgetting things, and why a fresh, focused conversation often beats piling onto an old one. - Everything counts: system prompt, pasted files, history and the output share the window. - When it fills up, the oldest content is dropped or compacted and can be forgotten. - Bigger is not always better: a stuffed window can still bury the key detail. #### Sizes and the "lost in the middle" problem Context windows have grown large, with leading models in 2026 offering hundreds of thousands of tokens and some reaching a million, enough to hold a whole codebase. But more room is not a free lunch: models can pay less attention to information buried in the middle of a long context, an effect often called "lost in the middle". So putting the most important context near the start or end, and keeping it relevant, still beats dumping everything in. #### Managing the context window Agent harnesses spend a lot of effort here: they compact older turns into summaries, trim irrelevant content, and offload noisy side work to subagents so the main window stays clean. The practical rule for you is the same as for cost: send less but more relevant text. Good context management is its own discipline, sometimes called context engineering, and it is one of the highest-leverage skills when building with agents. ### Prompt Caching - Canonical URL: https://agenticschool.dev/glossary/prompt-caching Prompt caching is a feature that stores the processed state of a repeated part of your prompt so later requests can reuse it instead of paying to process it again. When many requests share the same long prefix - a big system prompt, tool definitions, or a document you keep asking about - caching lets the model skip recomputing that prefix, which cuts both cost and latency. On the Claude API, for example, a cache read costs about a tenth of normal input price, a roughly 90 percent discount on the cached portion. #### How prompt caching works You mark a point in your prompt as a cache breakpoint. The provider stores the encoded state of everything up to that point, and the next request that begins with the exact same bytes reads from the cache rather than recomputing. The match must be exact: if even one token in the prefix differs, you get a cache miss and pay the full price. The order matters too, since the prompt is hashed as tools, then system prompt, then messages. - Put the stable, repeated content first (tools, system prompt, long documents). - Mark a cache breakpoint after it; the prefix up to there gets cached. - A later request with the identical prefix reads the cache cheaply. #### What it costs and saves There is a small premium to write the cache and a large saving to read it. On the Claude API a cache write costs about 1.25x normal input for a 5 minute lifetime (or 2x for a 1 hour lifetime), while a cache read costs about 0.1x input, a roughly 90 percent saving. The cache is ephemeral with a short time-to-live that resets on each read, so a busy conversation keeps its cache warm without paying the write cost again. #### When to use it Prompt caching pays off whenever you reuse a large, stable prefix across many calls: a long system prompt, a fixed set of tool definitions, a knowledge document, or a multi-turn chat where the early context stays the same. It does not help one-off prompts that never repeat. Because the saving applies only to the unchanged prefix, structure your prompts so the constant parts come first and the variable parts come last. ### AI IDE - Canonical URL: https://agenticschool.dev/glossary/ai-ide An AI IDE is a code editor (integrated development environment) with an AI coding agent built deeply into it, so the AI can understand your whole project, write and edit code across many files, and carry out tasks without you leaving the editor. It goes beyond simple autocomplete: an AI IDE has project-wide context, a chat or agent panel, and the ability to make multi-file changes you review inline. Cursor and Windsurf are the best-known examples, and many traditional editors now add AI-IDE features through extensions. #### AI IDE vs autocomplete vs terminal agent There is a spectrum of AI coding tools. An AI IDE sits in the middle: it is a full editor where the AI sees your project and edits across files, with you reviewing changes visually. That is more than inline autocomplete and different from a terminal-based agent. - Autocomplete (like Copilot): suggests the next lines as you type. - AI IDE (like Cursor, Windsurf): a full editor with a project-aware agent that edits across files. - Terminal agent (like Claude Code, Codex CLI): an agent that lives in the terminal and drives your repo. #### What makes an editor an AI IDE The defining traits are project-wide understanding and agentic editing. The AI can index your codebase, answer questions about it, and apply coordinated changes across several files, then show you a diff to accept or reject. You stay in one place to chat, generate, edit and review, which keeps a tight feedback loop. Underneath, an AI IDE is an agent harness with a graphical editor as its front end. #### How to choose AI IDEs suit people who like a visual editor and want to see and approve every change in context, which makes them friendly for learning and for frontend work. Terminal agents suit those who want maximum automation and scripting. Many builders use both: an AI IDE for hands-on editing and a terminal agent for longer, more autonomous tasks. The right pick depends on whether you prefer a visual, review-heavy workflow or a more hands-off one. ### Workflow Automation - Canonical URL: https://agenticschool.dev/glossary/workflow-automation Workflow automation is using software to run a multi-step process automatically, so a repetitive task happens on its own instead of someone doing each step by hand. A workflow is just a sequence: a trigger starts it (a new email, a form submission, a schedule), then a series of actions run in order (save data, send a message, update a record). Tools like n8n, Zapier and Make let you build these flows visually, and adding AI turns rigid automations into ones that can read, decide and write in plain language. #### Trigger, actions, result Every automation follows the same shape. Something kicks it off, then steps run automatically until the job is done, with no person clicking through each one. - Trigger: the event that starts the flow (new lead, incoming email, a time of day). - Actions: the ordered steps that run (filter, transform, call an API, send a notification). - Result: the outcome that used to be manual now happens every time, consistently. #### How AI changes it Classic automation is rules based: if this, then exactly that. It breaks on anything fuzzy, like understanding a free-text email or summarising a document. Dropping an AI step into a workflow handles the messy parts: it can classify a message, extract the key fields, draft a reply, or decide which branch to take. This is where workflow automation starts to overlap with AI agents, which add their own decision loop on top. #### Automation vs agents A workflow automation runs a fixed path you designed; it is predictable and easy to audit. An AI agent decides its own steps toward a goal, which is more flexible but less predictable. Many real systems blend the two: a deterministic workflow for the reliable plumbing, with an AI or agent step where judgement is needed. For business use, starting with a clear automation and adding intelligence only where it pays off keeps things reliable and measurable. ### GEO (Generative Engine Optimization) - Canonical URL: https://agenticschool.dev/glossary/geo GEO (generative engine optimization) is the practice of optimising your content and online presence so that generative AI tools - ChatGPT, Google AI Overviews, Gemini, Perplexity and others - cite, mention or recommend your brand when people ask them questions. Where classic SEO aims to rank a blue link in a search results page, GEO aims to be part of the AI-generated answer itself. As more people get answers from AI instead of clicking through to websites, GEO has become the way to stay visible in that new, answer-first surface. #### How GEO differs from SEO SEO and GEO share a foundation (good, trustworthy content) but target different surfaces. SEO optimises to rank links in a traditional search engine. GEO optimises to be included and cited inside an AI-generated response, where there may be no list of links at all, just a synthesised answer that mentions a few sources. - SEO: rank a clickable link in a search results page. - GEO: be cited, named or recommended inside an AI-generated answer. - Both reward clear, accurate, authoritative content; GEO adds machine readability. #### How GEO and AEO relate GEO and AEO (answer engine optimization) overlap heavily and are often used interchangeably, but there is a useful distinction. AEO focuses on being returned as the direct, factual answer to a specific question. GEO is broader: it focuses on influencing the narratives and recommendations AI builds around a topic, so your brand is woven into how the AI talks about your space, not just quoted for one fact. #### What GEO looks like in practice Practical GEO means writing content an AI can confidently quote: clear definitions up front, structured headings and FAQs, accurate and current facts, and citable claims. It also means strong, trustworthy signals (a real author, consistent mentions across the web) and making your site machine-readable, including structured data and an llms.txt file. The goal is to be the source an AI reaches for, and feels safe citing, when it answers a question in your field. ### AEO (Answer Engine Optimization) - Canonical URL: https://agenticschool.dev/glossary/aeo AEO (answer engine optimization) is the practice of optimising your content so that AI answer engines - Google AI Overviews, ChatGPT, Perplexity and similar - return it as the direct, authoritative answer to a question. Instead of trying to win a click on a results page, AEO aims to make your content the response itself, the snippet or citation the answer engine surfaces when someone asks. It is the discipline of becoming a fact-level authority an AI trusts enough to quote. #### Why AEO matters now A growing share of questions are answered directly by AI, with no click to a website. If your content is not structured to be picked as the answer, you can be invisible even when you have the best information. AEO closes that gap by making your answers easy to find, quote and trust, so you stay present in the answer-first world rather than only on page one of links. #### How to optimise for answers AEO rewards content that reads like a clean answer. The patterns are concrete and repeatable. - Answer the question directly in the first sentence or two, then expand. - Use clear question-style headings and an FAQ so each answer is self-contained. - Add structured data (FAQ, How-To, Article) so machines parse your answers reliably. - Keep facts accurate, current and citable, with real authorship behind them. #### AEO vs GEO vs SEO These three are layers, not rivals. SEO optimises to rank links in search. AEO optimises to be the direct answer an engine returns. GEO is broader still, optimising to be cited and woven into the narratives generative AI builds around a topic. AEO and GEO overlap so much that many teams treat them as one effort; the simplest framing is that AEO is about being the answer and GEO is about being part of the story. --- ## Guides ### How to Use Claude Code: Complete Beginner Guide (2026) - Canonical URL: https://agenticschool.dev/guides/how-to-use-claude-code Claude Code is Anthropic's terminal-based AI coding agent: you run one command in your project, describe what you want in plain language, and the agent reads your files, plans, edits code, runs commands and checks its own work in a loop. This complete beginner guide takes you from nothing installed to a productive first session: how to install Claude Code, how to start it, the core plan-edit-run-review workflow that makes it reliable, the CLAUDE.md file that teaches it your rules, and where to go next to master subagents, hooks, skills and MCP. Everything here is current as of June 2026 and verified against the official docs. #### What Claude Code is (and what it is not) Claude Code is an agent harness: software that wraps a Claude model in a loop with tools (read files, edit files, run shell commands, search the web) and the judgement to use them toward a goal you set. It runs in your terminal, in VS Code and JetBrains, in a desktop app and on the web. It is not just autocomplete and it is not a chat window that hands you snippets to paste: it works directly in your repository, makes multi-file changes, runs your tests, and iterates until the task is done. You stay in control by reviewing what it proposes and approving the actions that matter. - It lives where your code lives: the terminal, your editor, or a browser tab. - It acts, not just suggests: it edits files and runs commands, then reads the results. - You supervise: it asks permission for sensitive actions and you review its plans and diffs. - It needs an account: a Claude Pro, Max, Team, Enterprise or Console (pay-per-token) plan. The free Claude.ai plan does not include Claude Code. #### Installing Claude Code The recommended way to install Claude Code in 2026 is the native installer, which has no Node.js dependency and auto-updates itself in the background. Run the one-line installer for your platform, then confirm it worked. If you prefer npm, the package still works and installs the same binary, but it needs Node.js 18 or later and does not auto-update as cleanly. Never install with sudo: a root-owned npm directory causes permission failures on every future install. ```bash # macOS, Linux, WSL (recommended native installer) curl -fsSL https://claude.ai/install.sh | bash # Windows PowerShell irm https://claude.ai/install.ps1 | iex # npm fallback (needs Node.js 18+, do NOT use sudo) npm install -g @anthropic-ai/claude-code # Confirm it installed claude --version ``` Install Claude Code with the native installer, or npm as a fallback, then verify the version. If something looks off, run "claude doctor" for a health check of your install and configuration. On native Windows, installing Git for Windows is optional but lets Claude Code use Git Bash for shell commands. #### Your first session Open a terminal inside the project you want to work on and run "claude". The first time, it walks you through logging in via the browser. Once you are in, you are at an interactive prompt: type a request in plain English and press enter. Start small to build trust, for example "explain what this project does" or "find where the homepage hero is rendered". Claude Code reads the relevant files itself, so you do not need to paste code in. When you want to end the session, type /exit, and to resume later run "claude -c" to continue your most recent session in that folder. ```bash # Start Claude Code in the current project claude # Continue your most recent session in this folder claude -c # Resume a specific past session from a list claude --resume ``` Starting, continuing and resuming a Claude Code session. #### The core workflow: plan, edit, run, review The single habit that makes Claude Code reliable is to plan before you let it edit. Press Shift+Tab to enter plan mode: Claude explores your codebase and proposes what it intends to do without changing a single file, so you can correct course cheaply before any code is written. Approve the plan, let it make the edits, have it run your tests or dev server, and then review the diff before you commit. This explore-plan-implement-commit loop keeps the agent honest and catches a wrong direction while it is still free to fix. - Plan: press Shift+Tab for plan mode so Claude proposes changes before editing. Use it for anything touching several files, schema, or security. - Edit: approve the plan and let Claude make the multi-file changes. - Run: have it run your tests, type checker or dev server so it sees real results, not assumptions. - Review: read the diff and commit. You are the quality gate before code is permanent. #### CLAUDE.md: teaching the agent your rules CLAUDE.md is a plain Markdown file at the root of your repo that Claude Code reads automatically at the start of every session and treats as standing instructions. It is the highest-leverage upgrade you can make: put your stack, conventions, quality gate and tone in it once, and the agent stops re-deciding them every task. Generate a starting file by running /init, which scans your project and drafts a CLAUDE.md, then trim it to the rules you actually keep correcting. Keep it tight, because it is reloaded every turn and eats the same context window your task needs. ```markdown # Project Rules ## Stack - TypeScript only. Package manager is bun, never npm. ## Conventions - Use rounded-sm for border-radius. Never use em dashes; use "-". ## Quality - Before saying a task is done, run: bun run lint && bun run typecheck && bun run test. ## Tone - Direct and concise. No filler. ``` A compact CLAUDE.md you can adapt. Run /init to generate a starting point. #### Go deeper: the four skills that make you a power user Once the basics click, four features turn Claude Code from a helpful assistant into a reliable teammate. Each has its own deep-dive guide in this cluster. Subagents let you delegate noisy side work to a fresh context so your main session stays sharp. Hooks run your quality gates automatically so a broken change cannot slip through. Skills and slash commands package your best workflows into one-word triggers. And MCP connects the agent to your databases, browsers and services. Read them in any order; together they are the difference between using Claude Code and mastering it. - Subagents: delegate research, review or testing to a specialised agent in its own context window. See the Claude Code Subagents guide. - Hooks: fire shell commands automatically at lifecycle events to enforce formatting, tests and safety. See the Claude Code Hooks guide. - Skills and slash commands: capture a repeated workflow as a reusable /command or Skill. See the Skills and Slash Commands guide. - MCP: connect external tools and data through one standard protocol. See the MCP setup guide. ### Claude Code Subagents Explained (with Examples) - Canonical URL: https://agenticschool.dev/guides/claude-code-subagents A Claude Code subagent is a specialised assistant the main agent delegates a focused task to, running in its own separate context window with its own system prompt and its own scoped set of tools. Instead of the main session doing everything and filling up with noisy output, it hands a job (review this code, research this question, run these tests) to a subagent, which works in isolation and returns only a clean summary. This guide explains what subagents are, when they earn their keep, the built-in ones Claude Code ships with, and exactly how to create your own with the /agents command or a Markdown file in .claude/agents. #### What a subagent actually is Each subagent runs in its own context window with a custom system prompt, specific tool access and independent permissions. When Claude encounters a task that matches a subagent's description, it delegates to that subagent, which works independently and returns its result. The key word is isolation: all the messy intermediate work, the dozen files it read, the failed approaches, the raw command output, stays in the subagent's window and never touches the main conversation. The main agent receives only the conclusion. - Separate context window: the subagent does not inherit your whole main conversation. - Custom system prompt: a short, focused brief tuned to its single job. - Scoped tools and permissions: only what the task needs, often read-only. - Returns a summary: the noisy work is absorbed, the main session gets the clean result. #### When to use a subagent Reach for a subagent when a side task would flood your main conversation with search results, logs or file contents you will not reference again. The subagent does that work in its own context and hands back just the answer, so your main window stays sharp and you avoid the performance cliff that hits when a context window fills with noise. Define a custom subagent when you keep spawning the same kind of worker with the same instructions: a code reviewer, a test writer, a docs generator. - Preserve context: keep exploration and bulk reading out of your main conversation. - Enforce constraints: limit a subagent to read-only tools so a reviewer cannot edit. - Specialise behaviour: a focused system prompt makes it more reliable on its one job. - Control cost: route narrow work to a faster, cheaper model like Haiku. #### Built-in subagents you already have Claude Code ships with built-in subagents it uses automatically, so you benefit from subagents before you ever create one. Explore is a fast, read-only agent for searching and understanding a codebase, often run on Haiku to keep it cheap. Plan is the research agent used in plan mode to gather context before proposing a plan. General-purpose handles complex, multi-step tasks that need both exploration and changes. You can block a built-in type via permissions if you need to, but most of the time you let them do their job. - Explore: fast, read-only codebase search and analysis (often Haiku). - Plan: read-only research agent that gathers context during plan mode. - General-purpose: full-tool agent for complex, multi-step work. #### Creating a custom subagent Subagents are Markdown files with YAML frontmatter. The easiest way to create one is the /agents command, which opens a guided interface where you name the subagent, write its description, pick its tools and model, and can have Claude draft the system prompt for you. Project subagents live in .claude/agents/ (check them into version control so your team shares them); personal ones live in ~/.claude/agents/ and follow you across projects. Only name and description are required in the frontmatter; the description is what Claude reads to decide when to delegate, so write it clearly. ```markdown --- name: code-reviewer description: Reviews code for quality and best practices. Use proactively after code changes. tools: Read, Glob, Grep model: sonnet --- You are a senior code reviewer. When invoked, analyse the changed code and return a concise, prioritised list of issues covering correctness, security and readability. Do not edit files; only report. Cite the file and line for each issue and suggest the smallest fix. ``` A real read-only code-reviewer subagent at .claude/agents/code-reviewer.md. Only name and description are required. Files added directly on disk are loaded at session start, so restart Claude Code (or create it via /agents, which takes effect immediately) to pick up a new file. The model field accepts sonnet, opus, haiku or inherit; the tools field defaults to inheriting all tools if you omit it. #### Invoking and working with subagents Most of the time you do not invoke a subagent by hand: Claude reads each subagent's description and delegates automatically when a task matches. You can also nudge it explicitly, for example "use the code-reviewer agent to review this diff". A good pattern is to write the feature with your main agent, then have it delegate the review to a read-only reviewer subagent, which inspects the diff in its own context and returns a tight issue list, after which the main agent fixes them. Because the review reasoning never loaded into your main window, the session stays clean. ### Claude Code Hooks: Automate Your Quality Gates - Canonical URL: https://agenticschool.dev/guides/claude-code-hooks Claude Code hooks are shell commands that Claude Code runs automatically at specific points in its lifecycle, so you can enforce your quality gates deterministically instead of hoping the agent remembers. Where a CLAUDE.md rule is a suggestion the model may or may not follow, a hook always fires: it can run your formatter after every file write, run your tests before the agent stops, or block a dangerous command outright. This guide explains the hook events, how hooks are configured, and a practical setup you can copy today. #### What a hook actually is Every hook has three parts: an event (the lifecycle moment it fires on), an optional matcher (a filter so it only runs for certain tools), and an action (the shell command it runs). When a hook fires, Claude Code passes it JSON on standard input describing the event (session id, working directory, the tool name and its input), and your script decides what to do. The command exit code controls flow: exit 0 means proceed, and a non-zero exit (conventionally 2) blocks the action and feeds your message back to the agent so it can react. - Event: when the hook fires (for example after a tool runs). - Matcher: an optional filter, for example only on file edits. - Action: the shell command Claude Code executes. #### The lifecycle events you can hook Hooks cover the full tool lifecycle. The ones you will use most are PreToolUse and PostToolUse (around every tool call), Stop (when the agent is about to finish), and SessionStart (when a session begins). There are more for finer control. - PreToolUse: before a tool runs - validate or block the action. - PostToolUse: after a tool runs - format, lint or check the result. - UserPromptSubmit: when you send a prompt - inject context or guard input. - Stop and SubagentStop: when the agent (or a subagent) is about to stop - run tests as a final gate. - SessionStart and Notification: session setup and when the agent needs your attention. #### Where hooks are configured Hooks live in your Claude Code settings.json. Put team-wide, non-negotiable gates in the project file at .claude/settings.json so everyone shares the same guardrails, and keep personal preferences in your user settings. Each entry pairs an event with a matcher and the command to run. ```json { "hooks": { "PostToolUse": [ { "matcher": "Edit|Write", "hooks": [ { "type": "command", "command": "bun run lint --fix" } ] } ] } } ``` A PostToolUse hook that runs your linter after every file edit or write. #### A practical quality-gate setup The highest-leverage setup is small: format and lint on every write, and run your test suite as a Stop gate so the agent cannot declare a task done with red tests. Add a PreToolUse guard if you want to block destructive shell commands. Keep hook scripts fast and idempotent, because they run often, and write to standard error with a clear message when you block so the agent knows how to fix it. ### Claude Code Skills and Slash Commands - Canonical URL: https://agenticschool.dev/guides/claude-code-skills-and-commands Claude Code Skills and slash commands are reusable, named workflows: instead of retyping the same multi-step instructions, you invoke one and the agent runs your proven steps with the right context already loaded. As of 2026, custom slash commands have merged into Skills: a file at .claude/commands/deploy.md and a Skill at .claude/skills/deploy/SKILL.md both create the /deploy command and work the same way. This guide shows you how to create a slash command and a SKILL.md, how arguments and dynamic context work, where they live, and when Claude loads a Skill automatically versus when you trigger it yourself. #### Skills and commands are now one thing For years Claude Code had two separate features: slash commands (a Markdown file you triggered with /name) and Skills (a richer, self-contained capability). In 2026 they were unified. Custom commands have been merged into Skills, so both create a slash command and behave the same way. Your existing .claude/commands/ files keep working, and Skills simply add optional features on top: a directory for supporting files, frontmatter to control who invokes them, and the ability for Claude to load them automatically when relevant. Claude Code Skills follow the open Agent Skills standard, so the same skill can work across tools. - A file at .claude/commands/deploy.md creates /deploy. - A Skill at .claude/skills/deploy/SKILL.md also creates /deploy. - Skills add: supporting files, invocation control, and automatic loading by description. - Existing .claude/commands/ files still work; Skills are the recommended path forward. #### Creating a slash command The simplest reusable workflow is a Markdown file whose name becomes the command. Drop a file in .claude/commands/ (project) or ~/.claude/commands/ (personal) and Claude Code exposes it as a slash command. Use the $ARGUMENTS placeholder to capture whatever you type after the command, or positional $1, $2 for individual arguments. A description in the YAML frontmatter shows up as help text. This is perfect for a single, well-defined action you trigger often. ```markdown --- description: Investigate and fix a GitHub issue by number. argument-hint: [issue-number] --- Fix GitHub issue #$ARGUMENTS. First read the issue with the gh CLI, then find the relevant code, propose a fix in plan mode, and only implement after I approve. Run the test suite before you say it is done. ``` A slash command at .claude/commands/fix-issue.md, invoked as "/fix-issue 123". $ARGUMENTS captures the 123. #### Writing a SKILL.md A Skill is a directory with a SKILL.md file at its root: YAML frontmatter plus Markdown instructions, and optionally bundled scripts, templates or reference files. The directory name becomes the command you type, and the description is what Claude reads to decide when to load the skill automatically, so a sharp description is half the work. Only the description is recommended; everything else is optional. Skills shine when a workflow is multi-step, has its own assets, or you want Claude to invoke it on its own when the moment fits. ```markdown --- description: Summarise uncommitted changes and flag risks. Use when the user asks what changed or wants a commit message. allowed-tools: Read, Grep --- ## Current changes !`git diff HEAD` ## Instructions Summarise the changes above in two or three bullet points, then list any risks such as missing error handling, hardcoded values, or tests that need updating. If the diff is empty, say there are no uncommitted changes. ``` A SKILL.md at .claude/skills/summarize-changes/SKILL.md, invoked as /summarize-changes or loaded automatically by its description. The line with !`git diff HEAD` is dynamic context injection: Claude Code runs the command and replaces the line with its output before the model sees the skill, so the instructions arrive with your real diff already inlined. Keep the body concise, because once a skill loads its content stays in context across turns. #### Where they live and who invokes them Project Skills live in .claude/skills//SKILL.md and apply to that repo; personal Skills live in ~/.claude/skills//SKILL.md and follow you everywhere. Claude can invoke a Skill automatically when your request matches its description, or you can trigger it directly with /name. If a workflow should only ever run when you ask for it (a deploy, say), set disable-model-invocation: true so Claude never triggers it on its own. To hide a knowledge-only skill from the slash menu, set user-invocable: false. - Project: .claude/skills//SKILL.md - shared with the repo. - Personal: ~/.claude/skills//SKILL.md - all your projects. - Auto-invoked by description, or triggered directly with /name. - disable-model-invocation: true makes a skill manual-only; user-invocable: false hides it from the / menu. #### When to use which The signal to package anything is repetition: the second time you brief the agent on the same sequence, capture it. Then choose the form by how rich it is. A single clear action with no assets is a plain command file. A multi-step process with its own templates, scripts or reference docs, or one you want Claude to load automatically, is a Skill. You can start with a command and grow it into a Skill later. Keep your library small and focused: two or three you actually reach for beat twenty you forget exist. ### How to Set Up MCP in Claude Code - Canonical URL: https://agenticschool.dev/guides/claude-code-mcp-setup MCP (Model Context Protocol) is an open standard that connects Claude Code to external tools, databases and services through one common interface, so the agent can read and act on those systems directly instead of you copying data into chat. This guide shows you exactly how to set up MCP in Claude Code: what connecting a server gives you, how to add one with the claude mcp add command for each transport, the scopes that decide who shares the connection, how to verify and authenticate with /mcp, and the common servers worth starting with. Every command here is current as of June 2026. #### What MCP gives you in Claude Code Connect a server when you find yourself copying data into chat from another tool, like an issue tracker, a database or a monitoring dashboard. Once connected, Claude can read and act on that system directly. An MCP server exposes tools (actions the agent can run), resources (read-only data it can fetch) and prompts (reusable templates), and because MCP is a shared standard, an integration is written once per tool and works across every MCP-aware client. That is why support spread so fast and why there is now an MCP server for almost everything you would want to connect. - Implement from an issue tracker: "Add the feature described in JIRA ENG-4521 and open a PR." - Query a database directly instead of pasting rows into chat. - Integrate designs, monitoring data and other services the agent could not otherwise see. - Tools, resources and prompts: actions to run, data to read, templates to reuse. #### Adding a server with claude mcp add The fastest way to connect a server is the claude mcp add command. There are three transports. HTTP is the recommended choice for remote, cloud-based servers. Stdio runs a server as a local process on your machine, ideal for tools that need direct system access. SSE also exists but is deprecated in favour of HTTP. For stdio servers, the double dash separates Claude's own options from the command that runs the server, and everything after it is passed to the server untouched. ```bash # Remote HTTP server (recommended for cloud services) claude mcp add --transport http notion https://mcp.notion.com/mcp # HTTP server with a bearer token claude mcp add --transport http secure-api https://api.example.com/mcp \ --header "Authorization: Bearer your-token" # Local stdio server (note the -- before the command) claude mcp add --transport stdio airtable \ --env AIRTABLE_API_KEY=YOUR_KEY -- npx -y airtable-mcp-server ``` Adding remote HTTP and local stdio MCP servers with claude mcp add. SSE also exists but is deprecated. #### Choosing a scope The --scope flag decides who can use the server and where the config is stored. Local (the default) keeps it to you in the current project. Project shares it with everyone via a committed .mcp.json file at the repo root, so a project-scoped server is the right choice for connections your whole team needs. User makes it available to you across all your projects. Servers that need credentials should take them through environment variables (the --env flag), never hard-coded keys, so the same .env discipline applies and secrets stay out of committed config. ```json { "mcpServers": { "filesystem": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-filesystem", "./data"] }, "playwright": { "command": "npx", "args": ["-y", "@playwright/mcp@latest"] } } } ``` A project-scoped .mcp.json (committed to the repo) connecting a filesystem server and a Playwright browser server. Project-scoped servers from .mcp.json are not trusted automatically: they appear as pending approval until you review and approve them when you run Claude interactively, which protects you from running untrusted server code a teammate added. #### Verifying and authenticating After adding a server, confirm it is working. From the shell, "claude mcp list" shows all configured servers and their status, "claude mcp get " shows details for one, and "claude mcp remove " deletes it. Inside a Claude Code session, the /mcp panel shows each connected server, its tool count, and lets you authenticate with servers that require OAuth 2.0 by walking you through the browser sign-in. If a request needs a server that is still connecting, Claude waits for it before continuing. ```bash # List all configured servers and their status claude mcp list # Details for one server claude mcp get notion # Remove a server claude mcp remove notion # Inside a session: check status and authenticate (OAuth) /mcp ``` Managing and verifying MCP servers. Use /mcp inside a session to authenticate via OAuth. #### Common servers and connecting deliberately Good first connections are the filesystem server (scoped to a folder), the Playwright server (so the agent can drive a real browser), and a database server for your stack. But connect deliberately: every server adds its tool definitions to your context window, so each one has a real, ongoing cost. An MCP server is also third-party code with access to your systems, so vet what you connect just as you would a dependency. Often a CLI tool you already have, named in your CLAUDE.md, does the job with less overhead than a dedicated server. Three servers you use beat fifteen that bloat your context. ### What Is Agentic Engineering? The 2026 Pillar Guide - Canonical URL: https://agenticschool.dev/guides/what-is-agentic-engineering Agentic engineering is the discipline of building real software by directing AI coding agents that plan, write and run code in a loop, while you own the goal, the context and the verification. Instead of typing most of the code yourself, you set a clear objective, give the agent the tools and documentation it needs, let it explore the codebase, propose a plan, make the edits and run the tests, and then you review and decide what ships. It is the professional, accountable counterpart to "vibe coding": the same agents, but used with structure, judgement and quality gates so the output is something you can put in production and stand behind. This pillar guide defines agentic engineering precisely, shows how it differs from both vibe coding and traditional development, breaks down the core loop and the skills it rewards, lists the tools the field runs on in 2026, and gives you a path to learn it. Everything here is current as of June 2026. #### A precise definition Agentic engineering is software development where AI coding agents do most of the typing and a human engineer does the directing. An agent is a model wrapped in a loop with tools: it can read your files, edit them, run shell commands and read the results, then iterate toward a goal. Agentic engineering is the practice of using those agents deliberately, with three things you keep firmly in human hands: defining the goal, preparing the context and tools the agent needs, and verifying the result. The term gained currency through practitioners like Simon Willison, who frames it around exactly those human-led pillars, and builders such as Indy Dev Dan who popularised the agentic-coding workflow. The shorthand: you stop being the typist and become the lead who sets direction and guarantees quality. - Goal definition: state precisely what "done" means before the agent starts, not vaguely. - Context and tool preparation: give the agent the docs, files, commands and access it needs to succeed (a CLAUDE.md, MCP servers, a clean repo). - Verification: review the plan and the diff, run the tests, and decide what is safe to ship. You are the quality gate. - The agent does the rest: exploring, planning, editing and running in a loop you supervise. #### Agentic engineering vs vibe coding Vibe coding, the term Andrej Karpathy coined on 2 February 2025, is the opposite end of the spectrum: you "give in to the vibes," prompt the model, accept the changes without really reading them, paste back any errors, and forget the code even exists. It is brilliant for a weekend prototype, a throwaway script or a demo, where speed matters and nothing is at stake. Agentic engineering uses the very same agents, but refuses to skip the parts that make software trustworthy. You read the plan, you read the diff, you keep tests green, and you understand what shipped. The honest framing many practitioners use is that vibe coding describes a prototype while agentic engineering describes a production system. Our dedicated comparison goes deeper, but the one-line difference is accountability: with vibe coding nobody is checking; with agentic engineering you are. - Vibe coding: prompt, accept, run, repeat. No review, no tests, no real understanding. Great for prototypes and disposable code. - Agentic engineering: prompt, but plan, review the diff, run the quality gate, and own the result. Built for production. - Same tools, different discipline. The agent is identical; the rigour around it is not. - See the Vibe Coding vs Agentic Engineering guide for the full, honest comparison and when each is the right call. #### How it differs from traditional development Traditional development means a human writes the code line by line, with autocomplete and a debugger as helpers. Agentic engineering inverts the ratio: the agent generates and runs most of the code, and your time moves up the stack to the work that actually needs human judgement. The bottleneck shifts from how fast you can type to how clearly you can specify a goal, how well you have prepared the agent context, and how rigorously you verify. The skills that compound are not memorising syntax but writing precise specs, designing a clean architecture the agent can navigate, reading diffs fast, and building quality gates that catch mistakes automatically. The engineers who win are not the ones who resist agents; they are the ones who learn to direct them well. - Your scarce resource shifts from typing speed to clarity of intent and quality of review. - Architecture and naming matter more, because a clean codebase is one an agent can navigate and you can review. - Specs, tests and CLAUDE.md-style rules become first-class artefacts, not afterthoughts. - You stay the engineer of record: the agent proposes, you dispose. #### The core loop Every reliable agentic workflow runs the same loop, whatever tool you use: explore, plan, implement, verify. First the agent explores your codebase to understand the real state rather than guessing. Then it proposes a plan you can correct cheaply before any code is written, because a wrong direction caught in the plan costs nothing and a wrong direction caught after a thousand-line diff costs a lot. Then it implements the change and runs your tests, type checker or dev server so it works from real results, not assumptions. Finally you verify: read the diff, confirm the gate is green, and commit. The single habit that separates people who get great results from people who get frustrated is planning before editing and never skipping the verify step. - Explore: let the agent read the relevant code so it acts on reality, not a guess. - Plan: have it propose its approach before touching files, and correct it while that is free. - Implement: approve the plan, let it make the edits and run the commands. - Verify: read the diff, keep the tests green, and decide what ships. Loop until done. #### The tools of the field in 2026 Agentic engineering runs on a small, fast-moving stack. The agent harness is the core: a terminal or IDE agent that wraps a model in the explore-plan-implement-verify loop. Claude Code (Anthropic) and Codex CLI (OpenAI) are the leading terminal agents; Cursor is the leading agentic IDE. Underneath sits a frontier model (Opus, Sonnet, GPT or Gemini tiers) chosen for the task and budget. Around the agent you wire context and tools: a CLAUDE.md or AGENTS.md that teaches it your rules, MCP servers that connect it to your databases, browsers and services, hooks that enforce quality gates automatically, and subagents that keep noisy side work out of your main context. None of this is exotic; it is the everyday kit of someone shipping with agents today. - Harness: Claude Code or Codex CLI in the terminal, Cursor as an agentic IDE. See Claude Code vs Codex CLI. - Model: pick Opus, Sonnet, GPT or Gemini tiers by task difficulty and cost. - Context: a CLAUDE.md / AGENTS.md of standing rules, plus MCP servers for live data and actions. - Guardrails: hooks for deterministic quality gates and subagents for context isolation. #### How to learn agentic engineering You learn it the same way you learn any craft: by shipping something real, then tightening the discipline each time. Start by getting fluent with one harness end to end, our How to Use Claude Code guide is the fastest on-ramp. Then internalise the loop and the review habits on a small project you actually care about. From there, follow a structured path that takes you from your first shipped app through the modern app stack, automation and agent-first quality practices. The Agentic Engineering Roadmap lays out that zero-to-shipping sequence and links every step, and if a term trips you up, the glossary defines the vocabulary (agentic AI, AI agent, agent harness, vibe coding) in plain language. - Get fluent with one harness: start with How to Use Claude Code. - Ship a small real project to build the explore-plan-implement-verify habit. - Follow the Agentic Engineering Roadmap from first app to agent-first production. - Lean on the glossary for the vocabulary: agentic AI, AI agent, agent harness, vibe coding. ### Vibe Coding vs Agentic Engineering: What's the Difference? - Canonical URL: https://agenticschool.dev/guides/vibe-coding-vs-agentic-engineering Vibe coding and agentic engineering both mean building software with AI coding agents, but they sit at opposite ends of one dial: how much you trust the output without checking it. Vibe coding, the term Andrej Karpathy coined in February 2025, is letting the agent run and accepting what it produces without really reading it, which is wonderful for prototypes and miserable for production. Agentic engineering uses the same agents but keeps a human firmly in the loop: you plan, review the diff, run the tests and own what ships. This guide defines both honestly, shows where each genuinely belongs, and explains why agentic engineering is the approach that scales past a weekend demo. It is current as of June 2026 and pairs with our pillar, What Is Agentic Engineering. #### What vibe coding actually means Vibe coding is a specific, real thing, not just a put-down. Andrej Karpathy described it on 2 February 2025 as "a new kind of coding where you fully give in to the vibes, embrace exponentials, and forget that the code even exists." In practice it means you talk to the agent, accept its changes without scrutinising the diff, paste any error straight back for it to fix, and let the codebase grow organically. The defining trait is that you are not really reviewing or understanding the code; you are steering by whether the thing seems to work. For a throwaway script, a hackathon entry, a quick personal tool or a UI experiment, that is a perfectly rational trade: speed over rigour when nothing is at stake. - Coined by Andrej Karpathy, February 2025; named a word of the year by Collins. - You accept the diff without reading it and paste errors back for the agent to resolve. - You steer by "does it run", not by understanding the code. - Genuinely great for prototypes, demos, scripts and learning by playing. #### What agentic engineering actually means Agentic engineering uses the identical agents, but refuses to skip the steps that make software trustworthy. You define the goal precisely, prepare the context and tools the agent needs, let it explore and propose a plan, then you read that plan, read the diff it produces, keep your tests and type checks green, and decide what ships. The human stays the engineer of record. Where vibe coding optimises for speed, agentic engineering optimises for software you can put in front of real users and maintain six months later. Our pillar guide breaks the discipline down in full; the short version is that the agent does the typing and you do the directing and the verifying. - You own goal definition, context preparation and verification; the agent does the rest. - You read the plan and the diff, and a quality gate (lint, types, tests) must pass. - You understand what shipped, so you can extend and debug it later. - Built for production and longevity, not just for getting something on screen. #### An honest side-by-side The two are not rivals so much as the same tool used with different discipline. Naming the contrast plainly makes it easy to know which mode you are in, and to switch on purpose rather than by accident. - Review: vibe coding skips the diff; agentic engineering reads every diff that matters. - Tests: vibe coding rarely runs them; agentic engineering treats a green gate as non-negotiable. - Understanding: vibe coding forgets the code exists; agentic engineering keeps you able to explain it. - Best fit: vibe coding for prototypes and disposable code; agentic engineering for anything real or shared. - Risk: vibe coding can ship silent bugs and security holes; agentic engineering catches them at the gate. #### When vibe coding is completely fine Treating vibe coding as always wrong is its own mistake. When the stakes are low and the lifespan is short, the rigour of full agentic engineering is wasted effort. Vibe a prototype to test whether an idea is worth building at all. Vibe a one-off data-cleaning script you will run once and delete. Vibe a UI sketch to feel out three layouts before you commit to one. Vibe while you are learning, where breaking things fast is how you understand them. The trap is not vibe coding itself; it is vibe coding something that quietly graduates into production without ever getting the review, tests and understanding that production demands. #### Why agentic engineering scales Vibe coding hits a wall the moment code has to live. Unreviewed output accumulates bugs you did not see, security issues nobody checked for, and architectural drift that makes the next change harder than the last, until the agent itself gets lost in the mess it made. Agentic engineering scales precisely because the discipline compounds in your favour: clean architecture and clear specs make the agent more effective, automated quality gates catch regressions for free, and your own understanding means you can keep extending the system instead of being trapped by it. The same agents that produce a fragile vibe-coded pile produce a maintainable codebase when you direct and verify them. That is the whole bet of agentic engineering, and the Agentic Engineering Roadmap shows how to build the habit. - Unreviewed code compounds into bugs, security holes and architectural drift. - Quality gates and clean specs make every later change cheaper, not costlier. - Understanding your own system lets you keep shipping instead of getting stuck. - Promote a vibe-coded prototype the moment it matters: add review, tests and structure. ### The Agentic Engineering Roadmap (Zero to Shipping) - Canonical URL: https://agenticschool.dev/guides/agentic-engineering-roadmap This is the roadmap for learning agentic engineering from zero to shipping: a sequenced path that takes you from never having run a coding agent to building and deploying production software you direct and verify. Agentic engineering is the discipline of building real software by directing AI agents while you own the goal, the context and the quality, and it is learned by doing, stage by stage. Below is the exact order to go through it, mapped to the five courses of the campus and the deep-dive guides for each step, so you are never guessing what to learn next. Work top to bottom; each stage assumes the one before it. Everything here is current as of June 2026, and it pairs with our pillar, What Is Agentic Engineering. #### How to use this roadmap Do not try to learn everything at once. The fastest way to ruin the journey is to read about subagents and MCP before you have shipped a single thing. Go in order: get one app live, then deepen your command of the agent, then learn the production stack, then automation, then the quality and agent-first practices that make your work trustworthy. Each stage below names the course that teaches it and the guides that go deeper on the trickiest parts. Build something real at every stage rather than collecting tutorials, because the loop only clicks when you have felt it on your own project. - Go in order; each stage builds on the last. - Ship something real at every stage, not just read. - Use the linked course for the path and the linked guides for the deep dives. - Refer to the glossary whenever a term is new. #### Stage 1: Foundations - ship your first app Start here even if you have never opened a terminal. The Foundations course takes you from understanding how LLMs and coding agents actually work, through installing Claude Code and Codex, prompting them well, scaffolding a project, and shipping a real website to the public internet. By the end you have done the whole arc once: an idea becomes a deployed app. The companion guide here is How to Use Claude Code, which gets you productive in your first session, and the underlying model and harness vocabulary is in the glossary if you want the definitions. - Course: Foundations - From Zero to Your First Shipped App. - Guide: How to Use Claude Code (your first productive session). - Outcome: one real website deployed to the internet, the full arc done once. - Glossary: AI agent, agent harness, agentic AI. #### Stage 2: Claude Code Mastery - direct the agent well Once you can ship, deepen your command of the harness so the agent becomes a reliable teammate rather than a slot machine. The Claude Code Mastery course covers teaching it your rules with CLAUDE.md, packaging workflows as skills and commands, automating quality gates with hooks, connecting tools with MCP, running multi-agent workflows with subagents, and managing context without burning money. This is where the explore-plan-implement-verify loop becomes second nature. Lean on the Claude Code cluster of guides for each feature: subagents, hooks, skills and commands, and MCP setup. - Course: Claude Code Mastery - Becoming a Power User. - Guides: Claude Code Subagents, Hooks, Skills and Slash Commands, and MCP setup. - Outcome: the explore-plan-implement-verify loop becomes a habit, not an effort. - Glossary: subagent, system prompt, context window, prompt caching. #### Stage 3: The Modern App Stack - build something real Now build a product, not just a page. The Modern App Stack course wires up how real apps fit together: authentication and OAuth with Clerk, a reactive database with Convex, safe secret handling, payments with Stripe, and the migration from development to production. This is the stage where agentic engineering stops being a coding trick and becomes how you assemble a working SaaS. You direct the agent across a multi-service codebase while keeping the architecture clean enough that it (and you) can navigate it. - Course: The Modern App Stack - Auth, Data and Payments. - Outcome: a production-shaped app with auth, a database and payments. - Skill focus: directing an agent across a real, multi-service codebase. - Glossary: AI IDE, tool calling. #### Stage 4: Automation and Agentic Systems - make it run itself With a product under your belt, move from single tasks to systems. The Automation and Agentic Systems course compares n8n, Zapier and Trigger.dev, automates the browser with Playwright, runs code safely in sandboxes, builds your own AI tools on top of APIs, and designs human-in-the-loop systems with the right level of autonomy. This is where you build the workflows and tools that keep working when you step away, and where the five levels of LLM autonomy become a practical design choice rather than a concept. - Course: Automation and Agentic Systems. - Outcome: workflows and tools that run without you babysitting them. - Skill focus: choosing the right autonomy level and keeping a human in the loop. - Glossary: workflow automation, tool calling. #### Stage 5: Quality, Security and Agent-First - ship it for real The final stage makes your work production-grade and discoverable by both humans and AI. The Quality, Security and the Agent-First Business course covers tests and CI/CD, security and privacy, getting found through SEO and GEO/AEO, designing agent-first products whose APIs other agents love to use, and a capstone where you build and ship a complete agentic product end to end. Finish this and you are not learning agentic engineering any more; you are practising it. From here, keep shipping, and revisit the deep-dive guides whenever a project pushes you into new territory. - Course: Quality, Security and the Agent-First Business. - Outcome: a complete agentic product, shipped and findable by humans and AI. - Skill focus: tests, security, GEO/AEO and agent-first API design. - Glossary: GEO, AEO, llms.txt. ### What Is an MCP Server? (and How to Build One) - Canonical URL: https://agenticschool.dev/guides/what-is-an-mcp-server An MCP server is a small program that exposes tools, data and prompt templates to an AI agent through the Model Context Protocol, the open standard that lets any MCP-aware app connect to it. Instead of writing a custom integration for every model and every tool, you implement one MCP server for your service and every MCP client (Claude Code, Cursor, Claude Desktop and more) can use it. The agent can then call your server to run actions, read your data and reuse your prompts, without you copying anything into chat. This guide explains exactly what an MCP server is, how the host-client-server architecture works, the three things a server exposes, how to build a minimal one in TypeScript, and the common servers worth knowing. Everything here is current as of June 2026; to wire one into your own agent, see our How to Set Up MCP in Claude Code guide. #### What an MCP server is MCP (Model Context Protocol) is an open standard that connects AI agents to external tools, data and services through one common interface. An MCP server is the piece you (or a vendor) write that sits in front of a specific capability: a database, a browser, an issue tracker, a filesystem, an API. It advertises what it can do, and any MCP client can discover and use it. The reason MCP spread so fast is that it solves an N-times-M problem: before MCP, every tool needed a bespoke integration for every agent; with MCP, a tool is integrated once and works everywhere. So today there is an MCP server for almost anything you would want an agent to reach. - A server wraps one capability (a database, a browser, an API) and exposes it to agents. - It speaks the Model Context Protocol, so any MCP client can use it without custom glue. - Write the integration once; every MCP-aware app benefits. This is why support spread fast. - See the glossary entry on MCP for the short definition and tool calling for how agents invoke it. #### How it works: host, client and server MCP uses a client-server architecture built on JSON-RPC 2.0, with three participants. The host is the AI application you use, such as Claude Code or Cursor. When the host starts, it creates one MCP client for each configured server, and each client holds its own dedicated, stateful connection to one MCP server. The server is the program exposing the capability. Messages flow as JSON-RPC requests, responses and notifications over a transport. There are two transports in common use: stdio, where the host spawns the server as a local child process and they talk over standard input and output, and Streamable HTTP, used for remote, cloud-hosted servers. One host can run many client-server connections at once, which is how an agent reaches a filesystem, a browser and a database in the same session. - Host: the AI app (Claude Code, Cursor, Claude Desktop) the user interacts with. - Client: one per server, created by the host, holding a dedicated connection. - Server: your program exposing a capability, speaking JSON-RPC 2.0. - Transport: stdio for local servers, Streamable HTTP for remote ones (SSE is deprecated). #### What a server exposes: tools, resources and prompts An MCP server can expose three kinds of primitive, and a given server implements whichever ones make sense for it. Tools are actions the model can invoke, like "run this query" or "create this issue"; each tool declares an input schema so the agent knows how to call it. Resources are read-only data the agent can fetch, like a file's contents or a database record, addressed by URI. Prompts are reusable templates the server offers, often surfaced to the user as slash commands or quick actions. Tools are by far the most common, because the headline value of MCP is letting an agent act on a system, not just read it. - Tools: actions the agent can call (query a DB, open a PR, drive a browser). Each has an input schema. - Resources: read-only data the agent can fetch by URI (file contents, records). - Prompts: reusable templates the server provides, often shown as slash commands. - A server declares which primitives it supports; most lead with tools. #### Build a minimal MCP server The fastest way to understand a server is to build a tiny one. The official TypeScript SDK ships as the @modelcontextprotocol/server package and lets you stand up a working stdio server in a few lines: create an McpServer, register a tool with a name, a description and a Zod input schema, then connect it over a StdioServerTransport. The example below exposes one greet tool. Save it, point your MCP client at the command that runs it (for Claude Code, that is claude mcp add --transport stdio), and the agent can call greet. A Python SDK (FastMCP) offers the same shape if you prefer Python. ```typescript // server.ts - a minimal MCP server exposing one tool over stdio import { McpServer } from '@modelcontextprotocol/server' import { StdioServerTransport } from '@modelcontextprotocol/server/stdio' import * as z from 'zod' const server = new McpServer({ name: 'greeting-server', version: '1.0.0' }) // Register a tool: a name, a description, and a Zod input schema. server.registerTool( 'greet', { description: 'Greet someone by name', inputSchema: z.object({ name: z.string() }), }, async ({ name }) => ({ content: [{ type: 'text', text: `Hello, ${name}!` }], }), ) // Connect over stdio: the host spawns this file and talks via stdin/stdout. const transport = new StdioServerTransport() await server.connect(transport) ``` A minimal MCP server in TypeScript with one tool over stdio, using the official @modelcontextprotocol/server SDK. Keep server.ts free of console.log on stdout: a stdio server uses stdout for the JSON-RPC protocol, so log to stderr instead. From here you add more tools, resources and prompts the same way, and switch to Streamable HTTP when you want to host the server remotely. #### MCP server vs a plain API A fair question is why an agent needs an MCP server when the underlying service already has a REST API. The difference is discovery and self-description. A REST API needs a human to read its docs and write integration code for each client; an MCP server advertises its tools, their schemas and their descriptions in a standard the agent reads at connect time, so the agent learns how to use it automatically and the same server works across every MCP client. MCP does not replace your API; it is a thin, agent-friendly layer in front of it. For internal one-offs, a CLI the agent can call may be simpler, but for anything you want many agents and apps to use, an MCP server is the interoperable choice. - A REST API needs per-client integration code; an MCP server is self-describing and reused everywhere. - The agent reads tool schemas and descriptions at connect time, so it knows how to call them. - MCP sits in front of your API as an agent-friendly layer, it does not replace it. - For a private one-off, a CLI may be simpler; for shared use, a server wins. #### Common servers worth knowing You rarely have to build a server from scratch, because the ecosystem already covers the common cases. Good first connections are the filesystem server (scoped to a folder so the agent can read and write project files), the Playwright server (so the agent can drive a real browser to test or scrape), and a database server for your stack. Beyond those, there are servers for issue trackers, design tools, monitoring and most major SaaS products. Connect deliberately, though: every server adds its tool definitions to the agent context, so each one has an ongoing cost, and a server is third-party code with access to your systems, so vet it like any dependency. Our Claude Code MCP setup guide walks through adding, scoping and verifying servers safely. - Filesystem server: read and write files in a scoped folder. - Playwright server: let the agent drive a real browser. - Database server: query your data directly instead of pasting rows into chat. - Connect few and vet each: every server costs context and is third-party code. ### How to Build an AI Agent from Scratch - Canonical URL: https://agenticschool.dev/guides/how-to-build-an-ai-agent An AI agent is a language model wrapped in a loop that can call tools, read the results and decide what to do next, repeating until a goal is reached. Building one from scratch is far simpler than the hype suggests: at its core it is a while loop around a model that supports tool calling, where you hand the model a goal and a set of functions, it asks to run one, you run it, you feed the result back, and it goes again until it is done. This guide takes you from the definition to a concrete, minimal agent you can run today, then up through the levels of autonomy and what changes when you put an agent in production. We will build the loop by hand first so you understand exactly what is happening, then point you at the SDKs that do this for you. Everything here is current as of June 2026. #### What an AI agent actually is An AI agent is software that uses a language model to decide and act in a loop, rather than just answering once. The model is the brain, but a brain with no hands cannot do anything, so you give it tools: functions it can call to read a file, query a database, search the web or hit an API. The agent runs a loop: the model receives the goal and the list of available tools, it either answers or asks to call a tool, your code runs that tool and returns the result, and the model uses that result to decide its next move. That loop is the whole idea. "Agentic AI" is the broader term for systems built this way; an "AI agent" is one such system. For the precise definitions, see the glossary entries on AI agent, agentic AI, tool calling and the agent harness. - Model: the reasoning core that decides what to do (an LLM that supports tool calling). - Tools: functions the model can call to act on the world, each with a name, a description and an input schema. - Loop: model decides, your code runs the chosen tool, the result goes back, repeat until done. - See the glossary: AI agent, agentic AI, tool calling, agent harness for the formal definitions. #### The build loop, step by step Every agent, from a ten-line script to Claude Code, runs the same loop. You send the model the conversation so far plus the tool definitions. The model replies in one of two ways: with a final answer (it is done), or with a request to call one or more tools. If it asks for a tool, your code executes that tool, captures the output, appends it to the conversation as a tool result, and sends everything back. The model reads the result and decides again. You keep looping until the model returns a final answer or you hit a safety limit on iterations. The two non-negotiable guardrails are a maximum number of turns, so a confused agent cannot loop forever, and validation of tool inputs, because the model is asking you to run real code with arguments it chose. - Send the goal, conversation history and tool definitions to the model. - If the model returns a final answer, stop and return it. - If it requests a tool, validate the input, run the tool, append the result, and loop. - Always cap the number of iterations and validate tool arguments before executing. #### A minimal agent you can build Here is the smallest agent that does something real: a model with one tool (a calculator) running the tool-calling loop by hand against the Anthropic Messages API. The pattern is identical for any provider that supports tool calling. The model gets the question and the tool definition; when it replies with stop_reason "tool_use", we run the tool, send back a tool_result, and loop until it gives a plain text answer. Read it once and the magic disappears: an agent is a loop, a model and a dictionary of functions. ```python # pip install anthropic # A minimal agent: one tool, the tool-calling loop by hand. import anthropic client = anthropic.Anthropic() # reads ANTHROPIC_API_KEY from the env # 1) Define the tools: a name, a description, and an input schema. tools = [ { "name": "calculator", "description": "Evaluate a basic arithmetic expression.", "input_schema": { "type": "object", "properties": {"expression": {"type": "string"}}, "required": ["expression"], }, } ] # 2) Map tool names to the real functions that run them. def calculator(expression: str) -> str: # Real code: validate hard. A toy eval is fine only for a demo. allowed = set("0123456789+-*/(). ") if not set(expression) <= allowed: return "error: invalid characters" return str(eval(expression)) # demo only; never eval untrusted input in prod TOOLS = {"calculator": calculator} # 3) The loop. def run_agent(goal: str, max_turns: int = 8) -> str: messages = [{"role": "user", "content": goal}] for _ in range(max_turns): resp = client.messages.create( model="claude-sonnet-4-5", max_tokens=1024, tools=tools, messages=messages, ) if resp.stop_reason != "tool_use": return "".join(b.text for b in resp.content if b.type == "text") messages.append({"role": "assistant", "content": resp.content}) results = [] for block in resp.content: if block.type == "tool_use": out = TOOLS[block.name](**block.input) results.append({ "type": "tool_result", "tool_use_id": block.id, "content": out, }) messages.append({"role": "user", "content": results}) return "stopped: hit the turn limit" print(run_agent("What is 4321 * 1234, then add 99?")) ``` A complete minimal agent in Python: one tool, the model-plus-tool-calling loop by hand. The same shape works with any tool-calling model. That is genuinely all an agent is. To make it useful you add more tools (read a file, call your API, query a database), give each a precise description so the model knows when to use it, and harden the execution path. The eval in the calculator is for the demo only; never run model-chosen code or expressions without strict validation or a sandbox. #### Use a framework once you understand the loop Building the loop by hand once is the best way to understand agents, but in production you reach for a framework that handles the loop, retries, streaming, sessions and permissions for you. In 2026 the two most direct paths are the Claude Agent SDK, which exposes the same agent loop, tool set and context management that power Claude Code (install @anthropic-ai/claude-agent-sdk for TypeScript or claude-agent-sdk for Python), and the OpenAI Agents SDK, a lightweight Python and TypeScript framework that turns any function into a tool with automatic schema generation (pip install openai-agents). Both give you tool calling, multi-step loops, human-in-the-loop checkpoints, subagents and first-class MCP support out of the box. The principle is the same one you just built; the SDK just removes the plumbing. - Claude Agent SDK: the same loop and tools that run Claude Code, programmable in Python and TypeScript, with built-in MCP and subagents. - OpenAI Agents SDK: a lightweight multi-agent framework that turns any function into a validated tool (pip install openai-agents). - Both handle the loop, retries, streaming, sessions and permissions you would otherwise write by hand. - Connect external tools through MCP rather than bespoke glue; see What Is an MCP Server. #### The levels of autonomy Not every agent should be fully autonomous, and choosing the right level is a design decision, not a default. Think of a ladder. At the bottom the model only suggests and a human does everything. One rung up it drafts and a human approves each action. Higher, it acts autonomously on low-risk steps but pauses for approval on anything sensitive (a human-in-the-loop checkpoint). At the top it runs an entire workflow unattended. The right level depends on the cost of a mistake: the more an error hurts, the more human oversight you keep. Most reliable production agents sit in the middle, fully autonomous on safe, reversible actions and gated on the rest. The Automation and Agentic Systems course covers this as the five levels of LLM autonomy. - Suggest only: the agent proposes, a human does everything. Lowest risk, lowest leverage. - Draft and approve: the agent prepares the action, a human confirms before it runs. - Autonomous with checkpoints: it acts on safe steps and pauses for approval on risky ones. - Fully unattended: it runs the whole workflow alone; reserve this for low-stakes, reversible tasks. #### Productionizing your agent A demo agent and a production agent differ in everything around the loop. The model and the tools are the easy part; reliability is the work. Validate every tool input, because the model is choosing the arguments. Run anything that executes code or touches the outside world in a sandbox with timeouts and resource limits, never on a machine you care about. Log every step (the goal, each tool call, each result) so you can see what the agent did and debug it when it goes sideways. Cap iterations and cost so a confused agent cannot loop forever or run up a bill. And keep a human in the loop for irreversible or sensitive actions. These are the same lessons the founder builds learned the hard way: CallAssistant gave its voice agent tightly defined tools because there is no "are you sure?" on a phone call, and CodeCourier ran untrusted code only inside a disposable sandbox. - Validate tool inputs and run code-executing tools in a sandbox with timeouts and limits. - Log the goal, every tool call and every result so the agent is observable and debuggable. - Cap iterations and spend so a runaway loop cannot cost you time or money. - Gate irreversible or sensitive actions behind a human-in-the-loop approval step. - Learn from real builds: CallAssistant (tight tools) and CodeCourier (sandboxing) on the Builds page. ### How to Build a SaaS with AI (Modern Stack) - Canonical URL: https://agenticschool.dev/guides/build-a-saas-with-ai You can build and ship a real SaaS in 2026 by directing an AI coding agent across a small, proven stack, and this pillar guide is the map for doing it end to end: from idea to scaffold, to authentication, to a database, to payments, to going live. The work is no longer typing every line; it is choosing the right pieces, wiring them together cleanly, and verifying what the agent builds. The stack we teach is the one a solo founder can actually ship and maintain: a modern framework for the app, Clerk for auth, Convex for the database, Stripe for payments, and an agent like Claude Code driving the build. This guide gives you the full sequence and links the deep dives, the comparisons behind each choice, and the course that walks each stage in detail. Everything here is current as of June 2026. #### The modern SaaS stack in one picture A SaaS is the same three layers as any app (a frontend the user sees, a backend that holds logic and secrets, and a database that stores data) plus the few services that turn an app into a business: authentication so users have accounts, a database that syncs in real time, and payments so you can charge. The 2026 default for a solo founder or small team is a framework for the app, Clerk for auth, Convex for the reactive database, and Stripe for payments, all glued together by a coding agent you direct. You do not have to use these exact tools, but you do need one good choice in each slot. Our cornerstone article, Modern App Stack Explained, lays out the full picture; this guide is the build path through it. - App framework: renders the UI and serves the app (Next.js, TanStack Start, Astro and others). - Auth: user accounts, login and OAuth, handled by a provider so you never store passwords yourself. - Database: where your data lives and syncs; Convex is reactive so the UI updates live. - Payments: Stripe for checkout, subscriptions and webhooks. See the article Modern App Stack Explained for the whole map. #### Step 1: idea to scaffold Start by narrowing the idea to one job your product does well, then let your agent scaffold the project. The discipline that matters here is scope: a SaaS that does one thing clearly beats one that does ten things vaguely, exactly as the favicon-maker build learned. Have your coding agent set up the framework, version control and a dev server, and get a blank app running locally before you add anything. Write a CLAUDE.md with your stack and conventions so the agent builds on your rules from the first commit. The goal of this stage is a running skeleton you can deploy, not a feature, because shipping the empty shell early de-risks everything that follows. - Narrow to one core job; resist feature creep before you have shipped anything. - Let the agent scaffold the framework, Git and a dev server, and run it locally. - Add a CLAUDE.md so the agent follows your stack and conventions from the start. - Course 3 lesson: Architecture 101 explains how the pieces fit before you wire them. #### Step 2: add authentication with Clerk Almost every SaaS needs accounts, and you should never build login from scratch: rolling your own auth is how secrets leak and sessions break. An auth provider handles signup, login, OAuth (sign in with Google and others), sessions and security for you, so you wire in a few components and protect your routes. Clerk is the default we teach because it is fast to integrate and pairs cleanly with this stack, but Auth0 and Supabase Auth are valid choices depending on your needs. We break down that decision in the Clerk vs Auth0 vs Supabase Auth comparison; whichever you pick, the principle is the same: delegate the security-critical part to specialists. The Modern App Stack course walks the full Clerk integration including Google OAuth and the dev-to-production move. - Never roll your own auth; a provider handles login, OAuth, sessions and security. - Clerk is the default here; weigh it against Auth0 and Supabase Auth in our comparison. - Wire in the auth components, then protect the routes and data that need a signed-in user. - Course 3 lesson: Clerk authentication and OAuth from dev to production. #### Step 3: model your data in Convex Your SaaS needs somewhere to store data that survives a refresh and is shared across users, and the modern choice for this stack is Convex: a reactive database where your queries update the UI automatically when the underlying data changes, so you write far less synchronisation code. You define your schema, write functions that read and write data, and the frontend subscribes to live results. Convex is not the only option (Supabase and Firebase are strong alternatives with different trade-offs), and we lay that decision out in the Convex vs Supabase vs Firebase comparison. The reason reactive data matters for a solo founder is leverage: the database doing the live updates for you is one less system you have to build and debug. The Modern App Stack course covers modelling data and writing Convex functions in full. - A database stores data permanently and shares it across users and sessions. - Convex is reactive: queries update the UI live, so you write less sync code. - Compare it against Supabase and Firebase in our backend comparison before you commit. - Course 3 lesson: Convex, your reactive database. Design the API and schema before the UI, as the BizCollect build learned. #### Step 4: charge customers with Stripe A SaaS is a business, so it needs payments, and Stripe is the standard for checkout, subscriptions and the webhooks that keep your app in sync with what a customer actually paid. The flow has two halves. First, checkout and subscriptions: you create products and prices, send the customer to Stripe to pay, and bring them back. Second, the webhooks: Stripe tells your backend when a payment succeeds, a subscription renews or a card fails, and your code updates the user record accordingly. The trap beginners hit is treating the redirect as the source of truth; the webhook is. Always handle secrets safely and keep your Stripe keys out of frontend code and out of Git. The Modern App Stack course covers Stripe in two lessons: checkout and subscriptions, then webhooks, proration and coupons. - Stripe handles checkout, subscriptions and the billing edge cases you should not build yourself. - Webhooks, not the redirect, are the source of truth for what a customer paid. - Keep Stripe secret keys in the backend and out of version control. - Course 3 lessons: Stripe Part 1 (checkout, subscriptions) and Stripe Part 2 (webhooks, proration, coupons). #### Step 5: ship it The last stage is the one beginners put off and should do early: getting the app from your machine to the public internet, in production, with real keys. That means moving each service (auth, database, payments) from its development mode to production, handling environment variables and secrets correctly, setting up your domain, and submitting the site to Search Console so it can be found. Shipping is not a one-time event at the end; it is a loop you should run from the empty scaffold onward, so going live is boring rather than terrifying. The Modern App Stack course closes with exactly this: the dev-to-production migration, Search Console and performance. Once it is live, you direct the agent to add features the same way you built the skeleton, verifying each change before it ships. - Move auth, database and payments from development to production mode with real keys. - Handle secrets and environment variables correctly; never commit keys. - Set up your domain and submit the site to Search Console so it gets found. - Course 3 lesson: Going live, the dev-to-prod migration, Search Console and performance. #### Direct the agent, do not just vibe it Building a SaaS with AI is the clearest case for agentic engineering over vibe coding. A weekend prototype can be vibed, but a product with real users, their data and their money cannot: a silent bug in your Stripe webhook or an unprotected Convex query is the kind of mistake that costs trust or cash. So you direct the agent and verify its work: plan before it edits, read the diffs on anything touching auth, data or payments, and keep a quality gate (lint, types, tests) green. The same agents that produce a fragile pile when you accept everything unchecked produce a maintainable SaaS when you stay the engineer of record. Real products are built this way; the CallAssistant build is a working SaaS shipped on this exact discipline, and the What Is Agentic Engineering pillar explains the mindset in full. - Vibe a prototype; direct and verify anything with real users, data or payments. - Plan before editing and read every diff that touches auth, data or money. - Keep a quality gate green so a regression cannot slip into production. - See the CallAssistant build for a shipped SaaS and the What Is Agentic Engineering pillar for the discipline. ### AI Automation for Business: A Practical Playbook - Canonical URL: https://agenticschool.dev/guides/ai-automation-for-business AI automation for business means using AI and workflow tools to take repetitive, rules-based work off your team so they spend their time on what actually needs a human. Done well it is one of the highest-return things a small business can do in 2026, and done badly it is an expensive science project. This playbook is the practical version: how to pick the first task to automate, when to buy a tool versus build your own, how to choose between n8n, Zapier and Make, what real automations look like, and how to measure the return honestly so you keep doing the ones that pay and kill the ones that do not. It is written for the person deciding where automation goes, not just the person building it, and it pairs with the Automation and Agentic Systems course. Everything here is current as of June 2026. #### Where to start: pick the right first task The biggest mistake is starting with the most exciting task instead of the most automatable one. The best first candidate is boring: a task that happens often, follows the same steps every time, has a clear trigger and a clear output, and currently eats real hours. Automating something rare or judgement-heavy is hard and low-return; automating a frequent, rules-based chore is easy and pays back fast. Look for the work people quietly skip or do inconsistently because it is tedious, the same instinct behind the s2p build, which automated posting every release to every channel because the founder kept skipping channels out of laziness. Workflow automation is the umbrella term for connecting services so a trigger fires a chain of steps; start where that shape fits cleanly. - High frequency: it happens daily or many times a week, so saved minutes compound. - Rules-based: the same inputs produce the same steps, with little judgement needed. - Clear trigger and output: something starts it (a form, an email, a timer) and something concrete comes out. - Currently painful: people skip it, do it late, or do it inconsistently. See the s2p build for a real example. #### Build vs buy: the honest decision Most automations should be bought or assembled, not built. If an off-the-shelf tool or a workflow platform already does the job, use it: your time is better spent on the parts of your business no vendor sells. Build your own only when the workflow is core to how you differentiate, when no tool fits without ugly workarounds, or when the per-use cost of a hosted tool grows faster than the cost of owning it. A good sequence is to prototype with the fastest option to prove the workflow is worth automating at all, then move the proven, high-volume ones to something you control. The Automation and Agentic Systems course frames the same rule: before you build a custom app, ask whether a workflow tool can do it in an afternoon. - Buy or assemble by default; build only when the workflow is core or no tool fits. - Prototype fast to validate the workflow is worth automating before investing in it. - Move proven, high-volume workflows to something you control as cost or complexity grows. - Course 4 lesson: Workflow Automation explains when a custom build beats a platform. #### Choosing a platform: n8n vs Zapier vs Make The three platforms that matter for business automation differ most in how they bill and how much control you get, and that difference drives the cost at volume. Zapier bills per task, where every step that runs counts, so it is the easiest to start with and the most expensive as volume and step-count grow. Make bills per operation (each action), sitting in the middle. n8n bills per workflow execution, counting one whole run regardless of how many steps it has, and it is open-source and self-hostable, so a busy multi-step workflow can be dramatically cheaper to run on n8n than on a per-task plan. The practical rule: prototype in Zapier for its huge integration catalogue, then graduate high-volume or many-step workflows to self-hosted n8n. Our n8n vs Zapier vs Make comparison breaks down the trade-offs in full. - Zapier: easiest to start, the widest integration catalogue, billed per task (each step counts). Great at low volume, costly at high. - Make: visual and capable, billed per operation (each action). A middle ground on cost. - n8n: open-source and self-hostable, billed per workflow execution (a whole run is one), so many-step workflows stay cheap. Best for volume and control. - Rule of thumb: prototype in Zapier, graduate high-volume glue to self-hosted n8n. See the full comparison. #### Real workflow examples Automation gets concrete fast once you see real shapes. A release-to-social pipeline turns one published change into formatted posts across every channel automatically, keeping them consistent and never skipped, which is exactly the s2p build. Email automation drafts, sends and follows up while keeping a human approval gate for anything outside the known-safe cases, so it saves time without sounding like a bot, as the AutoMail build does. Document-to-data automation reads a messy invoice photo and writes a clean database row, flagging the low-confidence fields for a human, which is the invoice-automation build. The common thread is not the AI; it is reliable plumbing around a clear trigger and a clear output, with a human watching the edges. - Release to social: one source of truth becomes consistent posts everywhere (the s2p build). - Email with a human gate: draft and follow up automatically, approve the uncertain cases (the AutoMail build). - Document to data: read an invoice photo into a validated database row, flag low-confidence fields (the invoice-automation build). - The AI step is the easy part; reliable triggers, retries and idempotency are where automation succeeds or fails. #### Keep a human in the loop The fastest way to lose trust in automation is to hand it full autonomy on day one and watch it confidently do the wrong thing. Automation earns trust gradually: start with the system drafting and a human approving, promote a category to full auto only once it has proven itself, and keep a human gate on anything irreversible or customer-facing. This is not a failure of automation; it is what makes automation safe to rely on, and it is the pattern behind every dependable system the founder builds describe. Match the autonomy to the cost of a mistake. A misposted tweet is recoverable; a wrong invoice near someone money is not. The Automation and Agentic Systems course covers human-in-the-loop design and the five levels of autonomy as a deliberate choice. - Start with draft-and-approve; widen autonomy only as each category earns it. - Keep a human gate on anything irreversible, financial or customer-facing. - Match autonomy to the cost of a mistake, not to how impressive full automation feels. - Course 4 lessons: Human in the Loop and the 5 Levels of LLM Autonomy. #### Measuring the ROI honestly Automation only deserves to survive if it pays, and that means measuring it before and after rather than assuming. The return is simple to reason about: estimate the hours the task takes now and the loaded cost of those hours, subtract the time the automation still needs (oversight, fixes, the occasional manual exception) and the cost of the tools, and compare that to the setup effort to get a payback period. The honest version also counts the qualitative wins automation delivers that a stopwatch misses: consistency (the channel that never gets skipped), fewer errors, and faster turnaround. Track a couple of real numbers per automation, kill the ones that do not pay back, and double down on the ones that do. Resist counting fantasy savings; an automation that needs constant babysitting is not saving the hours you think it is. - Time saved = hours the task took, minus the time the automation still needs (oversight, exceptions). - Money saved = time saved times the loaded hourly cost, minus tool costs. - Payback period = setup effort divided by monthly savings; below a few months is usually a clear win. - Count consistency and fewer errors too, but never count savings an automation does not actually deliver. ### Prompt Patterns for Coding Agents - Canonical URL: https://agenticschool.dev/guides/prompt-patterns-for-coding-agents A coding agent is only as good as the brief you give it, and the difference between a frustrating session and a reliable one is rarely the model; it is the prompt. Prompting a coding agent is not the same as chatting with a chatbot: the agent reads your files, runs commands and acts in a loop, so a good prompt sets the role, defines what "done" looks like, shows the shape of a good answer, breaks big work into checkable steps, and tells the agent how to verify its own output. This guide collects the prompt patterns that consistently work with coding agents like Claude Code and Codex CLI, with concrete examples you can adapt, and the anti-patterns that quietly waste your time and tokens. Everything here is current as of June 2026 and pairs with the Foundations prompt-engineering lesson. #### Why prompting a coding agent is different A chatbot answers once; a coding agent works in a loop, reading your repository, editing files, running your tests and reacting to the results. That changes what a good prompt has to do. You are not asking for a snippet, you are briefing a teammate who will take real actions in your codebase, so the prompt has to carry intent (what you want and why), constraints (your stack, conventions and what not to touch), and a definition of done the agent can check itself against. The single biggest lever is moving the standing rules out of the prompt entirely: put your stack, conventions and quality gate in a CLAUDE.md or AGENTS.md so every prompt starts from your rules, and the prompt itself only has to carry the task. The patterns below assume that foundation and focus on the per-task brief. - The agent acts: it edits files and runs commands, so vague intent becomes wrong changes, not just a wrong sentence. - Standing rules belong in CLAUDE.md or AGENTS.md, not re-typed every prompt; see How to Use Claude Code. - A good per-task prompt carries intent, constraints and a checkable definition of done. - Context is finite: a focused prompt leaves room for the agent to read code and think. See the glossary on the context window. #### Pattern 1: role and spec The most reliable opening is to state the role the agent should adopt and then specify the task as a tight, testable spec rather than a wish. A spec names the goal, the constraints, the files or area in scope, and the acceptance criteria that decide when it is done. "Make the login better" is a wish; the version below is a spec. The role line focuses the model (a senior engineer reviews differently than a tutor), and the acceptance criteria give the agent something concrete to verify against, which is what stops it from declaring victory early. Keep the spec to what matters: over-specifying every line removes the agent's ability to make good local decisions. ```markdown Role: act as a senior backend engineer on this codebase. Goal: add rate limiting to the public /api/contact endpoint. Constraints: - Use the existing Convex rate-limit helper; do not add a new dependency. - Limit to 5 requests per minute per IP. Return 429 with a clear JSON body. - Touch only the contact route and its test; leave other endpoints alone. Done when: - A new test proves the 6th request in a minute gets a 429. - `bun run lint && bun run typecheck && bun run test` all pass. Plan first (do not edit yet), show me the plan, and wait for my go-ahead. ``` The role-and-spec pattern: a role, a goal, explicit constraints, and acceptance criteria the agent can verify against. #### Pattern 2: show an example (few-shot) Models match patterns, so the fastest way to get output in the shape you want is to show one. When you need a new file, component, test or migration that should look like the ones you already have, point the agent at an existing example and ask it to follow the same structure: "Write the new endpoint the same way as src/api/users.ts, including its test file." This few-shot pattern works because your codebase is the best style guide you have, and it beats describing your conventions in prose. For output you cannot point at, give a tiny inline example of the format you expect (a sample log line, a JSON shape, a commit-message style) so the agent has a target to match. - Point at an existing file: "Follow the structure of src/api/users.ts and its test." - Your codebase is your style guide; an example beats a paragraph of conventions. - For new formats, inline a tiny sample of the exact output you want (a log line, a JSON shape). - One sharp example usually outperforms several vague instructions. #### Pattern 3: decompose big work into steps Large, vague requests are where agents drift: asked to "build the dashboard" they make a hundred decisions you would have made differently and bury the mistakes in a giant diff. The fix is decomposition. Either break the work into a sequence of small, separately reviewable tasks yourself, or ask the agent to propose a plan first and approve it before any code is written. Plan-first is the highest-leverage habit with coding agents: a wrong direction caught in a plan costs nothing, while the same mistake caught after a thousand-line diff costs real time. In Claude Code this is plan mode (Shift+Tab); in any agent you can simply instruct "plan first, do not edit, show me the steps." Then implement one step, review, and move to the next. - Split a big task into small, separately reviewable steps rather than one mega-prompt. - Ask for a plan before edits; correcting a plan is free, correcting a huge diff is not. - Implement one step, review the diff, then continue; keep each change small enough to read. - Smaller steps also keep the context window clean, so the agent stays sharp across the task. #### Pattern 4: build a verification loop The defining feature of a coding agent is that it can run things, so the best prompts tell it how to check its own work and to keep going until the check passes. Instead of "fix the failing test," say "run the test suite, fix what is red, and run it again; do not stop until it is green." Instead of trusting that a change works, tell the agent to run the type checker, the linter and the tests, and to read the actual output rather than assume. This turns the agent from a one-shot generator into a closed loop that grounds itself in real results. The strongest version bakes the loop into your project so it is not even a prompt: hooks that run your gate automatically (see the Claude Code Hooks guide) mean the agent is held to the standard whether or not you remember to ask. - Ask the agent to run the gate (types, lint, tests) and to read the real output, not assume. - Make it iterate: "keep fixing and re-running until the suite is green," not a single pass. - For UI or behaviour, have it run the dev server or a Playwright check so it sees the real result. - Automate the loop with hooks so verification happens every time, not only when you remember. #### Pattern 5: give the agent the right context, not all of it Agents fail more often from missing context than from a weak model: asked to use an API without its docs, they guess. So bring the agent what it needs, deliberately. Point it at the relevant files, paste the error in full, link the doc, name the exact function. But resist the opposite trap of dumping everything: a context window stuffed with twenty files and a long history degrades quality, because the model loses the important detail in the noise (the "lost in the middle" effect). The skill is curation: enough context to succeed, little enough to stay sharp. This is the heart of context engineering, and our Context Engineering guide goes deep on managing the window across a long task. - Name the exact files, paste the full error, and link the doc the agent needs. - Do not dump the whole repo; relevant context beats maximum context every time. - Watch for "lost in the middle": key detail buried in a huge prompt gets ignored. - See the Context Engineering guide and the glossary on the context window and the system prompt. #### Anti-patterns to avoid Most prompting pain comes from a short list of repeatable mistakes. Naming them makes them easy to catch in your own habits. The cure for every one is a pattern above: be specific, show an example, decompose, verify, and curate context. - Vague wishes: "make it better" gives the agent nothing to verify against. State a spec and acceptance criteria. - The mega-prompt: one huge request that buries a hundred decisions in an unreviewable diff. Decompose and plan first. - Trust without verification: accepting output you did not ask the agent to test. Build a verification loop. - Context dumping: pasting everything so the signal drowns. Curate to what the task needs. - Re-typing rules every turn: stack and conventions belong in CLAUDE.md or AGENTS.md, not each prompt. - Politeness as instruction: "please try to maybe" reads as optional. Be direct; constraints are rules, not suggestions. ### Context Engineering: Managing the Context Window - Canonical URL: https://agenticschool.dev/guides/context-engineering Context engineering is the practice of deliberately managing what an AI agent is holding in its context window right now, so it stays accurate and fast across a long task instead of slowly drifting into confusion. The context window is the model's working memory: the system prompt, your rules, the files it has read, the tools available, the conversation so far. It is finite, and the single most important fact about it is that quality degrades as it fills, not gracefully but with a cliff. Context engineering is how you keep the right things in the window and the wrong things out: through compaction, retrieval, ordering, and prompt caching to control cost. This guide explains what fills the window, why a full window hurts, and the techniques that keep agentic work reliable. Everything here is current as of June 2026 and pairs with the Course 2 context-engineering lesson. #### What the context window actually holds The context window is everything the model can see when it generates its next response, measured in tokens (chunks of text, roughly four characters each). For a coding agent it fills with more than your latest message: the system prompt that defines the agent, your CLAUDE.md or AGENTS.md rules, the definitions of every connected tool and MCP server, every file the agent has read, every command output it has seen, and the entire conversation up to now. All of that competes for the same finite budget. The mental model that matters: context is a scarce resource you spend, and everything you load (a chatty MCP server, a giant file, a long back-and-forth) is budget the actual task no longer has. See the glossary on the context window for the formal definition. - The system prompt and your CLAUDE.md / AGENTS.md rules, reloaded every turn. - Tool and MCP server definitions, which is why connecting many servers is costly. - Every file read and every command output, which accumulates fast during a task. - The full conversation history; long sessions carry their whole past forward. #### Why a full window hurts: the performance cliff It is tempting to think a bigger context window means you can stop worrying, but the opposite is true: model quality degrades well before the window is technically full, and it degrades sharply. As the window fills with files, history and noise, the model has more to attend to and is likelier to lose the thread, contradict an earlier instruction, or forget a constraint from the top of the conversation. This is the "performance cliff," and it is why a 1M-token window does not mean you should pour 1M tokens into it. The practical takeaway is counterintuitive but reliable: a smaller, well-curated context usually outperforms a larger, stuffed one. Context engineering exists precisely to keep you on the good side of that cliff. - Quality drops before the window is full, and the drop is a cliff, not a gentle slope. - A stuffed window makes the model lose threads, contradict itself and drop constraints. - A large maximum context is a ceiling, not a target; do not fill it because you can. - A curated small context beats a bloated large one, the central rule of context engineering. #### Lost in the middle "Lost in the middle" is a well-documented behaviour of language models: they attend most reliably to information at the very start and the very end of their context, and least reliably to information buried in the middle. A crucial instruction or the one relevant fact, dropped into the centre of a long prompt or a long conversation, is the most likely thing to be ignored. The practical consequence shapes how you arrange context. Put the most important instructions and the most relevant material where the model looks: near the top (your standing rules) and near the bottom (the immediate task and the key file). Do not assume that because something is somewhere in the window, the model is using it. Position is leverage. - Models attend best to the start and end of context, worst to the middle. - A key instruction buried mid-prompt is the most likely to be ignored. - Put standing rules near the top and the immediate task and key file near the bottom. - Being in the window is not the same as being used; position determines attention. #### Compaction, handovers and resets When a session runs long, you need ways to shed weight without losing the thread. Three techniques do most of the work. Compaction summarises the conversation so far into a compact form and continues, freeing the window; the catch is that automatic compaction quietly drops details you cared about, so steer it by telling the agent what to preserve before it compacts. A handover ends one session and starts a fresh one with a clean, deliberate summary you write, which gives you a much tidier context than letting one session sprawl for hours. A reset throws away a context that has gone confused and starts over with a tight prompt, which is often faster than trying to argue a derailed agent back on track. Knowing when to reach for each is the practical core of the skill. - Compaction: summarise and continue to free the window; steer it so it keeps what matters. - Handover: end the session and start fresh with a clean summary you control. - Reset: discard a confused context and restart with a tight prompt rather than arguing. - Subagents also help: delegate noisy work so its output never lands in your main window. #### Retrieval: bring in only what is needed The opposite failure mode to a stuffed window is the right information never arriving at all. Retrieval is how you pull in just the relevant piece on demand instead of pre-loading everything. For a coding agent this is mostly concrete and unglamorous: let the agent search the codebase and read only the files a task touches, rather than pasting the whole repo; point it at the one doc page it needs; have it grep for the function instead of loading the directory. The principle behind retrieval-augmented patterns is the same whether it is a vector database or an agent running grep: fetch the specific thing the task needs, when it needs it, so the window holds signal rather than a hopeful pile of maybe-relevant material. - Pull in the specific file, doc or record the task needs, not everything that might be relevant. - Let a coding agent search and read on demand instead of pre-loading the whole repo. - Retrieval keeps the window full of signal, which keeps the model on the good side of the cliff. - The same idea scales up to vector search; the goal is always relevant-on-demand, not everything-just-in-case. #### Prompt caching: control the cost of a big context A large, stable context is expensive because the model re-processes every token of it on every request, and you pay for those input tokens each time. Prompt caching fixes the cost side: you mark a stable prefix (your system prompt, rules, tool definitions, a large reference document) as cacheable, and subsequent requests that begin with the same exact bytes read it from cache instead of recomputing it. On the Claude API the economics in June 2026 are clear: a cache write costs about 1.25x a normal input token for the default five-minute lifetime (or 2x for the one-hour option), and a cache read costs only about 0.1x, a tenth of the price. So a cached prefix pays for itself within a couple of reuses. The cache is a prefix cache, so order matters: put your stable content first and your changing content last, and a single changed token before the breakpoint forces a full re-write. Caching does not reduce how much context the model attends to, only what you pay to send it, so it complements curation rather than replacing it. - Caching reuses the encoded state of a stable prefix so it is not recomputed each request. - On the Claude API (June 2026): cache writes about 1.25x input (5-minute default, 2x for 1-hour), cache reads about 0.1x. - It is a prefix cache: keep stable content first and changing content last, or you force a re-write. - Caching cuts cost, not attention; you still curate the window. See the glossary on prompt caching. #### A practical context-engineering checklist Put the ideas together into habits you run without thinking. None of this requires special tooling; it is discipline about what you load and when you clean up. A future companion is the token and context estimator tool on this campus, which will let you paste text and see how much of a model window it fills before you send it; for now, the rules below carry you. - Keep standing rules in CLAUDE.md or AGENTS.md, and keep that file tight; it loads every turn. - Load the files the task needs, not the whole repo; let the agent retrieve on demand. - Put the most important instruction near the top and the immediate task near the bottom. - Compact, hand over or reset when a session gets long or confused; do not let it sprawl. - Cache large stable prefixes to control cost, keeping stable content first. - Delegate noisy side work to a subagent so its output stays out of your main window. ### GEO vs SEO vs AEO: Getting Found by AI - Canonical URL: https://agenticschool.dev/guides/geo-vs-seo-vs-aeo GEO, SEO and AEO are three overlapping disciplines for getting found, and in 2026 you need all three because discovery now happens in two places: classic search results and AI answers. SEO (search engine optimization) is the long-standing craft of ranking in Google and Bing. AEO (answer engine optimization) is about being the source an answer engine extracts when it gives a direct answer. GEO (generative engine optimization) is about being cited and recommended inside the synthesised responses of generative assistants like ChatGPT, Perplexity and Claude. They share most of their tactics and differ in what they optimise for. This guide defines each precisely, shows how they relate, and gives you a practical checklist (structured data, llms.txt, answer-first content) you can apply today. Everything here is current as of June 2026 and pairs with the Course 5 SEO and GEO/AEO lesson. #### The three terms, defined It is easy to treat these as buzzwords, so pin down what each one actually means. SEO optimises for ranking in traditional search engine results: the blue links on Google and Bing, driven by relevance, authority and crawlability. AEO optimises for the answer layer: getting your content selected as the source when an engine returns a direct answer, a featured snippet, a voice response or an AI answer box, by making a specific fact, definition or recommendation easy to extract. GEO optimises for generative answers: being chosen and cited when an assistant like ChatGPT or Perplexity synthesises a response from multiple sources. The neat way to hold the difference: SEO ranks pages, AEO supplies the answer, GEO earns the citation. See the glossary entries on GEO and AEO for the standalone definitions. - SEO: rank in traditional search results (Google, Bing) through relevance, authority and crawlability. - AEO: be the source extracted for a direct answer (snippets, voice, AI answer boxes). - GEO: be cited and recommended inside a generative assistant's synthesised response. - Shorthand: SEO ranks the page, AEO supplies the answer, GEO earns the citation. #### How they differ, and why they overlap The three are not rivals; they are layers of one strategy. SEO establishes baseline visibility, AEO makes your content extractable for direct answers, and GEO positions it as trusted reference material an assistant will cite. The reason you do not have to chase them separately is that they reward the same underlying thing: clear, well-structured, trustworthy content that a machine can read, understand and trust. The differences are in emphasis. SEO still cares a lot about links and ranking signals. AEO cares about answering a specific question cleanly and early on the page. GEO cares about the signals a model uses as a proxy for credibility: clear attribution, quotable statements, statistics and citations. Optimise the shared foundation well and you serve all three at once; then tune the emphasis per channel. - They are layers of one strategy, not competing tactics; SEO is the base, AEO and GEO build on it. - All three reward the same thing: clear, structured, trustworthy, machine-readable content. - Emphasis differs: SEO leans on authority and links, AEO on clean extractable answers, GEO on citability. - Generative engines treat attribution, quotes, statistics and citations as credibility signals. #### Write answer-first content The biggest single shift for AEO and GEO is structuring content so a machine can lift the answer out cleanly. Lead with the answer: open the page, or a section, by directly answering the question it targets in one or two self-contained sentences, then expand. A heading phrased as the real question ("What is GEO?") followed by an immediate, quotable definition is far more extractable than a paragraph that warms up for three sentences first. Keep claims specific and attributable, because models favour content that reads as credible: cite sources, include real statistics, and use clear, quotable statements. A frequently-asked-questions section is one of the highest-value formats because each question-and-answer pair is a pre-packaged, extractable answer that maps directly onto how people query assistants. - Answer first: open each section by answering its question in one or two self-contained sentences. - Phrase headings as the questions people actually ask, then answer immediately below. - Use a real FAQ section; each Q-and-A pair is a ready-to-extract answer. - Be specific and attributable: statistics, clear quotable claims and cited sources read as credible. #### Add structured data Structured data is markup (usually JSON-LD using the Schema.org vocabulary) that tells machines exactly what your content is, removing the guesswork. It is foundational for all three disciplines because it lets search and answer engines parse your page reliably rather than inferring its meaning. The high-value types for a content site are FAQPage (mark up your FAQ so each question and answer is machine-readable), Article or BlogPosting (declare author, dates and headline), HowTo (mark up step-by-step guides), BreadcrumbList (express site structure), and Organization or Person (establish who is behind the content, which feeds the credibility signals GEO rewards). Get the markup correct and validate it; broken structured data helps nobody. This campus emits exactly these types on its content pages, which is the practical version of the advice. - Use JSON-LD with Schema.org types so machines parse your content instead of guessing. - High-value types: FAQPage, Article/BlogPosting, HowTo, BreadcrumbList, Organization/Person. - Organization and Person markup establishes who is behind the content, feeding GEO credibility signals. - Validate your markup; broken structured data is worse than none. #### Publish an llms.txt and clean Markdown llms.txt is a simple proposed standard: a Markdown file at the root of your site (/llms.txt) that gives AI crawlers a clean, curated map of your most important content, free of the navigation, ads and scripts that clutter a normal HTML page. It is an agent-first courtesy that makes your site cheaper and clearer for a model to understand, which is squarely in the GEO and AEO spirit. Pair it with clean, semantic HTML and, where you can, Markdown versions of your pages, so the content a model reads is the content that matters rather than a soup of markup. The same instinct that makes a site good for assistants (clarity, structure, no clutter) makes it good for human readers too. See the glossary entry on llms.txt for the format. - Publish /llms.txt: a Markdown map of your key content for AI crawlers, without the page clutter. - Serve clean, semantic HTML and Markdown twins where you can, so models read signal not soup. - It is an agent-first courtesy that aligns with the agent-first products approach to building. - See the glossary on llms.txt for the exact format and the llms.txt generator tool when it ships. #### Do not abandon classic SEO In the rush toward AI answers it is easy to forget that the fundamentals still carry most of the weight, and an answer engine often draws on pages that rank well in the first place. So keep doing the SEO basics: clear, compelling title tags and meta descriptions, a sitemap, a clean URL structure, fast pages, and Search Console set up so you can see how you are doing. Authority and trust (E-E-A-T: experience, expertise, authoritativeness, trustworthiness) matter for all three channels, because both Google and the assistants are trying to surface sources they can rely on. The honest framing for 2026 is that GEO and AEO are not a replacement for SEO but an additional layer on top of a solid SEO foundation. Build the foundation, then optimise for the answer and the citation. - Keep the SEO basics: title tags, meta descriptions, sitemaps, clean URLs, fast pages, Search Console. - Assistants often cite pages that already rank, so SEO feeds AEO and GEO rather than competing with them. - E-E-A-T (experience, expertise, authoritativeness, trust) matters across all three channels. - Treat GEO and AEO as a layer on top of solid SEO, not a replacement for it. ### Codex CLI Tutorial: Setup and First Workflow - Canonical URL: https://agenticschool.dev/guides/codex-cli-tutorial Codex CLI is OpenAI's terminal-based coding agent: you run one command in your project, describe what you want in plain language, and the agent reads your files, plans, edits code and runs commands in your terminal. This tutorial takes you from nothing installed to a productive first workflow: how to install Codex CLI, how to sign in (with your ChatGPT account or an API key), how to give it project rules with an AGENTS.md, and how to run a first task safely. If you are weighing it against Anthropic's agent, see our Claude Code vs Codex CLI comparison; the two are close cousins and the habits transfer. Everything here was verified against the official OpenAI Codex docs in June 2026, and because this tool evolves fast, the exact commands are the part to double-check against the docs. #### What Codex CLI is Codex CLI is an agent harness from OpenAI: a lightweight coding agent that runs in your terminal, wraps a model in a loop with tools (read files, edit files, run shell commands), and works directly in your repository toward a goal you set. Like Claude Code, it does not just hand you snippets to paste; it makes multi-file changes and runs commands, and you stay in control by reviewing what it proposes and approving the actions that matter. It is open source and grew quickly: by early 2026 it had passed two million weekly active users. The project configuration model is AGENTS.md, an open, tool-agnostic convention for giving a coding agent its standing instructions, which is the Codex equivalent of Claude Code's CLAUDE.md. See the glossary on the agent harness for the underlying concept. - A terminal coding agent that reads, edits and runs code in your repo, not an autocomplete. - You supervise: it proposes actions and you approve the ones that matter. - Open source, with more than two million weekly active users by early 2026. - Configured with AGENTS.md, an open, tool-agnostic standing-instructions convention. #### Installing Codex CLI Install Codex CLI with npm or Homebrew. The npm install puts the codex binary on your path globally and needs Node.js 18 or later; the Homebrew cask is the convenient option on macOS. Pick one, then confirm it is on your path. These commands are current as of June 2026 against the official quickstart, but because the tool moves fast, check the OpenAI Codex docs if anything has shifted. ```bash # npm (needs Node.js 18+) npm install -g @openai/codex # Homebrew on macOS brew install --cask codex # Confirm it is installed and on your path codex --version ``` Install Codex CLI with npm or Homebrew, then verify the version. Verified against the official docs in June 2026. #### Signing in On first run, Codex prompts you to authenticate, and you have two paths. The recommended one for most people is signing in with your ChatGPT account, which uses Codex as part of a paid ChatGPT plan (Plus, Pro, Business, Edu or Enterprise). The alternative is an OpenAI API key, billed per token, which is the right choice for automation, CI, or if you prefer usage-based billing; note that some functionality may differ between the two. To use a key, set it as an environment variable before launching, or follow the in-app sign-in flow for ChatGPT. Just run codex and pick "Sign in with ChatGPT" when prompted. ```bash # Start Codex; on first run it walks you through sign-in codex # Option A (recommended): choose "Sign in with ChatGPT" in the prompt # uses your paid ChatGPT plan (Plus, Pro, Business, Edu, Enterprise) # Option B: authenticate with an API key (good for CI and automation) export OPENAI_API_KEY="your-api-key" codex ``` Authenticate with your ChatGPT account (recommended) or an OpenAI API key for automation and CI. #### Giving it rules with AGENTS.md AGENTS.md is how you teach Codex your project's standing rules, the same role CLAUDE.md plays for Claude Code, and because it is an open convention many agents read it. Codex builds an instruction chain when it starts, walking from a global file down to your current directory and concatenating them, with files closer to your working directory taking precedence in a conflict. That means three useful scopes: a global ~/.codex/AGENTS.md for your personal defaults, an AGENTS.md at your repository root for project rules everyone shares, and nested AGENTS.md files in subdirectories for area-specific overrides. Keep it focused on setup, conventions and test requirements; Codex respects a default size limit (about 32 KiB across the chain), and like any standing-instructions file, a tight one beats a sprawling one. ```markdown # AGENTS.md (at your repository root) ## Setup - Install with `bun install`. The dev server is `bun run dev`. ## Conventions - TypeScript only. Use "-" not em dashes. Border radius is always rounded-sm. ## Tests and quality gate - Before you call a task done, run: bun run lint && bun run typecheck && bun run test. ## Out of scope - Do not edit files under /generated; they are built, not hand-edited. ``` A compact AGENTS.md at the repo root. Codex reads global, then project, then nested files, with the closest winning. #### Your first task Open a terminal inside the project you want to work on and run codex. Start small to build trust: ask it to explain the project or find where something lives before you have it change anything. When you do ask for an edit, prefer to have it propose its approach first, review the change it makes, and keep your tests green. A safety habit the docs recommend is to commit a clean Git checkpoint before a task and again after, so a change you do not like is one git restore away. Codex offers approval modes that control how much it can do without asking; start more cautious, where it asks before running commands or editing, and loosen it only once you trust the workflow on a given repo. - Run codex inside your project; begin with a read-only request like "explain what this project does". - For edits, have it propose the approach, then review the diff before you accept it. - Commit a Git checkpoint before and after a task so you can roll back cleanly. - Start with a cautious approval mode (it asks before acting) and loosen it only once you trust it. #### Where to go next Once the basics click, the habits that make Codex reliable are the same ones that make any coding agent reliable, and they transfer directly from our other guides. Lean on the Prompt Patterns for Coding Agents guide for how to brief it well, the Context Engineering guide for keeping its window sharp on long tasks, and the What Is Agentic Engineering pillar for the discipline of directing and verifying rather than accepting unchecked output. If you also use Anthropic's agent, the Claude Code vs Codex CLI comparison lays out where each fits, and the Foundations course installs and runs both side by side. - Prompt Patterns for Coding Agents: how to brief Codex with a spec, examples and a verification loop. - Context Engineering: keep its context window sharp across long tasks. - Claude Code vs Codex CLI: which terminal agent fits which job. - Foundations course: installs and runs Claude Code and Codex side by side. --- ## Comparisons ### Claude Code vs Codex CLI (2026 Comparison) - Canonical URL: https://agenticschool.dev/compare/claude-code-vs-codex-cli Claude Code and Codex CLI are the two leading terminal-based AI coding agents in 2026. Both live in your terminal, read and edit your repo, run commands and work toward a goal across many steps, but they make different trade-offs. Claude Code (Anthropic) leans into reasoning depth and supervised autonomy and is closed-source; Codex CLI (OpenAI) is open-source, Rust-native and leans into speed, parallelism and lower cost per token. This page compares them honestly so you can pick the right one for your work, not the one with the loudest marketing. #### Claude Code Reasoning-first terminal agent from Anthropic. Complex features, refactors, architecture and frontend work where output quality matters most. #### Codex CLI Open-source, fast, cost-efficient agent from OpenAI. Autonomous and long-running tasks, DevOps and cost-sensitive, high-volume workflows. Pick Claude Code when output quality and reasoning matter most: complex features, refactors and frontend work where you supervise and want clean results the first time. Pick Codex CLI when you care about open-source, lower cost per token, or long-running autonomous and DevOps tasks at volume. Many teams run both: Claude Code for the hard, supervised work and Codex CLI for cheap, parallel, automated runs. There is no single winner in 2026; the right answer depends on whether your bottleneck is quality or cost and autonomy. ### Claude Code vs Cursor (2026 Comparison) - Canonical URL: https://agenticschool.dev/compare/claude-code-vs-cursor Claude Code and Cursor are two of the most popular ways to code with AI in 2026, but they are different kinds of tool. Claude Code (Anthropic) is a terminal-based coding agent: you run one command in your project and it reads, plans, edits and runs your code in a loop while you supervise. Cursor is an AI-first IDE, a fork of VS Code where the AI is woven into a familiar editor with inline autocomplete, an agent panel and visual diff review. Neither is strictly "better"; they suit different ways of working. This page compares them honestly on interface, models, pricing and autonomy so you can pick the one that fits how you like to build, and many developers happily use both. #### Claude Code Terminal-native coding agent from Anthropic. Developers who live in the terminal and want a deeply autonomous agent for multi-file features, refactors and repo-wide work. #### Cursor AI-first IDE built on a VS Code fork. Developers who want AI inside a familiar visual editor, with fast autocomplete and side-by-side diff review. Pick Cursor if you want AI inside a familiar, visual editor: best-in-class autocomplete, side-by-side diff review and an in-IDE agent, with the freedom to switch between Anthropic, OpenAI and Google models. Pick Claude Code if you live in the terminal, want the deepest agentic autonomy on hard multi-file work, and value its CLAUDE.md, hooks, skills, MCP and subagent ecosystem. The honest truth in 2026 is that these are complementary, not rivals: a common setup is Cursor as your day-to-day editor for tight, reviewed edits and Claude Code in the terminal for the heavy, autonomous tasks. Choose by how you prefer to work and review, not by which has the louder benchmark. ### Best AI Coding Tools in 2026 - Canonical URL: https://agenticschool.dev/compare/best-ai-coding-tools There is no single best AI coding tool in 2026; there is a best one for you. The market has split into clear shapes: terminal coding agents (Claude Code, Codex CLI, Aider), AI-first IDEs (Cursor, Windsurf) and the in-editor assistant that started it all (GitHub Copilot). This round-up compares the six tools most people are choosing between, honestly, with genuine pros and cons for each and a use-case-based recommendation, because the right tool depends on how you work, your budget and how much autonomy you want. Where two of these go head to head we have dedicated deep-dives (Claude Code vs Cursor, and Claude Code vs Codex CLI) linked below. Facts here are current as of June 2026; pricing moves fast, so we describe the model rather than fragile exact numbers. #### Claude Code Reasoning-first terminal coding agent (Anthropic). Complex, multi-file features, refactors and repo-wide work where output quality and deep autonomy matter most. #### Cursor The leading AI-first IDE (VS Code fork). Developers who want AI inside a familiar visual editor with great autocomplete and diff review. #### Codex CLI Open-source, fast, cost-efficient terminal agent (OpenAI). Autonomous and long-running tasks, DevOps and cost-sensitive, high-volume automation. #### GitHub Copilot The original in-editor AI assistant (GitHub). Teams already on GitHub who want AI completion, chat and agent mode inside VS Code, JetBrains and the GitHub flow. #### Windsurf AI IDE with the Cascade agent (now part of Cognition). Beginners and agentic-heavy workflows who want a gentle, visual on-ramp to an AI editor. #### Aider Open-source, git-native terminal pair programmer. Developers who want a free, model-agnostic terminal tool with tight Git integration and full cost control. For the highest output quality on complex, multi-file work, Claude Code leads the terminal agents in 2026. If you want AI inside a familiar visual editor, Cursor is the best all-round AI IDE, with Windsurf the gentler, more beginner-friendly alternative. For cost-sensitive, autonomous or high-volume runs, Codex CLI is the strong open-source pick, and Aider is the free, model-agnostic, git-native choice when you want full control of cost and models. If your team lives in GitHub and you just want capable AI in your existing editor, GitHub Copilot is the path of least resistance. The honest answer is that most serious developers run two: an IDE for tight, reviewed edits and a terminal agent for the heavy lifting. Use the head-to-head guides below to choose between the closest pairs. ### Opus vs Sonnet vs Haiku: Which Claude Model? (2026) - Canonical URL: https://agenticschool.dev/compare/opus-vs-sonnet-vs-haiku Anthropic ships Claude in three sizes, and picking the right one is the simplest, biggest lever on both quality and your bill. Opus is the most capable model for the hardest reasoning, Sonnet is the balanced workhorse that handles most coding and agent tasks well, and Haiku is the fast, cheap model for high-volume and latency-sensitive work. They share the same core abilities and, in 2026, a 1M-token context at standard pricing; what changes is depth of reasoning, speed and cost per token. This page compares them honestly so you can match the model to the task instead of paying for Opus on work Haiku would nail, or sending a genuinely hard problem to Haiku and getting frustrated. It pairs with our Choosing an AI Model article. Prices are list rates as of 2026 and move over time. #### Claude Opus The most capable Claude, for the hardest problems. Complex reasoning, tricky multi-file refactors, architecture and the agentic tasks where quality matters most. #### Claude Sonnet The balanced workhorse for everyday building. Most day-to-day coding, agent loops and content work where you want strong quality at a sensible price. #### Claude Haiku The fast, cheap model for volume and latency. High-volume classification, extraction, simple edits, subagent side work and anything latency-sensitive. Make Sonnet your default: it handles the large majority of coding and agent work at a strong quality-to-cost ratio, which is why Claude Code defaults to it. Reach for Opus on the genuinely hard problems, complex reasoning, tricky refactors and architecture, where its extra depth pays for itself by getting things right the first time. Drop to Haiku for high-volume, simple or latency-sensitive work like classification, extraction and read-only subagents, where speed and cost matter more than peak reasoning. A great pattern is to mix them: a stronger model leads while Haiku does the cheap, narrow side work. To cut spend further, use prompt caching for repeated context and batch processing for non-urgent jobs. When in doubt, start on Sonnet and only move up or down when a task clearly demands it. ### n8n vs Zapier vs Make (2026 Comparison) - Canonical URL: https://agenticschool.dev/compare/n8n-vs-zapier-vs-make n8n, Zapier and Make are the three automation platforms most people choose between in 2026 to connect apps and run workflows without writing glue code for every integration. They solve the same problem in different ways: Zapier is the easiest, broadest and most managed; Make is the visual, operation-priced middle ground for branching logic; and n8n is the open-source, self-hostable, developer-leaning option that gets dramatically cheaper at high volume. The catch is that each prices usage differently (Zapier per task, Make per operation, n8n per workflow execution), which makes raw price tags misleading. This page compares them honestly on pricing model, self-hosting, ease of use and AI features so you can match the platform to your real workload. It pairs with our Course 4 lesson on choosing between n8n, Zapier and Trigger.dev. #### n8n Open-source, self-hostable workflow automation. Developers and technical teams who want low cost at volume, self-hosting and full control over data and logic. #### Zapier The easiest, broadest managed automation tool. Non-technical users and teams who want the widest app coverage and the fastest, simplest setup. #### Make Visual, operation-priced automation with rich logic. Users who want a visual builder for complex, branching workflows at lower cost than Zapier. Choose Zapier if you are non-technical or value the fastest setup and the widest app coverage, and your volume is modest enough that per-task pricing stays comfortable. Choose Make if you want a visual builder for complex, branching workflows at meaningfully lower cost than Zapier and are happy on a managed cloud. Choose n8n if you are technical, run high volume, or need self-hosting and data control: per-execution pricing and the self-host option make it by far the cheapest at scale, and its 2.0 AI and agent features suit agentic workflows. The honest rule of thumb: start on Zapier or Make to prove a workflow quickly, and move to self-hosted n8n once volume, cost or data-residency needs make per-task pricing hurt. Match the platform to your workload, not to a sticker price, because the three count usage in different units. ### Cursor vs Windsurf vs Copilot (2026 Comparison) - Canonical URL: https://agenticschool.dev/compare/cursor-vs-windsurf-vs-copilot Cursor, Windsurf and GitHub Copilot are the three most popular ways to get AI inside a graphical editor in 2026, and choosing between them comes down to how much editor you want to switch and how much autonomy you want. Cursor (Anysphere) and Windsurf (now part of Cognition) are AI-first IDEs: forks of VS Code where the AI is woven into the editor, with an agent that plans and edits across files. GitHub Copilot is the assistant that started the category, and it stays as an extension inside the editor you already use rather than a new app. None is strictly best; they suit different habits, budgets and team needs. This page compares them honestly on interface, models, pricing and autonomy so you can pick the one that fits how you build. If you also want a terminal agent in the mix, see our Best AI Coding Tools round-up. Facts are current as of June 2026; pricing moves fast, so we describe the model rather than fragile exact numbers. #### Cursor The leading AI-first IDE (VS Code fork). Developers who want the most capable agentic IDE with great autocomplete, multi-model choice and side-by-side diff review. #### Windsurf Beginner-friendly AI IDE with the Cascade agent. People who want a clean, low-steering agentic editor and a generous free tier to learn on, with a path to longer autonomous runs. #### GitHub Copilot The original in-editor AI assistant (GitHub). Teams already on GitHub who want capable AI inside their existing editor with the widest IDE support and predictable enterprise governance. Pick Cursor if you want the most capable AI IDE in 2026: top-tier autocomplete, multi-model routing, side-by-side diff review and deep agentic features, and you do not mind moving into its editor or watching a usage pool on heavy days. Pick Windsurf if you want a gentler, lower-steering agentic editor and a genuinely useful free tier to learn on, with a path to longer autonomous runs through Cognition, accepting a smaller ecosystem and the newer quota limits. Pick GitHub Copilot if your team lives in GitHub and you want capable AI inside the editor you already use, the widest IDE coverage, the cheapest entry point and the strongest enterprise governance, trading a little raw agentic depth for stability. The honest rule of thumb: Cursor for power and autonomy, Windsurf to learn and stay hands-off, Copilot to add AI without changing tools. Choose by your workflow and how much autonomy you want, not by a single benchmark. ### Convex vs Supabase vs Firebase (2026 Comparison) - Canonical URL: https://agenticschool.dev/compare/convex-vs-supabase-vs-firebase Convex, Supabase and Firebase are the three backends most indie hackers and small teams choose between in 2026 to add a database, auth, storage and APIs without standing up servers from scratch. They take different shapes: Convex is a reactive, TypeScript-native backend where your queries are functions and the UI updates in real time by default; Supabase is managed Postgres with auth, storage, edge functions and instant REST APIs, and it is open-source and self-hostable; Firebase is Google's mature, mobile-first platform built on the Firestore NoSQL database with the deepest offline support. The choice usually comes down to your data model (SQL vs document vs reactive), how much you value real-time, and whether you want to be able to self-host. This page compares them honestly so you can match the backend to your app. It pairs with our Modern App Stack Explained article and the Course 3 Convex lesson. Facts are current as of June 2026 and pricing models move over time. #### Convex Reactive, TypeScript-native backend with real-time by default. TypeScript apps that want real-time reactivity, end-to-end type safety and the least backend wiring. #### Supabase Open-source Postgres backend with auth, storage and APIs. Teams that want a real relational SQL database, the option to self-host and a full open-source backend suite. #### Firebase Google's mature, mobile-first BaaS on Firestore. Mobile and offline-first apps that want battle-tested SDKs and Google Cloud integration. Pick Convex if you are building a TypeScript app and want real-time reactivity and end-to-end type safety with the least backend wiring; it is the most natural fit when live updates are core to the product, with the trade-off of a source-available license and a younger ecosystem. Pick Supabase if you want a real relational SQL database, the freedom to self-host, and a full open-source suite of database, auth, storage and functions; it is the most versatile all-rounder and the safest against lock-in. Pick Firebase if you are shipping a mobile or offline-first app and value its mature SDKs and Google Cloud integration, accepting hosted-only lock-in and per-read pricing that can be hard to predict at scale. The honest rule of thumb: Convex for reactive TypeScript apps, Supabase for SQL and open-source control, Firebase for mobile. For how these fit a full modern stack alongside auth and payments, see our Modern App Stack Explained article, and compare auth providers in our Clerk vs Auth0 vs Supabase Auth page. ### Clerk vs Auth0 vs Supabase Auth (2026 Comparison) - Canonical URL: https://agenticschool.dev/compare/clerk-vs-auth0-vs-supabase-auth Clerk, Auth0 and Supabase Auth are the three authentication providers most teams choose between in 2026 to handle sign-up, login, sessions and access control without building auth from scratch, which is the part of an app you really do not want to get wrong. They aim at different users: Clerk leads on developer experience with drop-in UI components and the smoothest setup; Auth0 (an Okta product) is the enterprise standard for SAML single sign-on, compliance and advanced security; and Supabase Auth comes bundled with a Supabase Postgres database and Row Level Security, with the most generous free tier and the lowest cost at scale. All three implement modern standards like OAuth and OpenID Connect under the hood. This page compares them honestly on developer experience, free tiers, cost at scale and openness so you can match the provider to your project. For the standard they are built on, see our What Is OAuth explainer, and if you are also choosing a database, see our Convex vs Supabase vs Firebase comparison. Pricing is as of 2026 and moves over time. #### Clerk The best developer experience with drop-in auth UI. Indie hackers and SaaS teams who want beautiful pre-built auth UI and the fastest path to production. #### Auth0 The enterprise standard for SSO and compliance (Okta). Enterprises and B2B SaaS that need SAML SSO, advanced security policies and strong compliance. #### Supabase Auth Open-source auth bundled with a Postgres database. Teams already on Supabase, or anyone who wants the cheapest auth at scale with the option to self-host. Pick Clerk if you want the best developer experience and the fastest path to production: drop-in UI components, modern features like passkeys and a generous free tier make it the favourite for indie hackers and React or Next.js SaaS, accepting that it is managed and priced per MAU. Pick Auth0 if you are an enterprise or B2B SaaS that needs SAML single sign-on, advanced security policies and broad compliance; it is the most capable and most battle-tested, with the trade-off that per-MAU pricing gets expensive at scale. Pick Supabase Auth if you are already on Supabase or you want the cheapest auth at scale with the option to self-host; it pairs naturally with Postgres and Row Level Security and has the most generous free tier, at the cost of fewer polished UI components and lighter enterprise features. The honest rule of thumb: Clerk for developer experience, Auth0 for enterprise needs, Supabase Auth for cost and openness. All three are built on standards like OAuth and OpenID Connect, explained in our What Is OAuth guide. --- ## Tools ### llms.txt Generator - Canonical URL: https://agenticschool.dev/tools/llms-txt-generator This free llms.txt generator builds a spec-correct llms.txt file for your website in seconds. Enter your site name, a one-line summary and your key pages, and it produces a Markdown file you can copy or download and drop at the root of your domain. Everything runs in your browser: nothing is sent to a server, there is no sign-up, and there is no cost. Below the tool you will find a short explainer on what llms.txt is and why it matters for being understood by AI. #### What is llms.txt? llms.txt is a simple Markdown file you place at the root of your site (at /llms.txt) to give AI systems a clean, curated map of your most important content. Instead of forcing a model to crawl and guess at your HTML, you hand it a short, structured summary plus links to the pages that matter. It is to AI readers roughly what a sitemap or robots.txt is to search crawlers: a friendly, machine-first entry point. #### The format, in one minute The spec is deliberately small. A valid llms.txt starts with a single H1 with your site or project name (the only required part), followed by a blockquote with a short summary, then optional sections. Each section is an H2 heading with a Markdown list of links, where every link is written as a title, the URL, and a short description. ```markdown # Your Site Name > A one-line summary of what your site is and who it is for. ## Docs - [Getting started](https://example.com/start): How to set up in five minutes. - [API reference](https://example.com/api): Every endpoint with examples. ## Optional - [Full text](https://example.com/llms-full.txt): The whole site as one document. ``` A minimal, spec-correct llms.txt: H1, blockquote summary, then H2 link sections. #### Why it matters for AEO As more people ask AI assistants questions instead of typing into a search box, being cleanly readable by those systems becomes part of getting found. A good llms.txt makes it cheap and unambiguous for an AI to understand what your site offers and link to the right page, which is the heart of answer engine optimisation (AEO). It does not replace good content or a sitemap; it complements them by giving machine readers a curated front door. #### How to use the file Generate the file with the tool above, then save it as llms.txt and upload it to the root of your domain so it is reachable at yoursite.com/llms.txt. Keep it short and curated: link the handful of pages that best represent your site rather than every URL. Update it when your key pages change, and consider also publishing a fuller llms-full.txt with more complete content for systems that want the whole corpus in one request. ### LLM Cost Calculator - Canonical URL: https://agenticschool.dev/tools/llm-cost-calculator This free LLM cost calculator estimates what you will pay to run a large language model in production. Pick two models, enter your input and output tokens (or words, which it converts for you) and your monthly call volume, then toggle prompt caching and batch processing to see the savings. It shows per-call and monthly cost for both models side by side using current 2026 list prices, so you can size a budget or pick the cheapest model for the job before you write a line of code. Everything runs in your browser: no sign-up, no API key, nothing leaves your device. #### How LLM pricing actually works Almost every LLM API charges per token, not per request, and quotes the rate per million tokens. A token is roughly four characters of English text, so about 750 words is 1,000 tokens. Your bill for one call is simply the input tokens you send times the input rate plus the output tokens the model generates times the output rate, divided down to your actual token counts. Because providers quote per million, the calculator above does that division for you and multiplies by your monthly call volume to project a real monthly cost. #### Why input and output are priced differently Output tokens almost always cost several times more than input tokens, because generating text is more compute-intensive than reading it. On the Claude models in 2026, for example, output is five times the input rate (about USD 5 input and USD 25 output per million tokens for Opus, about USD 3 and USD 15 for Sonnet, and about USD 1 and USD 5 for Haiku). This is why a chatbot that returns long answers can cost far more than one that returns short ones, and why trimming output length is often the biggest single lever on your bill. The calculator splits the two sides so you can see exactly where the money goes. #### Prompt caching and batch discounts Two features can cut your cost dramatically, and the toggles above model both. Prompt caching reuses a large, unchanging prefix (a system prompt, tool definitions, or retrieved documents) so that repeated input is billed at roughly a tenth of the normal input rate; it only affects the input side, which is why the calculator discounts input alone. Batch processing runs non-urgent jobs asynchronously for about half price on both input and output, in exchange for a slower, best-effort turnaround. If your workload reuses context or can tolerate latency, these two settings often matter more than which model you pick. #### Picking the cheapest model that still works The cheapest model is not always the smartest choice: a weaker model that needs three retries can cost more than a stronger one that gets it right the first time. The honest approach is to start with the smallest model that can do the task reliably, measure its real token usage, and only move up when quality clearly demands it. Use this calculator alongside our Opus vs Sonnet vs Haiku comparison and our Choosing an AI Model article to match the model to the task, then estimate the bill before you commit. For high-volume work, combine a capable lead model with a cheaper one for narrow side tasks. ### Token & Context Window Estimator - Canonical URL: https://agenticschool.dev/tools/token-context-estimator This free token and context window estimator tells you, in seconds, roughly how many tokens a piece of text is and whether it fits inside each major model context window. Paste a prompt, a document, or a whole transcript, and it shows a token estimate plus a fill bar and a clear fits or does not fit verdict for Claude, GPT and Gemini, alongside an approximate input cost. It uses a fast characters-per-token heuristic, so it is an estimate rather than an exact tokenizer, which is perfect for quickly sizing context before you build. Everything runs in your browser: no sign-up, no upload, nothing leaves your device. #### What is a token? A token is the unit a language model actually reads and writes. It is not a word or a character but something in between: a common short word is often a single token, while a long or rare word is split into several. As a rough rule for English, one token is about four characters, and 1,000 tokens is roughly 750 words. Models bill per token and measure their limits in tokens, which is why estimating tokens, not words, is the right way to size a prompt or a document. #### What is a context window? The context window is the maximum number of tokens a model can hold in mind at once, counting both your input and the output it generates. If your text exceeds the window, the model cannot see all of it, and you have to trim, chunk, or summarise. In 2026 the windows are large: Claude Opus and Sonnet offer about one million tokens, Gemini models reach a million too, and others sit between two and four hundred thousand. The fill bars above show how much of each window your text would use, so you can see at a glance what fits. #### Why this is an estimate, not an exact count A real token count depends on the specific tokenizer each model family uses, and those differ between Claude, GPT and Gemini and even between versions. Bundling every tokenizer into a browser tool would be heavy and would still only match one model at a time. Instead this tool uses the widely cited four-characters-per-token heuristic, which is close enough to plan with for typical English prose but can be off for code, other languages, or text full of symbols and numbers. Treat the number as a reliable ballpark, and verify with the provider tokenizer when an exact count matters. #### Using the estimate to plan cost and context Once you know roughly how many tokens your text is, two decisions get easier. First, whether it fits: if a document is close to or over a model window, plan to chunk it, retrieve only the relevant parts, or pick a larger-window model. Second, what it costs: the approximate input cost shown next to each model is a starting point, and the full LLM Cost Calculator lets you add output tokens, monthly volume, caching and batch discounts for a complete estimate. For the deeper why, see our glossary entries on tokens and the context window and our context engineering guide. ### AI Automation ROI Calculator - Canonical URL: https://agenticschool.dev/tools/ai-automation-roi-calculator This free AI automation ROI calculator turns a vague hunch that "we should automate this" into real numbers. Enter how much time a recurring task eats today (hours per week, how many people, the loaded hourly cost), how much of it an automation can take over, plus the one-time setup effort and the monthly tool cost, and it shows the time saved per month and year, the gross and net monthly saving, the steady-state annual saving, the payback period and the first-year ROI. Everything runs in your browser: no sign-up, no API key, nothing leaves your device. Use it to sanity-check an automation idea before you build it. #### How automation ROI is calculated The maths is simpler than most spreadsheets make it look. First, work out the time the task costs today: hours per week times the number of people gives the total weekly hours, and the share you can automate tells you how many of those hours an automation removes. Multiply the saved hours by your loaded hourly cost to get the gross saving. Then subtract the running cost of the automation (the monthly tool or subscription fee) to get the net monthly saving. The calculator above converts weekly figures to monthly using 4.33 weeks per month, so the numbers line up with a real calendar rather than a tidy four-week approximation. #### Payback period and first-year ROI Two numbers decide whether an automation is worth building. The payback period is how many months it takes for the net monthly saving to recoup the one-time investment (the setup hours valued at your hourly cost, plus any extra setup cost). A payback under three to six months is usually an easy yes. First-year ROI compares the net benefit over twelve months against everything you put in that year (the one-time setup plus twelve months of tool cost), expressed as a percentage. A positive ROI means the automation pays for itself within the year; if the net monthly saving is negative, the tool flags that it does not pay back at all, which is just as useful to know early. #### Use a loaded hourly cost, not a salary The single biggest mistake people make is using a bare salary figure. The honest input is the fully loaded cost of an hour: salary plus payroll taxes, benefits, software, office and the opportunity cost of what that person could be doing instead. As a rough rule, the loaded cost is often 1.3 to 1.5 times the base hourly wage. Using the loaded figure keeps the saving believable and stops you from overstating the case. Be equally honest about the share you can automate: very few tasks go to 100 percent, because edge cases, exceptions and review still need a human. A realistic 60 to 80 percent usually beats an optimistic 100 percent that never materialises. #### From estimate to a real automation A strong ROI on paper is a starting point, not a finish line. The savings only show up if the automation is adopted, kept running and trusted, which is why the smartest move is to start with one painful, repetitive, rule-based task rather than a sprawling process. Once the numbers look good here, the next decision is how to build it: a no-code platform like n8n, Zapier or Make for simple connector workflows, or a custom script or agent for anything with real logic. Read our guide on AI automation for business to scope the first project, compare the main no-code platforms before you pick one, and treat this calculator as the quick gut-check you run before every automation you consider. ### Which AI Coding Tool Should I Use? - Canonical URL: https://agenticschool.dev/tools/ai-coding-tool-quiz This free quiz recommends the AI coding tool that fits how you actually work. Answer seven quick questions about where you code, how much autonomy you want, your experience, budget, codebase size and whether open source matters, and it suggests one of the leading 2026 tools: Claude Code, Cursor, GitHub Copilot, Windsurf, Codex CLI or Aider. You get a clear pick with a short rationale and a deep link to the honest comparison behind it. Everything runs in your browser: no sign-up, no tracking, nothing leaves your device. It is a starting point to narrow the field, not a verdict carved in stone. #### The biggest split: terminal agent or IDE assistant Before any quiz, one question sorts most of the field: do you want to work in the terminal or inside your editor? Terminal-first agents like Claude Code, Codex CLI and Aider run multi-step tasks from the command line, read and edit many files at once, and act with real autonomy. Editor-based tools like Cursor, Windsurf and GitHub Copilot live where you already write code, keeping you in the loop with inline suggestions, chat and scoped edits. Neither is better in the abstract; they suit different temperaments and workflows. The quiz weights this answer most heavily because it is the strongest single signal of which tool you will actually enjoy using day to day. #### Autonomy, budget and the kind of work you do After the environment, three things refine the pick. Autonomy is how much you want the tool to do unattended: full multi-step execution points to agents like Claude Code, close step-by-step collaboration suits Cursor and Windsurf, and pure inline autocomplete is classic Copilot territory. Budget steers the result too: an open-source-only or zero-cost requirement favours Aider, good value at a small monthly fee favours Copilot, and a results-first budget opens up the frontier agents. Finally, the kind of work matters: large refactors and big codebases reward a capable agent, learning while you build suits a gentle assistant, and quick one-off scripts suit a fast command-line tool. The quiz blends all of these into a single recommendation. #### Why the result is a starting point, not the last word No quiz can know your exact stack, team or taste, and these tools change fast. The recommendation is meant to narrow six options down to one strong candidate you can try first, with an honest runner-up worth a look. Treat it like a knowledgeable friend pointing you in a direction, then verify with your own hands: almost every tool has a free tier or trial, and an hour on a real task tells you more than any quiz. The result links straight to the detailed comparison behind the pick so you can read the spec table, the pros and cons and the verdict, then decide with eyes open. For the full landscape, our best AI coding tools hub lays out every option side by side, and our guide on how to use Claude Code shows what a terminal agent feels like in practice. --- ## Knowledge ### How LLMs Actually Work: Tokens, Context and the Performance Cliff - Canonical URL: https://agenticschool.dev/knowledge/how-llms-work A large language model does one thing remarkably well: it predicts the next token given everything it has seen. Once you understand tokens, the context window and the performance cliff that hits long inputs, working with any model stops feeling like guesswork. This guide explains all three in plain language, with the few numbers that actually matter in 2026, so you can drive any model well and stop blaming the tool for behaviour that is entirely predictable. #### Tokens, not words A model never sees words the way you do. Your text is first split into tokens, which are common chunks of characters, roughly four characters or three quarters of a word in English. Two things are measured in tokens: the price you pay and the amount a model can hold at once. That is why a cheap model can become expensive on long documents, and why code or other languages cost more tokens than the same idea in plain English. Pricing is quoted per million tokens and split into input and output, with output usually several times more expensive than input. #### The context window The context window is the maximum number of tokens a model can consider at once: your instructions, the files you pasted, the conversation history and the answer it is writing, all added together. Think of it as the model's desk. Everything relevant has to fit on the desk at the same time, and when the desk is full something falls off and is effectively forgotten. This is why a long chat starts losing track of instructions you gave near the start. In 2026 a strong model typically has around a 200,000 token window, with some advertising a million or more. #### The performance cliff Bigger context is not the same as better answers. As you fill a context window, quality degrades long before you hit the hard limit. Models attend best to the start and end of a long input and get fuzzy in the middle, a pattern often called lost in the middle. A million-token window sounds amazing, but answer quality on a packed window is often worse than on a tight, well-chosen prompt. This is the performance cliff, and the lesson is blunt: relevance beats volume every time. #### Why huge context windows disappoint You will see models advertising enormous context windows and assume they are strictly better. In practice they often disappoint, for exactly the reason above. A model can technically accept a million tokens and still answer worse than a focused prompt, because quality falls as the window fills. Treat a huge window as occasional insurance for a genuinely large document, not as permission to stop curating what you send. #### How to use this in practice The practical takeaways are simple. Send less, but send the right less. Start fresh conversations rather than piling onto long ones. When an answer is bad, your first two questions are whether your context is too big and whether the relevant information is actually near the top or bottom. On a workflow that runs thousands of times, trimming a bloated prompt can cut your bill dramatically and improve the answers at the same time. #### Why this matters for your business Tokens are money and context discipline is quality. A team that understands this writes tighter prompts, picks cheaper models for simple tasks, and gets more reliable output, which means less rework. Understanding the cliff is the single highest-leverage thing a non-technical founder can learn before spending on AI at scale, because it changes every downstream decision about models, prompts and agents. ### Claude Code for Business: Setup, Workflows and Security - Canonical URL: https://agenticschool.dev/knowledge/claude-code-for-business Claude Code is a coding agent that works directly in your terminal: it reads and edits files, runs commands and takes on whole tasks. For a business that is attractive, but it immediately raises questions about security, permissions and governance. This guide shows how teams introduce Claude Code cleanly, from first setup through repeatable workflows to clear rules that enable speed without losing control. #### What Claude Code does in a business Claude Code is not a chat window, it is an agent that acts inside your project. It understands a task, reads the relevant files, proposes a plan, makes the changes, runs your tests and presents the result as a reviewable diff. For a business that means small fixes, refactors, documentation and test coverage can be accelerated without a senior person doing every step by hand. The leverage is in delegating repetitive engineering work and reserving human time for architecture, reviews and decisions. #### Getting started: installation and first project Start with a clean workspace: a clearly structured repository, a working test command and an account with defined permissions. Install Claude Code, open your project, and begin with a small, well-scoped task rather than a big rebuild. Watch how the agent plans and works, and learn where you step in. That first deliberately small run teaches you more than any theory, because it shows how the agent reasons about your actual code and where context is missing. #### Project rules with CLAUDE.md Repeatable behaviour comes from project rules. A CLAUDE.md file in your repository describes how the agent should behave: which commands are allowed, which code conventions apply, how tests run and which areas are off limits. Good rules cut friction dramatically because the agent does not have to re-guess how your project works on every task. Treat the file as living documentation: every recurring correction from a review belongs in it as a rule, so the next run starts smarter. #### Safe workflows and permissions Security starts with least privilege. The agent should only access the repositories, directories and commands it genuinely needs for a task. Keep secrets and API keys out of the client and out of logs, and work with environment variables instead of hard-coded keys. Do not let risky commands run blindly; define which actions require confirmation. This keeps speed possible without a single run endangering production data or access. #### From issue to pull request The most productive business workflow runs from a clear issue to a clean pull request. Write the issue with a goal, context and acceptance criteria so the agent knows when the task is done. Let it implement the change in small, reviewable diffs and add tests. The resulting pull request then goes through human review like any other. This integrates the agent into existing processes instead of creating a parallel shadow workflow, and keeps quality control where it belongs. #### Governance and responsibility In a business a good workflow is not enough without governance. Decide who approves agent results, how sensitive data is handled, and which tasks always require human control. Document these rules so they do not live only in individual heads. A simple human-approval checklist for risky steps stops speed coming at the cost of traceability. Governance here is not a brake; it is what lets a team safely hand the agent more responsibility over time. ### Choosing an AI Model: Haiku, Sonnet, Opus, GPT and Gemini Compared - Canonical URL: https://agenticschool.dev/knowledge/choosing-an-ai-model There is no single best AI model, only the right model for a task and a budget. Every provider ships a family of models in tiers: small and fast, mid and balanced, large and smart. Once you see the tiers instead of the brand names, choosing becomes simple. This guide maps the 2026 landscape, explains how to read benchmarks without being fooled, and shows where to get strong models cheaply or for free. #### Think in tiers, not brands Forget brand loyalty and think in tiers. Small models are fast and cheap, great for classification, extraction, simple rewrites and high-volume jobs. Mid models are the balanced workhorse for most real coding and writing. Large models are slower and pricier but reason far better on genuinely hard problems. Almost every provider mirrors this structure, so once you internalise it you can place any new model instantly from its spec and price. #### How the families line up Claude offers Haiku (small), Sonnet (mid) and Opus (large). OpenAI GPT and Google Gemini have equivalent small, mid and large tiers. The rule of thumb is to start one tier lower than you think you need and only move up if the output is genuinely not good enough. Using a flagship model to reformat a list is like hiring a surgeon to apply a plaster. #### Reading benchmarks honestly Benchmarks are useful and also routinely misleading. A model can top a coding benchmark and still feel worse in your actual project, because benchmarks measure narrow tasks under ideal conditions that providers optimise hard for. Treat them as a rough filter, not a verdict. The only benchmark that matters is your own: take three real tasks from your work, run them through two or three models, and judge the output yourself, paying attention to consistency rather than peak performance. #### Pricing and routing by difficulty Price is quoted per million input and output tokens, and the spread between tiers is large, often ten times or more. Because output costs several times more than input, verbose models and chatty prompts cost more than you expect. The practical move is to route by difficulty: a cheap model for the easy majority of calls, an expensive model only for the hard minority. On a workflow at scale this single decision often matters more than which provider you chose. #### Getting strong models cheaply or free You do not have to pay full price to start. OpenRouter gives you one account and key to access almost every model through a single endpoint, with transparent pricing and easy switching, which is ideal for comparing models. Google routinely offers generous free credits through its AI Studio, a genuinely strong low-cost way to get a capable model. Most providers also offer free trial usage you can spend deliberately on your own three-task benchmark. ### Ship Your First App with AI: From Idea to a Live Website - Canonical URL: https://agenticschool.dev/knowledge/ship-your-first-app Most people get stuck between having an idea and seeing it live on the internet. The path feels like a wall of unfamiliar words: scaffolding, dev server, Git, DNS, deployment. It is far simpler than it looks once you do it in order. This guide walks the whole loop, from scaffolding a project and running it locally to putting it under version control and deploying it to the public internet with a real domain. #### Scaffold and run locally Rather than memorise a framework, have your agent scaffold a modern starter and explain each step. The flow is always the same: create the project, install dependencies, start the dev server, open the browser. The dev server prints a local address like localhost, which simply means this computer. Nothing is on the public internet yet, which is exactly what you want while building. Edit a file, save, and the browser updates instantly, which is what makes web building feel fast. #### Put it under version control Git is version control: it takes snapshots of your project called commits so you can always go back. Think of it as an unlimited undo history with labels. Each commit records what changed and why, so a bad edit is never a disaster, you just return to the last good commit. This single habit removes the fear that stops beginners from experimenting, because nothing is permanent until you decide it is. #### Keep secrets out before you push Before you push anything to GitHub, make sure secrets cannot escape. A .gitignore file lists things Git should ignore, and your .env file, where API keys live, belongs there. The rule is absolute: secrets go in .env, .env goes in .gitignore, and the agent never writes a key into committed code. Get this right once and you will never accidentally publish a key. Default business repositories to private so your code and IP stay yours. #### Deploy to Vercel Vercel is a hosting platform built for this. Connect your GitHub repo and Vercel watches it, redeploying automatically every time you push. Import your repository, accept the detected build settings for a standard project, and click deploy. In a minute or two you get a live URL anyone in the world can open. Set your secrets as environment variables in the Vercel dashboard rather than in code, so they stay safe and never touch your repo. #### Connect a custom domain DNS is the internet's phone book, translating a human name like yoursite.com into the address of the server that should answer. To connect a domain you add records, usually an A record or a CNAME, that point your name at Vercel. Vercel tells you exactly which records to create, you paste them into your DNS provider, and after propagation Vercel issues a free HTTPS certificate so your site loads securely. Many builders manage DNS through Cloudflare for free HTTPS, a faster global CDN and basic protection. #### Why shipping matters A project on your laptop earns nothing. A live site can be shown to customers, indexed by Google, recommended by AI and improved with real feedback. Automatic deploys mean shipping an improvement costs you a single push, so the loop from idea to live update is minutes, not days. That speed is the entire competitive advantage of building this way, and it only starts once you actually ship. ### The Modern App Stack Explained: Auth, Data and Payments - Canonical URL: https://agenticschool.dev/knowledge/modern-app-stack-explained A real product is more than a website. It needs users who can log in, data that persists and updates, and a way to take payment. The good news is that you no longer build these hard parts from scratch: dedicated services handle each one safely. This guide gives you the map - how the pieces fit, which service does what, and the order to assemble them - so you can build a production SaaS without reinventing the dangerous parts. #### How the pieces fit A modern app splits into a fast, indexable marketing surface and the interactive app behind login. A framework like Next.js, Astro or TanStack provides structure, routing and rendering. On top of that you add three services: authentication, a database and payments. Keeping the marketing and app surfaces cleanly separated lets each be optimised for its job, which helps both performance and SEO. #### Authentication with Clerk Storing passwords, managing sessions and resisting attacks correctly is a deep specialism, so you should not hand-roll it. Clerk handles sign-up, login, sessions and social login like Google OAuth, which removes a whole category of security risk. You integrate its components and let it manage identity, while your app manages everything else. Moving from development to production is a careful key swap rather than a rebuild. #### Reactive data with Convex Convex stores your data and runs your backend logic as TypeScript functions. You declare a schema, read with queries and change with mutations, and the UI updates automatically when anything it depends on changes. Strong typing flows from the database into your components, removing a whole class of bugs. For real products, prefer soft deletes, which mark a row as deleted but keep it, so data can be recovered and history stays intact. #### Payments with Stripe Stripe is the standard for taking payment, and it means you never touch raw card data. Checkout hosts a secure payment page, subscriptions handle recurring revenue, and webhooks tell your app what happened so it stays in sync with billing reality. A strict separation of test and production keys lets you build and verify everything without moving real money. The harder details, proration, coupons and embedded checkout, build on this foundation. #### Secrets tie it all together Every service you add comes with secret keys, and leaking one can be catastrophic. Keep them out of code in env files that are always gitignored, use separate keys for development and production so a mistake in dev cannot touch live data, and encrypt sensitive data at rest. This discipline is the connective tissue of the whole stack, and it has to scale as each new integration arrives. #### Going live Launch means switching every service from development to production keys and data, one at a time, verifying each before the next. Then make the live product discoverable by registering with Search Console and submitting a sitemap, and fast by treating Core Web Vitals as part of launch. A careful order prevents the classic launch-day failure where one missed key takes down logins or billing. ### Agent-First Products: Why AI Must Love Your API - Canonical URL: https://agenticschool.dev/knowledge/agent-first-products For two decades, products were designed for human eyes and human clicks. That is changing. A growing share of actions are taken by AI agents acting on behalf of people, and agents do not look at your beautiful UI, they call your API. A product whose API is clean, documented and a joy to use gets adopted by agents and the humans who direct them. This guide explains agent-first design and the API-over-UI philosophy that comes with it. #### Agents are a new user base AI agents now choose which tools and APIs to call to get a job done. That makes them users, with preferences. A product an agent can read about, understand and call cleanly gets chosen; one that hides behind a UI an agent cannot use gets skipped, no matter how polished that UI is. Designing for agents is not futurism, it is recognising a user base that already exists and is growing fast. #### The API-over-UI philosophy Agent-first design treats the API as the primary product surface, not an afterthought behind the interface. Clean endpoints, honest documentation and machine-readable discovery come first, and the UI becomes one client of the API rather than the whole product. This inversion forces clarity: if an agent can use your product from the docs alone, a human developer certainly can too. #### What a great agent-facing API looks like Agents reward predictability. Clear schemas, consistent responses, sensible error messages and discoverable documentation let an agent use your product confidently on the first try. Machine-readable signals like an llms.txt file and structured data make your capabilities easy to find and understand. The same qualities that make an API pleasant for an agent make it pleasant for a human developer. #### Why this is a competitive advantage As more work flows through agents, being the product an agent reaches for first becomes a durable advantage. It is distribution you earn by being usable, not by buying ads. Early movers who design agent-first get recommended and integrated while competitors are still polishing buttons no agent will ever click. The shift rewards builders who take it seriously now. #### How to start You do not need to rebuild everything. Start by making your most valuable capability available through a clean, documented API, add machine-readable discovery so agents can find it, and test it by having an agent use your product from the docs alone. Where the agent struggles, your API needs work. That feedback loop, an agent as your first user, is the fastest way to an agent-first product. --- ## Builds ### BizCollect: Building a Business Data Tool API-First - Canonical URL: https://agenticschool.dev/builds/bizcollect - Stack: TypeScript, Node.js, REST API, OpenAPI, Convex, Claude Code BizCollect collects and structures business data through a clean API first, a UI second. Here is why building API-first changed how I ship every product. #### The problem I was solving I needed a reliable way to collect, clean and structure business data from many messy sources and hand it to other tools in a predictable shape. My first instinct, like most people, was to build a nice dashboard. I started with the screens. That was the mistake that taught me the most. #### Why I flipped it to API-first Halfway through, I realised every consumer of this data was a program, not a person clicking buttons: an automation, a scraper, another agent, a future me writing a script at midnight. The UI was a thin afterthought. So I rebuilt the whole thing around a documented API, with the dashboard as just one client of that API rather than the product itself. - Every capability is an endpoint with a stable contract, not a button buried in a screen. - The API ships with an OpenAPI spec so an agent can discover and use it without me explaining anything. - Responses are predictable JSON with the same shape every time, so consumers never guess. #### What changed once the API came first The moment the API was the product, everything downstream got easier. Automations plugged in without screen-scraping. Testing became trivial because I was testing endpoints, not clicking through flows. And when I pointed an AI agent at the OpenAPI spec, it could use BizCollect correctly on the first try, because the contract told it exactly what to send and what it would get back. That was the lightbulb moment: a tool that an AI can understand and operate without hand-holding is worth far more in 2026 than a tool that only a human can navigate. #### What I would do differently I would write the OpenAPI spec before writing a single endpoint, and treat it as the design document. I would also version the API from day one instead of bolting versioning on later. The dashboard can always be regenerated; a broken API contract breaks everyone who depends on you, including agents you have never met. ### s2p: Auto-Posting Every Release to All My Channels - Canonical URL: https://agenticschool.dev/builds/s2p - Stack: GitHub Actions, GitHub API, Node.js, Webhooks, n8n, TypeScript s2p turns a GitHub release into formatted posts across every social channel automatically. Here is how I built a release-to-social pipeline that runs itself. #### The chore I refused to keep doing Every time I shipped a release, I would copy the changelog, reformat it for each platform, swap the tone, and post it by hand in five places. It took twenty minutes, I did it inconsistently, and I usually skipped a channel or two out of laziness. s2p ("ship to posts") was me deciding that a computer should do this, every time, the same way. #### How the pipeline works The trigger is a published GitHub release. A workflow picks up the release notes, an LLM rewrites them into the right voice and length for each channel, and the result is queued for posting. The whole thing is a chain of small, boring steps, which is exactly what makes it reliable. - A GitHub webhook fires when I publish a release. - The release notes are reformatted per channel: short and punchy for one, longer and technical for another. - Each post is generated from the same source of truth, so the channels never drift apart. #### The lesson hiding in the boring parts The interesting part was not the LLM rewriting text, it was the plumbing. Reliable automation is mostly about handling the unglamorous edge cases: what happens if a channel is down, if the release has no notes, if a post fails halfway. I learned to make every step idempotent and to log loudly, so a half-finished run never silently posts the same thing twice or drops a channel without telling me. The AI part was the easy 20 percent. The trustworthy plumbing was the 80 percent that made it something I actually rely on. ### Favicon Maker: A Small Tool That Earned Its Keep - Canonical URL: https://agenticschool.dev/builds/favicon-maker - Stack: TypeScript, Canvas, Sharp, React, Vercel Favicon Maker turns a single logo into every favicon and icon size a site needs. A small, sharp tool that taught me the value of doing one thing well. #### Why a whole tool for favicons Every site needs a favicon, and not just one file: there is the classic .ico, a pile of PNG sizes for different devices, an apple-touch-icon, and a manifest entry. Doing it by hand is fiddly and easy to get subtly wrong, which is why so many sites ship a blurry icon. I wanted to drop in one clean logo and get the complete, correct set out. #### Keeping the scope brutally small The temptation with a tool like this is to keep adding: a logo editor, background removal, an icon library, accounts. I resisted all of it. Favicon Maker does exactly one job, and that constraint is the whole reason it is good. - One input: your logo. One output: every favicon and icon size, correctly named. - No accounts, no upsell, no settings nobody understands. - Because it does one thing, it is easy to trust and impossible to get confused by. #### What the favicon trick taught me about SEO Building this pushed me down a rabbit hole that turned out to matter: the favicon is the little thing that shows up next to your result in search and in browser tabs, and a crisp, recognisable one quietly improves how trustworthy your link looks. A tiny detail most people ignore is exactly the kind of edge that compounds. The tool was small, but the lesson was not: sharp, correct details at every size are part of looking professional, and looking professional is part of getting clicked. ### CodeCourier: Running Untrusted Code Without Getting Burned - Canonical URL: https://agenticschool.dev/builds/codecourier - Stack: Node.js, E2B, Docker, TypeScript, REST API CodeCourier executes AI-generated code in isolated sandboxes so a bad snippet can never touch the host. Here is what I learned about safe code execution. #### The problem with running code an AI wrote The moment you let an AI generate code and then run it, you have a security problem. The code might be fine, or it might delete files, leak secrets, or hammer the network. CodeCourier was my answer: a service that takes a snippet, runs it somewhere it can do no harm, and returns the result. #### Sandboxes are the whole point The core idea is that the code runs in a disposable, isolated environment, a sandbox, that has no access to anything I care about. If a snippet tries something nasty, the worst case is that the sandbox gets thrown away. I leaned on existing sandbox infrastructure rather than rolling my own isolation, because getting isolation subtly wrong is how you get owned. - Each run gets a fresh, throwaway environment with no host access and no real secrets. - Network and filesystem are locked down by default; you grant access deliberately, never accidentally. - Timeouts and resource limits stop a runaway snippet from costing you money or hanging forever. #### Why I did not build my own isolation My instinct was to clever my way to a homemade sandbox. I am glad I did not. Isolation is a domain where the failure mode is silent and catastrophic: you think you are safe until you are very much not. Using purpose-built sandbox tooling meant the hard, security-critical part was handled by people who specialise in it, and I got to focus on the orchestration around it. That is the same instinct as using an auth provider instead of rolling your own login: some problems are too sharp to solve from scratch. ### AutoMail: Email Automation That Sounds Like a Person - Canonical URL: https://agenticschool.dev/builds/automail - Stack: Node.js, Email API, n8n, LLM, TypeScript AutoMail drafts, sends and follows up on email automatically while keeping a human approval step. Here is how I automated email without sounding like a bot. #### Email is where automation gets personal Automating email is dangerous because the output goes straight to a human who will judge you by it. A clumsy automated email is worse than no email: it tells the recipient you did not care enough to write it. AutoMail was my attempt to automate the repetitive parts of email without losing the part that makes it feel human. #### The human-in-the-loop step that saved me My first version sent everything automatically. It was a mistake. The fix was to keep a human approval gate for anything the system was not sure about, so the automation drafts and the human nods. Over time, the categories that were reliably good got promoted to full auto, and the rest stayed under review. - The system drafts; a human approves anything outside the known-safe categories. - Templates handle the boring, identical cases; the LLM handles the ones that need nuance. - Every send is logged so I can see exactly what went out and to whom. #### What I learned about trust and tone The big lesson was that automation earns trust gradually. You do not start by handing the keys to the machine; you start with the machine drafting and you approving, and you widen the autonomy only as each category proves itself. That graduated approach is the same pattern behind every safe automation I have built since: validate first, automate the validated, and keep a human watching the edges. Tone matters too. A few human touches in the templates kept the emails from sliding into that flat, generated register that makes people stop reading. ### GlowLens: Turning Images Into Useful Signals - Canonical URL: https://agenticschool.dev/builds/glowlens - Stack: Gemini Vision, Node.js, TypeScript, REST API, Convex GlowLens uses vision models to extract structured signals from images and report how confident it is. Here is what building a vision tool taught me. #### What GlowLens does GlowLens takes images and turns them into structured signals: what is in the picture, measurable attributes, and a confidence score for each answer. It started as an experiment in how far modern vision models had come, and it turned into a reusable building block I reach for whenever a project needs to understand a picture. #### Confidence is part of the answer The most important design decision was that GlowLens never just states a result, it states a result plus how sure it is. A vision model that confidently gives you a wrong answer is dangerous; one that says "probably this, but I am only 60 percent sure" lets the system around it make a smart decision. - Every extracted attribute comes with a confidence signal, not just a value. - Low-confidence results get routed to a human instead of being trusted blindly. - The output is structured JSON, so the next step in the pipeline can branch on confidence. #### Where vision models still trip Vision is genuinely impressive now, but it is not magic. It struggles with unusual angles, poor lighting and anything it has not seen much of, and crucially it fails in ways that look confident. The lesson I took away is that the value of a vision tool is not just its accuracy on easy cases, it is how gracefully it handles the hard ones. By making confidence a first-class output and routing the uncertain cases to a human, GlowLens became something I could actually build on top of, rather than a clever demo that quietly lies to you under pressure. ### CallAssistant: A Phone Agent on Twilio and Realtime Voice - Canonical URL: https://agenticschool.dev/builds/callassistant - Stack: Twilio, OpenAI Realtime, Node.js, WebSockets, TypeScript CallAssistant answers and handles real phone calls using Twilio and a realtime voice model. Here is what building a talking phone agent actually takes. #### A phone number that answers itself CallAssistant connects a real phone number to a voice agent: someone calls, the agent picks up, understands what they want, and handles it in a natural conversation. Twilio carries the call, a realtime voice model does the listening and speaking, and my code is the glue and the brain that decides what to actually do. #### Latency is the whole experience On the web you can get away with a spinner. On a phone call, a one-second pause feels like the line went dead. The entire engineering challenge was keeping the round trip fast enough that the conversation felt alive, which meant streaming audio both ways instead of waiting for complete turns. - Audio streams in real time over a persistent connection, not in slow request-response chunks. - The agent can be interrupted mid-sentence, because real people interrupt. - Every action the agent can take is a clearly defined tool, so it never improvises something dangerous. #### What voice taught me that text did not Building a text chatbot lulls you into thinking voice is just the same thing with a microphone. It is not. Voice is unforgiving about timing, about interruptions, about the awkward silence when the model is thinking. It also raises the stakes on safety: a voice agent that takes real actions on a real call needs tight, well-defined tools and clear limits, because there is no "are you sure?" dialog on a phone call. The deepest lesson was that the medium shapes the product. The same model behaves completely differently when the interface is a live human voice instead of a chat box. ### B-Rolls Finder: Search YouTube by Conversation - Canonical URL: https://agenticschool.dev/builds/b-rolls-finder - Stack: YouTube Data API, Node.js, React, LLM, TypeScript B-Rolls Finder searches YouTube through a chat interface to surface the right b-roll fast. Here is how I made finding video footage feel like a conversation. #### The problem with finding footage When you are editing a video, finding the right b-roll clip is a slog: you type keywords into YouTube, scrub through results, refine, and repeat. B-Rolls Finder was my attempt to replace that with a conversation. You describe the mood and content you want, and it goes and finds candidates. #### Chat as the interface, the API as the engine Under the hood it is the YouTube Data API doing the searching, but the interface is a chat box. The LLM turns a loose human request like "calm aerial shots of a city at dawn" into precise queries, runs them, and presents the results conversationally so you can refine in plain language. - You describe what you want in words; the model turns that into real search queries. - Results come back as a short, scannable shortlist instead of an endless scroll. - You refine by replying, the same way you would tell a human assistant "more like the second one". #### The lesson about good interfaces The API was the easy part; the win was the interface. The same data, the same YouTube search, felt completely different when wrapped in a conversation instead of a search box. It reminded me that a lot of the value in AI products right now is not new capability, it is a better interface to capability that already exists. I also ran into the practical realities of working with a third-party API: quotas, rate limits and the need to cache, all of which I had read about in the abstract and only really understood once they bit me. Respecting someone else API is part of being a good citizen and part of not getting cut off. ### Invoice Automation: From a Photo to a Clean Database Row - Canonical URL: https://agenticschool.dev/builds/invoice-automation - Stack: Gemini Vision, Node.js, Convex, TypeScript, REST API This tool reads invoice images and writes structured data into a database, handling the messy edge cases. Here is what image-to-database really involves. #### The job: photo in, data out The goal sounds simple: take a photo or scan of an invoice and turn it into a clean database row with the vendor, the amount, the date and the line items. In a demo it works on the first try and feels like magic. In reality, invoices are where image-to-data goes to get humbled, because no two are formatted the same. #### The edge cases are the project The happy path took an afternoon. The edge cases took the rest of the time, and they are where all the real engineering lives. Faded scans, foreign currencies, totals that do not add up, two invoices on one page, handwriting in the margin. The model would read most of it correctly and then quietly get one digit wrong, which on an invoice is exactly the kind of mistake you cannot ship. - I built a stack of test invoices, including deliberately ugly ones, and ran every change against all of them. - Extracted numbers are validated against each other, so a total that does not match its line items gets flagged. - Low-confidence fields are surfaced for a human to confirm instead of being written silently. #### Test data is the unsung hero The single most valuable thing I built was not the extraction prompt, it was the collection of real, messy test invoices. Every time the tool failed on a new kind of document, that document went into the test set, and from then on I could never regress on it. That turned a fragile demo into something dependable. The broader lesson is that for image-to-data work, your test data is the product. The model is a commodity; the curated set of hard examples you validate against is the moat, and it is what lets you trust the output enough to put it near someone money. ### Swiss Trading Cards: Image to Product Specs at Scale - Canonical URL: https://agenticschool.dev/builds/swiss-trading-cards - Stack: Gemini Vision, Node.js, Convex, TypeScript, Image processing Swiss Trading Cards turns photos of cards into structured product specifications ready to list. Here is how I built a reliable image-to-product pipeline. #### From a shoebox of cards to a catalogue The idea was to take photos of trading cards and automatically produce the structured product data you need to sell them: the name, the set, the condition, the attributes that matter to a buyer. Doing this by hand for a large collection is mind-numbing and error-prone, which made it a perfect candidate for an image-to-data pipeline. #### A pipeline, not a single prompt The mistake would have been to throw the whole image at a model and ask for everything at once. Instead I broke it into stages, each doing one clear job, so I could test and fix each stage independently. A clean pipeline is far easier to debug than one giant prompt that is either right or wrong with nothing in between. - Stage one identifies the card; stage two extracts attributes; stage three normalises them into the product schema. - Each stage outputs structured data the next stage consumes, so failures are localised and visible. - Ambiguous cards are flagged for review rather than guessed, because a wrong listing is worse than a missing one. #### Normalisation is where the value is Reading a card was the flashy part. The genuinely valuable part was normalisation: making sure the same set is always spelled the same way, the same condition grades map to the same values, and the output always fits the product schema exactly. Buyers and downstream systems do not care how impressive the vision step was; they care that the data is consistent. I learned that an image-to-product pipeline lives or dies on its boring normalisation layer, and that splitting the work into small, testable stages is what made the whole thing trustworthy enough to actually list real products from. ### Language Learning App: Chaining Image to Text to Audio - Canonical URL: https://agenticschool.dev/builds/language-learning-app - Stack: Gemini Vision, Translation API, Text-to-Speech, React, Node.js This app chains vision, translation and speech so you point your camera at an object and hear its name in a new language. Here is the chained-AI lesson. #### Learning by pointing your camera The app lets you point your camera at something in the real world, an apple, a chair, a street sign, and hear and read its name in the language you are learning. It is a small idea with a surprisingly motivating effect, because it ties new words to real things in front of you instead of a flashcard. #### Three AI steps in a chain Under the hood it is a chain of three models, each feeding the next. Vision identifies the object, translation turns the word into the target language, and text-to-speech reads it aloud with a decent accent. Each step is simple on its own; the product is the chain. - Image to text: a vision model names what the camera sees. - Text to text: a translation step converts the word into the target language. - Text to audio: a speech model pronounces it, so you learn how it actually sounds. #### What chains teach you about errors The hard lesson of chaining models is that errors multiply. If each step is 90 percent reliable, three steps in a row are not 90 percent reliable, they compound, and a wrong word identification at the start poisons everything after it. So the real work was making each step fail gracefully and visibly: if vision is unsure what the object is, the app says so rather than confidently teaching you the wrong word. Building this changed how I think about multi-step AI products. The magic is in the chain, but the reliability is in how honestly each link admits when it is unsure, so a small early mistake does not silently become a confident wrong answer at the end. ### Maxify Audio: Cleaning Up Sound Without a Studio - Canonical URL: https://agenticschool.dev/builds/maxify-audio - Stack: Node.js, FFmpeg, Audio processing, TypeScript, REST API Maxify Audio enhances rough recordings into clean, listenable sound through an automated chain. Here is what building an audio enhancement tool taught me. #### Good-enough audio for people without a studio Maxify Audio takes a rough recording, the kind you get from a laptop mic in a normal room, and cleans it up into something that sounds professional enough to publish. The goal was never studio perfection; it was to close most of the gap automatically for people who do not own a studio or know audio engineering. #### A chain of processing steps Enhancement is a sequence: reduce the background noise, even out the volume, warm up the tone, and normalise the levels so it sits at a consistent loudness. Each step is a well-understood audio operation, and chaining them in the right order does most of the work. The art is in the order and the restraint. - Noise reduction first, so later steps are not amplifying hiss. - Level and tone adjustments to make speech clear and consistent. - A final normalisation pass so every output lands at a sensible, even loudness. #### The lesson about doing too much My early versions over-processed everything. I pushed the noise reduction and the enhancement so hard that voices came out sounding underwater and unnatural, which is worse than leaving them rough. The fix was restraint: tune for the realistic case, not the worst case, and accept that good-enough and natural beats aggressive and artificial. That is a lesson that generalises well beyond audio. With any automated enhancement, whether it is sound, images or text, the temptation is to crank every setting to the maximum, but the best result usually comes from a lighter touch that respects the original. Knowing when to stop turned out to be the actual skill. --- ## Resources ### Claude Code Setup Checklist - Canonical URL: https://agenticschool.dev/resources/claude-code-setup-checklist Claude Code Setup Checklist is a hands-on working document for building with AI agents in your business. It covers the purpose, when to reach for it, the concrete setup steps, the common mistakes to avoid, the security and privacy points that matter, and the next sensible action. Use it as a reusable building block inside reviews, briefings, checklists and internal playbooks, so your AI workflows become repeatable systems instead of one-off tool experiments. ### Codex Workflow Template - Canonical URL: https://agenticschool.dev/resources/codex-workflow-template Codex Workflow Template is a hands-on working document for building with AI agents in your business. It covers the purpose, when to reach for it, the concrete setup steps, the common mistakes to avoid, the security and privacy points that matter, and the next sensible action. Use it as a reusable building block inside reviews, briefings, checklists and internal playbooks, so your AI workflows become repeatable systems instead of one-off tool experiments. ### Agent Task Brief - Canonical URL: https://agenticschool.dev/resources/agent-task-brief Agent Task Brief is a hands-on working document for building with AI agents in your business. It covers the purpose, when to reach for it, the concrete setup steps, the common mistakes to avoid, the security and privacy points that matter, and the next sensible action. Use it as a reusable building block inside reviews, briefings, checklists and internal playbooks, so your AI workflows become repeatable systems instead of one-off tool experiments. ### Spec Sheet Template - Canonical URL: https://agenticschool.dev/resources/spec-sheet-template Spec Sheet Template is a hands-on working document for building with AI agents in your business. It covers the purpose, when to reach for it, the concrete setup steps, the common mistakes to avoid, the security and privacy points that matter, and the next sensible action. Use it as a reusable building block inside reviews, briefings, checklists and internal playbooks, so your AI workflows become repeatable systems instead of one-off tool experiments. ### Model Selection Cheatsheet - Canonical URL: https://agenticschool.dev/resources/model-selection-cheatsheet Model Selection Cheatsheet is a hands-on working document for building with AI agents in your business. It covers the purpose, when to reach for it, the concrete setup steps, the common mistakes to avoid, the security and privacy points that matter, and the next sensible action. Use it as a reusable building block inside reviews, briefings, checklists and internal playbooks, so your AI workflows become repeatable systems instead of one-off tool experiments. ### OpenRouter Quickstart - Canonical URL: https://agenticschool.dev/resources/openrouter-quickstart OpenRouter Quickstart is a hands-on working document for building with AI agents in your business. It covers the purpose, when to reach for it, the concrete setup steps, the common mistakes to avoid, the security and privacy points that matter, and the next sensible action. Use it as a reusable building block inside reviews, briefings, checklists and internal playbooks, so your AI workflows become repeatable systems instead of one-off tool experiments. ### CLAUDE.md Starter - Canonical URL: https://agenticschool.dev/resources/claude-md-starter CLAUDE.md Starter is a hands-on working document for building with AI agents in your business. It covers the purpose, when to reach for it, the concrete setup steps, the common mistakes to avoid, the security and privacy points that matter, and the next sensible action. Use it as a reusable building block inside reviews, briefings, checklists and internal playbooks, so your AI workflows become repeatable systems instead of one-off tool experiments. ### Secrets and .gitignore Checklist - Canonical URL: https://agenticschool.dev/resources/gitignore-secrets-checklist Secrets and .gitignore Checklist is a hands-on working document for building with AI agents in your business. It covers the purpose, when to reach for it, the concrete setup steps, the common mistakes to avoid, the security and privacy points that matter, and the next sensible action. Use it as a reusable building block inside reviews, briefings, checklists and internal playbooks, so your AI workflows become repeatable systems instead of one-off tool experiments. ### Vercel Deploy Checklist - Canonical URL: https://agenticschool.dev/resources/vercel-deploy-checklist Vercel Deploy Checklist is a hands-on working document for building with AI agents in your business. It covers the purpose, when to reach for it, the concrete setup steps, the common mistakes to avoid, the security and privacy points that matter, and the next sensible action. Use it as a reusable building block inside reviews, briefings, checklists and internal playbooks, so your AI workflows become repeatable systems instead of one-off tool experiments. ### DNS and Domain Connection Guide - Canonical URL: https://agenticschool.dev/resources/dns-domain-guide DNS and Domain Connection Guide is a hands-on working document for building with AI agents in your business. It covers the purpose, when to reach for it, the concrete setup steps, the common mistakes to avoid, the security and privacy points that matter, and the next sensible action. Use it as a reusable building block inside reviews, briefings, checklists and internal playbooks, so your AI workflows become repeatable systems instead of one-off tool experiments. ### Clerk Auth Guide - Canonical URL: https://agenticschool.dev/resources/clerk-auth-guide Clerk Auth Guide is a hands-on working document for building with AI agents in your business. It covers the purpose, when to reach for it, the concrete setup steps, the common mistakes to avoid, the security and privacy points that matter, and the next sensible action. Use it as a reusable building block inside reviews, briefings, checklists and internal playbooks, so your AI workflows become repeatable systems instead of one-off tool experiments. ### Convex Backend Guide - Canonical URL: https://agenticschool.dev/resources/convex-backend-guide Convex Backend Guide is a hands-on working document for building with AI agents in your business. It covers the purpose, when to reach for it, the concrete setup steps, the common mistakes to avoid, the security and privacy points that matter, and the next sensible action. Use it as a reusable building block inside reviews, briefings, checklists and internal playbooks, so your AI workflows become repeatable systems instead of one-off tool experiments. ### Stripe Checkout Checklist - Canonical URL: https://agenticschool.dev/resources/stripe-checkout-checklist Stripe Checkout Checklist is a hands-on working document for building with AI agents in your business. It covers the purpose, when to reach for it, the concrete setup steps, the common mistakes to avoid, the security and privacy points that matter, and the next sensible action. Use it as a reusable building block inside reviews, briefings, checklists and internal playbooks, so your AI workflows become repeatable systems instead of one-off tool experiments. ### Stripe Webhook Guide - Canonical URL: https://agenticschool.dev/resources/stripe-webhook-guide Stripe Webhook Guide is a hands-on working document for building with AI agents in your business. It covers the purpose, when to reach for it, the concrete setup steps, the common mistakes to avoid, the security and privacy points that matter, and the next sensible action. Use it as a reusable building block inside reviews, briefings, checklists and internal playbooks, so your AI workflows become repeatable systems instead of one-off tool experiments. ### n8n Self-Hosting Guide - Canonical URL: https://agenticschool.dev/resources/n8n-selfhosting-guide n8n Self-Hosting Guide is a hands-on working document for building with AI agents in your business. It covers the purpose, when to reach for it, the concrete setup steps, the common mistakes to avoid, the security and privacy points that matter, and the next sensible action. Use it as a reusable building block inside reviews, briefings, checklists and internal playbooks, so your AI workflows become repeatable systems instead of one-off tool experiments. ### Playwright Scraping Guide - Canonical URL: https://agenticschool.dev/resources/playwright-scraping-guide Playwright Scraping Guide is a hands-on working document for building with AI agents in your business. It covers the purpose, when to reach for it, the concrete setup steps, the common mistakes to avoid, the security and privacy points that matter, and the next sensible action. Use it as a reusable building block inside reviews, briefings, checklists and internal playbooks, so your AI workflows become repeatable systems instead of one-off tool experiments. ### Sandbox Execution Guide - Canonical URL: https://agenticschool.dev/resources/sandbox-execution-guide Sandbox Execution Guide is a hands-on working document for building with AI agents in your business. It covers the purpose, when to reach for it, the concrete setup steps, the common mistakes to avoid, the security and privacy points that matter, and the next sensible action. Use it as a reusable building block inside reviews, briefings, checklists and internal playbooks, so your AI workflows become repeatable systems instead of one-off tool experiments. ### API Tool Builder Template - Canonical URL: https://agenticschool.dev/resources/api-tool-builder-template API Tool Builder Template is a hands-on working document for building with AI agents in your business. It covers the purpose, when to reach for it, the concrete setup steps, the common mistakes to avoid, the security and privacy points that matter, and the next sensible action. Use it as a reusable building block inside reviews, briefings, checklists and internal playbooks, so your AI workflows become repeatable systems instead of one-off tool experiments. ### Lead Magnet and Funnel Playbook - Canonical URL: https://agenticschool.dev/resources/lead-magnet-funnel-playbook Lead Magnet and Funnel Playbook is a hands-on working document for building with AI agents in your business. It covers the purpose, when to reach for it, the concrete setup steps, the common mistakes to avoid, the security and privacy points that matter, and the next sensible action. Use it as a reusable building block inside reviews, briefings, checklists and internal playbooks, so your AI workflows become repeatable systems instead of one-off tool experiments. ### Human-in-the-Loop Checklist - Canonical URL: https://agenticschool.dev/resources/human-in-the-loop-checklist Human-in-the-Loop Checklist is a hands-on working document for building with AI agents in your business. It covers the purpose, when to reach for it, the concrete setup steps, the common mistakes to avoid, the security and privacy points that matter, and the next sensible action. Use it as a reusable building block inside reviews, briefings, checklists and internal playbooks, so your AI workflows become repeatable systems instead of one-off tool experiments. ### Test Stack Checklist - Canonical URL: https://agenticschool.dev/resources/test-stack-checklist Test Stack Checklist is a hands-on working document for building with AI agents in your business. It covers the purpose, when to reach for it, the concrete setup steps, the common mistakes to avoid, the security and privacy points that matter, and the next sensible action. Use it as a reusable building block inside reviews, briefings, checklists and internal playbooks, so your AI workflows become repeatable systems instead of one-off tool experiments. ### Security Hardening Checklist - Canonical URL: https://agenticschool.dev/resources/security-hardening-checklist Security Hardening Checklist is a hands-on working document for building with AI agents in your business. It covers the purpose, when to reach for it, the concrete setup steps, the common mistakes to avoid, the security and privacy points that matter, and the next sensible action. Use it as a reusable building block inside reviews, briefings, checklists and internal playbooks, so your AI workflows become repeatable systems instead of one-off tool experiments. ### GDPR Compliance Checklist - Canonical URL: https://agenticschool.dev/resources/gdpr-compliance-checklist GDPR Compliance Checklist is a hands-on working document for building with AI agents in your business. It covers the purpose, when to reach for it, the concrete setup steps, the common mistakes to avoid, the security and privacy points that matter, and the next sensible action. Use it as a reusable building block inside reviews, briefings, checklists and internal playbooks, so your AI workflows become repeatable systems instead of one-off tool experiments. ### SEO and GEO/AEO Checklist - Canonical URL: https://agenticschool.dev/resources/seo-geo-checklist SEO and GEO/AEO Checklist is a hands-on working document for building with AI agents in your business. It covers the purpose, when to reach for it, the concrete setup steps, the common mistakes to avoid, the security and privacy points that matter, and the next sensible action. Use it as a reusable building block inside reviews, briefings, checklists and internal playbooks, so your AI workflows become repeatable systems instead of one-off tool experiments. ### Agent-First API Blueprint - Canonical URL: https://agenticschool.dev/resources/agent-first-api-blueprint Agent-First API Blueprint is a hands-on working document for building with AI agents in your business. It covers the purpose, when to reach for it, the concrete setup steps, the common mistakes to avoid, the security and privacy points that matter, and the next sensible action. Use it as a reusable building block inside reviews, briefings, checklists and internal playbooks, so your AI workflows become repeatable systems instead of one-off tool experiments. --- ## Changelog ### Claude Code Workflow Review, 12 June 2026 - Canonical URL: https://agenticschool.dev/changelog/claude-code-workflow-review-2026-06-12 Project rules, tests and review prompts should be updated regularly. For teams this means agent tasks need clear acceptance criteria, small diffs and a review loop, so speed does not come at the cost of maintainability, security and traceability. Treat CLAUDE.md as a living document and promote every recurring correction into a permanent rule. ### Codex Business Workflows, 12 June 2026 - Canonical URL: https://agenticschool.dev/changelog/codex-business-workflows-2026-06-12 Issue templates and acceptance criteria matter more than ever. The more clearly a business problem is translated into expected outputs, tests and boundaries, the more reliably Codex can fold small changes, code reviews and documentation tasks into productive workflows. Keep an AGENTS.md so Codex follows your conventions automatically. ### Convex and Clerk Template, 12 June 2026 - Canonical URL: https://agenticschool.dev/changelog/convex-clerk-template-2026-06-12 Roles belong in Convex, Clerk supplies the identity. This keeps admin and moderation rights verifiable on the server, while login, session and profile management run cleanly through Clerk. The separation reduces risk for later team and membership features, and it scales as your product grows. ### Model Pricing and Tiers, 12 June 2026 - Canonical URL: https://agenticschool.dev/changelog/model-pricing-shift-2026-06-12 Matching task difficulty to model tier remains the clearest lever on both your AI bill and your output quality. Routing the easy 80 percent of calls to a small model and reserving a large model for genuinely hard reasoning can cut costs by an order of magnitude while improving reliability. Keep switching cheap with OpenRouter so you can always move to a better fit. ### Vercel Speed Insights, 12 June 2026 - Canonical URL: https://agenticschool.dev/changelog/vercel-speed-insights-2026-06-12 Heavy embeds and admin bundles must stay separate from public pages. Core Web Vitals stay stable for SEO-heavy lessons only when third-party media loads lazily, admin code never lands in public routes, and layout boxes have fixed dimensions. Treat performance as part of shipping, not an afterthought.