What you learn
- The three model tiers (small, mid, large) and how Haiku, Sonnet, Opus, GPT and Gemini map onto them
- How to read benchmarks and pricing honestly instead of chasing leaderboards
- Where to get strong models cheaply or free: OpenRouter, free Gemini credits, and why huge context models disappoint
Summary
Every model provider ships a family of models, not one model. They come in tiers: small and fast, mid and balanced, large and smart. Once you see the tiers instead of the brand names, choosing becomes simple. This lesson gives you a decision rule you can apply to any new model that launches, plus the practical tricks to get top models cheaply or for free.
What you will learn
You will learn the three-tier mental model, how Claude (Haiku, Sonnet, Opus), OpenAI GPT and Google Gemini line up inside it, how to treat benchmarks with healthy suspicion, and how to access strong models without paying full sticker price using OpenRouter and free Gemini credits.
Prerequisites
The previous lesson on tokens and context. You need to be comfortable with the idea that price is quoted per million tokens and that bigger context is not automatically better, because both ideas drive model choice.
Tokens are the chunks of text AI models read and are billed in. Learn what a token is, why it matters for cost, and how it differs from a password token.
An API is a way for two programs to talk to each other. Learn what an API is, how it works, and why it matters for building with AI.
A .env file stores secrets like API keys outside your code so they never get published. Learn what it is, how it works and how to keep it safe.
The problem
Beginners either default to the single model they have heard of, or they chase whatever topped a leaderboard last week. Both are expensive mistakes. Using a flagship model to reformat a list is like hiring a surgeon to apply a plaster. Using a tiny model for hard architectural reasoning produces confident nonsense. The skill is matching task difficulty to model tier, and that skill outlives any individual model name.
The three tiers
Forget brand loyalty and think in tiers. Small models are fast and cheap, great for classification, extraction, simple rewrites and high-volume jobs. Mid models are the balanced workhorse for most real coding and writing. Large models are slower and pricier but reason far better on genuinely hard problems. Almost every provider mirrors this structure, so once you internalise it you can place any new model instantly.
- Small / fast: Claude Haiku, GPT small tier, Gemini Flash. Use for volume, extraction, routing, cheap drafts.
- Mid / balanced: Claude Sonnet, GPT mid tier, Gemini Pro. Your daily driver for coding and serious writing.
- Large / smart: Claude Opus, GPT large reasoning tier, Gemini Ultra/Pro top tier. Use for hard reasoning, tricky debugging, architecture.
- Rule of thumb: start one tier lower than you think, and only move up if the output is genuinely not good enough.
How to read benchmarks honestly
Benchmarks are useful and also routinely misleading. A model can top a coding benchmark and still feel worse in your actual project, because benchmarks measure narrow tasks under ideal conditions and providers optimise hard for them. Treat benchmarks as a rough filter, not a verdict. The only benchmark that matters is your own: take three real tasks from your work, run them through two or three models, and judge the output yourself. Pay attention to consistency, not just peak performance, because a model you ship with needs to be reliably good, not occasionally brilliant.
Pricing and the real cost picture
Price is quoted per million input and output tokens, and the spread between tiers is large - often 10x or more between a small and a large model. Because output costs several times more than input, verbose models and chatty prompts cost more than you expect. The practical move is to route by difficulty: cheap model for the 80 percent of easy calls, expensive model only for the hard 20 percent. On a workflow at scale this single decision often matters more than which provider you chose.
Getting strong models cheaply or free
You do not have to pay full price to start. There are three reliable routes in 2026, and a beginner should know all of them before committing budget.
- OpenRouter: a single account and API key that gives you access to almost every model (Claude, GPT, Gemini, open models) through one endpoint, with transparent per-token pricing and easy model switching. Ideal for comparing models without juggling five accounts.
- Free Gemini credits: Google routinely offers a generous free tier and credits through its AI Studio, which is a genuinely strong, low-cost way to get a capable mid model for experiments and low-volume tools.
- Provider free tiers and trials: most providers give you some free usage to evaluate. Use it deliberately to run your own three-task benchmark.
Why 1M-context models disappoint
You will see models advertising 1,000,000 token context windows and assume they are strictly better. In practice they often disappoint, for the exact reason from the previous lesson: the performance cliff. A model can technically accept a million tokens and still answer worse than a focused prompt because quality degrades as the window fills. Treat a huge context window as occasional insurance for a genuinely large document, not as permission to stop curating context. Most days, a mid model with a tight prompt beats a giant-context model with a sloppy one.
Step by step: choose a model for a real task
Make this concrete with one task from your own work. The goal is to practise the decision rule, not to find a permanent answer.
- Write down the task and rate its difficulty: simple, normal, or genuinely hard.
- Pick the matching tier: small, mid, or large.
- Run it through two models in that tier (use OpenRouter to switch quickly).
- Judge the output yourself on quality and consistency, then note which you would actually ship and why.
Typical mistakes
The big ones: always reaching for the flagship and burning money on easy tasks; trusting a leaderboard over your own three-task test; assuming a bigger context window means a smarter model; and locking into one provider so you never notice when a competitor ships something better for your use case. OpenRouter exists precisely so switching costs stay low.
Business ROI
Model selection is one of the clearest levers on your AI bill and your output quality. Routing easy work to a small model and reserving the large model for hard reasoning can cut costs by an order of magnitude while improving reliability, because each task gets the right tool. For a founder, the discipline of "match tier to difficulty, benchmark on your own tasks, keep switching cheap" is worth more than any single model choice.
Checklist
You are ready to move on when you can confidently do the following without second-guessing the brand names.
- Place any new model into small, mid or large from its spec and price.
- Explain why you would not use a flagship for a simple extraction job.
- Run a three-task personal benchmark instead of trusting a leaderboard.
- Name where to get a strong model cheaply or free (OpenRouter, Gemini credits).
Resources
Set up an OpenRouter account now so model switching is friction-free for the rest of the course, and grab free Gemini credits from Google AI Studio for low-cost experiments. You will use both repeatedly. The next lesson moves from the model to the tool that wraps it.
Your task
Create an OpenRouter account, then run one real task from your work through a small, a mid and a large model. Write a two-line note on which tier was actually good enough. That note is your first piece of real, personal benchmark data and it will guide your model choices for months.
Next lesson
A model on its own just talks. To make it do work - read files, run commands, edit code - you wrap it in a harness. The next lesson explains what a harness is and compares Claude Code, Codex, Pi and OpenCode so you know which tool to reach for.

Comments
Loading comments.
Post a comment