What you learn

The three model tiers (small, mid, large) and how Haiku, Sonnet, Opus, GPT and Gemini map onto them
How to read benchmarks and pricing honestly instead of chasing leaderboards
Where to get strong models cheaply or free: OpenRouter, free Gemini credits, and why huge context models disappoint

Summary

Every model provider ships a family of models, not one model. They come in tiers: small and fast, mid and balanced, large and smart. Once you see the tiers instead of the brand names, choosing becomes simple. This lesson gives you a decision rule you can apply to any new model that launches, plus the practical tricks to get top models cheaply or for free.

What you will learn

You will learn the three-tier mental model, how Claude (Haiku, Sonnet, Opus), OpenAI GPT and Google Gemini line up inside it, how to treat benchmarks with healthy suspicion, and how to access strong models without paying full sticker price using OpenRouter and free Gemini credits.

Prerequisites

The previous lesson on tokens and context. You need to be comfortable with the idea that price is quoted per million tokens and that bigger context is not automatically better, because both ideas drive model choice.

Fundamentals

Fundamental

What Are Tokens in AI?

Tokens are the chunks of text AI models read and are billed in. Learn what a token is, why it matters for cost, and how it differs from a password token.

Fundamental

What Is an API? A Plain-Language Guide

An API is a way for two programs to talk to each other. Learn what an API is, how it works, and why it matters for building with AI.

Fundamental

What Is a .env File?

A .env file stores secrets like API keys outside your code so they never get published. Learn what it is, how it works and how to keep it safe.

The problem

Beginners either default to the single model they have heard of, or they chase whatever topped a leaderboard last week. Both are expensive mistakes. Using a flagship model to reformat a list is like hiring a surgeon to apply a plaster. Using a tiny model for hard architectural reasoning produces confident nonsense. The skill is matching task difficulty to model tier, and that skill outlives any individual model name.

The three tiers

Forget brand loyalty and think in tiers. Small models are fast and cheap, great for classification, extraction, simple rewrites and high-volume jobs. Mid models are the balanced workhorse for most real coding and writing. Large models are slower and pricier but reason far better on genuinely hard problems. Almost every provider mirrors this structure, so once you internalise it you can place any new model instantly.

Small / fast: Claude Haiku, GPT small tier, Gemini Flash. Use for volume, extraction, routing, cheap drafts.
Mid / balanced: Claude Sonnet, GPT mid tier, Gemini Pro. Your daily driver for coding and serious writing.
Large / smart: Claude Opus, GPT large reasoning tier, Gemini Ultra/Pro top tier. Use for hard reasoning, tricky debugging, architecture.
Rule of thumb: start one tier lower than you think, and only move up if the output is genuinely not good enough.

How to read benchmarks honestly

Benchmarks are useful and also routinely misleading. A model can top a coding benchmark and still feel worse in your actual project, because benchmarks measure narrow tasks under ideal conditions and providers optimise hard for them. Treat benchmarks as a rough filter, not a verdict. The only benchmark that matters is your own: take three real tasks from your work, run them through two or three models, and judge the output yourself. Pay attention to consistency, not just peak performance, because a model you ship with needs to be reliably good, not occasionally brilliant.

Pricing and the real cost picture

Price is quoted per million input and output tokens, and the spread between tiers is large - often 10x or more between a small and a large model. Because output costs several times more than input, verbose models and chatty prompts cost more than you expect. The practical move is to route by difficulty: cheap model for the 80 percent of easy calls, expensive model only for the hard 20 percent. On a workflow at scale this single decision often matters more than which provider you chose.

Getting strong models cheaply or free

You do not have to pay full price to start. There are three reliable routes in 2026, and a beginner should know all of them before committing budget.

OpenRouter: a single account and API key that gives you access to almost every model (Claude, GPT, Gemini, open models) through one endpoint, with transparent per-token pricing and easy model switching. Ideal for comparing models without juggling five accounts.
Free Gemini credits: Google routinely offers a generous free tier and credits through its AI Studio, which is a genuinely strong, low-cost way to get a capable mid model for experiments and low-volume tools.
Provider free tiers and trials: most providers give you some free usage to evaluate. Use it deliberately to run your own three-task benchmark.

Why 1M-context models disappoint

You will see models advertising 1,000,000 token context windows and assume they are strictly better. In practice they often disappoint, for the exact reason from the previous lesson: the performance cliff. A model can technically accept a million tokens and still answer worse than a focused prompt because quality degrades as the window fills. Treat a huge context window as occasional insurance for a genuinely large document, not as permission to stop curating context. Most days, a mid model with a tight prompt beats a giant-context model with a sloppy one.

Step by step: choose a model for a real task

Make this concrete with one task from your own work. The goal is to practise the decision rule, not to find a permanent answer.

Write down the task and rate its difficulty: simple, normal, or genuinely hard.
Pick the matching tier: small, mid, or large.
Run it through two models in that tier (use OpenRouter to switch quickly).
Judge the output yourself on quality and consistency, then note which you would actually ship and why.

Typical mistakes

The big ones: always reaching for the flagship and burning money on easy tasks; trusting a leaderboard over your own three-task test; assuming a bigger context window means a smarter model; and locking into one provider so you never notice when a competitor ships something better for your use case. OpenRouter exists precisely so switching costs stay low.

Business ROI

Model selection is one of the clearest levers on your AI bill and your output quality. Routing easy work to a small model and reserving the large model for hard reasoning can cut costs by an order of magnitude while improving reliability, because each task gets the right tool. For a founder, the discipline of "match tier to difficulty, benchmark on your own tasks, keep switching cheap" is worth more than any single model choice.

Checklist

You are ready to move on when you can confidently do the following without second-guessing the brand names.

Place any new model into small, mid or large from its spec and price.
Explain why you would not use a flagship for a simple extraction job.
Run a three-task personal benchmark instead of trusting a leaderboard.
Name where to get a strong model cheaply or free (OpenRouter, Gemini credits).

Resources

Set up an OpenRouter account now so model switching is friction-free for the rest of the course, and grab free Gemini credits from Google AI Studio for low-cost experiments. You will use both repeatedly. The next lesson moves from the model to the tool that wraps it.

Your task

Create an OpenRouter account, then run one real task from your work through a small, a mid and a large model. Write a two-line note on which tier was actually good enough. That note is your first piece of real, personal benchmark data and it will guide your model choices for months.

Next lesson

A model on its own just talks. To make it do work - read files, run commands, edit code - you wrap it in a harness. The next lesson explains what a harness is and compares Claude Code, Codex, Pi and OpenCode so you know which tool to reach for.

Comments

Loading comments.

Choosing Your Model: Haiku vs Sonnet vs Opus, GPT, Gemini and Benchmarks

What you learn

Summary

What you will learn

Prerequisites

The problem

The three tiers

How to read benchmarks honestly

Pricing and the real cost picture

Getting strong models cheaply or free

Why 1M-context models disappoint

Step by step: choose a model for a real task

Typical mistakes

Business ROI

Checklist

Resources

Your task

Next lesson

Comments

Ready to put AI to work as a real workflow?

What you learn

Summary

What you will learn

Prerequisites

The problem

The three tiers

How to read benchmarks honestly

Pricing and the real cost picture

Getting strong models cheaply or free

Why 1M-context models disappoint

Step by step: choose a model for a real task

Typical mistakes

Business ROI

Checklist

Resources

Your task

Next lesson

Comments

Ready to put AI to work as a real workflow?

Better AI workflows, once a week.