---
title: "Building Your Own AI Tools with APIs"
description: "Build custom AI tools on top of model APIs, including image-to-structured-data workflows, instead of buying SaaS"
type: "lesson"
locale: "en"
course: "Automation and Agentic Systems"
number: "4.4"
canonical: "https://agenticschool.dev/courses/automation-agentic-systems/building-your-own-ai-tools-with-apis"
datePublished: "2026-06-12"
dateModified: "2026-06-12"
---

# Building Your Own AI Tools with APIs

- Course: Automation and Agentic Systems
- Lesson: 4.4
- Duration: 28 min
- Level: fortgeschritten
- Status: published
- Canonical URL: https://agenticschool.dev/courses/automation-agentic-systems/building-your-own-ai-tools-with-apis
- Locale: en

> Build custom AI tools on top of model APIs, including image-to-structured-data workflows, instead of buying SaaS

## Summary

You do not have to wait for someone to build the tool you need. With model APIs you build your own, often in an afternoon. This lesson shows how to call Gemini and Claude directly, get structured JSON back, and use vision to turn a photo into a database record - illustrated by two real founder tools: invoice assignment and a Swiss trading-cards cataloguer.

## What you learn

- Calling Gemini and Claude APIs directly, including the free Gemini tier
- Forcing structured JSON output so a model response becomes a database record
- Two founder case studies: invoice assignment and Swiss trading cards, photo in, record out

## Summary

The biggest shift in building software is this: when the tool you need does not exist, you build it the same afternoon. Model APIs let you call an LLM from your own code, with your own prompt, and - crucially - get structured data back instead of a wall of prose. Once a model can reliably turn an image or a document into a clean JSON record, a whole class of manual data-entry work disappears. This lesson teaches the direct API call, the structured-output trick that makes the response usable, and two real tools the founder of this school built from exactly these pieces.

## What you will learn

You will learn to call Gemini and Claude directly with a minimal fetch request, to force the model to return JSON matching a schema so the output drops straight into a database, to use vision so a photo becomes structured data, and to recognise when building a small internal tool beats paying for SaaS. The two founder case studies make it concrete: photo of an invoice in, assigned record out; photo of a trading card in, catalogued record out.

## Prerequisites

Courses 1 to 3. You need the model-selection lesson from Course 1 (tool cost depends entirely on which model you call), the secrets discipline from Course 3 (the API key never touches client code), and a database to write into - Convex from Course 3 is perfect. The Fundamentals page on what an API is covers the request basics if you need them.

## The problem

Businesses pay monthly for SaaS tools that do one narrow thing - read receipts, tag images, extract fields from PDFs - and still do not quite fit their workflow. Meanwhile the same job is a single API call away. The blocker has never been capability; it is that people do not realise how little code stands between "I have a photo of an invoice" and "the invoice is in my accounting system, assigned to the right project". This lesson removes that blocker by showing the whole path end to end.

## APIs as building blocks

Calling a model API directly gives you total control: your prompt, your model, your output format, no UI in the way. It is also less code than people expect. A request is a POST with your API key in a header and a JSON body describing what you want. Here is a minimal call to Gemini and the same idea against Claude, so you can see both. Google offers a genuinely generous free Gemini tier through AI Studio, which makes it the natural place to prototype vision tools without spending anything.

```typescript
// Minimal Gemini call. The key lives in an env var, never in client code.
const res = await fetch(
  'https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent',
  {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'x-goog-api-key': process.env.GEMINI_API_KEY!,
    },
    body: JSON.stringify({
      contents: [{ parts: [{ text: 'Summarise this in one sentence: ...' }] }],
    }),
  },
)
const data = await res.json()
console.log(data.candidates[0].content.parts[0].text)
```
A minimal Gemini API call. Model names and exact paths change - confirm against the current Google AI docs.

Claude works the same way: a POST to the Anthropic messages endpoint with your key in an x-api-key header and a messages array in the body. The provider differs, the shape is the same. Pick the model with the model-selection rule from Course 1 - a fast, cheap model for high-volume extraction, a stronger one only when the reasoning is genuinely hard.

## Structured output: the trick that makes it useful

A model that replies in prose is not a tool - you cannot put a paragraph into a database column. The trick is to demand structured output: give the model a JSON schema and require it to return data matching that schema exactly. Modern APIs support this directly (a response schema or structured-output mode), and the result is a guaranteed-shape object you can validate and insert. This is what turns "the model said something about the invoice" into "the invoice row has supplier, amount, currency, date and project_id". Always validate the returned JSON against your schema (Zod from your stack is ideal) before trusting it, because a model can still occasionally drift.

```typescript
import { z } from 'zod'

// The exact shape you want back - this IS your database record.
const InvoiceSchema = z.object({
  supplier: z.string(),
  invoiceNumber: z.string(),
  amount: z.number(),
  currency: z.string(),
  issueDate: z.string(), // ISO date
  projectId: z.string().nullable(),
})

// Tell the model to return ONLY JSON matching this schema, then validate.
const parsed = InvoiceSchema.parse(JSON.parse(modelJsonString))
// parsed is now a typed, validated record ready to insert. No prose.
```
Define the record shape with Zod, instruct the model to return matching JSON, and validate before insert. Validation catches the rare drift.

## Vision: a photo in, a record out

The same API accepts images, not just text. Vision models like Gemini read a picture and, combined with the structured-output trick, return a clean record describing what they see. You send the image bytes alongside your instruction and schema, and you get structured data back. This is the move that automates physical-world data entry: point a phone at a document or an object, and a database row appears. The model does the reading; your schema does the structuring; your code does the inserting. Three steps, and a task that used to be a person typing for hours becomes a photo and a webhook.

## Founder case study: invoice assignment

Here is a real one we built. A business was drowning in supplier invoices that each had to be read, have their fields extracted, and - the tedious part - be assigned to the correct internal project before going into the accounting system. We built a small tool: drop an invoice photo or PDF in, a vision model extracts supplier, number, amount, currency and date into the exact schema above, and a second step matches it to the right project using the line items and supplier history. A human still approves edge cases (more on that in the human-in-the-loop lesson), but the reading and assigning that used to eat hours a week now happens in seconds. No SaaS subscription, no per-document fee, total control over the logic, and it fits the business exactly because the business defined the schema.

## Founder case study: Swiss trading cards

The second tool is more fun and makes the same point. We had a large collection of Swiss trading cards to catalogue - each needs its player or subject, set, year and condition recorded, which by hand is mind-numbing. The tool is almost embarrassingly simple: photograph a card, a vision model returns a structured record (name, set, year, estimated condition) matching a schema, and it lands in a database with the image attached. What would have been days of manual entry became an afternoon of taking photos. The lesson is not about trading cards; it is that "image to structured database record" is a universal pattern. Invoices, cards, inventory, business cards, receipts, equipment serial plates - the same three steps apply to all of them.

## Build, do not buy

Both case studies replaced a SaaS purchase with an internal tool, and that is the strategic point. When a job is narrow and specific to your business, a small API-backed tool you own usually beats a generic product you rent. You get an exact fit, no per-seat or per-document fees, full control of the data, and the ability to change the logic the moment your process changes. This is not "build everything" - use great SaaS for commodity needs. It is "for the narrow, repetitive, business-specific data jobs, a fifty-line tool on a model API often wins".

- Build when the job is narrow, specific to your business, and high-volume enough that per-unit SaaS fees add up.
- Buy when the need is generic, the SaaS fit is good, and you would be reinventing a mature product.
- Own your data and schema. A tool you built bends to your process; a tool you rent makes your process bend to it.

## Typical mistakes

The common ones: putting the API key in client-side code where anyone can steal it (it belongs in a server env var, always); asking for prose and then trying to parse it with fragile string matching instead of demanding schema-validated JSON; skipping validation and inserting a malformed record into your database; using an expensive flagship model for simple high-volume extraction when a cheap fast model is plenty; and buying SaaS for a job a fifty-line internal tool would do better and cheaper.

## Business ROI

This is the lesson where AI stops being a chat toy and starts replacing line items on your invoice and hours on your calendar. An image-to-record tool can eliminate a part-time data-entry role, and because you own it the marginal cost per document is fractions of a cent of model usage, not a SaaS subscription. The founder tools above each took an afternoon to build and saved recurring hours every week. For a small business, the ability to build the exact tool you need on demand is a structural advantage competitors who only buy SaaS cannot match.

## Checklist

You are ready to move on when each of these is true, because the next lessons build funnels and feedback loops on top of tools like these.

- Make a minimal API call to Gemini or Claude with the key safely in an env var.
- Force structured JSON output and validate it with a schema before use.
- Turn a photo into a structured database record with a vision model.
- Decide, for a real job, whether to build an internal tool or buy SaaS.

## Resources

Grab free Gemini credits from Google AI Studio to prototype vision tools at no cost, and keep the Anthropic and Google AI docs handy because model names and the structured-output API surface change. Zod from your existing stack is your validation layer. The /builds case studies for the invoice automation and Swiss trading cards tools go deeper on each if you want the full story.

## Your task

Pick one repetitive data-entry task in your work that starts with an image or document. Build a tiny tool: take the image, send it to Gemini with a Zod schema, validate the JSON, and log the record. You do not need a UI - a script that prints the structured record is proof. Note how long it took versus what the manual task costs you each week.

## Next lesson

Tools and automations need people to find them. The next lesson covers the marketing plumbing: lead magnets, capture forms, funnels and the double opt-in email rules you must follow in the EU and Switzerland.

## Transcript

This lesson is a written, text-first guide. You do not have to wait for someone to build the tool you need. With model APIs you build your own, often in an afternoon. This lesson shows how to call Gemini and Claude directly, get structured JSON back, and use vision to turn a photo into a database record - illustrated by two real founder tools: invoice assignment and a Swiss trading-cards cataloguer. You will build custom ai tools on top of model apis, including image-to-structured-data workflows, instead of buying saas. Work through the sections in order, try the task at the end in a real project, and move on once it works for you. There is no video required - everything you need is in the written steps above.
