Stack
The problem with finding footage
When you are editing a video, finding the right b-roll clip is a slog: you type keywords into YouTube, scrub through results, refine, and repeat. B-Rolls Finder was my attempt to replace that with a conversation. You describe the mood and content you want, and it goes and finds candidates.
Chat as the interface, the API as the engine
Under the hood it is the YouTube Data API doing the searching, but the interface is a chat box. The LLM turns a loose human request like "calm aerial shots of a city at dawn" into precise queries, runs them, and presents the results conversationally so you can refine in plain language.
- You describe what you want in words; the model turns that into real search queries.
- Results come back as a short, scannable shortlist instead of an endless scroll.
- You refine by replying, the same way you would tell a human assistant "more like the second one".
The lesson about good interfaces
The API was the easy part; the win was the interface. The same data, the same YouTube search, felt completely different when wrapped in a conversation instead of a search box. It reminded me that a lot of the value in AI products right now is not new capability, it is a better interface to capability that already exists. I also ran into the practical realities of working with a third-party API: quotas, rate limits and the need to cache, all of which I had read about in the abstract and only really understood once they bit me. Respecting someone else API is part of being a good citizen and part of not getting cut off.
Lessons learned
- A conversational interface can transform a tool without changing the underlying data at all.
- A lot of AI product value is a better interface to existing capability, not brand new capability.
- Third-party APIs have quotas and rate limits. Cache results and respect them or you get cut off.
- Let the model translate loose human intent into precise queries. That translation is the real feature.
