Stack
A phone number that answers itself
CallAssistant connects a real phone number to a voice agent: someone calls, the agent picks up, understands what they want, and handles it in a natural conversation. Twilio carries the call, a realtime voice model does the listening and speaking, and my code is the glue and the brain that decides what to actually do.
Latency is the whole experience
On the web you can get away with a spinner. On a phone call, a one-second pause feels like the line went dead. The entire engineering challenge was keeping the round trip fast enough that the conversation felt alive, which meant streaming audio both ways instead of waiting for complete turns.
- Audio streams in real time over a persistent connection, not in slow request-response chunks.
- The agent can be interrupted mid-sentence, because real people interrupt.
- Every action the agent can take is a clearly defined tool, so it never improvises something dangerous.
What voice taught me that text did not
Building a text chatbot lulls you into thinking voice is just the same thing with a microphone. It is not. Voice is unforgiving about timing, about interruptions, about the awkward silence when the model is thinking. It also raises the stakes on safety: a voice agent that takes real actions on a real call needs tight, well-defined tools and clear limits, because there is no "are you sure?" dialog on a phone call. The deepest lesson was that the medium shapes the product. The same model behaves completely differently when the interface is a live human voice instead of a chat box.
Lessons learned
- On voice, latency is the product. Stream audio both ways or the conversation feels broken.
- Design for interruption. Real callers talk over you, and your agent has to handle it.
- Give a voice agent tightly defined tools and limits. There is no confirmation dialog on a phone call.
- The medium reshapes the product. A model that works in chat needs rethinking the moment it has a voice.
