RealChar is developing an AI assistant that can handle customer service phone calls on behalf of users, addressing the frustration of long wait times and tedious interactions. The system uses a complex architecture combining traditional ML and generative AI, running multiple models in parallel through an event bus system, with fallback mechanisms for reliability. The solution draws inspiration from self-driving car systems, implementing real-time processing of multiple input streams and maintaining millisecond-level observability.
RealChar is a startup building Revia, a consumer-facing AI assistant that makes phone calls on behalf of users to handle customer service interactions. The company was founded by Sean, who brings extensive experience from Google Assistant (specifically the Google Duplex project) and the self-driving car industry at Waymo. The podcast interview reveals deep technical insights into how they’re applying robotics and autonomous vehicle engineering principles to build reliable AI agents for real-time phone conversations.
The core value proposition is compelling: consumers are increasingly forced to deal with corporate AI systems designed to make it difficult to complete simple tasks like canceling subscriptions or getting support. RealChar flips this dynamic by giving consumers their own AI to fight back against these systems. As the host puts it, “if I’m going to have to deal with these shitty chat bots then I might as well have my own chat bot that can deal with them.”
The most distinctive aspect of RealChar’s approach is applying self-driving car engineering principles to LLM-powered applications. Sean explains that everything in their system runs in parallel, similar to how robotics systems operate on clock cycles.
The system uses an event bus pattern for communication between components. Every event occurring in millisecond intervals gets published to a unified event bus. Downstream systems subscribe to events they can handle, process them, and publish responses back. This creates a decoupled architecture where:
This is particularly important for voice applications where even half-second delays are immediately noticeable to users, unlike text applications where latency is more forgiving.
Phone calls are more complex than pure audio processing. The system must handle:
Sean draws a parallel to self-driving cars, which are also inherently multimodal systems combining LiDAR, camera data, and audio (for detecting sirens). The perception models must understand any type of data format on the fly.
Real-time voice conversation presents severe latency constraints that many LLM applications don’t face. Sean shares specific benchmarks from their internal testing:
These latencies are “way too slow for real-time conversation.” Unlike text applications where users tolerate delays, audio applications expose any latency immediately. Half a second of lag creates an obviously broken user experience.
To handle the unpredictability of LLM response times, RealChar implements a sophisticated fallback mechanism inspired by self-driving car safety systems:
This isn’t a simple gateway routing pattern. The combination of task type, identified intent, and real-time performance metrics all influence routing decisions. Some requests may be held, some forwarded immediately, and some trigger alternative actions like button presses.
The system uses both generative AI and traditional ML models working in parallel:
The goal is to have sufficient signals and context before engaging the generative AI models, reducing the chance of irrelevant or incorrect responses.
A key lesson from self-driving cars is the importance of virtual simulation environments for testing. RealChar has built a controlled testing environment that serves multiple purposes:
Sean uses a self-driving car analogy: “What is the first thing you’re trying to do? You’re trying to ask a self-driving car to start, drive in a straight line, then stop.” Similarly, they started with simple tasks like making phone calls, saying hello, and hanging up, then progressively building capability.
The system provides real-time observability with the ability for humans to take over control, similar to autopilot systems:
This requires low-level WebSocket handling for audio processing, which they’ve built in-house. The audio processing component is particularly challenging because streaming audio has much stricter requirements than text streaming.
Sean is refreshingly candid about the current state of the technology and the challenges of productionizing LLM-based agents:
The non-deterministic nature of generative AI creates ongoing challenges:
Sean directly addresses a common pattern: “It’s very easy to create a really appealing demo with your AI agent that works only 1% of the time.” Making something work 90% of the time requires 90% of the engineering effort. This honesty about the gap between impressive demos and reliable production systems is notable.
Using the self-driving analogy, Sean estimates they can “drive straight for 15 miles” - meaning basic functionality works, but complex scenarios like highway driving or detours (complex phone call routing, unexpected questions) aren’t fully solved yet.
The subscription pricing model (flat fee for unlimited phone calls) reflects a deliberate decision about customer experience. Sean explicitly considered the tension between customer engagement and API costs, ultimately prioritizing “ease of mind” over usage-based pricing that might discourage engagement.
This reveals an interesting LLMOps consideration: how token economics affect product design. RealChar manages this by:
The case study offers several valuable lessons:
The product represents an ambitious attempt to bring self-driving car engineering rigor to consumer AI applications, acknowledging that reliable AI agents are still years away from handling all scenarios perfectly.
Snorkel developed a specialized benchmark dataset for evaluating AI agents in insurance underwriting, leveraging their expert network of Chartered Property and Casualty Underwriters (CPCUs). The benchmark simulates an AI copilot that assists junior underwriters by reasoning over proprietary knowledge, using multiple tools including databases and underwriting guidelines, and engaging in multi-turn conversations. The evaluation revealed significant performance variations across frontier models (single digits to ~80% accuracy), with notable error modes including tool use failures (36% of conversations) and hallucinations from pretrained domain knowledge, particularly from OpenAI models which hallucinated non-existent insurance products 15-45% of the time.
Thoughtly, a voice AI platform founded in late 2023, provides conversational AI agents for enterprise sales and customer support operations. The company orchestrates speech-to-text, large language models, and text-to-speech systems to handle millions of voice calls with sub-second latency requirements. By optimizing every layer of their stack—from telephony providers to LLM inference—and implementing sophisticated caching, conditional navigation, and evaluation frameworks, Thoughtly delivers 3x conversion rates over traditional methods and 15x ROI for customers. The platform serves enterprises with HIPAA and SOC 2 compliance while handling both inbound customer support and outbound lead activation at massive scale across multiple languages and regions.
Sierra, an AI agent platform company, discusses their comprehensive approach to deploying LLMs in production for customer service automation across voice and chat channels. The company addresses fundamental challenges in productionizing AI agents including non-deterministic behavior, latency requirements, and quality assurance through novel solutions like simulation-based testing that runs thousands of parallel test scenarios, speculative execution for voice latency optimization, and constellation-based multi-model orchestration where 10-20 different models handle various aspects of each conversation. Their outcome-based pricing model aligns incentives with customer success, while their hybrid no-code/code platform enables both business and technical teams to collaboratively build, test, and deploy agents. The platform serves large enterprise customers across multiple industries, with agents handling millions of customer interactions in production environments.