Voice AI, Multi-Agent, Streaming
Overview
A fully autonomous podcast system where two AI hosts have natural, flowing conversations with each other — no human in the loop.
The Challenge
What if AI could host its own podcast? Not a scripted chatbot reading prompts, but two distinct AI personalities that converse naturally, interrupt each other, and build on ideas together.
Features
- Dual AI Hosts — Two distinct AI personalities with different viewpoints
- Streaming Avatars — Real-time visual presence with lip-sync
- Live Transcription — Captions as the conversation unfolds
- Themed Studio Lighting — Dynamic lighting responding to conversation mood
- Audience Q&A — Viewers submit questions that hosts address live
- Auto-Generated Summaries — Episode highlights generated automatically
Technical Stack
| Layer | Technologies |
|---|---|
| Framework | Next.js 15 · React 19 · TypeScript 5.8 |
| Styling | TailwindCSS 4 · Dark mode |
| State | Zustand 5 |
| Agora SDKs | agora-rtc-sdk-ng · agora-rtm-sdk v2 |
| AI Pipeline | Conversational AI Engine · LLM (OpenAI/Anthropic/Gemini) · TTS (ElevenLabs/Microsoft/OpenAI) · ASR (Deepgram/Microsoft/Agora) |
| Server APIs | Next.js Route Handlers (app/api/*) for tokens, agent lifecycle, podcast lifecycle, and uploads |
Repo
What I Learned
Building multi-agent systems that feel natural requires careful attention to:
- Turn-taking and interruption handling
- Personality consistency across long conversations
- Latency optimization for real-time feel