Intervoo

Live AI interview copilot

Principal Engineer2024 - PresentSolo4 min read

Sub-sec

Latency

0 sec

Audio

Modes

Visit Live Site

Live Demo

Overview

Intervoo is a live AI interview copilot. It listens to interviews in real time, recognizes when a question is asked, and quietly suggests a personalized answer based on the candidate's resume and the company they're interviewing with. A separate practice mode helps candidates prep beforehand and quietly builds a personal story library that makes future copilot sessions more accurate. Designed, built, and shipped end-to-end.

The Challenge

A live interview is the worst possible environment for an AI tool. Latency has to be under a second or the conversation moves on. Answers have to actually sound like the candidate, not a generic chatbot. And privacy is non-negotiable — nobody wants their interview audio sitting on someone else's server. Most existing tools fail one or more of these. The hard part wasn't the AI — it was building the whole experience around those three constraints.

The Solution

Two coupled experiences sharing one design: a practice mode that helps candidates prepare and quietly extracts their best stories, and a live copilot mode that uses those stories — plus the target company's culture — to generate personalized suggestions in real time. Voice processing happens at the edge, audio is never stored, and the entire system is designed to disappear behind the conversation rather than interrupt it.

Architecture

Three real-time loops: voice in, AI out, and a practice flywheel that quietly improves both. Privacy is enforced at the system boundary, not bolted on after.

Client

Browser-side voice captureStreaming UIQuestion detection

Voice

Real-time transcriptionSpeaker separationZero audio retention

Streaming LLM responsesPersonal context groundingCompany culture awareness

Data

Encrypted user dataStory libraryPrivacy-first storage

Request Flow

Candidate speaks

System recognizes a question

Personal context is assembled silently

AI suggestion streams to the candidate

Practice sessions quietly improve future suggestions

Key Decisions & Tradeoffs

Streaming, not request/response

Why: Sub-second perceived latency was non-negotiable. Streaming the response token-by-token lets the candidate start reading almost immediately, instead of waiting for a full answer to load.

One model, hand-written orchestration

Why: Tried frameworks early on. Every layer added latency, complexity, and cost without earning its keep. Stripping it back to one streaming endpoint with custom prompt logic was faster, simpler, and far easier to debug.

Practice as the data engine

Why: Practice answers are stories the user has already validated. Quietly extracting them into a personal library makes future copilot answers feel personal without forcing the user to fill out a profile. Two features, one shared flywheel.

Zero audio storage, ever

Why: A live interview tool that uploads audio is a non-starter for trust. Audio is processed in the browser and discarded. Only what the user explicitly saves persists.

Resume as single source of truth

Why: Earlier iterations fragmented user data across multiple tables. Consolidating to one source of truth eliminated entire classes of sync bugs and made personalization straightforward.

Database-level access control from day one

Why: Multi-tenant data with AI in the loop is a leaking-data accident waiting to happen. Enforcing isolation at the database level means a bug in any API route can't cross user boundaries.

What I took away from this project

Lessons that still shape how I build — in my own words.

Frameworks cost more than they save on latency-critical paths

I started Intervoo on a popular LLM orchestration framework because it seemed irresponsible not to. Three weeks in, every prompt I wanted to tune meant fighting an abstraction, and the streaming was worse than a raw fetch. Ripped it out over a weekend, dropped to one streaming endpoint with hand-written prompt logic, and perceived latency dropped noticeably. The lesson that keeps coming back: on the hot path, frameworks are renting complexity you'll want to own later anyway.

Privacy constraints make the product, not break it

"Zero audio storage" sounded like a limitation at first. It turned out to be the single clearest positioning we had. Once I committed to it, a lot of architectural questions answered themselves — processing at the edge, ephemeral sessions, no recording features. Constraints at the start of a project are a gift if you treat them as axioms instead of problems.

Practice and copilot shouldn't be separate features

The first version of Intervoo had two product surfaces that didn't talk to each other. Users would practice, and then show up to the live copilot with no personal context. Coupling them through a shared story library — where practice quietly builds the data the copilot uses — is what turned two decent features into one compounding product.

Impact

Sub-sec

Latency

Streaming response time

0 sec

Audio

Stored anywhere

Modes

Practice + live copilot

Solo

Build

End-to-end

Technology Stack

Frontend

Next.jsReactTypeScriptTailwind CSS

AI & Voice

Streaming LLMsReal-time STTWebRTC

Backend

SupabasePostgreSQLEdge Functions

Infra

Stripe billingNetlify

Key Features

Streaming LLM pipeline that delivers token-by-token responses with sub-second time-to-first-token

Real-time voice layer: browser-side audio capture, low-latency transcription, live question detection

Personalization engine that grounds every suggestion in the candidate's own story library and target company context

Privacy-first architecture: audio processed at the edge and never persisted — only transcripts the user explicitly saves

Two coupled product surfaces sharing one data model: practice mode quietly extracts stories, live mode uses them

Multi-tenant Postgres with Row Level Security — a bug in any API route can't cross user boundaries

Stripe billing, webhooks, and subscription state wired into a Next.js full-stack on Supabase

Solo build: product, architecture, AI engineering, full-stack, design, infra, deployment

Available for new projects

Want something like this, built for you?

If the Intervoo story sounds close to what you're trying to ship, a 30-min call is the fastest way to find out if we're a fit.

Replies in < 2 hours

Fixed price, no scope creep

Full refund if week 1 isn't right

Book 30-Min Strategy Call

BCBSM

Jishu Labs