Tavus built the most lifelike real-time conversational video on the planet. But a photoreal agent that can only talk doesn’t justify the GPU bill. We’ve already built the agentic harness that turns those conversations into captured leads, booked meetings, and completed on-page actions — live, today, on real businesses.
01
The cost problem
Tavus’s Phoenix-3 renders a full-face, micro-expression, lip-synced human via Gaussian diffusion at conversational latency. That demands datacenter GPUs held warm per call. The render is worth it only if the conversation drives a measurable business outcome.
Phoenix-3 (rendering), Raven-0 (perception), Sparrow-0 (~600 ms turn-taking). The face feels alive. This is the hard, defensible part — and it’s the GPU-hungry part.
Real-time video diffusion can’t be cheaply batched or sharded the way an LLM can. Every active call burns premium GPU. That’s a structural cost, not a rounding error.
A beautiful agent that can only chat is a demo. To justify the spend it has to do things — capture the lead, book the meeting, move the visitor down the funnel.
02
What we already built
This is production code running today. Your models render and take turns; our harness wires the agent into the business so the conversation produces outcomes.
Agent opens a secure name/email/message form mid-conversation and the lead fans out to the business — Slack, webhook (HMAC-signed), and CRM — in real time.
On an owner-allowlisted field, the agent types the value the visitor gives it (React-friendly native setter + input/change events). It NEVER submits — the human reviews and sends.
Allowlisted host-page routes, button clicks, and section scrolls — the agent can take a visitor to pricing, open a menu, or expand a booking widget on the live site.
Detects scheduling intent and opens the owner’s calendar (Cal.com / Calendly / Google) so the visitor picks a slot without leaving the conversation.
Full-text search over the business’s verified website, profile, and documents (Postgres FTS) so answers are grounded — not hallucinated.
Tier-gated CRM routing and specialist hand-off, with cross-session memory so a returning visitor is recognized.
An admin "AI test circuit" runs a battery against every agent handler (markers, tools, keys, delivery) and tracks a rolling uptime score — so the harness is observably healthy.
Every host-page action is defence-in-depth gated — validated at the API, the tool executor, the loader, and the effects client. No payment/password fields, ever.
Already wired to Tavus. We register these as real function tools on the Tavus persona (layers.llm.tools), decode the conversation.tool_call events, and apply the effect in the visitor’s browser — so a Tavus video agent calls open_lead_form / fill_host_field / book_meeting and it just works.
03
Live proof, not slideware
Every business is a geographic #portal on hashtag.org. The same single embed code drops the agent onto the owner’s own website. The agent talks, qualifies, and acts — and the lead lands in the owner’s Slack before the call ends.
A single install powers both the on-site widget (talk to the business’s AI clone) and the #portal card on the spatial map. Backlinks + discovery come free.
Visitor intent → open_lead_form → submitted lead fans out to Slack, webhook, and CRM. The business feels the agent working immediately.
Each #portal carries a spatial data layer (local businesses, civic data, more) the agent can reason over — context Tavus agents don’t otherwise have.
The agent is prompted to qualify within 60 s, surface value, and close to the lead form or booking — it doesn’t wait to be asked.
04
Harness × Phoenix-3
Your video fusion makes the interaction feel human enough to trust. Our harness makes that trust convert. Neither half is as valuable alone.
Tavus alone
A lifelike face that chats
Stunning. But the ROI conversation is hard: high GPU cost, no built-in path to a captured outcome.
Tavus × harness
A face that closes
Qualifies, fills forms, books, captures the lead, routes it to the CRM. Now the GPU spend maps directly to pipeline — an easy ROI story for every customer you sell to.
05
The cost flip
Our #space Chrome extension is already the distribution surface. For power users and enterprises with a capable GPU, a small native helper renders a distilled real-time avatar LOCALLY and streams frames straight into the extension — so the per-call GPU cost goes to zero, right inside the browser.
The browser extension is the front-end (call UI, captions, the harness’ lead form). A one-time native helper does the CUDA render and talks to it over Chrome Native Messaging — browser distribution + native GPU power, no separate app to discover.
We’re validating real-time local inference on a 4090 (Windows/NVIDIA) first — distilled avatar models now hit 16–32 FPS on a single modern GPU. macOS / other GPUs follow.
For these users the expensive render is free to us. Hosted Tavus stays the premium path for everyone without a capable GPU — and the SAME harness drives outcomes on both.
06
The ask
We chose Tavus. We can route real conversational-video volume to you, showcase the harness on live businesses, and prove the ROI story your sales team needs.
What we bring
What we’re asking
You make AI faces feel human. We make them close business.
Let’s put them together and make every call worth the GPU.
Watch a live #portal agent