Before AI can guide human conversation, conversational state must be made legible. A real-time system that maps client language into structured, confidence-gated insight during the session, not after.
General-purpose transcription tools capture what was said. They don't distinguish signal from noise, client language from coach language, or a passing remark from a recurring theme. The result is a wall of text that requires a human to re-read, re-interpret, and manually extract what mattered.
For a coach managing multiple clients across dozens of sessions, that cognitive load compounds. And for an AI system meant to support coaching between sessions, unstructured transcripts are unusable.
Client phrases extracted verbatim and mapped spatially by meaning, not time. Confidence-gated so only signal surfaces automatically. Coach retains full override authority at every layer.
Not a summary of what happened. A structured view of what was said — legible to the coach and to downstream AI systems.
“The problem wasn’t capturing the conversation. It was making the conversation legible to a system without making the coach feel surveilled.”
The system processes live Zoom transcripts in real time, extracting meaningful phrases from client speech, classifying them by quadrant, and plotting them spatially. Only phrases that clear the confidence threshold surface on the map — everything else stays in the list layer for coach review.
Conversation Map · Executive Coaching Session
The pipeline moves from raw transcript to structured spatial insight in four deterministic steps. Each layer is auditable — the coach can see not just what surfaced, but why, and override any classification.
Layer 01
Extraction
Phrase extraction
Confidence ≥ 0.6
Max 2 phrases / utterance
Cross-quadrant allowed
Layer 02
Classification
Quadrant assignment
Confidence ≥ 0.7
Below threshold → list only
No opaque black-box routing
Layer 03
Ranking
Exact recurrence
Normalized match
Recurring themes surface first
Coach override preserved
Layer 04
Spatial Mapping
Quadrant as structure
Recurrence drives proximity
Inline quadrant override
Human authority final
Zoom API integration feeds live transcript data into the Python/FastAPI backend. The React/Next.js frontend renders the conversation map in real time, updating as the session progresses without interrupting the coach's attention.
Four non-negotiables that shaped every decision — from confidence thresholds to what the map shows at a glance.
The coach sees every classification and can override any of them. The system proposes; the coach decides. This isn't a UX nicety — it's the design constraint that makes the tool trustworthy in a therapeutic context.
Every extracted phrase has a visible confidence score and an auditable classification path. No black-box summaries. A coach can always trace why something surfaced — and reject it if it doesn't feel right.
The system stores structured session objects — extracted phrases, quadrant assignments, confidence scores — not raw transcripts. What gets retained is meaning, not surveillance.
The map is designed to be glanceable during a session, not studied. A coach should be able to orient in two seconds and return to the client. Density is the enemy; spatial clarity is the goal.
No existing tool mapped conversational state in real time with coach-level trust requirements. This was a new category, not an iteration on transcription.
Below 0.7, phrases enter the list layer for manual coach review. Above it, they surface on the map automatically. The threshold is the trust architecture.
Every extracted phrase passes through four visible, overridable layers. The system's reasoning is never hidden from the human in the loop.
You make the system's reasoning visible and overridable at every step. Confidence scores aren't a nice-to-have — they're the mechanism that lets a human decide what to trust. Without that visibility, the tool becomes a black box that undermines the very relationship it's meant to support.
A transcript records what was said. Conversational state captures what mattered, in what context, with what frequency. Common Ground is the layer that converts the former into the latter — the infrastructure that makes downstream AI possible.
Because the coach's attention belongs to the client, not the screen. Ambient awareness is a design goal, not an aesthetic one. Every visual decision — density, color, spatial logic — is optimized for a human who needs to be present, not analytical.
“Before AI can guide human conversation, conversational state must be made legible. Common Ground is the layer that makes that possible.”