Conversational AI NLP · Python React · Next.js Coaching Tools FastAPI Zoom API

Common
Ground

Before AI can guide human conversation, conversational state must be made legible. A real-time system that maps client language into structured, confidence-gated insight during the session, not after.

Pipeline layers

0.7

Confidence threshold

0→1

New infrastructure layer

Live

Session mapping

Project Context

Role

Lead AI Product Designer & Engineer

System architecture, NLP pipeline, UX design

Platform

Web app · Zoom API

Real-time transcript processing, live session integration

Stack

Python · FastAPI · React · Next.js

TypeScript · NLP pipeline · Zoom Transcript API

Strategic role

Paloma Ecosystem Layer

Foundational conversational mapping infrastructure

The Problem

What existing tools gave coaches

The coach hears everything. The system captures nothing.

General-purpose transcription tools capture what was said. They don't distinguish signal from noise, client language from coach language, or a passing remark from a recurring theme. The result is a wall of text that requires a human to re-read, re-interpret, and manually extract what mattered.

For a coach managing multiple clients across dozens of sessions, that cognitive load compounds. And for an AI system meant to support coaching between sessions, unstructured transcripts are unusable.

What Common Ground gives coaches

Conversational state made visible in real time.

Client phrases extracted verbatim and mapped spatially by meaning, not time. Confidence-gated so only signal surfaces automatically. Coach retains full override authority at every layer.

Not a summary of what happened. A structured view of what was said — legible to the coach and to downstream AI systems.

“The problem wasn’t capturing the conversation. It was making the conversation legible to a system without making the coach feel surveilled.”

Design constraint · Human authority preserved

How It Works

Live Session Demo · Executive Coaching

From conversation to structured insight.

The system processes live Zoom transcripts in real time, extracting meaningful phrases from client speech, classifying them by quadrant, and plotting them spatially. Only phrases that clear the confidence threshold surface on the map — everything else stays in the list layer for coach review.

Conversation Map · Executive Coaching Session

Live Session 0:04

Strengths Struggles Past Future Past Strengths Preferred Future Past Struggles Feared Future

1successful last year

2strong support network

10learned from experience

7want more freedom

8clear next step

6confidence growing

11want more control

13imagine thriving

3fear of failing

5feeling overwhelmed

12repeating patterns

4uncertain future

9worry about disappointment

Captured Phrases

13 phrases · session 0:04

My confidence is really growing every single day

I was really successful in my work last year

I have a strong and supportive network of friends

I have this deep fear of failing in life again

The future really feels so uncertain to me right now

I'm feeling so overwhelmed by everything right now

I really want to have more freedom in my life

I can finally see a clear next step forward

I worry so much about letting people down again

I learned so much from that difficult past experience

I want to feel so much more in control

The same patterns just keep happening over and over

I imagine myself really thriving in the future

System Architecture

04 · Pipeline Design

Four layers.
One legible map.

The pipeline moves from raw transcript to structured spatial insight in four deterministic steps. Each layer is auditable — the coach can see not just what surfaced, but why, and override any classification.

Layer 01

Extraction

Phrase extraction
Confidence ≥ 0.6
Max 2 phrases / utterance
Cross-quadrant allowed

Layer 02

Classification

Quadrant assignment
Confidence ≥ 0.7
Below threshold → list only
No opaque black-box routing

Layer 03

Ranking

Exact recurrence
Normalized match
Recurring themes surface first
Coach override preserved

Layer 04

Spatial Mapping

Quadrant as structure
Recurrence drives proximity
Inline quadrant override
Human authority final

Zoom API integration feeds live transcript data into the Python/FastAPI backend. The React/Next.js frontend renders the conversation map in real time, updating as the session progresses without interrupting the coach's attention.

Design Principles

Four non-negotiables that shaped every decision — from confidence thresholds to what the map shows at a glance.

Human authority preserved

The coach sees every classification and can override any of them. The system proposes; the coach decides. This isn't a UX nicety — it's the design constraint that makes the tool trustworthy in a therapeutic context.

Deterministic over opaque

Every extracted phrase has a visible confidence score and an auditable classification path. No black-box summaries. A coach can always trace why something surfaced — and reject it if it doesn't feel right.

Structured memory over transcript storage

The system stores structured session objects — extracted phrases, quadrant assignments, confidence scores — not raw transcripts. What gets retained is meaning, not surveillance.

Ambient awareness over dashboard noise

The map is designed to be glanceable during a session, not studied. A coach should be able to orient in two seconds and return to the client. Density is the enemy; spatial clarity is the goal.

What This Demonstrates

0→1

New infrastructure category

No existing tool mapped conversational state in real time with coach-level trust requirements. This was a new category, not an iteration on transcription.

0.7

Confidence threshold

Below 0.7, phrases enter the list layer for manual coach review. Above it, they surface on the map automatically. The threshold is the trust architecture.

Auditable pipeline layers

Every extracted phrase passes through four visible, overridable layers. The system's reasoning is never hidden from the human in the loop.

What This Work Demonstrates

How do you build AI infrastructure for a human-trust context?

You make the system's reasoning visible and overridable at every step. Confidence scores aren't a nice-to-have — they're the mechanism that lets a human decide what to trust. Without that visibility, the tool becomes a black box that undermines the very relationship it's meant to support.

What's the difference between a transcript and conversational state?

A transcript records what was said. Conversational state captures what mattered, in what context, with what frequency. Common Ground is the layer that converts the former into the latter — the infrastructure that makes downstream AI possible.

Why does the map need to be glanceable in two seconds?

Because the coach's attention belongs to the client, not the screen. Ambient awareness is a design goal, not an aesthetic one. Every visual decision — density, color, spatial logic — is optimized for a human who needs to be present, not analytical.

“Before AI can guide human conversation, conversational state must be made legible. Common Ground is the layer that makes that possible.”

Strategic intent · Paloma ecosystem

View on GitHub → ← All case studies

Common Ground

The coach hears everything. The system captures nothing.

Conversational state made visible in real time.

From conversation to structured insight.

Four layers.One legible map.

Common
Ground

Four layers.
One legible map.