Sonal Labs · Paper № 001 · MMXXVI

Trustworthy memory,from the signal up.

Memory inherits every mistake the capture makes. Sonal Labs researches overlap-native attribution and builds the model and memory layer that depend on it.

The thesis Paper № 001

Fig. 1 — Two speakers, overlappingA → ink · B → accent

§ 01 · A scenario

Two people speak at once.
Whose symptom goes on the chart?

Transcript fragment · overlap eventCardiology follow-up · 00:02:14

DR. PATEL: Any chest pain in the last week —
PATIENT: — yes, on Tuesday, but —
DR. PATEL: — and any shortness of breath?
PATIENT: — it went away quickly.

Standard AI output

“Patient denies chest pain.”

Result

A cardiac symptom goes unrecorded.

Conventional pipelines make a forced choice when two voices land in the same frame. They pick the dominant speaker, drop the other, and hand a plausible-looking transcript to whatever comes next — a discharge summary, a referral, a care plan.

The error doesn't stay in the transcript. It becomes a symptom on the wrong chart. A deposition that collapses two voices into one. A care plan the patient never agreed to.

A memory system that occasionally gets words wrong is imperfect. A memory system that assigns clinical facts to the wrong person is adversarial.

Read the full thesis See the model

§ 02 · The stack

Three capabilities, one system.

Products →

Overlap-Native Audio

A proprietary diarisation model built for the Cocktail Party Problem. Disentangles simultaneous voices, fingerprints speakers, stays stable through interruption and repair.

II.

Visual Intelligence

Optional video context — from smart glasses, CCTV, and cameras — fused with the audio track. Resolves ambiguity the microphone alone can't.

III.

Active Automation

Sonal doesn't just record. It infers intent and triggers workflows. Evidence-grounded action items, reminders, follow-ups, and API-level actions.

§ 03 · Featured research

Overlap-native speech.

All research →

Whitepaper · v1.0 · January 202615 pp · Draft by request

Overlap-Native Speaker Diarization
as the foundation for wearable audio-visual memory systems.

A survey of overlap-aware techniques (EEND, TS-VAD, OSD), a reproducible baseline on open foundations, and an overlap-stratified evaluation methodology.

Tim Uzua · Sonal Labs Research

The dominant paradigm in speech AI treats “meetings” as the canonical multi-speaker scenario. Real conversation doesn't live there. Life unfolds in morning walks, phone calls while driving, family dinners, errands, worship — precisely where current systems fail most catastrophically. The paper formalises a different starting point.

Read the summary Request the PDF

§ 04 · The model

Presence AI.

The native overlap LLM at the centre of every Sonal system. Its job is attribution — knowing, frame by frame, who is holding the floor and what they said.

Built on EEND, TS-VAD, and open ASR foundations. Measured on overlap-stratified benchmarks, not aggregate numbers that hide the failure mode.

Turn projection

Continuously predicts where the current turn is going and when it will end, conditioned on prosody, syntax and context.

Floor management

Holds, yields, asks to continue, or interrupts — the same decisions a good human interlocutor makes, as an explicit training objective.

Repair

Handles self- and other-repair. False starts, reformulations, mishearings — not errors, structure to be preserved.

Audio-visual

Optional lip-movement and facial-cue conditioning for disambiguation when the acoustic signal alone isn't enough.

Whisper, Canary, Parakeet, Nova. Basic transcription is no longer a moat — it's infrastructure. Costs down ~90 % since 2020.

II.

Multi-speaker is the frontier.

EEND, TS-VAD and overlap-aware techniques exist in research but aren't productised. This is the last hard problem in speech AI.

III.

Wearable demand is validated.

The hardware is ready and the market is growing 25 % a year. The missing piece is a software layer you can trust.

Paper № 001 · Overlap-Native Speaker Diarization — draft·In design · Overlap-stratified evaluation suite·Drafting · Presence AI model card·In design · Multimodal memory — vision + speech·Established MMXXVI · Companies House·research@sonallabs.com·

§ 07 · Editor's note

“Overlap-native diarisation is the technical foundation, not the product. The product is trustworthy memory.”

— Sonal Labs Whitepaper № 001, § 7

Trustworthy memory,from the signal up.

Two people speak at once.Whose symptom goes on the chart?

Three capabilities, one system.

Overlap-Native Audio

Visual Intelligence

Active Automation

Overlap-native speech.

Overlap-Native Speaker Diarizationas the foundation for wearable audio-visual memory systems.

Presence AI.

Four rooms where memory has to be right.

Healthcare & Clinical

Legal & Professional

Security & Home

Personal & Family

ASR is commoditised.

Multi-speaker is the frontier.

Wearable demand is validated.

Two people speak at once.
Whose symptom goes on the chart?

Overlap-Native Speaker Diarization
as the foundation for wearable audio-visual memory systems.