Sonal Labs · Paper № 001 · MMXXVI

Trustworthy memory,from the signal up.

Memory inherits every mistake the capture makes. Sonal Labs researches overlap-native attribution and builds the model and memory layer that depend on it.

Fig. 1 — Two speakers, overlappingA → ink · B → accent

§ 01 · A scenario

Two people speak at once.
Whose symptom goes on the chart?

Transcript fragment · overlap eventCardiology follow-up · 00:02:14
DR. PATEL
Any chest pain in the last week
PATIENT
yes, on Tuesday, but
overlap
DR. PATEL
— and any shortness of breath?
overlap
PATIENT
— it went away quickly.

Standard AI output

“Patient denies chest pain.”

Result

A cardiac symptom goes unrecorded.

Conventional pipelines make a forced choice when two voices land in the same frame. They pick the dominant speaker, drop the other, and hand a plausible-looking transcript to whatever comes next — a discharge summary, a referral, a care plan.

The error doesn't stay in the transcript. It becomes a symptom on the wrong chart. A deposition that collapses two voices into one. A care plan the patient never agreed to.

A memory system that occasionally gets words wrong is imperfect. A memory system that assigns clinical facts to the wrong person is adversarial.

§ 02 · The stack

Three capabilities, one system.

I.

Overlap-Native Audio

A proprietary diarisation model built for the Cocktail Party Problem. Disentangles simultaneous voices, fingerprints speakers, stays stable through interruption and repair.

II.

Visual Intelligence

Optional video context — from smart glasses, CCTV, and cameras — fused with the audio track. Resolves ambiguity the microphone alone can't.

III.

Active Automation

Sonal doesn't just record. It infers intent and triggers workflows. Evidence-grounded action items, reminders, follow-ups, and API-level actions.

§ 03 · Featured research

Overlap-native speech.

Whitepaper · v1.0 · January 202615 pp · Draft by request

Overlap-Native Speaker Diarization
as the foundation for wearable audio-visual memory systems.

A survey of overlap-aware techniques (EEND, TS-VAD, OSD), a reproducible baseline on open foundations, and an overlap-stratified evaluation methodology.

Tim Uzua · Sonal Labs Research

The dominant paradigm in speech AI treats “meetings” as the canonical multi-speaker scenario. Real conversation doesn't live there. Life unfolds in morning walks, phone calls while driving, family dinners, errands, worship — precisely where current systems fail most catastrophically. The paper formalises a different starting point.

§ 04 · The model

Presence AI.

The native overlap LLM at the centre of every Sonal system. Its job is attribution — knowing, frame by frame, who is holding the floor and what they said.

Built on EEND, TS-VAD, and open ASR foundations. Measured on overlap-stratified benchmarks, not aggregate numbers that hide the failure mode.

Turn projection

Continuously predicts where the current turn is going and when it will end, conditioned on prosody, syntax and context.

Floor management

Holds, yields, asks to continue, or interrupts — the same decisions a good human interlocutor makes, as an explicit training objective.

Repair

Handles self- and other-repair. False starts, reformulations, mishearings — not errors, structure to be preserved.

Audio-visual

Optional lip-movement and facial-cue conditioning for disambiguation when the acoustic signal alone isn't enough.

Evaluation licence available to research partners.

Inside Presence AI

§ 05 · Applied programmes

Four rooms where memory has to be right.

I.

Healthcare & Clinical

Scenario

Doctor discusses X-ray results (audio) + image of the X-ray (video).

Action

Generates a comprehensive patient visit summary — attributed, signed, auditable.

II.

Legal & Professional

Scenario

Multiple attorneys arguing (overlap event).

Action

Verbatim diarisation of who said what, with timestamps suitable for record.

III.

Security & Home

Scenario

Camera sees FedEx truck + person dropping a box.

Action

Logs ‘Parcel delivered’ event, alerts the homeowner via app.

IV.

Personal & Family

Scenario

Friend mentions an anniversary date in passing.

Action

Auto-creates a calendar event — ‘Buy a gift for Sarah's anniversary’ — with the clip as evidence.

§ 06 · Why nowThe last hard problem in speech AI

I.

ASR is commoditised.

Whisper, Canary, Parakeet, Nova. Basic transcription is no longer a moat — it's infrastructure. Costs down ~90 % since 2020.

II.

Multi-speaker is the frontier.

EEND, TS-VAD and overlap-aware techniques exist in research but aren't productised. This is the last hard problem in speech AI.

III.

Wearable demand is validated.

The hardware is ready and the market is growing 25 % a year. The missing piece is a software layer you can trust.

Paper № 001 · Overlap-Native Speaker Diarization — draft·In design · Overlap-stratified evaluation suite·Drafting · Presence AI model card·In design · Multimodal memory — vision + speech·Established MMXXVI · Companies House·research@sonallabs.com·
Paper № 001 · Overlap-Native Speaker Diarization — draft·In design · Overlap-stratified evaluation suite·Drafting · Presence AI model card·In design · Multimodal memory — vision + speech·Established MMXXVI · Companies House·research@sonallabs.com·

§ 07 · Editor's note

Overlap-native diarisation is the technical foundation, not the product. The product is trustworthy memory.

— Sonal Labs Whitepaper № 001, § 7