Smart glasses hear the world through your ears — conference names, speaker names, product jargon that frontier ASR models have never seen. At Gobi I built a per-event adaptation pipeline: for each conference domain, a lightweight LoRA adapter (r=16, α=32 on q_proj/v_proj) fine-tuned on fully synthetic speech.
The synthetic data pipeline generates entity-rich utterances with Gemini 2.5 and voices them with Cloud TTS Chirp 3 HD — roughly 26,000 utterances (55 hours) across five conference domains, no human recordings required. The adapters fixed 57 entity errors and cut entity-level WER from 9.91% to 8.46% raw (5.58% canonicalized), while the swap-in adapter design keeps base-model performance untouched outside the target domain.
Why it matters: wearables can’t ship a fine-tuned model per customer. Adapters that train on synthetic data in hours and hot-swap per event make domain adaptation operationally cheap.
Related reading: LoRA-Whisper (2406.06619), DAS (2501.12501), synthetic cross-accent augmentation (2303.00802).