From a0fcdf74da86cc0e75d0b3337c771cccde40315c Mon Sep 17 00:00:00 2001 From: Jared Vititoe Date: Wed, 1 Jul 2026 00:46:39 -0400 Subject: [PATCH] feat(denoise): autoGainControl=false for the ML tier + docs - CallEmbed sets `autoGainControl=false` for the ML noise-suppression tier so the browser's auto gain control doesn't fight the in-source ML model; the browser/off tiers keep AGC on. - Docs: refresh the LOTUS_FEATURES noise-suppression section (browser-native default, quality-ordered dropdown, DFN3 ML default, attenuation floor, gate-after-ML, DFN level 60, AGC-off, the reliability fixes) and LOTUS_TODO P5-30 (mark tuning/reliability/AGC done; record GTCRN as researched-and-deferred). Co-Authored-By: Claude Opus 4.8 --- LOTUS_FEATURES.md | 32 ++++++++++++++++++++++++------- LOTUS_TODO.md | 8 ++++++-- src/app/plugins/call/CallEmbed.ts | 4 ++++ 3 files changed, 35 insertions(+), 9 deletions(-) diff --git a/LOTUS_FEATURES.md b/LOTUS_FEATURES.md index 25a699c95..3018363d8 100644 --- a/LOTUS_FEATURES.md +++ b/LOTUS_FEATURES.md @@ -512,7 +512,7 @@ A comprehensive mic noise-suppression system in **Settings → General → Calls **Advanced Features & Test Options:** -- **Multiple ML Models:** Toggle between **RNNoise** (standard hybrid) and **Speex** (legacy DSP-based) to compare artifact levels and suppression strength. +- **Multiple ML Models:** Four in-source models, selectable from a dropdown **ordered by quality/CPU** (best first): **DeepFilterNet 3** (48 kHz, best), **DTLN** (16 kHz), **RNNoise** (48 kHz), **Speex** (48 kHz, lightest). The **tier default is Browser-native**; when a user opts into ML the default model is **DeepFilterNet 3**. - **Series Suppression (Combination):** Optional toggle to run the browser's native stationary noise filter _before_ the ML model. This allows testing the individual performance of the ML model vs the combined effectiveness at removing fan hum. - **Noise Gate:** Configurable hardware-style gate with a dB threshold. Hard-cuts all audio when input is below the threshold, ensuring absolute silence between sentences. - **Live Microphone Meter:** A real-time volume visualizer in the settings panel to help users accurately tune their Noise Gate threshold. @@ -524,17 +524,35 @@ A comprehensive mic noise-suppression system in **Settings → General → Calls **Open-Source Models (all now in-source in the EC fork):** | Model | Transients (Clicks) | Voice Quality | CPU Usage (WASM) | Sample rate | | :--- | :--- | :--- | :--- | :--- | -| **RNNoise** (default) | Poor | Moderate | < 5% | 48 kHz | -| **Speex** | Poor | Low | < 5% | 48 kHz | +| **DeepFilterNet 3** (ML default) | **Excellent** | **Very High** | 25-50%+ | 48 kHz | | **DTLN** | Good | High | 10-20% | 16 kHz | -| **DeepFilterNet 3** | **Excellent** | **Very High** | 25-50%+ | 48 kHz | +| **RNNoise** | Poor | Moderate | < 5% | 48 kHz | +| **Speex** | Poor | Low | < 5% | 48 kHz | > **Update (2026-06):** with the EC fork live, denoise runs **inside** Element > Call as a LiveKit `TrackProcessor` and **all four models ship in-source** > (DTLN at 16 kHz, the rest at 48 kHz; the processor degrades to the raw mic -> rather than ever going silent). The model picker selects between them. Real-call -> **audio-quality** comparison across models is still the open verification item -> (RNNoise output is known to be weak) — see `LOTUS_TESTING.md` §D2-1. +> rather than ever going silent). The model picker selects between them. + +> **Update (2026-07) — quality, reliability & AEC/AGC:** +> +> - **Quality tuning** (addresses the "robotic/underwater" RNNoise reports): +> a **dry/wet attenuation floor** (default ~-16 dB) blends a little raw mic +> under the denoised signal so suppression can't fully collapse the noise +> floor — applied only to the low-latency flat models (RNNoise/Speex); DTLN/DFN +> would comb-filter, so they rely on their own level. The **noise gate now runs +> after the ML stage**, and **DeepFilterNet 3 level 80 → 60**. Tunable via the +> `lotusDenoiseFloor` param. +> - **AEC/AGC:** browser **echo cancellation stays ON**, but the ML tier now sets +> **auto gain control OFF** (`autoGainControl=false`) so the browser's dynamic +> gain doesn't fight the ML model. Browser/off tiers keep AGC on. (Remote +> playback stays on standard elements — no AEC-defeat vector.) +> - **Reliability:** never-silent watchdog (auto-resume a suspended context), +> `resume()` timeout (no track-lock deadlock), rejected-WASM-fetch eviction +> (transient failures recover), activation off the local participant (works +> solo), and init/build-failure leak fixes. +> - Real-call **audio-quality** A/B (model choice, floor value, AGC on/off) is the +> open by-ear validation item — see `LOTUS_TESTING.md` §D2-1. ### Files diff --git a/LOTUS_TODO.md b/LOTUS_TODO.md index a7f45c848..63357a72b 100644 --- a/LOTUS_TODO.md +++ b/LOTUS_TODO.md @@ -301,8 +301,12 @@ Features: **Models — all in-source in the fork:** -- [x] **RNNoise** (48 kHz, default) · **Speex** (48 kHz) · **DTLN** (16 kHz) · **DeepFilterNet 3** (48 kHz) — all four wired and selectable. -- [ ] **Open verification:** real-call **audio-quality** comparison across the four models (RNNoise output is known-weak). Track under the denoise quality project, `LOTUS_TESTING.md` §D2-1 / J2. +- [x] **DeepFilterNet 3** (48 kHz, **ML default**) · **DTLN** (16 kHz) · **RNNoise** (48 kHz) · **Speex** (48 kHz) — all four wired and selectable; dropdown ordered best-quality first. Tier default is **Browser-native**. +- [x] **Quality tuning (2026-07):** dry/wet **attenuation floor** (~-16 dB, RNNoise/Speex only — the "robotic" fix; DTLN/DFN would comb-filter), **gate-after-ML**, **DFN level 80→60**. Floor tunable via `lotusDenoiseFloor`. +- [x] **AEC/AGC (2026-07):** echo-cancellation ON; **AGC OFF for the ML tier** (`autoGainControl=false`, threaded through EC `UrlParams`→`ConnectionFactory`) so browser AGC doesn't fight the model; playback confirmed no AEC-defeat. +- [x] **Reliability (2026-07):** never-silent watchdog, resume-timeout, WASM-cache reject-eviction, activate-off-local-participant, init/build leak fixes. +- [ ] **Open verification:** real-call by-ear **A/B** — model choice, floor value, AGC on/off (RNNoise known-weak historically). `LOTUS_TESTING.md` §D2-1 / J2. +- [ ] **GTCRN (RESEARCHED — DEFERRED):** tiny MIT 16 kHz model that beats RNNoise, but **no drop-in browser package** — needs a ~1-week from-scratch build: `onnxruntime-web` (WASM, 1 thread) in a **Web Worker** (ORT can't run in an AudioWorklet — issue #13072) behind a custom AudioWorklet ring-buffer node presenting as an `AudioNode`; model `gtcrn_simple.onnx` (~300 KB, stateful — thread `conv/tra/inter` caches per frame); we write STFT/iSTFT (n_fft 512/hop 256). Assets ~3–4 MB via the `lotusDenoise()` vite plugin. Registration checklist known (both repos, incl. the 2nd `denoisePipeline.ts` used by the DenoiseTester). **Revisit only if low-power quality is insufficient after validating the current tuning.** - [ ] **Desktop-only / HW-gated (future):** FRCRN or NVIDIA Maxine (RTX/Tensor only) — impossible in-browser; would run in the Tauri Rust backend + bridge a virtual mic into the webview. Detect capability; web falls back to RNNoise. - **Excluded:** Krisp (LiveKit Cloud only); FRCRN/Maxine on web (GPU/server-bound). diff --git a/src/app/plugins/call/CallEmbed.ts b/src/app/plugins/call/CallEmbed.ts index 3500b1d05..a59b17881 100644 --- a/src/app/plugins/call/CallEmbed.ts +++ b/src/app/plugins/call/CallEmbed.ts @@ -174,6 +174,10 @@ export class CallEmbed { denoiseMode === 'browser' || (denoiseMode === 'ml' && denoiseNativeNS) ).toString(), + // Turn the browser's auto gain control OFF for the ML tier only: its + // dynamic gain fights the in-source ML denoiser (pumping). Browser/off + // tiers keep the browser's normal capture pipeline (AGC on). + autoGainControl: (denoiseMode !== 'ml').toString(), audio: initialAudio.toString(), video: initialVideo.toString(), header: 'none',