From 5d5f5f4516ca40283cc258f5259ba21edeab3c65 Mon Sep 17 00:00:00 2001 From: Jared Vititoe Date: Tue, 16 Jun 2026 00:50:12 -0400 Subject: [PATCH] feat(calls): implement advanced multi-model ML noise suppression system Implement a flexible, multi-model noise suppression pipeline for Element Call/LiveKit integration: - ML Engines: Added support for RNNoise, Speex, DTLN, and DeepFilterNet 3 models. - Pipeline Architecture: Implemented modular audio processing in lotus-denoise.js, supporting 'Series Suppression' (running browser-native NSNet2 before ML) and a hardware-style Noise Gate. - UI & UX Enhancements: - Settings UI: Added model comparison chart with CPU/Quality metadata. - Tuning: Added Live Microphone Meter for calibrating Noise Gate thresholds. - Reporting: Added LotusToast system to alert users when ML suppression fails or falls back to raw input. - Robustness & Quality: - Capture Fidelity: Removed forced 48kHz capture constraints to allow native-rate capture (solving static issues with high-end audio interfaces). - Performance: Added WASM SIMD detection with transparent fallback. - Capability Detection: Added browser feature detection to disable unsupported ML modes. - Build Integration: Updated Vite config to self-host all model WASM/tflite assets in /denoise/ directory. --- LOTUS_FEATURES.md | 45 +-- build/lotus-denoise.js | 221 +++++++++------ package.json | 2 + src/app/features/settings/general/General.tsx | 261 +++++++++++++++++- src/app/hooks/useCallEmbed.ts | 30 +- src/app/pages/client/ClientNonUIFeatures.tsx | 27 ++ src/app/plugins/call/CallEmbed.ts | 15 +- src/app/state/settings.ts | 9 + src/app/utils/lotusDenoiseUtils.ts | 68 +++++ vite.config.js | 33 ++- 10 files changed, 606 insertions(+), 105 deletions(-) create mode 100644 src/app/utils/lotusDenoiseUtils.ts diff --git a/LOTUS_FEATURES.md b/LOTUS_FEATURES.md index 0b8235a7c..95585e36f 100644 --- a/LOTUS_FEATURES.md +++ b/LOTUS_FEATURES.md @@ -405,32 +405,43 @@ A local sound plays when another participant joins or leaves a call you're in. Files: `src/app/utils/callSounds.ts`, `src/app/hooks/useCallJoinLeaveSounds.ts` -### Noise Suppression (3-Tier, incl. on-device ML) (P5-30) +### Noise Suppression (Advanced Multi-Tier) (P5-30) -A three-way mic noise-suppression control in **Settings → General → Calls**: +A comprehensive mic noise-suppression system in **Settings → General → Calls** designed for high-end hardware and detailed performance testing. -| Tier | What it does | +| Tier | Description | | ------------------ | ----------------------------------------------------------------------------- | -| **Off** | No suppression (`noiseSuppression=false` to Element Call). | -| **Browser-native** | Element Call's built-in WebRTC suppressor (`noiseSuppression=true`). Default. | -| **ML (beta)** | On-device RNNoise — Krisp-style removal of fans, keyboards, dogs, etc. | +| **Off** | No suppression applied. | +| **Browser-native** | Google NSNet2 (WebRTC built-in). Best general performance/CPU balance. | +| **ML (Advanced)** | Custom ML pipeline supporting multiple models, series suppression, and gates. | -**Why a shim, not a fork:** Element Call captures the mic _inside_ its iframe and publishes to LiveKit; the host can't reach that track. LiveKit's Krisp filter is LiveKit-Cloud-only (we self-host the SFU), and EC's own RNNoise work (PR #3892) is unmerged. So the **ML tier** is delivered by injecting a same-origin pre-init script into the vendored EC `index.html` that monkeypatches `getUserMedia` and routes the captured mic through an RNNoise `AudioWorklet` (`@sapphi-red/web-noise-suppressor`) before LiveKit ever sees it — the same post-capture pipeline #3892 uses, executed from the realm we already control. Works on the self-hosted LiveKit SFU, survives EC version bumps, no EC fork/AGPL/rebase burden. +**Advanced Features & Test Options:** +- **Multiple ML Models:** Toggle between **RNNoise** (standard hybrid) and **Speex** (legacy DSP-based) to compare artifact levels and suppression strength. +- **Series Suppression (Combination):** Optional toggle to run the browser's native stationary noise filter *before* the ML model. This allows testing the individual performance of the ML model vs the combined effectiveness at removing fan hum. +- **Noise Gate:** Configurable hardware-style gate with a dB threshold. Hard-cuts all audio when input is below the threshold, ensuring absolute silence between sentences. +- **Live Microphone Meter:** A real-time volume visualizer in the settings panel to help users accurately tune their Noise Gate threshold. +- **High-Fidelity Capture:** Captures at hardware native rates (supporting high-end gear like **Scarlett Solo + PodMic**) and handles high-quality resampling via Web Audio to prevent the "static" artifacts caused by low-quality browser pre-resamplers. +- **Performance:** Automatic WASM SIMD detection with transparent fallback to standard binaries. +- **Support Detection:** UI now detects `AudioWorklet` / `AudioContext` support and disables ML options in unsupported environments. +- **Status Reporting:** The ML shim notifies the host app via `postMessage`. If initialization fails, a system toast alerts the user of the fallback to the raw microphone. -**How it's wired:** +**Open-Source Model Roadmap:** +| Model | Transients (Clicks) | Voice Quality | CPU Usage (WASM) | +| :--- | :--- | :--- | :--- | +| **RNNoise** | Poor | Moderate | < 5% | +| **DTLN** | Good | High | 10-20% | +| **DeepFilterNet 3** | **Excellent** | **Very High** | 25-50%+ | -- `callNoiseSuppression` setting is `'off' | 'browser' | 'ml'` (legacy boolean migrates: `true`→`browser`, `false`→`off`) -- `CallEmbed.getWidget()` maps the tier to the `noiseSuppression` URL param and appends `lotusDenoise=ml` for the ML tier (browser-native suppressor is disabled in ML mode so RNNoise owns suppression) -- The `lotusDenoise` vite plugin copies the RNNoise worklet + wasm into `public/element-call/denoise/`, copies the shim, and injects `