feat(calls): implement advanced multi-model ML noise suppression system
CI / Build & Quality Checks (push) Failing after 4m49s
Trigger Desktop Build / trigger (push) Successful in 11s

Implement a flexible, multi-model noise suppression pipeline for Element Call/LiveKit integration:

- ML Engines: Added support for RNNoise, Speex, DTLN, and DeepFilterNet 3 models.
- Pipeline Architecture: Implemented modular audio processing in lotus-denoise.js, supporting 'Series Suppression' (running browser-native NSNet2 before ML) and a hardware-style Noise Gate.
- UI & UX Enhancements:
  - Settings UI: Added model comparison chart with CPU/Quality metadata.
  - Tuning: Added Live Microphone Meter for calibrating Noise Gate thresholds.
  - Reporting: Added LotusToast system to alert users when ML suppression fails or falls back to raw input.
- Robustness & Quality:
  - Capture Fidelity: Removed forced 48kHz capture constraints to allow native-rate capture (solving static issues with high-end audio interfaces).
  - Performance: Added WASM SIMD detection with transparent fallback.
  - Capability Detection: Added browser feature detection to disable unsupported ML modes.
- Build Integration: Updated Vite config to self-host all model WASM/tflite assets in /denoise/ directory.
This commit is contained in:
2026-06-16 00:50:12 -04:00
parent 938ead79f7
commit 5d5f5f4516
10 changed files with 606 additions and 105 deletions
+28 -17
View File
@@ -405,32 +405,43 @@ A local sound plays when another participant joins or leaves a call you're in.
Files: `src/app/utils/callSounds.ts`, `src/app/hooks/useCallJoinLeaveSounds.ts`
### Noise Suppression (3-Tier, incl. on-device ML) (P5-30)
### Noise Suppression (Advanced Multi-Tier) (P5-30)
A three-way mic noise-suppression control in **Settings → General → Calls**:
A comprehensive mic noise-suppression system in **Settings → General → Calls** designed for high-end hardware and detailed performance testing.
| Tier | What it does |
| Tier | Description |
| ------------------ | ----------------------------------------------------------------------------- |
| **Off** | No suppression (`noiseSuppression=false` to Element Call). |
| **Browser-native** | Element Call's built-in WebRTC suppressor (`noiseSuppression=true`). Default. |
| **ML (beta)** | On-device RNNoise — Krisp-style removal of fans, keyboards, dogs, etc. |
| **Off** | No suppression applied. |
| **Browser-native** | Google NSNet2 (WebRTC built-in). Best general performance/CPU balance. |
| **ML (Advanced)** | Custom ML pipeline supporting multiple models, series suppression, and gates. |
**Why a shim, not a fork:** Element Call captures the mic _inside_ its iframe and publishes to LiveKit; the host can't reach that track. LiveKit's Krisp filter is LiveKit-Cloud-only (we self-host the SFU), and EC's own RNNoise work (PR #3892) is unmerged. So the **ML tier** is delivered by injecting a same-origin pre-init script into the vendored EC `index.html` that monkeypatches `getUserMedia` and routes the captured mic through an RNNoise `AudioWorklet` (`@sapphi-red/web-noise-suppressor`) before LiveKit ever sees it — the same post-capture pipeline #3892 uses, executed from the realm we already control. Works on the self-hosted LiveKit SFU, survives EC version bumps, no EC fork/AGPL/rebase burden.
**Advanced Features & Test Options:**
- **Multiple ML Models:** Toggle between **RNNoise** (standard hybrid) and **Speex** (legacy DSP-based) to compare artifact levels and suppression strength.
- **Series Suppression (Combination):** Optional toggle to run the browser's native stationary noise filter *before* the ML model. This allows testing the individual performance of the ML model vs the combined effectiveness at removing fan hum.
- **Noise Gate:** Configurable hardware-style gate with a dB threshold. Hard-cuts all audio when input is below the threshold, ensuring absolute silence between sentences.
- **Live Microphone Meter:** A real-time volume visualizer in the settings panel to help users accurately tune their Noise Gate threshold.
- **High-Fidelity Capture:** Captures at hardware native rates (supporting high-end gear like **Scarlett Solo + PodMic**) and handles high-quality resampling via Web Audio to prevent the "static" artifacts caused by low-quality browser pre-resamplers.
- **Performance:** Automatic WASM SIMD detection with transparent fallback to standard binaries.
- **Support Detection:** UI now detects `AudioWorklet` / `AudioContext` support and disables ML options in unsupported environments.
- **Status Reporting:** The ML shim notifies the host app via `postMessage`. If initialization fails, a system toast alerts the user of the fallback to the raw microphone.
**How it's wired:**
**Open-Source Model Roadmap:**
| Model | Transients (Clicks) | Voice Quality | CPU Usage (WASM) |
| :--- | :--- | :--- | :--- |
| **RNNoise** | Poor | Moderate | < 5% |
| **DTLN** | Good | High | 10-20% |
| **DeepFilterNet 3** | **Excellent** | **Very High** | 25-50%+ |
- `callNoiseSuppression` setting is `'off' | 'browser' | 'ml'` (legacy boolean migrates: `true``browser`, `false``off`)
- `CallEmbed.getWidget()` maps the tier to the `noiseSuppression` URL param and appends `lotusDenoise=ml` for the ML tier (browser-native suppressor is disabled in ML mode so RNNoise owns suppression)
- The `lotusDenoise` vite plugin copies the RNNoise worklet + wasm into `public/element-call/denoise/`, copies the shim, and injects `<script src="./lotus-denoise.js">` before EC's module entry
- The shim keeps `echoCancellation`/`autoGainControl` on the raw capture and falls back to the raw mic if RNNoise setup fails, so calls never break
**Known beta caveat:** routing capture through WebAudio can weaken the browser's acoustic echo cancellation (AEC runs on the native capture track) — the same tradeoff EC's upstream feature makes; hence the "beta" label.
> **Note:** DeepFilterNet 3 is planned for future inclusion in the desktop build where larger binaries and higher CPU overhead are more acceptable.
### Files
- `build/lotus-denoise.js`injected RNNoise getUserMedia shim (classic script)
- `vite.config.js``lotusDenoise()` plugin (asset copy + index.html injection)
- `src/app/plugins/call/CallEmbed.ts` — tier → widget URL params
- `build/lotus-denoise.js`multi-model getUserMedia shim
- `vite.config.js``lotusDenoise()` plugin (copies assets for RNNoise, Speex, and NoiseGate)
- `src/app/plugins/call/CallEmbed.ts` advanced tier → widget URL params
- `src/app/utils/lotusDenoiseUtils.ts` — support detection and model comparison metadata
- `src/app/features/settings/general/General.tsx` — advanced settings UI + mic meter
### Call Button Scoping