6634b2b8a2
Audit/repair of the multi-model denoise work so it actually builds and only exposes working, self-hosted models. - Complete the DTLN/DFN3 revert: uninstall @workadventure/noise-suppression and deepfilternet3-noise-filter (package.json + lockfile), drop the unused DTLN asset-copy block from vite.config.js (was shipping ~2MB of unused tflite/wasm), and narrow DenoiseModelId to the bundled models (rnnoise, speex). Coerce any retired persisted model value back to the default. - Fix General.tsx CI typecheck failures introduced by the denoise UI: restore three imports the rewrite deleted (useDateFormatItems, SequenceCardStyle, useTauriUpdater), add the missing denoise/sound imports, and correct hallucinated Folds props (Text has no variant/bold; Box uses alignItems/justifyContent). tsc now passes with 0 errors. - Harden the vite denoise plugin: required RNNoise/Speex/gate assets and the shim now fail the build loudly if missing (instead of a silent warn that shipped a broken ML feature), and the index.html shim injection is verified. - CI: move the cinny-desktop submodule bump into ci.yml as a `trigger-desktop` job gated on `needs: build`, and delete the standalone trigger-desktop.yml. A failing push no longer kicks off the slow Tauri builds in parallel. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
4.6 KiB
4.6 KiB
Engineering Review: Multi-Model ML Noise Suppression Upgrade (P5-30)
Overview
This PR implements a robust, modular, and high-fidelity client-side audio processing pipeline for noise suppression (NS) within Lotus Chat. It addresses issues with static noise artifacts, suboptimal sample rate resampling, and the lack of transparency in the audio processing chain.
1. Architectural Changes
1.1 Audio Processing Pipeline (lotus-denoise.js)
- Decoupled Initialization: The shim now treats the audio chain as a configurable graph:
Source→Noise Gate(optional) →ML Model→LiveKit. - Series Processing: We enabled the browser-native suppressor (Google NSNet2) to run in series with the ML model. The native engine handles stationary noise (fan hum) efficiently, while the ML model focuses on transient "life" noise (keyboard clicks, mouse taps).
- Hardware Fidelity: Removed forced
48kHzcapture constraints ingetUserMedia. This allows high-end audio interfaces (e.g., Rode/Scarlett at 48kHz) to pass raw audio without low-quality browser-level resampling, which was previously creating "static" artifacts. - SIMD Optimization: Added runtime
WebAssembly.validatechecks to detect SIMD support. The pipeline dynamically selectsrnnoise_simd.wasmover standard WASM if supported, reducing CPU utilization. - Failure Resilience: Wrapped the entire graph initialization in
Promise.all+try/catch. If any component (WASM loading, AudioWorklet initialization) fails, the shim sends apostMessagefailure report and falls back to the raw microphone stream, ensuring calls never drop due to suppression errors.
1.2 Multi-Model Support
Added support for 4 distinct processing models:
- RNNoise (Mozilla): Default lightweight hybrid model.
- Speex (Legacy): DSP-based fallback for extremely low-CPU requirements.
- DTLN (Balanced): Deep learning model (~15% CPU). Improved transient handling.
- DeepFilterNet 3 (Pro): Studio-grade Deep Learning (~25-50%+ CPU). Designed for high-fidelity noise removal.
2. Infrastructure & Build Integration (vite.config.js)
- Automated Asset Pipeline: Added rules to copy model assets (TFLite models, WASM runtimes) from
node_modulesinto thedenoise/directory during build. - CI-Friendly: The copy logic now includes
console.warnfallbacks for missing assets to prevent build failures in environments wherenpm installhasn't yet finished, facilitating robust CI/CD integration. - Self-Hosting: All assets are explicitly served from the
/denoise/path, ensuring full privacy and avoiding external CDN dependencies at runtime.
3. UI & UX Improvements
3.1 Settings & Tuning (General.tsx)
- Capability Detection: Created
lotusDenoiseUtils.tsto verify support forAudioContextandAudioWorklet. The ML option is programmatically disabled in unsupported browsers (e.g., Safari/Mobile) with a clear requirement list. - Comparison Chart: Added a UI table listing
Model,CPU Usage,Quality, andTransient Handlingto allow users to make informed decisions based on their hardware. - Live Tuning: Added a
MicMetercomponent using anAnalyserNodeto provide real-time visual feedback, enabling users to calibrate the Noise Gate Threshold (-100dB to 0dB) precisely to their microphone's noise floor.
3.2 Error Reporting
- Inter-Iframe Comms: The shim now reports status and failures to the parent
LotusChathost viawindow.parent.postMessage. - System Toasts: Added
LotusDenoiseFeatureinClientNonUIFeatures.tsx. It listens for these events and triggers a non-intrusive system toast if the noise suppression falls back to raw mic, ensuring users know their microphone status.
4. Technical Debt & Safety
- Settings Persistence: Added strongly-typed settings fields for
callDenoiseModel,callDenoiseNativeNS,callDenoiseGate, andcallDenoiseGateThresholdtosettings.ts. - Clean Teardown: Improved
cleanup()logic inlotus-denoise.jsto ensure theAudioContextandMediaStreamTracksare properly released, preventing potential memory leaks or microphone "hanging" after calls.
Testing Instructions for Senior Engineer
- Calibration: Go to Settings, enable ML NS, toggle on Noise Gate, and click "Test Microphone". Confirm the meter reflects real-time audio.
- Validation: Test "Series Suppression ON" vs "OFF" with a fan running in the background to confirm native NS is effectively handling the stationary noise.
- Fallback Test: Introduce a malformed model request (via devtools console) to verify the System Toast notification functions.