Low-Latency Edge-AI Speech Enhancement for Hearing Accessibility and Real-Time Audio Interfaces
CLARITY was developed as part of the EMPOWER Student Design Challenge and Assistive Technologies Initiative.
| Name | Role | GitHub |
|---|---|---|
| Pushkar Chaturvedy | Project Lead, Systems Design, Edge-AI Integration | @QCodeR-Innovate |
| Munukutla Vamsi | Hardware & Embedded Systems | @ThinkCrafty |
| Shreerang Joshi | Research & Validation, DFN Fine Tuning and Optimisation | @ThePurpleCode |
| Dhruv Kumar | Product Design & Documentation | @dhruvnarayan311-dot |
Traditional Frequency Modulation (FM) assistive audio systems can significantly improve speech intelligibility for students with hearing impairments, but they often rely on proprietary protocols and expensive closed hardware. CLARITY is an open-source alternative built to make low-latency classroom audio enhancement more accessible.
The system combines a Seeedstudio ReSpeaker Lite front-end (with XMOS-based hardware DSP) and a Raspberry Pi 4 Model B running a native DeepFilterNet3 inference pipeline. A custom LADSPA plugin wraps ONNX Runtime so the model can run in a real-time Linux audio path with asynchronous worker-thread execution and ring-buffer-based audio transfer. The result is a practical prototype for assistive listening, speech separation, conferencing and other low-power edge audio applications.
In addition to the technical documentation provided within this repository, supplementary project materials are available below.
The complete project presentation includes:
- System architecture visualizations
- Engineering design rationale
- Experimental validation
- Spectrogram analysis
- Embedded audio comparisons and demonstrations
📊 Presentation: View Canva Presentation
The presentation contains audio samples and comparative demonstrations that cannot be embedded directly within GitHub documentation.
CLARITY was created to solve a real problem: the need for clear speech in places where standard commercial audio hardware isn't an option. Rather than just building a proof-of-concept, we designed a complete system that anyone can easily grasp, reproduce, modify, and expand from scratch.
This repository is structured so that the README itself acts as:
- a research-paper style technical overview,
- a build-and-replicate guide,
- a user manual,
- and an onboarding document for contributors.
The CLARITY repository is organized into focused technical documents that cover different aspects of the system.
| Document | Description |
|---|---|
| Hardware Platform | Hardware architecture, component selection, and engineering rationale |
| Product Design | User-centric design philosophy, beamforming strategy, wearability, and deployment model |
| Future Directions | Research extensions, optimization opportunities, and future applications |
| Item | Details |
|---|---|
| Project name | CLARITY |
| Core use case | Classroom hearing accessibility |
| Secondary use cases | Video conferencing, voice enhancement, low-latency audio tools |
| Front-end hardware | Seeedstudio ReSpeaker Lite |
| Compute node | Raspberry Pi 4 Model B |
| Core model | DeepFilterNet3 |
| Runtime format | ONNX |
| Audio plugin layer | LADSPA |
| Audio subsystem | ALSA |
| Design goal | Low-cost, low-latency, open-source speech enhancement |
- Two-stage audio pipeline split between hardware pre-processing and edge compute.
- Real-time Linux audio integration using LADSPA and ALSA.
- Asynchronous inference thread to keep the audio callback responsive.
- Edge-friendly speech enhancement using DeepFilterNet3.
- Modular architecture that can be adapted for assistive devices, conferencing, and wearable audio systems.
- Open repository layout intended for long-term maintenance and reproducibility.
CLARITY was developed as a proof-of-concept edge-AI speech enhancement platform for hearing accessibility. While the current implementation demonstrates the feasibility of low-cost, low-latency speech enhancement, several opportunities exist for extending the research and expanding the product ecosystem.
Potential future directions include:
- Domain-specific model retraining for classroom environments
- Quantization and model compression for wearable deployment
- Extended beamforming through larger microphone arrays
- Real-time speaker tracking and localization
- Bluetooth Low Energy and wireless hearing-aid integration
- Multi-speaker separation and conversational awareness
- Edge-AI communication devices and conferencing systems
- Assistive technologies beyond hearing accessibility
For a detailed roadmap and research discussion, see:
Note: These figures are expected to live inside the
assets/folder exactly as named above.
CLARITY is intentionally split into two layers:
-
Hardware pre-processing layer
The ReSpeaker Lite handles front-end capture, beamforming, and noise suppression. -
Edge-AI enhancement layer
The Raspberry Pi 4 runs the DeepFilterNet3 model through an ONNX-based LADSPA plugin.
This division keeps the real-time audio path deterministic while allowing the neural inference workload to run asynchronously.
Acoustic environment
│
▼
Seeedstudio ReSpeaker Lite
(hardware DSP / mic array)
│
▼
Raspberry Pi 4
(LADSPA audio callback + ONNX worker thread)
│
▼
Enhanced speech output
ReSpeaker Lite │ ▼ Hardware DSP (AEC + AGC + Beamforming) │ ▼ Raspberry Pi 4 │ ▼ LADSPA Plugin │ ▼ ONNX Runtime │ ▼ DeepFilterNet3 │ ▼ Enhanced Speech
| Layer | Component | Role |
|---|---|---|
| Input hardware | Seeedstudio ReSpeaker Lite | Microphone array, beamforming, hardware noise suppression |
| Edge compute | Raspberry Pi 4 Model B | Runs the low-latency inference pipeline |
| ML model | DeepFilterNet3 | Speech enhancement / deep filtering |
| Runtime | ONNX Runtime | Executes the model efficiently on-device |
| Audio plugin | LADSPA | Integrates processing into Linux audio flow |
| Audio backend | ALSA | Provides low-level real-time audio access |
| Build tooling | C / Rust / Python toolchain | Native plugin and runtime support |
CLARITY leverages a two-stage edge-computing architecture consisting of a dedicated audio front-end DSP and an embedded AI inference engine.
| Component | Purpose |
|---|---|
| Raspberry Pi 4 Model B | Edge AI inference and runtime orchestration |
| ReSpeaker Lite (XMOS XU316) | Beamforming, AEC, AGC, and noise suppression |
| DeepFilterNet3 | Neural speech enhancement |
| ONNX Runtime | Efficient model inference |
| LADSPA | Low-latency Linux audio integration |
CLARITY/
│
├── assets/
│ ├── design_pipeline.png
│ ├── dfn_architecture.png
│ ├── onnx_ladspa_plugin.png
│ ├── raspberry-pi-4.png
│ ├── respeaker-lite.jpg
│ └── spectrogram_analysis.png
│
├── docs/
│ ├── FUTURE_DIRECTIONS.md
│ ├── HARDWARE.md
│ ├── PRODUCT_DESIGN.md
│ └── Product_Design_Documentation_Susmission.pdf
│
├── hardware/
│ └── (CAD models, enclosure designs, assembly resources)
│
├── software/
│ └── (source code, deployment scripts, model integration files)
│
├── .gitignore
├── LICENSE
├── README.md
└── THIRD_PARTY_LICENSES.md
| Directory / File | Purpose |
|---|---|
assets/ |
Project figures, architecture diagrams, hardware images, and validation results |
docs/ |
Technical documentation, product design rationale, hardware notes, and supporting reports |
hardware/ |
CAD models, enclosure designs, PCB resources, and future hardware artifacts |
software/ |
Source code, deployment scripts, inference pipeline components, and runtime integrations |
README.md |
Primary project documentation and replication guide |
LICENSE |
MIT License |
THIRD_PARTY_LICENSES.md |
Licensing information for external dependencies and referenced projects |
The ReSpeaker Lite captures audio through a dual-MEMS microphone array. Its onboard XMOS processor handles early-stage processing such as:
- beamforming,
- acoustic echo cancellation,
- automatic gain control,
- and noise suppression.
This reduces the burden on the Raspberry Pi and helps preserve speech structure before neural enhancement begins.
The edge compute node receives the cleaner signal and applies DeepFilterNet3. The model operates in the frequency domain using a perceptually meaningful representation, which makes it efficient enough for embedded deployment.
The overall flow is:
Noisy input
→ STFT
→ spectral features / ERB envelope
→ deep filtering / periodicity refinement
→ ISTFT
→ cleaned speech
The LADSPA plugin is designed around a split-thread architecture:
- Audio callback thread: strict real-time audio handling
- Worker thread: ONNX inference and frame processing
A ring-buffer arrangement decouples real-time input/output from model compute so the audio callback does not block.
| Metric | Target / Observation |
|---|---|
| End-to-end latency | Sub-200 ms target |
| Model window latency | ~40 ms model boundary |
| Runtime mode | Real-time streaming |
| Edge deployment | Raspberry Pi class hardware |
| Design priority | Preserve intelligibility without muting ambient context |
| Component | Specification | Approx. cost (INR) |
|---|---|---|
| Raspberry Pi 4 Model B | Quad-core ARM Cortex-A72, 4 GB RAM | 5,800 |
| Seeedstudio ReSpeaker Lite | XMOS XU316, 2-mic array | 2,560 |
| MicroSD card | 16 GB Class 10 | 500 |
| Wired earphones | Low-impedance 3.5 mm monitoring | 600 |
| Power supply | 5V 3A USB-C adapter | 500 |
| Total | ~9,960 |
![]() Raspberry Pi 4 Model B |
![]() ReSpeaker Lite (XMOS XU316) |
For detailed hardware specifications and engineering rationale, see:
CLARITY builds upon the excellent DeepFilterNet framework developed by the original authors.
Original Repository:
🔗 https://github.com/rikorose/deepfilternet
DeepFilterNet provides the low-complexity full-band speech enhancement architecture that serves as the core neural speech separation engine used in this project.
Please consider starring and citing the original repository if this project proves useful.
Before building, make sure you have:
- Raspberry Pi OS Lite (64-bit) or another lightweight Linux environment
- ALSA access
- build-essential / cmake / clang toolchain
- Rust toolchain
- Python with package installation access
- a working ReSpeaker Lite connection
- a DeepFilterNet3 model export compatible with ONNX Runtime
Use a clean Raspberry Pi OS Lite installation to avoid unnecessary desktop overhead.
A minimal ALSA configuration can route the default audio device directly to hardware:
pcm.!default {
type hw
card 1
}
ctl.!default {
type hw
card 1
}Adjust
card 1to match your connected ReSpeaker Lite device.
sudo apt-get update && sudo apt-get install -y \
build-essential \
cmake \
ladspa-sdk \
libasound2-dev \
llvm-dev \
clang
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source "$HOME/.cargo/env"
pip install maturin poetrygit clone https://github.com/rikorose/deepfilternet.git
cd deepfilternet
cd ladspa
cargo build --release
sudo cp target/release/libasound_module_ladspa_df.so /usr/lib/ladspa/export DEEPFILTERNET_MODEL="models/DeepFilterNet3"
deep-filter --compensate-delay --pf audio_input_stream.wav -o output/Install Raspberry Pi OS Lite (64-bit) to the MicroSD card.
Attach the ReSpeaker Lite and verify that the device is visible to ALSA.
Ensure that no unnecessary higher-level audio services are interfering with the stream.
Compile the LADSPA plugin and copy the resulting shared library into the LADSPA path.
Use controlled recordings first, then live classroom audio, then long-duration stress tests.
The best visual proof of the system is the spectrogram comparison between raw audio and enhanced audio.
- broadband background noise should be reduced,
- speech harmonics should stay visible,
- formant structure should remain intact,
- the output should sound clearer without becoming unnaturally gated.
LADSPA makes it possible to integrate signal processing into Linux audio pipelines with a small and focused plugin layer.
ONNX gives a portable way to run the model without tying the project to one training framework.
DeepFilterNet3 provides a strong quality-to-efficiency balance for embedded speech enhancement.
Real-time audio systems cannot afford callback stalls. Separating the compute thread from the callback thread keeps the stream stable.
- Performance depends on the exact Raspberry Pi model and system load.
- Latency can vary if ALSA is not configured cleanly.
- Model quality may change across different acoustic environments.
- Wearable deployment will require stronger power optimization than the current prototype.
| Next step | Goal |
|---|---|
| Quantization-aware training | Reduce model size and improve throughput |
| Structured pruning | Lower compute cost on edge devices |
| Domain adaptation | Improve performance in Indian classroom acoustics |
| Buffer tuning | Reduce jitter and improve determinism |
| Power optimization | Move toward battery-friendly deployment |
| Wearable form factor | Explore ultra-low-power alternatives to Raspberry Pi-class compute |
Although the project began as an assistive classroom system, the same architecture can support:
- meeting-room voice enhancement,
- conferencing microphones,
- edge audio assistants,
- wearable speech clarification,
- and other low-latency speech separation tools.
Contributions are welcome. Useful areas include:
- audio pipeline improvements,
- real-time performance tuning,
- model compression,
- hardware casing and integration,
- documentation,
- and benchmark reporting.
- Fork the repository.
- Create a feature branch.
- Make your changes.
- Test with real audio where possible.
- Submit a pull request with a clear explanation of the change.
This project builds on ideas and tooling from the broader open-source audio and machine learning ecosystem, including:
- DeepFilterNet
- LADSPA
- ONNX Runtime
- ALSA
- Raspberry Pi
- Seeedstudio ReSpeaker Lite
Speech enhancement framework:
https://github.com/rikorose/deepfilternet
Embedded computing platform:
https://www.raspberrypi.com/products/raspberry-pi-4-model-b/
ReSpeaker Lite microphone array:
https://www.seeedstudio.com/ReSpeaker-Lite-p-5928.html
Cross-platform machine learning inference engine:
- Pushkar Chaturvedy
- Munukutla Surya Vamsi Kalyan
- Shreerang Joshi
- Dhruv Kumar
For questions, suggestions, or collaboration opportunities, please reach out through GitHub Issues or contact the contributors listed above.
Contributions, bug reports, feature requests, and documentation improvements are welcome.
Please open an issue before submitting significant architectural changes to ensure alignment with the project's roadmap.
This project is released under the MIT License.
The MIT License permits unrestricted use, modification, distribution, and commercial adoption of this work, provided that the original copyright notice and license text are included in all copies or substantial portions of the software.
For full license details, see the LICENSE file in the repository.
© 2026 Pushkar Chaturvedy and Contributors.
This README is written to serve as the public-facing technical entry point for the CLARITY repository. It is structured so that a new contributor can understand the idea, inspect the architecture, and reproduce the build with minimal context.
hardware/assembly-notes.mdsoftware/build.mdsoftware/runtime.mddocuments/benchmark-report.mddocuments/architecture-notes.md
| Figure | File path |
|---|---|
| System concept / topology | assets/design_pipeline.png |
| LADSPA / ONNX threading pipeline | assets/onnx_ladspa_plugin.png |
| DeepFilterNet3 architecture | assets/dfn_architecture.png |
| Spectrogram validation | assets/spectrogram_analysis.png |
CLARITY is a research and educational project and is not certified as a medical device.
The system is intended for experimentation, prototyping, and academic research purposes only.






