CLARITY

Low-Latency Edge-AI Speech Enhancement System for Hearing Accessibility

Low-Latency Edge-AI Speech Enhancement for Hearing Accessibility and Real-Time Audio Interfaces

Team

CLARITY was developed as part of the EMPOWER Student Design Challenge and Assistive Technologies Initiative.

Name	Role	GitHub
Pushkar Chaturvedy	Project Lead, Systems Design, Edge-AI Integration	@QCodeR-Innovate
Munukutla Vamsi	Hardware & Embedded Systems	@ThinkCrafty
Shreerang Joshi	Research & Validation, DFN Fine Tuning and Optimisation	@ThePurpleCode
Dhruv Kumar	Product Design & Documentation	@dhruvnarayan311-dot

Abstract

Traditional Frequency Modulation (FM) assistive audio systems can significantly improve speech intelligibility for students with hearing impairments, but they often rely on proprietary protocols and expensive closed hardware. CLARITY is an open-source alternative built to make low-latency classroom audio enhancement more accessible.

The system combines a Seeedstudio ReSpeaker Lite front-end (with XMOS-based hardware DSP) and a Raspberry Pi 4 Model B running a native DeepFilterNet3 inference pipeline. A custom LADSPA plugin wraps ONNX Runtime so the model can run in a real-time Linux audio path with asynchronous worker-thread execution and ring-buffer-based audio transfer. The result is a practical prototype for assistive listening, speech separation, conferencing and other low-power edge audio applications.

Project Resources

In addition to the technical documentation provided within this repository, supplementary project materials are available below.

Presentation & Demonstration

The complete project presentation includes:

System architecture visualizations
Engineering design rationale
Experimental validation
Spectrogram analysis
Embedded audio comparisons and demonstrations

📊 Presentation: View Canva Presentation

The presentation contains audio samples and comparative demonstrations that cannot be embedded directly within GitHub documentation.

Why this project exists

CLARITY was created to solve a real problem: the need for clear speech in places where standard commercial audio hardware isn't an option. Rather than just building a proof-of-concept, we designed a complete system that anyone can easily grasp, reproduce, modify, and expand from scratch.

This repository is structured so that the README itself acts as:

a research-paper style technical overview,
a build-and-replicate guide,
a user manual,
and an onboarding document for contributors.

Documentation

The CLARITY repository is organized into focused technical documents that cover different aspects of the system.

Document	Description
Hardware Platform	Hardware architecture, component selection, and engineering rationale
Product Design	User-centric design philosophy, beamforming strategy, wearability, and deployment model
Future Directions	Research extensions, optimization opportunities, and future applications

At a glance

Item	Details
Project name	CLARITY
Core use case	Classroom hearing accessibility
Secondary use cases	Video conferencing, voice enhancement, low-latency audio tools
Front-end hardware	Seeedstudio ReSpeaker Lite
Compute node	Raspberry Pi 4 Model B
Core model	DeepFilterNet3
Runtime format	ONNX
Audio plugin layer	LADSPA
Audio subsystem	ALSA
Design goal	Low-cost, low-latency, open-source speech enhancement

Key features

Two-stage audio pipeline split between hardware pre-processing and edge compute.
Real-time Linux audio integration using LADSPA and ALSA.
Asynchronous inference thread to keep the audio callback responsive.
Edge-friendly speech enhancement using DeepFilterNet3.
Modular architecture that can be adapted for assistive devices, conferencing, and wearable audio systems.
Open repository layout intended for long-term maintenance and reproducibility.

Future Directions

CLARITY was developed as a proof-of-concept edge-AI speech enhancement platform for hearing accessibility. While the current implementation demonstrates the feasibility of low-cost, low-latency speech enhancement, several opportunities exist for extending the research and expanding the product ecosystem.

Potential future directions include:

Domain-specific model retraining for classroom environments
Quantization and model compression for wearable deployment
Extended beamforming through larger microphone arrays
Real-time speaker tracking and localization
Bluetooth Low Energy and wireless hearing-aid integration
Multi-speaker separation and conversational awareness
Edge-AI communication devices and conferencing systems
Assistive technologies beyond hearing accessibility

For a detailed roadmap and research discussion, see:

📖 Future Directions

Visual overview

1) System concept and end-to-end topology

2) LADSPA / ONNX pipeline and worker-thread routing

3) DeepFilterNet3 architecture under the hood

4) Validation spectrograms: raw input vs deep-filtered output

Note: These figures are expected to live inside the assets/ folder exactly as named above.

System architecture

CLARITY is intentionally split into two layers:

Hardware pre-processing layer
The ReSpeaker Lite handles front-end capture, beamforming, and noise suppression.
Edge-AI enhancement layer
The Raspberry Pi 4 runs the DeepFilterNet3 model through an ONNX-based LADSPA plugin.

This division keeps the real-time audio path deterministic while allowing the neural inference workload to run asynchronously.

Architecture summary

Acoustic environment
        │
        ▼
Seeedstudio ReSpeaker Lite
(hardware DSP / mic array)
        │
        ▼
Raspberry Pi 4
(LADSPA audio callback + ONNX worker thread)
        │
        ▼
Enhanced speech output

ReSpeaker Lite │ ▼ Hardware DSP (AEC + AGC + Beamforming) │ ▼ Raspberry Pi 4 │ ▼ LADSPA Plugin │ ▼ ONNX Runtime │ ▼ DeepFilterNet3 │ ▼ Enhanced Speech

Hardware and software stack

Layer	Component	Role
Input hardware	Seeedstudio ReSpeaker Lite	Microphone array, beamforming, hardware noise suppression
Edge compute	Raspberry Pi 4 Model B	Runs the low-latency inference pipeline
ML model	DeepFilterNet3	Speech enhancement / deep filtering
Runtime	ONNX Runtime	Executes the model efficiently on-device
Audio plugin	LADSPA	Integrates processing into Linux audio flow
Audio backend	ALSA	Provides low-level real-time audio access
Build tooling	C / Rust / Python toolchain	Native plugin and runtime support

Hardware Platform

CLARITY leverages a two-stage edge-computing architecture consisting of a dedicated audio front-end DSP and an embedded AI inference engine.

Component	Purpose
Raspberry Pi 4 Model B	Edge AI inference and runtime orchestration
ReSpeaker Lite (XMOS XU316)	Beamforming, AEC, AGC, and noise suppression
DeepFilterNet3	Neural speech enhancement
ONNX Runtime	Efficient model inference
LADSPA	Low-latency Linux audio integration

Repository structure

Repository Structure

CLARITY/
│
├── assets/
│   ├── design_pipeline.png
│   ├── dfn_architecture.png
│   ├── onnx_ladspa_plugin.png
│   ├── raspberry-pi-4.png
│   ├── respeaker-lite.jpg
│   └── spectrogram_analysis.png
│
├── docs/
│   ├── FUTURE_DIRECTIONS.md
│   ├── HARDWARE.md
│   ├── PRODUCT_DESIGN.md
│   └── Product_Design_Documentation_Susmission.pdf
│
├── hardware/
│   └── (CAD models, enclosure designs, assembly resources)
│
├── software/
│   └── (source code, deployment scripts, model integration files)
│
├── .gitignore
├── LICENSE
├── README.md
└── THIRD_PARTY_LICENSES.md

Directory Overview

Directory / File	Purpose
`assets/`	Project figures, architecture diagrams, hardware images, and validation results
`docs/`	Technical documentation, product design rationale, hardware notes, and supporting reports
`hardware/`	CAD models, enclosure designs, PCB resources, and future hardware artifacts
`software/`	Source code, deployment scripts, inference pipeline components, and runtime integrations
`README.md`	Primary project documentation and replication guide
`LICENSE`	MIT License
`THIRD_PARTY_LICENSES.md`	Licensing information for external dependencies and referenced projects

How it works

Stage 1: hardware DSP pre-processing

The ReSpeaker Lite captures audio through a dual-MEMS microphone array. Its onboard XMOS processor handles early-stage processing such as:

beamforming,
acoustic echo cancellation,
automatic gain control,
and noise suppression.

This reduces the burden on the Raspberry Pi and helps preserve speech structure before neural enhancement begins.

Stage 2: DeepFilterNet3 enhancement

The edge compute node receives the cleaner signal and applies DeepFilterNet3. The model operates in the frequency domain using a perceptually meaningful representation, which makes it efficient enough for embedded deployment.

The overall flow is:

Noisy input
  → STFT
  → spectral features / ERB envelope
  → deep filtering / periodicity refinement
  → ISTFT
  → cleaned speech

Real-time threading model

The LADSPA plugin is designed around a split-thread architecture:

Audio callback thread: strict real-time audio handling
Worker thread: ONNX inference and frame processing

A ring-buffer arrangement decouples real-time input/output from model compute so the audio callback does not block.

Performance goals

Metric	Target / Observation
End-to-end latency	Sub-200 ms target
Model window latency	~40 ms model boundary
Runtime mode	Real-time streaming
Edge deployment	Raspberry Pi class hardware
Design priority	Preserve intelligibility without muting ambient context

Bill of materials

Component	Specification	Approx. cost (INR)
Raspberry Pi 4 Model B	Quad-core ARM Cortex-A72, 4 GB RAM	5,800
Seeedstudio ReSpeaker Lite	XMOS XU316, 2-mic array	2,560
MicroSD card	16 GB Class 10	500
Wired earphones	Low-impedance 3.5 mm monitoring	600
Power supply	5V 3A USB-C adapter	500
Total		~9,960

Hardware Overview

Raspberry Pi 4 Model B

ReSpeaker Lite (XMOS XU316)

For detailed hardware specifications and engineering rationale, see:

📖 Hardware Documentation

DeepFilterNet

CLARITY builds upon the excellent DeepFilterNet framework developed by the original authors.

Original Repository:

🔗 https://github.com/rikorose/deepfilternet

DeepFilterNet provides the low-complexity full-band speech enhancement architecture that serves as the core neural speech separation engine used in this project.

Please consider starring and citing the original repository if this project proves useful.

Prerequisites

Before building, make sure you have:

Raspberry Pi OS Lite (64-bit) or another lightweight Linux environment
ALSA access
build-essential / cmake / clang toolchain
Rust toolchain
Python with package installation access
a working ReSpeaker Lite connection
a DeepFilterNet3 model export compatible with ONNX Runtime

Quick start

1) Prepare the system

Use a clean Raspberry Pi OS Lite installation to avoid unnecessary desktop overhead.

2) Configure ALSA

A minimal ALSA configuration can route the default audio device directly to hardware:

pcm.!default {
    type hw
    card 1
}

ctl.!default {
    type hw
    card 1
}

Adjust card 1 to match your connected ReSpeaker Lite device.

3) Install dependencies

sudo apt-get update && sudo apt-get install -y \
    build-essential \
    cmake \
    ladspa-sdk \
    libasound2-dev \
    llvm-dev \
    clang

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source "$HOME/.cargo/env"

pip install maturin poetry

4) Build the LADSPA module

git clone https://github.com/rikorose/deepfilternet.git
cd deepfilternet
cd ladspa
cargo build --release
sudo cp target/release/libasound_module_ladspa_df.so /usr/lib/ladspa/

5) Run inference

export DEEPFILTERNET_MODEL="models/DeepFilterNet3"
deep-filter --compensate-delay --pf audio_input_stream.wav -o output/

Full replication guide

Step 1: flash the operating system

Install Raspberry Pi OS Lite (64-bit) to the MicroSD card.

Step 2: connect the input hardware

Attach the ReSpeaker Lite and verify that the device is visible to ALSA.

Step 3: confirm low-latency audio routing

Ensure that no unnecessary higher-level audio services are interfering with the stream.

Step 4: build the native module

Compile the LADSPA plugin and copy the resulting shared library into the LADSPA path.

Step 5: validate with known audio samples

Use controlled recordings first, then live classroom audio, then long-duration stress tests.

Validation and testing

The best visual proof of the system is the spectrogram comparison between raw audio and enhanced audio.

What to look for

broadband background noise should be reduced,
speech harmonics should stay visible,
formant structure should remain intact,
the output should sound clearer without becoming unnaturally gated.

Spectrogram reference

Design notes

Why LADSPA

LADSPA makes it possible to integrate signal processing into Linux audio pipelines with a small and focused plugin layer.

Why ONNX

ONNX gives a portable way to run the model without tying the project to one training framework.

Why DeepFilterNet3

DeepFilterNet3 provides a strong quality-to-efficiency balance for embedded speech enhancement.

Why asynchronous inference

Real-time audio systems cannot afford callback stalls. Separating the compute thread from the callback thread keeps the stream stable.

Known limitations

Performance depends on the exact Raspberry Pi model and system load.
Latency can vary if ALSA is not configured cleanly.
Model quality may change across different acoustic environments.
Wearable deployment will require stronger power optimization than the current prototype.

Roadmap

Next step	Goal
Quantization-aware training	Reduce model size and improve throughput
Structured pruning	Lower compute cost on edge devices
Domain adaptation	Improve performance in Indian classroom acoustics
Buffer tuning	Reduce jitter and improve determinism
Power optimization	Move toward battery-friendly deployment
Wearable form factor	Explore ultra-low-power alternatives to Raspberry Pi-class compute

Applications beyond classrooms

Although the project began as an assistive classroom system, the same architecture can support:

meeting-room voice enhancement,
conferencing microphones,
edge audio assistants,
wearable speech clarification,
and other low-latency speech separation tools.

Contributing

Contributions are welcome. Useful areas include:

audio pipeline improvements,
real-time performance tuning,
model compression,
hardware casing and integration,
documentation,
and benchmark reporting.

Suggested contribution workflow

Fork the repository.
Create a feature branch.
Make your changes.
Test with real audio where possible.
Submit a pull request with a clear explanation of the change.

Acknowledgements

This project builds on ideas and tooling from the broader open-source audio and machine learning ecosystem, including:

DeepFilterNet
LADSPA
ONNX Runtime
ALSA
Raspberry Pi
Seeedstudio ReSpeaker Lite

DeepFilterNet

Speech enhancement framework:

https://github.com/rikorose/deepfilternet

Raspberry Pi Foundation

Embedded computing platform:

https://www.raspberrypi.com/products/raspberry-pi-4-model-b/

Seeed Studio

ReSpeaker Lite microphone array:

https://www.seeedstudio.com/ReSpeaker-Lite-p-5928.html

ONNX Runtime

Cross-platform machine learning inference engine:

https://onnxruntime.ai/

Authors

Pushkar Chaturvedy
Munukutla Surya Vamsi Kalyan
Shreerang Joshi
Dhruv Kumar

For questions, suggestions, or collaboration opportunities, please reach out through GitHub Issues or contact the contributors listed above.

Contributing

Contributions, bug reports, feature requests, and documentation improvements are welcome.

Please open an issue before submitting significant architectural changes to ensure alignment with the project's roadmap.

License

This project is released under the MIT License.

The MIT License permits unrestricted use, modification, distribution, and commercial adoption of this work, provided that the original copyright notice and license text are included in all copies or substantial portions of the software.

For full license details, see the LICENSE file in the repository.

Contact / project notes

This README is written to serve as the public-facing technical entry point for the CLARITY repository. It is structured so that a new contributor can understand the idea, inspect the architecture, and reproduce the build with minimal context.

Suggested next files

hardware/assembly-notes.md
software/build.md
software/runtime.md
documents/benchmark-report.md
documents/architecture-notes.md

Figure placeholders recap

Figure	File path
System concept / topology	`assets/design_pipeline.png`
LADSPA / ONNX threading pipeline	`assets/onnx_ladspa_plugin.png`
DeepFilterNet3 architecture	`assets/dfn_architecture.png`
Spectrogram validation	`assets/spectrogram_analysis.png`

Disclaimer

CLARITY is a research and educational project and is not certified as a medical device.

The system is intended for experimentation, prototyping, and academic research purposes only.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
assets		assets
docs		docs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

CLARITY

Low-Latency Edge-AI Speech Enhancement System for Hearing Accessibility

Team

Abstract

Project Resources

Presentation & Demonstration

Why this project exists

Documentation

At a glance

Key features

Future Directions

Visual overview

1) System concept and end-to-end topology

2) LADSPA / ONNX pipeline and worker-thread routing

3) DeepFilterNet3 architecture under the hood

4) Validation spectrograms: raw input vs deep-filtered output

System architecture

Architecture summary

Hardware and software stack

Hardware Platform

Repository structure

Repository Structure

Directory Overview

How it works

Stage 1: hardware DSP pre-processing

Stage 2: DeepFilterNet3 enhancement

Real-time threading model

Performance goals

Bill of materials

Hardware Overview

DeepFilterNet

Prerequisites

Quick start

1) Prepare the system

2) Configure ALSA

3) Install dependencies

4) Build the LADSPA module

5) Run inference

Full replication guide

Step 1: flash the operating system

Step 2: connect the input hardware

Step 3: confirm low-latency audio routing

Step 4: build the native module

Step 5: validate with known audio samples

Validation and testing

What to look for

Spectrogram reference

Design notes

Why LADSPA

Why ONNX

Why DeepFilterNet3

Why asynchronous inference

Known limitations

Roadmap

Applications beyond classrooms

Contributing

Suggested contribution workflow

Acknowledgements

DeepFilterNet

Raspberry Pi Foundation

Seeed Studio

ONNX Runtime

Authors

Contributing

License

Contact / project notes

Suggested next files

Figure placeholders recap

Disclaimer

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Packages