AffectiSense is an intelligent, clinically oriented, and modality-resilient mental health assessment platform. By fusing neurophysiological (EEG), vocal biomarker (audio), and behavioral (facial expressions/video) signals, AffectiSense provides objective, highly interpretable depression screening with calibrated confidence metrics.
Unlike traditional multi-modal frameworks that fail when certain inputs are missing, AffectiSense is designed to gracefully degradeβproducing meaningful severity and diagnostic predictions whether given a single modality (e.g., audio only) or the full EAV triplet.
-
Modality Resilience by Design
Uses an Attention Bottleneck Fusion Core trained with Modality Dropout ($p=0.3$ ). The system supports any subset of available inputs (Audio, Video, EEG) dynamically, ensuring clinical viability in messy, real-world healthcare environments. -
Interpretability & Trust (XAI)
Moves away from opaque black boxes. Multimodal attention weights are mathematically projected back to specific time-frequency ranges, vocal biomarkers, or facial action units (AUs), mapping them directly to DSM-5 symptom clusters. -
Calibrated Confidence
Implements uncertainty estimation through Monte Carlo Dropout and quality-assessment checks. Clinicians are explicitly informed of when to trust or review the system's output. -
Privacy-by-Design (HIPAA Aware)
Visual streams are processed locally via landmark detectionβdiscarding raw video frames instantly and saving only lightweight, de-identified spatial coordinates to guarantee patient privacy.
[ AVAILABLE SENSORS ]
/ | \
(optional) | (optional)
EEG Audio Video
| | |
[EEGNet] [HuBERT] [MediaPipe]
| | |
GAT Graph Prosody/Spectral Landmarks
Features Embeddings & AU Dynamics
\ | /
\ | /
ββββββββββββββββββββββββββββββββββββββββββββ
β Modality Availability Embeddings β
ββββββββββββββββββββββββββββββββββββββββββββ€
β Stochastic Modality Dropout Core β
ββββββββββββββββββββββββββββββββββββββββββββ€
β Cross-Attention Bottleneck Fusion β
ββββββββββββββββββββββ¬ββββββββββββββββββββββ
|
βββββββββββββββββββ΄ββββββββββββββββββ
β β
[ Clinical Predictor ] [ Confidence Head ]
βββ Binary Classification βββ Epistemic Uncertainty
βββ PHQ-8 Severity Score βββ Input Quality Calibrator
βββ Attention Map Projection
β
βΌ
[ DSM-5 Grounded Narrative ]
βββ RAG Agent Clinical Co-Pilot
- Audio (Vocal Biomarkers): Custom projection head running over frozen state-of-the-art self-supervised foundation embeddings (HuBERT-Base layers 6β9) to capture speech rate, tone, rhythm, and acoustic biomarkers of depression.
- Video (Facial Expression Dynamics): Temporal Vision Transformer running over localized MediaPipe FaceMesh coordinates (468 landmarks + 52 FACS blendshapes) to trace hyper-specific facial action unit (AU) dynamics, gaze patterns, and blink behaviors.
- EEG (Neurophysiology - Phase 2): Hybrid EEGNet + Graph Attention Network (GAT) mapping functional connectivity across 128-channel or wearable 3-electrode systems.
Instead of high-dimensional direct concatenation, input embeddings are compressed through a small set of latent bottleneck tokens. This forces the network to distill complementary modal signals and protects against noisy or corrupted data.
AffectiSense is developed and validated using leading clinical and behavioral datasets:
- DAIC-WOZ / AVEC: Multimodal clinical interview corpus containing raw audio, video features, and transcriptions mapped to PHQ-8 scores.
- MODMA: Multi-modal Open Dataset for Mental-disorder Analysis containing high-density 128-channel EEG alongside audio signals.
- D-Vlog: Large-scale spontaneous video logs labeled for depression classification.
- CMU-MOSEI: Emotional and behavioral speech expressions in complex natural environments.
.
βββ backend/
β βββ app/
β β βββ api/ # API routing and gateways
β β βββ core/ # Configs, secrets, and system state
β β βββ models/ # PyTorch multimodal layers & fusion core
β β βββ pipelines/ # Modality feature processing & orchestration
β β βββ schemas/ # Request/Response Pydantic models
β β βββ utils/ # Signal processing helpers
β βββ requirements.txt # CPU-optimized ML & web stack
β βββ Dockerfile
βββ frontend/ # Next.js 14 Premium Clinical UI (Next Steps)
βββ configs/ # Hyperparameters, pipelines & thresholds
βββ data/
β βββ raw/ # Protected raw local files
β βββ models/ # Locally cached model checkpoints
βββ README.md
- Restructured repository to clean monorepo architecture.
- Formulated pydantic validation, pipeline schemas, and config files.
- Implement full local CPU-optimized feature-extraction pipelines (Librosa + MediaPipe).
- Design attention bottleneck fusion and modality dropout model.
- Deploy FastAPI engine with streaming-ready upload handlers.
- Create Next.js 14 clinical dashboard with diagnostic gauges and attention heatmap overlays.
- Integrate frozen pre-trained foundation models (HuBERT/Whisper) for audio features.
- Implement sequence-to-sequence temporal video transformer.
- Evaluate all 7 modality combinations (A, V, E, AV, AE, VE, AVE) on standard open datasets.
- Add EEGNet graph-network support for neurophysiological inputs.
- Integrate RAG-grounded LLM clinical co-pilot for automated diagnostic draft reports.
- Complete end-to-end HIPAA security hardening (Audit logging, RBAC, KMS-encryption).
- Dockerize production deployments via Kubernetes & Triton serving.
- Python 3.10+
- FFmpeg (for audio/video encoding)
-
Clone the repository and navigate to the directory:
git clone https://github.com/adityamhaske/AffectiSense.git cd AffectiSense -
Setup virtual environment and install backend dependencies:
python -m venv .venv source .venv/bin/activate pip install -r backend/requirements.txt -
Run local backend server:
uvicorn backend.app.api.main:app --reload
This platform is intended strictly for educational, portfolio, and academic research purposes. It is not a diagnostic medical device. It should only be evaluated as an auxiliary clinical support tool.
Licensed under the MIT License.