A comprehensive, production-ready framework for multi-task deep learning in surgical video analysis, featuring instance segmentation, phase recognition, skill assessment, and video processing capabilities.
Cataract-LMM is an enterprise-grade AI framework designed for large-scale, multi-center surgical video analysis. Built on modern software engineering principles, this repository provides state-of-the-art deep learning models for comprehensive analysis of cataract surgery videos.
This framework implements methodologies from cutting-edge research in computer-assisted surgery, providing validated approaches for:
- Surgical Instance Segmentation using YOLO, Mask R-CNN, and SAM architectures
- Surgical Phase Recognition with Video Transformers, 3D CNNs, and temporal models
- Surgical Skill Assessment through multi-modal analysis and performance metrics
- Video Processing with GPU-accelerated pipelines for medical video data
- Production-Ready: Enterprise-grade architecture with comprehensive testing and CI/CD
- Multi-Task Learning: Unified framework supporting four core surgical analysis tasks
- Scalable Design: Microservices-ready architecture with containerization support
- Medical Compliance: HIPAA-aware design patterns and secure data handling
- Research-to-Production: Seamless transition from research notebooks to production deployment
- ๐ Quick Start
- โจ Features
- ๐๏ธ Architecture
- ๐ฆ Installation
- ๐ฏ Usage Examples
- ๐ ๏ธ Development
- ๐ Model Zoo
- ๐ง Configuration
- ๐งช Testing
- ๐ Documentation
- ๐ค Contributing
- ๐ License
- ๐ฃ Citation
- ๐จโ๐ป Author
- ๐ Support & Community
- ๐ Roadmap
- Python 3.8+
- CUDA 11.8+ (for GPU acceleration)
- FFmpeg (for video processing)
- Docker (optional, for containerized deployment)
# Clone the repository
git clone https://github.com/MJAHMADEE/Cataract_LMM.git
cd Cataract_LMM
# Install using Poetry (recommended)
cd codes
poetry install
# Activate virtual environment
poetry shell
# Or install using pip
pip install -r requirements.txt
# Validate installation
python setup.py --validate-only# Video processing
cd surgical-video-processing
python main.py --input path/to/video.mp4 --output ./results --config configs/default.yaml
# Instance segmentation
cd surgical-instance-segmentation
python inference/predictor.py --model yolo --input data/images/
# Phase recognition
cd surgical-phase-recognition
python validation/training_framework.py --config configs/default.yaml --mode train
# Skill assessment
cd surgical-skill-assessment
python main.py --config configs/comprehensive.yaml --mode evaluate| Component | Models | Key Features |
|---|---|---|
| Instance Segmentation | YOLO v8/11, Mask R-CNN, SAM | Real-time surgical instrument detection and segmentation |
| Phase Recognition | Video Transformers, 3D CNNs, TeCNO | 11-phase surgical workflow analysis |
| Skill Assessment | Multi-modal CNNs, Attention Models | Objective surgical skill evaluation |
| Video Processing | GPU-Accelerated Pipelines | Medical-grade video preprocessing and enhancement |
- ๐๏ธ Modular Architecture: Microservices-ready design with clear separation of concerns
- ๐ Security First: HIPAA-compliant patterns, secure credential management
- ๐ Comprehensive Testing: 85%+ test coverage with unit, integration, and E2E tests
- ๐ CI/CD Pipeline: Automated testing, security scanning, and deployment workflows
- ๐ Monitoring & Observability: Structured logging, metrics collection, and health checks
- ๐ณ Containerization: Multi-stage Docker builds with security hardening
- ๐ Rich Documentation: Comprehensive guides, API references, and examples
- ๐ฏ Configuration Management: YAML-based configuration with validation
- ๐งช Development Tools: Pre-commit hooks, linting, formatting, and type checking
- ๐ฆ Dependency Management: Poetry-based modern Python packaging
- ๐ง Development Environment: VS Code integration with debugging support
graph TB
A[Video Input] --> B[Video Processing Pipeline]
B --> C[Frame Extraction & Preprocessing]
C --> D[Multi-Task Analysis Engine]
D --> E[Instance Segmentation]
D --> F[Phase Recognition]
D --> G[Skill Assessment]
E --> H[Surgical Instruments]
F --> I[Surgery Phases]
G --> J[Skill Metrics]
H --> K[Clinical Decision Support]
I --> K
J --> K
Cataract_LMM/
โโโ ๐ README.md # Project overview and documentation
โโโ ๐ LICENSE # CC-BY-4.0 license
โโโ ๐ค CONTRIBUTING.md # Contribution guidelines
โโโ ๐ .gitignore # Git ignore patterns
โโโ ๐ codes/ # Main codebase
โ โโโ ๐ฌ surgical-video-processing/ # Video preprocessing and enhancement
โ โ โโโ core/ # Core processing algorithms
โ โ โโโ pipelines/ # Processing pipelines
โ โ โโโ metadata/ # Video metadata management
โ โ โโโ quality_control/ # Quality assurance tools
โ โ โโโ configs/ # Configuration files
โ โโโ ๐ฏ surgical-instance-segmentation/ # Instance segmentation models
โ โ โโโ models/ # YOLO, Mask R-CNN, SAM implementations
โ โ โโโ training/ # Training pipelines
โ โ โโโ inference/ # Real-time inference engines
โ โ โโโ evaluation/ # Model evaluation tools
โ โ โโโ data/ # Dataset utilities
โ โโโ ๐ surgical-phase-recognition/ # Phase classification models
โ โ โโโ models/ # Video Transformers, 3D CNNs, TeCNO
โ โ โโโ validation/ # Training and validation frameworks
โ โ โโโ preprocessing/ # Video preprocessing
โ โ โโโ analysis/ # Result analysis tools
โ โ โโโ configs/ # Model configurations
โ โโโ ๐ surgical-skill-assessment/ # Skill evaluation framework
โ โ โโโ models/ # Skill assessment models
โ โ โโโ engine/ # Training and inference engines
โ โ โโโ utils/ # Analysis utilities
โ โ โโโ configs/ # Assessment configurations
โ โโโ ๐งช tests/ # Comprehensive test suite
โ โโโ ๐ docs/ # Documentation source
โ โโโ ๐ณ docker/ # Docker configurations
โ โโโ ๐ reports/ # Analysis reports
โ โโโ โ๏ธ pyproject.toml # Python project configuration
โ โโโ ๐ Dockerfile # Container definition
โ โโโ ๐ Makefile # Development automation
โ โโโ ๐ง setup.py # Project setup script
โโโ ๐ค .github/ # GitHub configurations
โ โโโ workflows/ # CI/CD pipelines
โโโ ๐ security_scanning_demo.ipynb # Security analysis notebook
| Component | Minimum | Recommended |
|---|---|---|
| Python | 3.8 | 3.11+ |
| RAM | 16GB | 32GB+ |
| GPU Memory | 8GB | 24GB+ |
| Storage | 50GB | 500GB+ |
| CUDA | 11.8 | 12.0+ |
# Install Poetry
curl -sSL https://install.python-poetry.org | python3 -
# Clone and setup
git clone https://github.com/MJAHMADEE/Cataract_LMM.git
cd Cataract_LMM/codes
# Install dependencies
poetry install --extras "dev docs"
# Activate environment
poetry shell# Create environment
conda create -n cataract-lmm python=3.11
conda activate cataract-lmm
# Clone and install
git clone https://github.com/MJAHMADEE/Cataract_LMM.git
cd Cataract_LMM/codes
pip install -r requirements.txt# Build container
docker build -t cataract-lmm:latest .
# Run interactive container
docker run -it --gpus all -v $(pwd)/data:/app/data cataract-lmm:latest# Run comprehensive validation
python setup.py --validate-only
# Run tests
pytest tests/ -v
# Check GPU availability
python -c "import torch; print(f'CUDA Available: {torch.cuda.is_available()}')"from surgical_video_processing import VideoProcessor, QualityController
# Initialize processor with configuration
processor = VideoProcessor("configs/high_quality.yaml")
# Process surgical video
result = processor.process_video(
input_path="data/surgery_video.mp4",
output_dir="outputs/processed/",
apply_deidentification=True,
quality_threshold=0.8
)
print(f"Processed {result.frame_count} frames")
print(f"Quality score: {result.average_quality:.3f}")from surgical_instance_segmentation import SegmentationPredictor
# Load pre-trained model
predictor = SegmentationPredictor(
model_type="yolo_v8",
device="cuda"
)
# Segment surgical instruments
results = predictor.predict_batch(
image_paths=["frame001.jpg", "frame002.jpg"],
confidence_threshold=0.7,
save_visualizations=True
)
# Extract detections
for result in results:
print(f"Detected {len(result.boxes)} instruments")
print(f"Classes: {result.class_names}")from surgical_phase_recognition import PhaseClassifier
# Initialize phase recognition model
classifier = PhaseClassifier(
model_name="video_transformer",
config_path="configs/phase_recognition.yaml"
)
# Classify surgical phases in video sequence
phases = classifier.classify_sequence(
video_path="data/surgery_complete.mp4",
sequence_length=16,
overlap=0.5
)
# Display phase timeline
for phase in phases:
print(f"Time: {phase.timestamp:.2f}s - Phase: {phase.name}")from surgical_skill_assessment import SkillEvaluator
# Initialize skill assessment framework
evaluator = SkillEvaluator("configs/skill_assessment.yaml")
# Assess surgical performance
assessment = evaluator.evaluate_surgery(
video_path="data/complete_surgery.mp4",
phase_annotations="data/phases.json",
surgeon_level="resident" # resident, fellow, attending
)
# Generate skill report
report = evaluator.generate_report(assessment)
print(f"Overall Score: {report.overall_score}/100")
print(f"Efficiency: {report.efficiency_score}/10")
print(f"Precision: {report.precision_score}/10")# Clone repository
git clone https://github.com/MJAHMADEE/Cataract_LMM.git
cd Cataract_LMM/codes
# Install development dependencies
poetry install --extras "dev"
# Setup pre-commit hooks
pre-commit install
# Run development server
make dev-server# Format code
make format
# Run linting
make lint
# Type checking
make type-check
# Security scanning
make security-scan
# Run all quality checks
make quality# Run unit tests
make test
# Run with coverage
make test-coverage
# Run integration tests
make test-integration
# Run end-to-end tests
make test-e2e
# Generate coverage report
make coverage-reportmake help # Show all available commands
make install # Install dependencies
make clean # Clean build artifacts
make build # Build distribution packages
make docker-build # Build Docker image
make docker-run # Run Docker container
make docs-build # Build documentation
make docs-serve # Serve documentation locally| Model | mAP@0.5:0.95 |
|---|---|
| YOLOv11 โญ | 73.9% |
| YOLOv8 | 73.8% |
| SAM | 56.0% |
| SAM2 | 55.2% |
| Mask R-CNN | 53.7% |
| Model | Backbone | Accuracy | F1-Score | Precision | Recall |
|---|---|---|---|---|---|
| MViT-B โญ | - | 85.7% | 77.1% | 77.1% | 78.5% |
| Swin-T | - | 85.5% | 76.2% | 77.5% | 77.2% |
| CNN + GRU | EfficientNet-B5 | 82.1% | 71.3% | 76.0% | 70.4% |
| CNN + TeCNO | EfficientNet-B5 | 81.7% | 71.2% | 75.1% | 71.2% |
| CNN + LSTM | EfficientNet-B5 | 81.5% | 70.0% | 76.4% | 69.4% |
| Model | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|
| TimeSformer โญ | 82.5% | 86.0% | 82.0% | 83.9% |
| R3D-18 | 81.7% | 82.4% | 84.9% | 83.6% |
| Slow R50 | 80.0% | 81.8% | 81.8% | 81.8% |
| X3D-M | 80.0% | 83.9% | 78.8% | 81.3% |
| R(2+1)D-18 | 72.9% | 79.3% | 76.7% | 78.0% |
The framework uses YAML-based configuration for all components:
processing:
target_resolution: [1920, 1080]
fps: 30
quality_threshold: 0.75
deidentification:
enabled: true
blur_faces: true
remove_text: true
output:
format: "mp4"
compression: "h264"
quality: "high"model:
architecture: "yolov8"
size: "medium"
pretrained: true
training:
epochs: 100
batch_size: 16
learning_rate: 0.001
data:
classes: ["forceps", "scissors", "needle_holder", "suction"]
augmentation:
enabled: true
rotation: 15
scaling: [0.8, 1.2]# Create .env file
cp .env.example .env
# Edit configuration
CUDA_VISIBLE_DEVICES=0,1
WANDB_PROJECT=cataract-lmm
DATA_ROOT=/path/to/data
OUTPUT_DIR=/path/to/outputs
LOG_LEVEL=INFOtests/
โโโ unit/ # Unit tests for individual components
โโโ integration/ # Integration tests for module interactions
โโโ e2e/ # End-to-end workflow tests
โโโ performance/ # Performance and benchmarking tests
โโโ security/ # Security and vulnerability tests
โโโ fixtures/ # Test data and fixtures
โโโ conftest.py # Pytest configuration
# Run all tests
pytest
# Run specific test category
pytest tests/unit/
pytest tests/integration/
pytest tests/e2e/
# Run with coverage
pytest --cov=. --cov-report=html
# Run performance tests
pytest tests/performance/ --benchmark-only
# Run with specific markers
pytest -m "gpu" --gpu-required
pytest -m "slow" --timeout=300# pytest.ini
[tool:pytest]
testpaths = tests
python_files = test_*.py
python_classes = Test*
python_functions = test_*
markers =
unit: Unit tests
integration: Integration tests
e2e: End-to-end tests
gpu: Tests requiring GPU
slow: Slow running tests
security: Security tests
addopts =
--strict-markers
--verbose
--tb=short
--cov-report=term-missing- ๐ User Guide: Getting started, tutorials, and examples
- ๐ง API Reference: Comprehensive API documentation
- ๐๏ธ Developer Guide: Contributing, architecture, and development setup
- ๐ Model Documentation: Model architectures, performance metrics, and usage
- ๐ Security Guide: Security considerations and best practices
# Install documentation dependencies
poetry install --extras "docs"
# Build documentation
cd docs
make html
# Serve documentation locally
make serve
# Build PDF documentation
make latexpdf- Documentation Site: https://cataract-lmm.readthedocs.io
- API Reference: https://cataract-lmm.readthedocs.io/api/
- Tutorials: https://cataract-lmm.readthedocs.io/tutorials/
- Model Zoo: https://cataract-lmm.readthedocs.io/models/
We welcome contributions from the surgical AI community! Please see our CONTRIBUTING.md for detailed guidelines.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open a Pull Request
# Setup development environment
make dev-setup
# Run pre-commit checks
pre-commit run --all-files
# Run tests before committing
make test-all
# Submit pull request
gh pr create --title "Feature: Add amazing feature"- Python Style: Black formatter
- Import Sorting: isort
- Linting: Flake8 with medical AI conventions
- Type Checking: MyPy for type safety
- Documentation: Google style docstrings
This repository is governed by specific licensing terms to ensure the proper use of both the software framework and the surgical dataset.
The Cataract-LMM dataset is proudly open-access and is officially licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
The Cataract-LMM dataset is released under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CC BY-NC-ND 4.0). This license permits non-commercial use, sharing, and distribution with proper attribution, but prohibits commercial use and derivative works. See DATA_LICENSE.md for comprehensive details.
The software framework, scripts, and codebase associated with this project are licensed under the CC BY-NC-ND 4.0 License. See the LICENSE file for details.
๐ Update: Our manuscript is officially published!
The comprehensive methodology, algorithmic baselines, and technical validations of this dataset have been published in Nature Scientific Data. While our earlier preprint remains available for reference on arXiv (arXiv:2510.16371), we kindly request that any research or systems utilizing this dataset direct their citations to the final peer-reviewed journal version.
Please use your preferred format from the options below to cite our work:
Ahmadi, M. J., Gandomi, I., Abdi, P., Mohammadi, S.-F., Taslimi, A., Khodaparast, M., Hashemi, H., Tavakoli, M., & Taghirad, H. D. (2026). Cataract-LMM Large-Scale Multi-Source Multi-Task Benchmark for Deep Learning in Surgical Video Analysis. Scientific Data. https://doi.org/10.1038/s41597-026-07464-0
Ahmadi, Mohammad Javad, et al. "Cataract-LMM Large-Scale Multi-Source Multi-Task Benchmark for Deep Learning in Surgical Video Analysis." Scientific Data, 23 May 2026, https://doi.org/10.1038/s41597-026-07464-0.
Ahmadi, Mohammad Javad, Iman Gandomi, Parisa Abdi, Seyed-Farzad Mohammadi, Amirhossein Taslimi, Mehdi Khodaparast, Hassan Hashemi, Mahdi Tavakoli, and Hamid D. Taghirad. "Cataract-LMM Large-Scale Multi-Source Multi-Task Benchmark for Deep Learning in Surgical Video Analysis." Scientific Data (2026). https://doi.org/10.1038/s41597-026-07464-0.
Ahmadi, M.J., Gandomi, I., Abdi, P., Mohammadi, S.F., Taslimi, A., Khodaparast, M., Hashemi, H., Tavakoli, M. and Taghirad, H.D., 2026. Cataract-LMM Large-Scale Multi-Source Multi-Task Benchmark for Deep Learning in Surgical Video Analysis. Scientific Data. Available at: https://doi.org/10.1038/s41597-026-07464-0.
Ahmadi MJ, Gandomi I, Abdi P, Mohammadi SF, Taslimi A, Khodaparast M, Hashemi H, Tavakoli M, Taghirad HD. Cataract-LMM Large-Scale Multi-Source Multi-Task Benchmark for Deep Learning in Surgical Video Analysis. Scientific Data. 2026 May 23. doi: 10.1038/s41597-026-07464-0.
M. J. Ahmadi et al., "Cataract-LMM Large-Scale Multi-Source Multi-Task Benchmark for Deep Learning in Surgical Video Analysis," Scientific Data, May 2026, doi: 10.1038/s41597-026-07464-0.
@article{Ahmadi2026CataractLMM,
title={Cataract-LMM Large-Scale Multi-Source Multi-Task Benchmark for Deep Learning in Surgical Video Analysis},
author={Ahmadi, Mohammad Javad and Gandomi, Iman and Abdi, Parisa and Mohammadi, Seyed-Farzad and Taslimi, Amirhossein and Khodaparast, Mehdi and Hashemi, Hassan and Tavakoli, Mahdi and D. Taghirad, Hamid},
journal={Scientific Data},
year={2026},
month={May},
doi={10.1038/s41597-026-07464-0},
url={https://doi.org/10.1038/s41597-026-07464-0}
}Mohammad Javad Ahmadi
I welcome collaborations, technical inquiries regarding the dataset, and discussions on advancing AI in medical applications. Feel free to connect with me through any of the channels below:
- ๐ง Academic Email: mjahmadi@email.kntu.ac.ir
- ๐ง Personal Email: mjahmadee@gmail.com
- ๐ Documentation: Refer to individual README files in each module
- ๐ Issues: GitHub Issues
- ๐ฌ Discussions: GitHub Discussions
- ๐ง Email: mjahmadee@gmail.com
- โ Multi-task surgical video analysis framework
- โ Instance segmentation with YOLO/Mask R-CNN/SAM
- โ Phase recognition with Video Transformers
- โ Skill assessment framework
- โ Production-ready CI/CD pipeline
- ๐ Real-time inference optimization
- ๐ Multi-GPU distributed training
- ๐ Model quantization and pruning
- ๐ REST API and web interface
- ๐ Advanced analytics dashboard
- ๐ฎ Multi-modal learning (video + audio + sensor data)
- ๐ฎ Federated learning across institutions
- ๐ฎ Real-time surgical guidance system
- ๐ฎ Integration with surgical robots
- ๐ฎ Multi-language support