Skip to content

rbr7/Auto-DataScientists

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Auto-DataScientists

AutoDataScientists: Self-Organizing Agent Teams for Clinical, Healthcare, and Biomedical Data Science

AutoDataScientists

Project Page License Python

Research use only. This system is for methods research and exploratory analysis. It is not a medical device, is not FDA/CE cleared, and must not be used for diagnosis, treatment decisions, or any clinical purpose without independent validation and the appropriate regulatory clearance. See Disclaimer.

AutoDataScientists is a decentralized team of AI agents that run end-to-end data science for clinical, healthcare, and biomedical data science problems, ingesting messy real-world data (EHR, multi-omics, imaging, trials), proposing and critiquing analysis plans, building and validating models, and producing interpretable, reproducible write-ups.

Unlike a single autonomous agent that follows one analysis trajectory, AutoDataScientists agents self-organize into teams around competing hypotheses and modeling strategies, critique each other's analysis plans before spending compute, and share results, dead-ends, and intermediate artifacts so the system avoids redundant work and sustains parallel exploration as evidence accumulates over hours or days. Domain guardrails are privacy/de-identification, statistical rigor, subgroup fairness, and leakage detection are first-class steps in the workflow, not afterthoughts.

This repository packages the system as Claude Code subagents coordinating through a local message-board / workspace server. The orchestrator is a pure coordinator , it launches agents and harvests their results; it never analyzes data itself.


Why biomedical data science is different

Generic AutoML tends to fail in biomedicine for predictable, domain-specific reasons. AutoDataScientists is built to respect them:

  • Privacy & governance : PHI/PII handling, de-identification, IRB/ethics approval, and data-use agreements gate what leaves the machine. No identifiable data is sent to external APIs without a BAA / equivalent.
  • Statistical rigor over leaderboard chasing : proper study design, confounder handling, multiple-testing correction, calibration, and time-to-event (survival) framing rather than naive accuracy.
  • Leakage is everywhere : patient-level and site-level grouping in cross-validation, temporal splits for prospective questions, and explicit checks for target leakage from clinical workflows.
  • Batch & site effects : harmonization across assays, platforms, and hospitals before modeling.
  • Fairness & equity : performance reported across demographic and clinical subgroups, not just in aggregate.
  • Interpretability & biological plausibility : feature importance, pathway/enrichment context, and clinically meaningful explanations are required outputs, not optional.
  • Reproducibility : every run logs data versions, seeds, parameters, and decisions.

What it does

The agents cover the full data-science lifecycle for a given task:

  1. Ingest & profile : load data; map to standards (FHIR / OMOP / HGNC / SNOMED-CT / LOINC); profile schema, missingness, and distributions.
  2. De-identify & govern : verify de-identification, flag PHI, record consent/IRB constraints.
  3. Design : frame the question (predictive vs. causal vs. descriptive), define endpoints, power/sample-size sanity checks, and the validation protocol.
  4. Prepare : QC, batch-effect correction/harmonization, feature selection, dimensionality reduction, embeddings.
  5. Model : propose candidate approaches, run sweeps, train with grouping-aware cross-validation.
  6. Validate : leakage audit, external/temporal validation, calibration, subgroup/fairness analysis.
  7. Interpret & report : feature attributions, biological/clinical context, limitations, and a reproducible report.

Use cases

Clinical

  • Risk prediction (ICU mortality, 30-day readmission, sepsis early warning, length-of-stay)
  • EHR phenotyping and cohort definition
  • Clinical deterioration / time-to-event modeling

Personalized / precision medicine

  • Patient subtyping and stratification
  • Treatment-response prediction
  • Polygenic risk scoring and pharmacogenomics

Biomedical

  • Biomarker discovery and prioritization
  • Drug-response and target-identification analyses
  • Assay/screen data modeling

Computational biology

  • Single-cell cell-type annotation and differential expression
  • Variant effect prediction and interpretation
  • Multi-omics integration (genomics + transcriptomics + proteomics)

Supported data modalities

EHR / claims (FHIR, OMOP CDM) · genomics (WGS/WES, variants) · transcriptomics (bulk & single-cell) · proteomics / metabolomics · medical imaging (radiology, digital pathology) · clinical-trial data · wearable / sensor time series.

Each modality has its own loader and QC profile under task-<name>/. Start with the bundled examples and add your own.


How it works

A lightweight coordination layer hosts shared workspaces and a message board; agents post proposals, critiques, and results there. Roles include:

Agent Responsibility
Data Steward Ingestion, standards mapping, de-identification & PHI checks, missingness, batch detection
Biostatistician Study design, confounders, power, multiple testing, survival/causal framing
Feature / Representation Feature selection, harmonization, embeddings, dimensionality reduction
Modeler(s) Candidate models and sweeps; grouping-aware cross-validation (parallel teams)
Validator / Critic Leakage audits, external/temporal validation, calibration, subgroup fairness
Translator Feature attribution, pathway/clinical context, limitations, final report
Orchestrator Pure coordinator : launches agents, harvests results, never analyzes data

Agents self-organize around promising directions and must pass peer critique before consuming compute, so the system explores several strategies in parallel without duplicating effort.


Setup

Prerequisites: Python 3.10+, Node.js 22+ (for npx), and the Claude Code CLI (claude).

# 1. Start the local coordination server (agents coordinate through this)
#    Replace with your chosen message-board/workspace server.
npx <coordination-server> start

# 2. Python dependencies
pip install -r requirements.txt

Data security: run on infrastructure approved for your data classification. Configure external model access so that no identifiable data leaves the environment without a BAA / DUA in place.


Quickstart

From the repo root, in a separate shell:

claude -p "Read runbook.md and execute. Task: task-readmission-risk. Run name: readmit_v1."
claude -p "Read runbook.md and execute. Task: task-singlecell-annotation. Run name: scrna_v1."
claude -p "Read runbook.md and execute. Task: task-biomarker-discovery. Run name: biomarker_v1."

Each launch materializes a new sibling directory ../<run-name>/ with its own copy of the system, agents, workspace, and logs, so the template stays clean across runs. Hardware/data requirements vary per task : see each task-<name>/README.md.


Adding a new task

Drop a task-<name>/ directory at the repo root with:

  1. TASK.md : the spec. YAML frontmatter sets task_type (e.g. ehr-risk, omics-classification, survival, imaging, singlecell), name, endpoint, and validation (e.g. patient-grouped-cv, temporal-split). The body describes the data, cohort, constraints, and success criteria.
  2. LAUNCH.md : fills the workflow hooks the runbook references (launch_command, deident_policy, cv_strategy, fairness_subgroups, leakage_checks, promotion_criteria, exit_condition, …). Easiest path: copy the closest bundled task-*/LAUNCH.md and edit.

Optionally add a download_data.sh / loader to fetch the dataset. Then launch with --task task-<name>.


Results

Benchmark Metric AutoDataScientists Strongest baseline Δ
e.g. MIMIC readmission AUROC TBD TBD TBD
e.g. single-cell annotation macro-F1 TBD TBD TBD
e.g. ProteinGym subset Spearman TBD TBD TBD

Report subgroup/fairness breakdowns and calibration alongside headline metrics.


Data handling & compliance

  • Use only data you are authorized to use, under an approved IRB/ethics protocol and any applicable DUA.
  • De-identify per HIPAA Safe Harbor / Expert Determination or your local equivalent (e.g. GDPR) before analysis.
  • Keep an audit log of data access and agent decisions.
  • Do not transmit identifiable data to third-party services without a Business Associate Agreement (or equivalent).
  • This repository ships no patient data; example tasks reference public/synthetic datasets only.

Citation


License


Disclaimer

This software is provided for research and educational purposes only. It is not a medical device and has not been reviewed or cleared by any regulatory authority. Outputs may be incorrect, biased, or incomplete and must be independently validated. Nothing produced by this system constitutes medical advice or a substitute for the judgment of qualified clinicians. The authors accept no liability for use of this software in clinical or operational settings.

About

AutoDataScientists: Self-Organizing Agent Teams for Clinical, Healthcare, and Biomedical Data Science

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors