[Feature Request] Add a short "clinical assistant RAG & agent safety checklist" doc (docs only)

Hi Biomni team,

first – amazing work.  
Biomni is one of the few biomedical agent projects that feels both ambitious and grounded in real clinical / research workflows.

I’m the author of **WFGY** and its **16-mode RAG Failure ProblemMap**:

- WFGY main repo: https://github.com/onestardao/WFGY  
- 16-mode ProblemMap (RAG failure checklist): https://github.com/onestardao/WFGY/blob/main/ProblemMap/README.md  

The ProblemMap has been cited or integrated by:

- **Harvard MIMS Lab – ToolUniverse** (LLM tools benchmark; WFGY in robustness / RAG debugging section)  
- **Univ. of Innsbruck – Rankify** (RAG toolkit; uses WFGY for RAG / re-ranking troubleshooting)  
- **QCRI LLM Lab – Multimodal RAG Survey** (lists WFGY ProblemMap as a semantic failure-mode taxonomy)

### Why I think this matters for Biomni

From what I can see in the paper and repo, Biomni stitches together:

- literature / guideline retrieval,  
- logic for combining multiple sources,  
- safety / uncertainty handling.

In real deployments, the hardest failures are not “hallucination in the abstract” but very specific modes like:

- No.1 hallucination & chunk drift (retriever pulls a related but wrong trial)  
- No.2 interpretation collapse (query is mis-parsed: phenotype vs diagnosis vs intervention)  
- No.5 semantic ≠ embedding (embedding space under-represents rare conditions or negation)  
- No.6 logic collapse & recovery (conflicting studies; agent gets stuck or flips answer)  
- No.11 symbolic collapse (protocol / formula reasoning breaks)

I have been using the ProblemMap as a **“semantic clinic” for RAG / agent pipelines**, especially in high-risk domains like medicine.

### Proposal: a small debugging section + optional notebook

Would you be open to:

1. Adding a short **“RAG failure checklist”** section in Biomni’s docs that:
   - lists a few dominant failure modes relevant to biomedical agents (e.g. No.1 / No.2 / No.5 / No.6 / No.11),  
   - links to the full ProblemMap so users can do deeper triage.

2. Potentially hosting a minimal Jupyter notebook under `examples/` that shows:
   - how to log Biomni failures,  
   - how to tag them with ProblemMap codes,  
   - how that feeds back into dataset curation or agent prompt / tool adjustments.

All of this would be **MIT-licensed, docs-first**, and I can keep the notebook self-contained so it doesn’t interfere with your core code.

### Expected benefit

For Biomni users, especially in research hospitals or labs, having a **structured failure taxonomy** next to the agent can:

- make internal reviews easier (“we see mostly No.2 + No.5 failures”),  
- help prioritize improvements,  
- and make the system more defensible in front of clinical governance / ethics boards.

If this direction seems useful, I’d be happy to draft a small PR you can review.  
If not, thank you anyway for pushing biomedical agents forward – I learned a lot from your work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Add a short "clinical assistant RAG & agent safety checklist" doc (docs only) #282

Why I think this matters for Biomni

Proposal: a small debugging section + optional notebook

Expected benefit

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Feature Request] Add a short "clinical assistant RAG & agent safety checklist" doc (docs only) #282

Description

Why I think this matters for Biomni

Proposal: a small debugging section + optional notebook

Expected benefit

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions