Skip to content

[Feature Request] Add a short "clinical assistant RAG & agent safety checklist" doc (docs only) #282

@onestardao

Description

@onestardao

Hi Biomni team,

first – amazing work.
Biomni is one of the few biomedical agent projects that feels both ambitious and grounded in real clinical / research workflows.

I’m the author of WFGY and its 16-mode RAG Failure ProblemMap:

The ProblemMap has been cited or integrated by:

  • Harvard MIMS Lab – ToolUniverse (LLM tools benchmark; WFGY in robustness / RAG debugging section)
  • Univ. of Innsbruck – Rankify (RAG toolkit; uses WFGY for RAG / re-ranking troubleshooting)
  • QCRI LLM Lab – Multimodal RAG Survey (lists WFGY ProblemMap as a semantic failure-mode taxonomy)

Why I think this matters for Biomni

From what I can see in the paper and repo, Biomni stitches together:

  • literature / guideline retrieval,
  • logic for combining multiple sources,
  • safety / uncertainty handling.

In real deployments, the hardest failures are not “hallucination in the abstract” but very specific modes like:

  • No.1 hallucination & chunk drift (retriever pulls a related but wrong trial)
  • No.2 interpretation collapse (query is mis-parsed: phenotype vs diagnosis vs intervention)
  • No.5 semantic ≠ embedding (embedding space under-represents rare conditions or negation)
  • No.6 logic collapse & recovery (conflicting studies; agent gets stuck or flips answer)
  • No.11 symbolic collapse (protocol / formula reasoning breaks)

I have been using the ProblemMap as a “semantic clinic” for RAG / agent pipelines, especially in high-risk domains like medicine.

Proposal: a small debugging section + optional notebook

Would you be open to:

  1. Adding a short “RAG failure checklist” section in Biomni’s docs that:

    • lists a few dominant failure modes relevant to biomedical agents (e.g. No.1 / No.2 / No.5 / No.6 / No.11),
    • links to the full ProblemMap so users can do deeper triage.
  2. Potentially hosting a minimal Jupyter notebook under examples/ that shows:

    • how to log Biomni failures,
    • how to tag them with ProblemMap codes,
    • how that feeds back into dataset curation or agent prompt / tool adjustments.

All of this would be MIT-licensed, docs-first, and I can keep the notebook self-contained so it doesn’t interfere with your core code.

Expected benefit

For Biomni users, especially in research hospitals or labs, having a structured failure taxonomy next to the agent can:

  • make internal reviews easier (“we see mostly No.2 + No.5 failures”),
  • help prioritize improvements,
  • and make the system more defensible in front of clinical governance / ethics boards.

If this direction seems useful, I’d be happy to draft a small PR you can review.
If not, thank you anyway for pushing biomedical agents forward – I learned a lot from your work.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions