Hi Biomni team,
first – amazing work.
Biomni is one of the few biomedical agent projects that feels both ambitious and grounded in real clinical / research workflows.
I’m the author of WFGY and its 16-mode RAG Failure ProblemMap:
The ProblemMap has been cited or integrated by:
- Harvard MIMS Lab – ToolUniverse (LLM tools benchmark; WFGY in robustness / RAG debugging section)
- Univ. of Innsbruck – Rankify (RAG toolkit; uses WFGY for RAG / re-ranking troubleshooting)
- QCRI LLM Lab – Multimodal RAG Survey (lists WFGY ProblemMap as a semantic failure-mode taxonomy)
Why I think this matters for Biomni
From what I can see in the paper and repo, Biomni stitches together:
- literature / guideline retrieval,
- logic for combining multiple sources,
- safety / uncertainty handling.
In real deployments, the hardest failures are not “hallucination in the abstract” but very specific modes like:
- No.1 hallucination & chunk drift (retriever pulls a related but wrong trial)
- No.2 interpretation collapse (query is mis-parsed: phenotype vs diagnosis vs intervention)
- No.5 semantic ≠ embedding (embedding space under-represents rare conditions or negation)
- No.6 logic collapse & recovery (conflicting studies; agent gets stuck or flips answer)
- No.11 symbolic collapse (protocol / formula reasoning breaks)
I have been using the ProblemMap as a “semantic clinic” for RAG / agent pipelines, especially in high-risk domains like medicine.
Proposal: a small debugging section + optional notebook
Would you be open to:
-
Adding a short “RAG failure checklist” section in Biomni’s docs that:
- lists a few dominant failure modes relevant to biomedical agents (e.g. No.1 / No.2 / No.5 / No.6 / No.11),
- links to the full ProblemMap so users can do deeper triage.
-
Potentially hosting a minimal Jupyter notebook under examples/ that shows:
- how to log Biomni failures,
- how to tag them with ProblemMap codes,
- how that feeds back into dataset curation or agent prompt / tool adjustments.
All of this would be MIT-licensed, docs-first, and I can keep the notebook self-contained so it doesn’t interfere with your core code.
Expected benefit
For Biomni users, especially in research hospitals or labs, having a structured failure taxonomy next to the agent can:
- make internal reviews easier (“we see mostly No.2 + No.5 failures”),
- help prioritize improvements,
- and make the system more defensible in front of clinical governance / ethics boards.
If this direction seems useful, I’d be happy to draft a small PR you can review.
If not, thank you anyway for pushing biomedical agents forward – I learned a lot from your work.
Hi Biomni team,
first – amazing work.
Biomni is one of the few biomedical agent projects that feels both ambitious and grounded in real clinical / research workflows.
I’m the author of WFGY and its 16-mode RAG Failure ProblemMap:
The ProblemMap has been cited or integrated by:
Why I think this matters for Biomni
From what I can see in the paper and repo, Biomni stitches together:
In real deployments, the hardest failures are not “hallucination in the abstract” but very specific modes like:
I have been using the ProblemMap as a “semantic clinic” for RAG / agent pipelines, especially in high-risk domains like medicine.
Proposal: a small debugging section + optional notebook
Would you be open to:
Adding a short “RAG failure checklist” section in Biomni’s docs that:
Potentially hosting a minimal Jupyter notebook under
examples/that shows:All of this would be MIT-licensed, docs-first, and I can keep the notebook self-contained so it doesn’t interfere with your core code.
Expected benefit
For Biomni users, especially in research hospitals or labs, having a structured failure taxonomy next to the agent can:
If this direction seems useful, I’d be happy to draft a small PR you can review.
If not, thank you anyway for pushing biomedical agents forward – I learned a lot from your work.