feat: hackathon/implement SBML Model Annotation and Knowledge Graph Integration (Team Sanofi US) by sahneh · Pull Request #138 · VirtualPatientEngine/AIAgents4Pharma

sahneh · 2025-03-08T03:41:57Z

For authors

Description

Please:

Provide a summary of the modifications made and any associated issue (if applicable).
Include relevant context and motivation for the changes.
If this relates to a change in any website's frontend, kindly attach a screenshot of the adjustment from your localhost.
List any dependencies necessary for implementing this change.

Contributors:

Faryad Sahneh
Travis Ahn-Horst
Mahasweta Bhattacharya

This PR adds functionality to annotate SBML models using LLMs and integrate them with Biomedical Knowledge Graphs by:

Developing a multi-step annotation process using OCR and LLM-based entity recognition
Establishing connections between model species and ontological entities via Bio-Ontology API and UMLS mappings
Bridging the gap between dynamic biological processes (SBML) and static knowledge repositories (BKGs)

Files added:

Readme file proposing a framework that treats SBML models as first-class nodes in knowledge graphs
Processing scripts for Bio-Ontology API and UMLS extraction
JSON output files containing species annotations with ontology IDs
Integration methodology for connecting with PrimeKG
CSV file mapping BSML species to nodes of PrimeKG

The main files to look at are the followings:

species_dict_annotated.json: Species dictionary enriched with bio-ontology annotations
species2primekg_map.csv: A final mapping between primekg nodes and species
species_dict_umls.json: Species dictionary with UMLS codes

Fixes # (issue) Mention the issue number.

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

Please describe the tests you conducted to verify your changes. These may involve creating new test scripts or updating existing ones.

Added new test(s) in the tests folder
Added new function(s) to an existing test(s) (e.g.: tests/testX.py)
No new tests added (Please explain the rationale in this case)

Checklist

My code follows the style guidelines mentioned in the Code/DevOps guides
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation (e.g. MkDocs)
My changes generate no new warnings
I have added or updated tests (in the tests folder) that prove my fix is effective or that my feature works
New and existing tests pass locally with my changes
Any dependent changes have been merged and published in downstream modules

For reviewers

Checklist pre-approval

Is there enough documentation?
If a new feature has been added, or a bug fixed, has a test been added to confirm good behavior?
Does the test(s) successfully test edge/corner cases?
Does the PR pass the tests? (if the repository has continuous integration)

Checklist post-approval

Does this PR merge develop into main? If so, please make sure to add a prefix (feat/fix/chore) and/or a suffix BREAKING CHANGE (if it's a major release) to your commit message.
Does this PR close an issue? If so, please make sure to descriptively close this issue when the PR is merged.

Checklist post-merge

When you approve of the PR, merge and close it (Read this article to know about different merge methods on GitHub)
Did this PR merge develop into main and is it suppose to run an automated release workflow (if applicable)? If so, please make sure to check under the "Actions" tab to see if the workflow has been initiated, and return later to verify that it has completed successfully.

…s4Pharma into sbml-annotator-us

dmccloskey

Great explanation of the problem and some useful tools to continue towards a complete solution @sahneh and the rest of Team Sanofi US 👏.

I appreciate the two approaches and there implementations:

Lookup using BioPortals API
Lookup in UMLS using SciSpaCy
After using the OCR processed PDF article and SBML species descriptions to prompt an LLM to create a more complete description that could be used for lookup

I noticed that the API calls to BioPortals were not just for lookup but also for enriching the species with their ontology annotations. If you had additional time, was the idea to also do some type of semantic search between the enriched annotations (after textual embedding) and the descriptions extracted from the article/sbml model?

dmccloskey · 2025-03-17T08:41:52Z

Well explained👍

Thanks for the great feedback @dmccloskey. Excellent observations!

Our approach was guided by two key principles:

Leverage reasoning-focused LLMs with complete context rather than PDF RAG. Our justification was that SBML annotation requires holistic understanding of biological systems rather than fragmented inferences from text chunks. This approach also eliminates many of the technical challenges associated with making a RAG pipeline work properly.

Utilize established biomedical ontology tools instead of relying solely on semantic search. Biological entity mapping requires nuanced understanding that goes beyond simple text similarity, and there's a rich ecosystem of specialized technologies in this domain that provide significant advantages.

Regarding the enriched descriptions: they serve complementary purposes aligned with our goal of connecting SBML models to knowledge graphs:

The ontological annotations create structured connections to KGs through standardized identifiers

The textual descriptions make the models more accessible to LLMs.

For future work, semantic search would indeed be valuable. The following usecases are particularly interesting:

Post-filter annotations for more precise connections

Enable "white space exploration" beyond the explicit SBML model boundaries (across species, pathways, or disease contexts)

Overall, the point is the combination of KG-friendly ontological mapping and LLM-friendly textual descriptions creates a solid bridge between computational models and broader biological knowledge.

sahneh and others added 15 commits March 7, 2025 17:58

the source articles

5bec03d

update notebooks

9290b63

bioontolgoy api

69bd83a

umls approach using scispacy

f88b1c0

renamed to umls

7ecc838

extracting the annotations of species from bioontology

030414a

cleaned up notebook

b7ff70c

readme file

9475e10

species annot. from bioontology mapped to primekg nodes

7428cd7

added GEXO to the ontology to cover entrez ids in PrimeKG

b3858f7

Merge branch 'sbml-annotator-us' of https://github.com/sahneh/AIAgent…

a19dfe0

…s4Pharma into sbml-annotator-us

cleaning

006b2e4

final analysis of primekg mapping

14e76c3

update readme

2643051

typo in readme

2bf497a

gurdeep330 requested review from awmulyadi, dmccloskey and lilijap March 8, 2025 06:31

gurdeep330 assigned sahneh Mar 8, 2025

gurdeep330 added T2B T2KG labels Mar 8, 2025

gurdeep330 changed the title ~~feat/Implement SBML Model Annotation and Knowledge Graph Integration~~ feat: hackathon/implement SBML Model Annotation and Knowledge Graph Integration (Team Sanofi US) Mar 10, 2025

dmccloskey reviewed Mar 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: hackathon/implement SBML Model Annotation and Knowledge Graph Integration (Team Sanofi US)#138

feat: hackathon/implement SBML Model Annotation and Knowledge Graph Integration (Team Sanofi US)#138
sahneh wants to merge 15 commits into
VirtualPatientEngine:mainfrom
sahneh:sbml-annotator-us

sahneh commented Mar 8, 2025 •

edited

Loading

Uh oh!

dmccloskey left a comment

Uh oh!

dmccloskey Mar 17, 2025

Uh oh!

sahneh Mar 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

sahneh commented Mar 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

For authors

Description

Fixes # (issue) Mention the issue number.

Type of change

How Has This Been Tested?

Checklist

For reviewers

Checklist pre-approval

Checklist post-approval

Checklist post-merge

Uh oh!

dmccloskey left a comment

Choose a reason for hiding this comment

Uh oh!

dmccloskey Mar 17, 2025

Choose a reason for hiding this comment

Uh oh!

sahneh Mar 17, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sahneh commented Mar 8, 2025 •

edited

Loading