feat: hackathon/kg model hack Jack Saleh Sandeep (Team Galway)#137
feat: hackathon/kg model hack Jack Saleh Sandeep (Team Galway)#137SandeepRed wants to merge 2 commits into
Conversation
…gene/protein nodes related to those GO and semantic search for entrez
dmccloskey
left a comment
There was a problem hiding this comment.
Really cool work @SandeepRed and the rest of Team Galway!
Can you please confirm my high-level understanding of the proposed solution to map from the PDF article to PrimeKG?
- Extract disease terms from the PDF using OpenAI Textual Embeddings
- Extract species descriptions from the PDF using OpenAI Textual Embeddings
- Extract disease subgraph from PrimeKG by matching disease terms from step 1 to GO terms and descriptions in PrimeKG
- Extract gene/protein nodes linked to GO terms from step 3
- Embed the gene/protein descriptions from step 4 using OpenAI Textual Embeddings
- Compare the extracted species description embeddings from step 2 to the gene/protein description embeddings from step 5 using FAISS.
|
Dear Douglas, Yes, that is a perfect summary. Apologies for the delay—It was a bank holiday weekend and I was unwell. Regarding the extraction of species descriptions from the PDF using OpenAI Textual Embeddings: Our disease-based approach didn't work as well as expected (surprisingly, likely due to missing mappings), so we went with using high-level GO terms for the subgraph instead We wanted all of this fully automated, maybe merging disease and GO subgraphs or have a weighted scoring approach and utilize metadata like descriptions |
For authors
Description
Ideally would embed the whole text from article and vector search for GO nodes
Final output: species_gene_matches.csv