Skip to content

SilasPenda/Genome-AI-Pathway-Designer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Genome-AI-Pathway-Designer

An AI-powered Metabolic Engineering Assistant designed to map biochemical pathways, extract enzymatic data, and optimize protein selection for synthetic biology. This tool leverages Neo4j Knowledge Graphs, ESM-2 Transformer Embeddings, and LLMs to provide detailed blueprints for producing high-value compounds like Astaxanthin.


Features

  • **Metabolic Pathfinding: Automatically identifies the "Shortest Path" between a precursor (e.g., beta-Carotene) and a target molecule (e.g., Astaxanthin).
  • **Enzyme Extraction: Integrates with KEGG and UniProt to retrieve specific EC numbers and amino acid sequences for each reaction step.
  • **Transformer Embeddings: Uses Meta's facebook/esm2_t6_8M_UR50D model to convert protein sequences into 320-dimensional vectors for functional similarity search.
  • **Graph-Based Intelligence: Stores complex biological relationships in Neo4j, allowing for the detection of "metabolic leaks" and co-factor requirements.
  • **Genome-Host Alignment: Evaluates enzyme compatibility based on source organism data to ensure pathways are stable within a specific microbial host.
  • **Automated Ingestion: Streamlined pipeline for crawling KEGG reaction maps and enriching the graph with real-time protein data.

Architecture

  • **Knowledge Graph (Neo4j): Acts as the "Brain," storing nodes for Compounds, Reactions, and Enzymes, and edges representing metabolic transformations.
  • **Protein Language Model (ESM-2): Applies a Transformer architecture to understand the "biological grammar" of enzymes and generate functional embeddings.
  • **Bio-Informatics Extractor: A custom ingestion engine that bridges the gap between raw web-based databases (KEGG/UniProt) and a structured graph.
  • **Vector Search: Enables mathematical comparison of enzymes to find the most efficient catalysts across different genomes.

Getting Started

Prerequisites

  • Python 3.11+
  • Docker (optional, for containerized deployment)
  • Access to OpenAI API or other LLM providers like Ollama
  • Vector database setup (e.g., ChromaDB)

Installation

  1. Clone the repository:

    git clone https://github.com/SilasPenda/Policy-Compliance-Agent
    cd policy-compliance-auditor
    
  2. Create & activate virtual environment:

    python -m venv .venv
    source .venv/bin/activate (Linux & Mac)
    ./.venv/Scripts/activate (Windows)
    
  3. Install requirements:

    python -m pip install --upgrade pip
    pip install -r requirements.txt
    
  4. Create a .env file and add your credentials:

  5. Ingest biochemical data and generate AI embeddings:

       python ingestion/graph_ingestor.py
    
  6. Launch API

    uvicorn deployment.api:app --reload
    
  7. Start App

    streamlit run deployment/app.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages