An AI-powered Metabolic Engineering Assistant designed to map biochemical pathways, extract enzymatic data, and optimize protein selection for synthetic biology. This tool leverages Neo4j Knowledge Graphs, ESM-2 Transformer Embeddings, and LLMs to provide detailed blueprints for producing high-value compounds like Astaxanthin.
- **Metabolic Pathfinding: Automatically identifies the "Shortest Path" between a precursor (e.g., beta-Carotene) and a target molecule (e.g., Astaxanthin).
- **Enzyme Extraction: Integrates with KEGG and UniProt to retrieve specific EC numbers and amino acid sequences for each reaction step.
- **Transformer Embeddings: Uses Meta's facebook/esm2_t6_8M_UR50D model to convert protein sequences into 320-dimensional vectors for functional similarity search.
- **Graph-Based Intelligence: Stores complex biological relationships in Neo4j, allowing for the detection of "metabolic leaks" and co-factor requirements.
- **Genome-Host Alignment: Evaluates enzyme compatibility based on source organism data to ensure pathways are stable within a specific microbial host.
- **Automated Ingestion: Streamlined pipeline for crawling KEGG reaction maps and enriching the graph with real-time protein data.
- **Knowledge Graph (Neo4j): Acts as the "Brain," storing nodes for Compounds, Reactions, and Enzymes, and edges representing metabolic transformations.
- **Protein Language Model (ESM-2): Applies a Transformer architecture to understand the "biological grammar" of enzymes and generate functional embeddings.
- **Bio-Informatics Extractor: A custom ingestion engine that bridges the gap between raw web-based databases (KEGG/UniProt) and a structured graph.
- **Vector Search: Enables mathematical comparison of enzymes to find the most efficient catalysts across different genomes.
- Python 3.11+
- Docker (optional, for containerized deployment)
- Access to OpenAI API or other LLM providers like Ollama
- Vector database setup (e.g., ChromaDB)
-
Clone the repository:
git clone https://github.com/SilasPenda/Policy-Compliance-Agent cd policy-compliance-auditor -
Create & activate virtual environment:
python -m venv .venv source .venv/bin/activate (Linux & Mac) ./.venv/Scripts/activate (Windows)
-
Install requirements:
python -m pip install --upgrade pip pip install -r requirements.txt
-
Create a .env file and add your credentials:
-
Ingest biochemical data and generate AI embeddings:
python ingestion/graph_ingestor.py
-
Launch API
uvicorn deployment.api:app --reload
-
Start App
streamlit run deployment/app.py