Complete technology stack and dependencies for the Safety-by-Design Entrainer Selection Framework
Component
Version
Purpose
Python
3.11+
Core runtime
pip
Latest
Package management
venv
Built-in
Virtual environment
Library
Version
Purpose
NumPy
≥1.24
Numerical computing
Pandas
≥2.0
Data manipulation
SciPy
≥1.11
Scientific computing
Matplotlib
≥3.7
Visualization
Seaborn
≥0.12
Statistical visualization
Phase-Specific Dependencies
Library
Purpose
pubchempy
PubChem API access
rdkit
Molecular fingerprints & clustering
scikit-learn
K-means clustering
hdbscan
Density-based clustering
Phase II: Multi-Vector Selection
Library
Purpose
google-generativeai
Gemini API integration
neo4j
Graph database driver
chromadb
Vector embeddings storage
langchain
RAG orchestration
Engine B: TRIZ Multi-Agent
Library
Purpose
google-generativeai
Gemini API for agents
pydantic
Data validation
asyncio
Async agent coordination
Engine C: Cheminformatics
Library
Purpose
rdkit
Molecular descriptors
mordred
Extended descriptors
scikit-learn
Diversity selection
Phase III: Graph Traversal
Library
Purpose
neo4j
Graph database operations
networkx
Graph algorithms
rdkit
Tanimoto similarity
Phase IV: Bayesian Optimization
Library
Version
Purpose
botorch
≥0.9
Multi-objective BO
gpytorch
≥1.11
Gaussian processes
torch
≥2.0
Deep learning backend
Phase V: Process Simulation
Component
Purpose
pythoncom
COM automation (Windows)
win32com
DWSIM interface
thermo
UNIFAC calculations
Database
Version
Purpose
Neo4j Community
5.x
Graph storage for molecular relationships
ChromaDB
Latest
Vector embeddings for RAG
SQLite
Built-in
Local caching and results storage
API
Purpose
Rate Limits
PubChem PUG REST
Chemical data retrieval
5 req/sec
Google Gemini
LLM for Graph-RAG & TRIZ
Varies by tier
ChemSpider
Supplementary chemical data
API key required
Software
Version
Platform
DWSIM
8.x
Windows (COM automation)
Tool
Purpose
black
Code formatting
isort
Import sorting
flake8
Linting
mypy
Type checking
pre-commit
Git hooks
Tool
Purpose
pytest
Test framework
pytest-cov
Coverage reporting
pytest-asyncio
Async test support
hypothesis
Property-based testing
Tool
Purpose
mkdocs
Documentation site
mkdocs-material
Theme
mkdocstrings
API documentation
Python >= 3.11
Neo4j >= 5.0
DWSIM >= 8.0 (Windows only)
Python 3.11.x (tested)
Neo4j 5.15.0
DWSIM 8.6.8
Component
Minimum
Recommended
RAM
8 GB
16 GB
Storage
10 GB
50 GB
GPU
Not required
CUDA-capable (for BoTorch)
RDKit requires conda or a pre-built wheel:
# Option 1: Conda (recommended)
conda install -c conda-forge rdkit
# Option 2: pip (if available for your platform)
pip install rdkit
# Using Docker
docker run -d \
--name neo4j \
-p 7474:7474 -p 7687:7687 \
-e NEO4J_AUTH=neo4j/password \
neo4j:5-community
Download DWSIM from dwsim.org
Install with COM automation support enabled
Register COM objects (usually automatic)
entrainer-selection
├── Core
│ ├── numpy, pandas, scipy
│ └── pydantic, pyyaml
├── Cheminformatics
│ ├── rdkit
│ ├── pubchempy
│ └── mordred
├── Machine Learning
│ ├── torch
│ ├── botorch
│ └── gpytorch
├── Graph/RAG
│ ├── neo4j
│ ├── chromadb
│ └── langchain
├── LLM
│ └── google-generativeai
└── Simulation
├── thermo
└── pywin32 (Windows)