This project loads clinical trial data into a Neo4j graph database and models the relationships between trials, conditions, interventions, sponsors, and collaborators. It enables graph-based exploration of clinical research data from ClinicalTrials.gov
# Clone the repository
git clone https://github.com/Gabrielm3/clinical-trials-knowledge-graph
cd clinical-trials-knowledge-graph
# Start Neo4j database
docker-compose up -d neo4j
# Wait for Neo4j to be ready (about 30 seconds)
docker-compose logs -f neo4j
# Install Python dependencies (local development)
pip install -r requirements.txt
# Load data into Neo4j
python scripts/load_to_neo4j.py
# Access Neo4j Browser at http://localhost:7474
# Login: neo4j / clinicaltrials123- Docker & Docker Compose
- Python 3.11+ (for local development)
- Git
Each clinical trial is connected to related entities through clear relationships:
graph TD
T[Trial] --> |HAS_CONDITION| C[Condition]
T --> |HAS_INTERVENTION| I[Intervention]
T --> |SPONSORED_BY| S[Sponsor]
T --> |COLLABORATED_BY| CB[Collaborator]
Cypher Schema:
(Trial)-[:HAS_CONDITION]->(Condition)
(Trial)-[:HAS_INTERVENTION]->(Intervention)
(Trial)-[:SPONSORED_BY]->(Sponsor)
(Trial)-[:COLLABORATED_BY]->(Collaborator)# Start only Neo4j
docker-compose up -d neo4j
# Start everything (Neo4j + App)
docker-compose --profile app up -d
# View logs
docker-compose logs -f neo4j
# Stop services
docker-compose down
# Clean up (removes volumes)
docker-compose down -vA sample CSV row like:
| Field | Value |
|---|---|
| NCT Number | NCT001 |
| Conditions | Diabetes; Obesity |
| Interventions | Metformin |
| Sponsor | NIH |
| Collaborators | Harvard; UCSF |
Creates this graph structure:
graph LR
T["Trial<br/>NCT001"] --> |HAS_CONDITION| D[Diabetes]
T --> |HAS_CONDITION| O[Obesity]
T --> |HAS_INTERVENTION| M[Metformin]
T --> |SPONSORED_BY| N[NIH]
T --> |COLLABORATED_BY| H[Harvard]
T --> |COLLABORATED_BY| U[UCSF]
- Open Neo4j browser at http://localhost:7474
- Login with credentials:
neo4j/clinicaltrials123 - Run this Cypher query to view a subgraph:
MATCH (t:Trial)-[r]->(n)
RETURN t, r, n
LIMIT 50// Find trials for specific condition
MATCH (t:Trial)-[:HAS_CONDITION]->(c:Condition {name: "Coronavirus Infections"})
RETURN t.title, t.status
LIMIT 10
// Most common sponsors
MATCH (s:Sponsor)<-[:SPONSORED_BY]-(t:Trial)
RETURN s.name, count(t) as trial_count
ORDER BY trial_count DESC
LIMIT 10
// Trials with multiple conditions
MATCH (t:Trial)-[:HAS_CONDITION]->(c:Condition)
WITH t, count(c) as condition_count
WHERE condition_count > 1
RETURN t.title, condition_count
ORDER BY condition_count DESC# Local development setup
python -m venv venv
source venv/bin/activate # or venv\Scripts\activate on Windows
pip install -r requirements.txt
# Copy environment file
cp .env.example .env
# Edit .env with your credentials
# Run locally
python scripts/load_to_neo4j.py├── data/
│ └── ctg-studies.csv # Clinical trials dataset
├── scripts/
│ └── load_to_neo4j.py # Data loading script
├── docker-compose.yml # Docker services configuration
├── Dockerfile # Application container
├── requirements.txt # Python dependencies
├── config.py # Configuration management
└── .env # Environment variables
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
MIT License
