This folder includes download_github_zip.py, a helper script to:
- Download one or more dataset links
- Convert GitHub
blobURLs to direct download URLs - Extract
.zipfiles into thisdata/directory - Delete the downloaded
.zipfile after successful extraction
- Python 3
From the project root (VASTKnowledgeGraphVisualization), run:
python3 data/download_github_zip.py <url1> [url2 ...]python3 data/download_github_zip.py \
https://github.com/vast-challenge/2025-data/blob/main/MC1_release.zippython3 data/download_github_zip.py \
https://github.com/vast-challenge/2025-data/blob/main/MC1_release.zip \
https://github.com/vast-challenge/2025-data/blob/main/DC_release.zip--dry-run: show what will be downloaded without downloading--output-dir <path>: save and extract into another directory
Example:
python3 data/download_github_zip.py \
--dry-run \
https://github.com/vast-challenge/2025-data/blob/main/MC1_release.zipcreate_datasets.py creates two additional Knowledge Graphs which can be used to try out the application:
genre_influence.jsonrelationships between songs and albums from the VAST 2025 MC1 dataset, organized by musical genre.asoiaf_interaction.jsonundirected interaction graph of characters from JRR Martin's A Song of Ice and Fire series, based from data from https://github.com/mathbeveridge/asoiaf by Andrew Beveridge released under CC BY-NC-SA 4.0.
- Python 3
- Networkx
We recommend running ./dev.sh the first time in order to create the python environment and activating the python environment.
From the project root (VASTKnowledgeGraphVisualization), run:
./dev.sh
source api/venv/bin/activate
From the project root (VASTKnowledgeGraphVisualization), run:
python3 data/create_datasets.py
You will find the datasets in the data directory.
--output-dir <path>: save the datasets into another directory.
Contributions to this folder, from the git history:
- Salvo Rinzivillo — the download script and this guide.
- Giulia Fabiani — the dataset creation script and this guide.