This repository hosts material related to the creation of the City River Spaces (CRiSp) dataset.
Clone and access this repository, which contains all the scripts required to build the dataset:
git clone https://github.com/CityRiverSpaces/CRiSpDataset.git
cd CRiSpDatasetSet up an environment with all the required dependencies. This can be achieved either via a local install or by using Docker or Apptainer containers.
R should be installed (see e.g. instructions on CRAN).
It is recommended to install the required dependencies in a project environment, which can be set up using renv. In the R terminal:
# install.packages("renv")
renv::init(bare = TRUE)When prompted for which files to use for dependency discovery, you can select 1 (DESCRIPTION file only). Restart the R terminal.
It is then easiest to install the project dependencies using devtools:
install.packages("devtools")
devtools::install_deps()We also provide a Docker image that includes all the required dependencies. The image is published on the GitHub Container Registry (GHCR), see the image page.
In order to start a container from the repository image:
-
Docker should be installed and running.
-
In a terminal window, run the following command to pull the image from the registry:
docker pull ghcr.io/cityriverspaces/crispdataset:latest
-
Start an interactive R session within the container with:
docker run --rm -it ghcr.io/cityriverspaces/crispdataset:latest
NOTE: the
--rmoption removes the container when terminated, the-iand-toptions enable the interactive session.
The published Docker image can also be used with Apptainer (e.g. on DelftBlue):
-
Pull the image from GHCR and convert it to SIF format:
apptainer pull crispdataset.sif docker://ghcr.io/cityriverspaces/crispdataset:latest
-
Start an interactive R session within the (Apptainer) container with:
apptainer run crispdataset.sif
Ideally, a GitHub release of the repository should be published before starting to build a new version of the
dataset. Publishing the release, in fact, triggers the building of the container image, which is then
published with the same tag. Note that, for testing purposes, the image building process can also
be initiated manually (see "Run worflow" in
the Actions tag).
Publishing a release of the GitHub repository also triggers a webhook that archives a snapshot of the repository on Zenodo.
(A new version of) the dataset is built via the following steps (more info on the scripts in the scripts folder):
-
Clean the city population dataset from Eurostat (see the
datafolder). This step can be run on a local workstation using Docker as:docker run --rm -it ghcr.io/cityriverspaces/crispdataset:latest Rscript ./scripts/01-city_rivers_table.R
-
Download all the required input datasets: Open Street Map and Copernicus DEM GLO-30. Also this step can be run on a local workstation using Docker as:
docker run --rm -it ghcr.io/cityriverspaces/crispdataset:latest Rscript ./scripts/02-download_input_datasets.R
-
Generate the delineations for the cities. In order to run the delineations on DelftBlue, copy the scripts and the retrieved input datasets to the cluster, e.g. to
/scratch:scp -r ../CRiSpDataset delftblue:/scratch/fnattino/.
Access DelftBlue, and pull the latest Apptainer image with all the required dependencies:
ssh delftblue cd /scratch/fnattino/CRiSpDataset apptainer pull crispdataset.sif docker://ghcr.io/cityriverspaces/crispdataset:latestSubmit all delineations to the queue:
bash ./scripts/03-generate_corridor.bash --slurm
-
Collect all delineation output in a single file:
apptainer run crispdataset.sif Rscript ./scripts/04-merge-output.R
The file ./output/crisp-dataset.gpkg contains the new version of the dataset, and it can be released as a new version of the
"CRiSp Dataset" record in the "CityRiverSpaces" Community on Zenodo.