This repository contains Python scripts that can be used to convert datasets from Rijkswaterstaat Waterinfo into SQL files that can be imported into various SQL-based database systems
Historical data from RWS Waterinfo can be downloaded here
An example dataset has also been included in this repository (example.csv) under the terms of its public domain status. This example dataset contains the temperature & wind speed data from 1 January 2025 until 30 April 2025 at 4 different locations.
Open a command line and navigate to the folder where you have cloned the repository, and run the main Python script by typing python main.py "C:\data\example.csv", where example.csv is the location of the waterinfo dataset (or just the filename if it is located in the same folder as the Python script).
You can optionally choose to normalize the database using --normalize, and you can choose to remove unused columns using --drop-nulls. Example: python main.py example.csv --normalize --drop-nulls. Normalization is described in more detail below.
The SQL output will be saved in the same directory as the input file, and can then be imported into an SQL-based system such as PostgreSQL or SQLite.
When choosing to normalize the database, many of the columns containing various metadata will be moved to several new tables, with foreign key columns used in the main table (measurements) They are as follows:
- id
- meetapparaat_omschrijving
- meetapparaat_code
- bemonsteringsapparaat_omschrijving
- bemonsteringsapparaat_code
- plaatsbepalingsapparaat_omschrijving
- plaatsbepalingsapparaat_code
- id
- bemonsteringssoort_omschrijving
- bemonsteringssoort_code
- id
- groepering_omschrijving
- groepering_code
- groepering_kanaal
- groepering_type
- id
- meetpunt_identificatie
- locatie_code
- epsg
- x
- y
- id
- notite_omschrijving
- notitie_code
- id
- orgaan_omschrijving
- orgaan_code
- id
- grootheid_omschrijving
- grootheid_code
- parameter_omschrijving
- parameter_code
- hoedanigheid_omschrijving
- hoedanigheid_code
- compartiment_omschrijving
- compartiment_code
- id
- typering_omschrijving
- typering_code
- id
- waardebepalingsmethode_omschrijving
- waardebepalingsmethode_code
- waardebewerkingmethode_omschrijving
- waardebewerkingmethode_code
- id
- monster_identificatie
- waarnemingdatum
- waarnemingtijd
- limietsymbool
- numeriekewaarde
- alfanumeriekewaarde
- kwaliteitsoordeel_code
- referentie
- statuswaarde
- opdrachtgevende_instantie
- bemonsteringshoogte
- referentievlak
- taxon_code
- locatie_id
- groepering_id
- parameter_id
- waardebepaling_id
- apparatuur_id
- orgaan_id
- notitie_id
- typering_id
- bemonsteringssoort_id
The datasets from Waterinfo contain 50 columns, regardless of which data you have requested. However, many of these columns contain (meta)data exclusive to parameters related to chemistry and biology. Therefore, if you request basic data such as temperature, many of the columns will not be in use, and will always contain NULL values. If you enable the drop NULL columns option, these columns will not be used in the output SQL
If you want to, for all measurements, retrieve the date, time, measured value, and name of the location at which the measurement was made, you can perform the following SQL join query
SELECT
l.meetpunt_identificatie,
m.waarnemingdatum,
m.waarnemingtijd,
m.numeriekewaarde
FROM
measurements m
JOIN
locatie l ON m.locatie_id = l.locatie_id;