Skip to content

pfei/hubeau-data

Repository files navigation

hubeau-data

PyPI - Version PyPI - Downloads CI Python Version Checked with mypy Linting: ruff License: MIT Package Manager: uv

Typed, modern Python client for the Hub'Eau water data APIs.

Hub'Eau exposes 15+ REST APIs for French national water data — but no official typed Python client exists. This library fills that gap: Pydantic v2 models, strict typing, and a clean interface ready for data science workflows.

Installation

From PyPI (latest stable release):

pip install hubeau-data
# or
uv add hubeau-data

For development (clone + editable install with all tools):

git clone https://github.com/pfei/hubeau-data.git
cd hubeau-data
uv sync                # core dependencies
uv sync --all-extras   # with optional extras (pandas, etc.) if defined
uv run ruff check .          # lint
uv run mypy .                # type check
uv run pytest -m "not live"  # fast mocked tests — no network required
uv run pytest -m "live" -s   # live integration tests against real Hub'Eau APIs

Quickstart

from hubeau_data.client import HubeauClient
from hubeau_data.models.hydrometrie import ObservationTrParams
from hubeau_data.models.qualite_rivieres import StationPcParams

client = HubeauClient()

# Hydrométrie — real-time observations
# Note: use code_entite (not code_station) to filter observations_tr
params = ObservationTrParams(code_entite=["O001004003"], grandeur_hydro=["Q"], size=3)
observations = client.hydrometrie.get_observations_tr(params=params)
print(observations.count)                        # total records available server-side
print(observations.data[0].date_obs, observations.data[0].resultat_obs)
print(observations.next_cursor)                  # pass to next call to paginate

# Qualité Rivières — water quality stations
stations = client.qualite_rivieres.get_stations(
    params=StationPcParams(code_departement=["75"], size=3)
)
print(stations.count)
print(stations.data[0].code_station, stations.data[0].libelle_station)

# Eau potable — drinking water analyses for a commune
from hubeau_data.models.eau_potable import ResultatEauPotableParams
resultats = client.eau_potable.get_resultats_dis(
    params=ResultatEauPotableParams(code_commune=["75056"], size=5)
)
print(resultats.data[0].libelle_parametre, resultats.data[0].resultat_numerique)

# Phytopharmaceutiques — national pesticide sales
from hubeau_data.models.phytopharmaceutiques import VenteSubstanceParams
ventes = client.phytopharmaceutiques.get_ventes_substances(
    params=VenteSubstanceParams(type_territoire="National", size=5)
)
print(ventes.data[0].libelle_substance, ventes.data[0].quantite, ventes.data[0].annee)

# API health check — works on every API
report = client.hydrometrie.check_health(n_requests=3)
print(report.summary())

# Data coverage — spot-check stations
cov = client.hydrometrie.data_coverage(code_station="O001004003")
print(cov.summary())

Async client

For bulk data collection — e.g. fetching many stations before inserting into a database — AsyncHubeauClient mirrors the sync client and supports asyncio.gather() for parallel requests. Concurrency is capped per API via an asyncio.Semaphore (default: 5, configurable via max_concurrent):

import asyncio
from hubeau_data.async_client import AsyncHubeauClient
from hubeau_data.models.hydrometrie import ObservationTrParams

async def main():
    # max_concurrent=3: at most 3 simultaneous requests to the hydrometrie API
    async with AsyncHubeauClient(max_concurrent=3) as client:
        codes = ["O001004003", "K418001001", "A1234567"]
        tasks = [
            client.hydrometrie.get_observations_tr(
                params=ObservationTrParams(code_entite=[c], grandeur_hydro=["Q"], size=10)
            )
            for c in codes
        ]
        results = await asyncio.gather(*tasks)
        for code, obs in zip(codes, results):
            print(code, obs.count, "total /", len(obs.data), "fetched")

asyncio.run(main())

All 11 APIs are available on AsyncHubeauClient with the same method names as the sync client (get_sites, get_stations, etc.) — just await them. Retry logic (tenacity) applies to async requests too. check_health and data_coverage are sync-only (diagnostic tools, not bulk operations).

API Coverage

API Status Notes
Hydrométrie ✅ Supported Sites, stations, real-time and elaborated observations
Qualité des cours d'eau ⚠️ Partial Stations and analyses. Upstream API has known stability issues
Piézométrie ✅ Supported Stations, chroniques, chroniques temps réel
Qualité des nappes ⚠️ Partial Stations and analyses. Known 503/timeout issues
Écoulement ✅ Supported Stations, observations, campaigns
Température ✅ Supported Stations and chroniques
Prélèvements en eau ✅ Supported Ouvrages, points de prélèvement, chroniques
Hydrobiologie ✅ Supported Stations, indices (IBGN/IBMR/IBD/IPR), taxons
Poisson ✅ Supported Stations, indicateurs IPR/IPR+, observations, operations
Qualité eau potable ✅ Supported Communes/UDI links, analysis results
Phytopharmaceutiques ✅ Supported Purchases and sales by substance and product
Surveillance Littoral 🚫 Skipped API being decommissioned by Hub'Eau
Indicateurs Services 🚧 Maintenance API under maintenance — see services.eaufrance.fr

All supported APIs expose check_health(n_requests) and data_coverage(...), and are available on both HubeauClient (sync) and AsyncHubeauClient (async, except health/coverage).

Features

  • Pydantic v2 models for all responses — strict runtime validation, IDE autocomplete
  • Typed query Params models for every endpoint — no more **kwargs
  • Sync (HubeauClient) and async (AsyncHubeauClient) clients, same method names
  • Automatic retry with exponential backoff (tenacity) on transient errors — Hub'Eau APIs have known stability issues
  • check_health(n_requests) — latency stats per endpoint, healthy ratio
  • data_coverage(...) — data availability windows per station or territory
  • Optional extras: [dataframe], [geo], [viz] — install only what you need

Stack

  • Python 3.13+, mypy --strict, ruff, uv, hatchling, src-layout
  • httpx + tenacity for resilient sync/async HTTP
  • pytest-httpx mocked test suite — CI runs without network dependency

Examples & Scripts

uv run python examples/demo.py
uv run jupyter lab            # open examples/demo.ipynb

Health check scripts for every API under scripts/<api>/check_health.py:

uv run python scripts/hydrometrie/check_health.py --n-requests 3 --random
uv run python scripts/qualite_rivieres/check_health.py --n-requests 2
uv run python scripts/eau_potable/check_health.py --commune 75056
uv run python scripts/phytopharmaceutiques/check_health.py

Exploration scripts under scripts/qualite_rivieres/ and scripts/hydrometrie/.

Roadmap

  • Full Hub'Eau API coverage (11 APIs implemented)
  • check_health and data_coverage on all APIs
  • Typed Params models for every endpoint
  • Automatic retry with exponential backoff (tenacity)
  • Async client (AsyncHubeauClient, all 11 APIs)
  • Optional dependency groups — pandas, geopandas, matplotlib as extras
  • CHANGELOG.md + CONTRIBUTING.md
  • PyPI release (0.1.0, 0.2.0)
  • PagedResponse[T] — all get_* methods expose count, data, next_cursor
  • Rate limiting in async client (Semaphore)
  • Full audit of query parameter names across remaining APIs

License

MIT © Pierre Feilles

About

Typed Python client for the Hubeau water data APIs — Pydantic models, uv, strict mypy

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Contributors

Languages