This project delivers a production-ready statistical framework for assessing national macroinvertebrate trends using 30+ years of Environment Agency (EA) open data. The primary challenge addressed is measurement heterogeneity: reconciling legacy categorical placeholders (e.g., AB indices) with modern numeric counts and sub-sampled data.
By implementing Seasonal Censored-Normal Generalized Additive Models (CNORM GAMMs), this pipeline produces season-aware national indicators that respect interval uncertainty and phenology.
- Language: R
- Modeling:
mgcv(Censored GAMMs with REML smoothing) - Data Engineering: Apache Arrow (for high-performance Parquet processing)
- Visualization:
ggplot2,gratia, andsffor spatial mapping
-
Interval-Censored Likelihood: Every observation is treated as an interval
$[L_i, U_i]$ on a variance-stabilizing$\sqrt{\cdot}$ scale to account for one-significant-figure rounding and categorical bins. - Seasonal Phenology: Separate thin-plate regression splines for Spring and Autumn to capture divergent life-cycle dynamics.
-
Precision Weighting: Implemented
$w_i \propto 1 / (width_i + 1)$ to ensure exact counts carry more statistical weight than broad categorical ranges. - Spatial Heterogeneity: Integrated site-level random intercepts to produce site-marginal national curves, ensuring trends reflect population-level change rather than site-specific noise.
- Model Triangulation: Validated findings across four aligned models: Presence/Absence (Binomial), Ordered Categorical (OCAT), Censored-Poisson, and the headline CNORM.
Across five focal families (Aphelocheiridae, Brachycentridae, Cordulegastridae, Odontoceridae, Potamanthidae), the analysis reveals a characteristic "dip and recovery":
- General Trend: Significant national increase through the 1990s to a mid-2000s peak, followed by stabilization and a recent recovery into the 2020s.
-
Spatial Insight: For the Aphelocheiridae family, approximately 97% of studied sites showed positive abundance shifts between the early (
$\le2005$ ) and recent ($\ge2006$ ) windows.