bczernecki · bczernecki · Nov 7, 2025 · Jan 19, 2026 · Jan 19, 2026 · Jan 19, 2026
diff --git a/.Rbuildignore b/.Rbuildignore
@@ -10,10 +10,14 @@
 ^data-raw$
 ^vignettes/articles$
 ^\.github$
+^\.github/copilot-instructions\.md$
 ^\.Rhistory$
 ^\.lintr$
 vignettes/articles/usecase.Rmd
 ^pkgdown$
 ^.codecov.yml$
 ^tests$
 ^.covrignore$
+^\.positai$
+^\.claude$
+CLAUDE.md$
diff --git a/.github/agents/r-package-improver.agent.md b/.github/agents/r-package-improver.agent.md
@@ -0,0 +1,79 @@
+---
+description: "Use this agent when the user wants to improve the quality, performance, or maintainability of R package code.\n\nTrigger phrases include:\n- 'improve this R code'\n- 'optimize this function'\n- 'help me write better tests'\n- 'make this more efficient'\n- 'follow R best practices'\n- 'refactor this code'\n- 'improve documentation'\n- 'check if this follows package standards'\n- 'help me improve package quality'\n\nExamples:\n- User shows code and says 'can you help me make this function more efficient?' → invoke this agent to analyze performance and suggest optimizations\n- User asks 'I need to add more comprehensive tests to this function' → invoke this agent to identify gaps and recommend test cases\n- User says 'is this following R package best practices?' → invoke this agent to review structure, style, and conventions\n- User shows a function and asks 'how can I improve this?' → invoke this agent to provide holistic improvement recommendations"
+name: r-package-improver
+---
+
+# r-package-improver instructions
+
+You are an expert R package developer with deep knowledge of R programming best practices, package architecture, testing frameworks, and CRAN standards. You help developers write cleaner, more efficient, and more maintainable R code.
+
+Your responsibilities:
+- Analyze R code for quality, performance, and adherence to best practices
+- Identify code style violations and suggest corrections
+- Recommend performance optimizations with measurable impact
+- Improve test coverage and test quality
+- Enhance documentation clarity and completeness
+- Suggest refactoring opportunities for maintainability
+- Ensure CRAN compliance and package standards
+
+Core principles:
+1. Know R idioms: Use vectorization over loops, apply family over iteration, data.table/tidyverse patterns where appropriate
+2. Memory efficiency: Identify unnecessary object copies, suggest efficient data structures
+3. Error handling: Recommend defensive programming, proper error messages
+4. Testing: Suggest testthat patterns, edge cases, and meaningful assertions
+5. Documentation: Ensure Roxygen tags are complete, examples are runnable, parameters documented
+6. Style consistency: Follow tidyverse or base R conventions consistently
+
+Methodology:
+1. Examine the code context: What does it do? What's its intended use? Performance requirements?
+2. Identify improvement opportunities by category: performance, style, testing, documentation, maintainability
+3. Prioritize by impact: Focus on changes that improve readability, reduce bugs, or significantly improve performance
+4. Provide specific, actionable recommendations with before/after examples
+5. Consider the package ecosystem: What dependencies exist? Are there better alternatives?
+
+When analyzing code, evaluate:
+- Vectorization opportunities (replacing loops or apply calls with vector operations)
+- Memory usage (avoid unnecessary copies, use efficient data structures)
+- Naming conventions (snake_case for functions/variables, PascalCase rarely used)
+- Function length (consider breaking into smaller, testable units)
+- Error handling (input validation, informative error messages)
+- Test coverage (edge cases, error conditions, realistic inputs)
+- Documentation completeness (all parameters, return value, examples)
+- Package structure compliance (R/ directory, tests/testthat/, man/ auto-generated)
+
+Output format:
+- Prioritized list of improvements with impact/effort assessment
+- For each recommendation:
+  - Category (Performance/Style/Testing/Documentation/Maintainability)
+  - Current issue with example code snippet
+  - Recommended solution with before/after comparison
+  - Rationale (why this improves the code)
+- Summary of overall impact
+- Order suggestions by: high-impact/low-effort first, then high-impact/medium-effort
+
+Common R package improvements to look for:
+- Replace for loops with vectorized operations or lapply/mapply
+- Use seq_along() instead of seq(1:length(x))
+- Avoid stringsAsFactors issues in functions
+- Use proper argument validation at function entry
+- Add testthat tests covering edge cases and error conditions
+- Improve Roxygen documentation with @param, @return, @examples
+- Use consistent coding style (indentation, spacing, naming)
+- Avoid global variable assignments (<<-)
+- Use :: for namespace clarity when calling other packages
+- Consider S3/S4 methods if appropriate
+
+Quality assurance:
+- Verify recommendations are specific to R language/packages (not generic)
+- Ensure all code examples are syntactically correct
+- Check that suggestions follow tidyverse/CRAN conventions when applicable
+- Confirm recommendations won't break existing functionality
+- Test code examples mentally or verify they're runnable
+
+When to ask for clarification:
+- If the code's purpose or requirements are unclear
+- If you need to know performance targets or constraints
+- If multiple approaches exist and you need preference guidance
+- If you need context about existing test coverage
+- If the package's dependencies or target audience affect recommendations
+- If you need to understand the codebase's conventions before making suggestions
diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md
@@ -0,0 +1,57 @@
+# Copilot instructions for `climate`
+
+`climate` is a CRAN R package for downloading in-situ meteorological and hydrological data from OGIMET, IMGW-PIB, NOAA, and University of Wyoming sources. The package targets R >= 4.1.0 and uses roxygen2 with markdown enabled.
+
+## Build, test, and lint commands
+
+Run commands from the package root.
+
+- Load the package for interactive work: `R -q -e 'devtools::load_all()'`
+- Regenerate `man/` and `NAMESPACE` after roxygen changes: `R -q -e 'devtools::document()'`
+- Run the full test suite: `R -q -e 'devtools::test()'`
+- Run a single test file: `R -q -e 'testthat::test_file("tests/testthat/test-meteo_imgw.R")'`
+- Run package linting: `R -q -e 'lintr::lint_package()'`
+- Run a local package check: `R -q -e 'devtools::check()'`
+- Run the CI-style check locally when needed: `R -q -e 'rcmdcheck::rcmdcheck(args = c("--no-manual", "--as-cran", "--run-donttest"), error_on = "warning", check_dir = "check")'`
+- Run coverage: `R -q -e 'covr::package_coverage()'`
+
+## High-level architecture
+
+- Public download functions are thin wrappers that dispatch by `interval` to interval-specific implementations. Keep wrapper signatures and the underlying `*_hourly()`, `*_daily()`, and `*_monthly()` functions in sync. Examples:
+  - `meteo_imgw()` -> `meteo_imgw_hourly()`, `meteo_imgw_daily()`, `meteo_imgw_monthly()`
+  - `hydro_imgw()` -> `hydro_imgw_daily()`, `hydro_imgw_monthly()`
+  - `meteo_ogimet()` -> `ogimet_hourly()`, `ogimet_daily()`
+
+- The package has separate ingestion paths for each upstream source family:
+  - **IMGW archive downloads**: archive ZIP files are downloaded from `danepubliczne.imgw.pl`, unpacked, read through `imgw_read()`, then normalized and optionally joined with built-in station metadata.
+  - **IMGW datastore / telemetry downloads**: `meteo_imgw_datastore()` and `hydro_imgw_datastore()` fetch large monthly telemetry archives from the datastore endpoint. These are raw, high-volume datasets and are handled separately from the archive-style IMGW functions.
+  - **OGIMET**: HTML is scraped with `XML::readHTMLTable`; station identity is based on WMO IDs. Hourly precipitation post-processing is handled by `precip_split()`.
+  - **NOAA / Wyoming**: direct file or page downloads for ISH hourly data, Mauna Loa CO2, and Wyoming soundings.
+
+- IMGW column renaming is a distinct normalization layer. Most IMGW functions accept `col_names = "short" | "full" | "polish"` and pass results through `meteo_shortening_imgw()` or `hydro_shortening_imgw()`. The mapping tables live in built-in datasets backed by `data-raw/`.
+
+- Package data and docs follow standard R package patterns:
+  - exported code in `R/`
+  - tests in `tests/testthat/`
+  - built-in datasets in `data/`, generated from `data-raw/`
+  - roxygen-generated docs in `man/`
+
+## Key conventions
+
+- Do not hand-edit `man/` or `NAMESPACE`; update roxygen comments and run `devtools::document()`.
+
+- Do not hand-edit `data/*.rda`; regenerate datasets from the relevant scripts in `data-raw/` and then use `usethis::use_data(...)`.
+
+- Preserve graceful network-failure behavior. User-facing download functions commonly keep `allow_failure = TRUE` and wrap the real worker in a `tryCatch`, while the underlying implementation lives in a `*_bp` helper. Reuse `test_url()` for download gating instead of introducing hard failures for transient network issues.
+
+- Tests that touch the network are written to be offline-safe. Follow the existing pattern at the top of network tests: `if (!curl::has_internet()) return(invisible(NULL))`.
+
+- IMGW station handling is source-specific. Meteorological IMGW archive functions expect station names in uppercase, not numeric IDs; renamed stations may need multiple names such as `c("POZNAŃ", "POZNAŃ-ŁAWICA")`.
+
+- Preserve the encoding fallback logic in `imgw_read()`. IMGW files vary in delimiter and encoding, so the CP1250 / UTF-8 / transliteration branches are intentional.
+
+- If you add a new IMGW column, update both the abbreviation source data in `data-raw/` and the runtime shortening layer in `R/meteo_shortening_imgw.R` or `R/hydro_shortening_imgw.R`.
+
+- If you introduce new data.table non-standard evaluation symbols, add them to `R/globals.R` to avoid `R CMD check` NOTES.
+
+- `R/parser.R` is the exported parser implementation. If `inst/parser.R` exists, treat it as a sandbox/helper script rather than the package API surface.
diff --git a/.github/workflows/html5-check.yaml b/.github/workflows/html5-check.yaml
@@ -31,14 +31,14 @@ jobs:
       - name: Install pdflatex
         run: sudo apt-get install texlive-latex-base texlive-fonts-recommended texlive-fonts-extra texlive-latex-extra
 
-      - name: Install tidy and pandoc
-        run: sudo apt install tidy pandoc
+      - name: Install tidy, pandoc and v8
+        run: sudo apt install tidy pandoc libv8-dev
 
       - name: Remove cached R libraries
         run: rm -rf /home/runner/work/_temp/Library/data.table
 
       - name: Install dependencies
-        run: R -e 'install.packages(c("knitr", "rmarkdown", "XML", "httr", "maps", "dplyr", "tidyr", "xml2", "testthat", "archive"))'
+        run: R -e 'install.packages(c("knitr", "rmarkdown", "XML", "httr", "maps", "dplyr", "tidyr", "xml2", "testthat", "archive" , "V8"))'
 
       - name: Install data.table from source
         run: Rscript -e 'install.packages("data.table", type = "source")'

diff --git a/.gitignore b/.gitignore
@@ -14,3 +14,5 @@ docs
 pkgdown
 .Renviron
 test-out.txt
+.positai
+.aider*
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -0,0 +1,39 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Project
+
+`climate` is a CRAN R package that scrapes and downloads in-situ meteorological and hydrological data from public repositories: OGIMET, University of Wyoming soundings, NOAA ISH and CO2 (Mauna Loa), and IMGW-PIB (Poland). Standard R-package layout: code in `R/`, roxygen-generated docs in `man/`, tests in `tests/testthat/`, built-in datasets in `data/` (RDAs generated from `data-raw/`), example data in `inst/extdata/`, vignettes in `vignettes/`. Minimum R is 4.1.0; documentation is generated with roxygen2 markdown mode (do not hand-edit `man/` or `NAMESPACE`).
+
+## Common commands
+
+Run from the package root in R:
+
+- `devtools::load_all()` — interactive load for development.
+- `devtools::document()` — regenerate `man/` and `NAMESPACE` after touching roxygen blocks.
+- `devtools::test()` — run the full test suite (testthat).
+- `testthat::test_file("tests/testthat/test-meteo_imgw.R")` — run a single test file.
+- `devtools::check()` (or `R CMD check`) — full package check; CI runs this on macOS/Windows/Ubuntu (R devel, release, 4.1).
+- `lintr::lint_package()` — uses the custom `.lintr` (line length 120, cyclocomp limit 33, several default linters disabled). Respect those limits when adding code.
+- `covr::package_coverage()` — coverage. Project target is 60%; `R/sounding_wyoming.R`, `R/imgw_read.R`, and `R/onAttach.R` are excluded via `.covrignore`.
+- Built-in datasets are regenerated by sourcing the relevant scripts in `data-raw/` and re-running `usethis::use_data(...)`; do not edit `data/*.rda` by hand.
+
+## Architecture
+
+**Wrapper-then-implementation pattern.** Public entry points dispatch on `interval` to per-resolution implementations: `meteo_imgw()` → `meteo_imgw_hourly/daily/monthly()`, `hydro_imgw()` → `hydro_imgw_daily/monthly()`, `meteo_ogimet()` → `ogimet_hourly/daily()`. When adding a parameter to a wrapper, plumb it through every implementation it dispatches to.
+
+**Three independent data-source families**, each with its own download/parse path:
+
+- **IMGW-PIB** (Polish): downloads ZIPs from `danepubliczne.imgw.pl`, unzips, then reads CSVs through `imgw_read.R`. The reader has multi-step encoding fallbacks (CP1250, UTF-8, optional `iconv ISO-8859-2 → ASCII//TRANSLIT`); preserve those branches when editing — Polish station names contain diacritics and station files vary in delimiter/encoding. Stations are selected by NAME in capital letters (e.g. `"POZNAŃ"`), not by numeric ID. Some renamed stations require multiple names, e.g. `c("POZNAŃ", "POZNAŃ-ŁAWICA")`. Metadata lives in the built-in `imgw_meteo_stations` / `imgw_hydro_stations` datasets and in `R/clean_metadata_*.R`.
+- **OGIMET**: HTML scraping via `XML::readHTMLTable` from `ogimet.com`. Stations are identified by WMO ID. `precip_split` / `R/precip_split.R` handles 6/12/24h precipitation disaggregation for hourly data.
+- **NOAA / Wyoming**: direct file downloads (ISH gzipped fixed-width, CO2 text, sounding HTML).
+
+**Column-name shortening layer.** Most IMGW download functions accept `col_names = "short" | "full" | "polish"` and pass the raw frame through `meteo_shortening_imgw()` / `hydro_shortening_imgw()` (in `R/*_shortening_imgw.R`). Full and short names are looked up against `imgw_meteo_abbrev` / `imgw_hydro_abbrev` (built-in data). When you add a new IMGW column, update both the abbrev table (`data-raw/`) and the shortener.
+
+**Graceful network failure** is required for CRAN. Use `test_url()` (`R/test_url.R`) to gate downloads, and follow the existing `allow_failure = TRUE` pattern: wrap the real worker (`*_bp` "best practice" inner function) in `tryCatch` so user-facing functions return `NULL`/`invisible()` with a `message()` instead of erroring. Tests follow the same convention — every network test starts with `if (!curl::has_internet()) return(invisible(NULL))`. Don't add tests that fail when offline.
+
+**Other notes.**
+- `R/globals.R` holds `utils::globalVariables(...)` declarations needed because of data.table's NSE; add new NSE symbols there to keep `R CMD check` clean.
+- `R/onAttach.R` prints a startup message; it's covr-ignored and behind `interactive() && runif < 0.25`.
+- `inst/parser.R` and `R/parser.R` exist separately — `R/parser.R` is the exported package function; `inst/parser.R` is a sandbox script (currently untracked per `git status`). Don't conflate them.
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: climate
 Title: Interface to Download Meteorological (and Hydrological) Datasets
-Version: 1.3.0
+Version: 1.4.0
 Authors@R: c(person(given = "Bartosz",
            family = "Czernecki",
            role = c("aut", "cre"),
@@ -35,6 +35,7 @@ Imports:
     curl,
     data.table,
     httr,
+    R6,
     stringi,
     XML
 Suggests: 

diff --git a/NAMESPACE b/NAMESPACE
@@ -1,6 +1,7 @@
 # Generated by roxygen2: do not edit by hand
 
 export(.onAttach)
+export(compute_relative_humidity)
 export(find_all_station_names)
 export(hydro_imgw)
 export(hydro_imgw_daily)
@@ -15,18 +16,21 @@ export(meteo_imgw_monthly)
 export(meteo_noaa_co2)
 export(meteo_noaa_hourly)
 export(meteo_ogimet)
+export(meteo_ogimet_synop)
 export(meteo_shortening_imgw)
 export(nearest_stations_imgw)
 export(nearest_stations_noaa)
 export(nearest_stations_ogimet)
 export(ogimet_daily)
 export(ogimet_hourly)
+export(parser)
 export(sounding_wyoming)
 export(spheroid_dist)
 export(stations_hydro_imgw_telemetry)
 export(stations_meteo_imgw_telemetry)
 export(stations_ogimet)
 export(test_url)
+import(R6)
 import(data.table)
 import(httr)
 importFrom(XML,readHTMLTable)

diff --git a/NEWS.md b/NEWS.md
@@ -1,3 +1,13 @@
+# climate 1.4.0
+
+* adding the `parser()` function for reading raw SYNOP messages
+* updating the `meteo_ogimet()` function to use the new `parser()`, but also keep possibility to use HTML scraping engine
+* minor fixes
+  * adding label description to `hydro_imgw()` datasets to easen understanding of the data and avoid confusion with units (e.g. "Q [m3/s]" instead of "Q")
+  * updated documentation and vignettes to reflect changes in the code and new features
+  * unified R code syntax for assignments
+
+
 # climate 1.3.0
 
 * adapting code to most recent changes in the IMGW-PIB repository:
@@ -10,7 +20,6 @@
   "WARSZAWA-OKECIE", "WARSZAWA-OBSERWATORIUM", etc.)
 
 
-
 # climate 1.2.9
 
 * fixes for corrupted header files in `meteo_imgw_` family of functions due to changes in the IMGW-PIB repository

diff --git a/R/clean_metadata_hydro.R b/R/clean_metadata_hydro.R
@@ -11,12 +11,13 @@ clean_metadata_hydro = function(address, interval) {
 
   temp = tempfile()
   test_url(link = address, output = temp)
-  a = read.csv(temp, header = FALSE, stringsAsFactors = FALSE)$V1
+  a = read.csv(temp, header = FALSE, stringsAsFactors = FALSE, fileEncoding = "Windows-1250")$V1
 
   inds = grepl("^[A-Z]{2}.{5}", a)
 
   code = trimws(substr(a, 1, 7))[inds]
   name = trimws(substr(a, 10, nchar(a)))[inds]
   a = data.frame(parameters = code, label = name)
+  a$label = stringi::stri_trans_general(a$label, 'LATIN-ASCII')
   return(a)
 }
diff --git a/R/compute_relative_humidity.R b/R/compute_relative_humidity.R
@@ -0,0 +1,45 @@
+#' Compute relative humidity from air temperature and dew-point temperature
+#'
+#' Uses the August-Roche-Magnus approximation to derive relative humidity from
+#' the 2-metre air temperature and dew-point temperature.
+#'
+#' @param t2m Numeric vector. Air temperature (2 m) in degrees Celsius.
+#' @param dpt2m Numeric vector. Dew-point temperature (2 m) in degrees Celsius.
+#'   Must be the same length as `t2m`.
+#'
+#' @return Numeric vector of relative humidity values in percent (0–100).
+#'   Returns `NA` where either input is `NA`. Values are not clamped, so
+#'   rounding errors may produce results marginally outside 0–100.
+#'
+#' @details
+#' The August-Roche-Magnus approximation is:
+#'
+#' \deqn{RH = 100 \times
+#'   \frac{\exp\!\bigl(\tfrac{17.625\,T_d}{243.04 + T_d}\bigr)}
+#'        {\exp\!\bigl(\tfrac{17.625\,T}{243.04 + T}\bigr)}}
+#'
+#' where \eqn{T} is the air temperature and \eqn{T_d} is the dew-point
+#' temperature, both in degrees Celsius. The coefficients (17.625 and 243.04)
+#' follow Alduchov & Eskridge (1996).
+#'
+#' @references
+#' Alduchov, O. A., & Eskridge, R. E. (1996). Improved Magnus form approximation
+#' of saturation vapor pressure. *Journal of Applied Meteorology*, 35(4), 601–609.
+#'
+#' @examples
+#' compute_relative_humidity(t2m = 20, dpt2m = 10)   # ~52 %
+#' compute_relative_humidity(t2m = 0,  dpt2m = 0)    # 100 %
+#' compute_relative_humidity(t2m = c(20, 15, NA), dpt2m = c(10, 12, 8))
+#'
+#' @export
+compute_relative_humidity = function(t2m, dpt2m) {
+  if (!is.numeric(t2m) || !is.numeric(dpt2m)) {
+    stop("`t2m` and `dpt2m` must be numeric vectors")
+  }
+  if (length(t2m) != length(dpt2m)) {
+    stop("`t2m` and `dpt2m` must have the same length")
+  }
+  a = 17.625
+  b = 243.04
+  100 * exp((a * dpt2m) / (b + dpt2m)) / exp((a * t2m) / (b + t2m))
+}