Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,14 @@
^data-raw$
^vignettes/articles$
^\.github$
^\.github/copilot-instructions\.md$
^\.Rhistory$
^\.lintr$
vignettes/articles/usecase.Rmd
^pkgdown$
^.codecov.yml$
^tests$
^.covrignore$
^\.positai$
^\.claude$
CLAUDE.md$
79 changes: 79 additions & 0 deletions .github/agents/r-package-improver.agent.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
---
description: "Use this agent when the user wants to improve the quality, performance, or maintainability of R package code.\n\nTrigger phrases include:\n- 'improve this R code'\n- 'optimize this function'\n- 'help me write better tests'\n- 'make this more efficient'\n- 'follow R best practices'\n- 'refactor this code'\n- 'improve documentation'\n- 'check if this follows package standards'\n- 'help me improve package quality'\n\nExamples:\n- User shows code and says 'can you help me make this function more efficient?' → invoke this agent to analyze performance and suggest optimizations\n- User asks 'I need to add more comprehensive tests to this function' → invoke this agent to identify gaps and recommend test cases\n- User says 'is this following R package best practices?' → invoke this agent to review structure, style, and conventions\n- User shows a function and asks 'how can I improve this?' → invoke this agent to provide holistic improvement recommendations"
name: r-package-improver
---

# r-package-improver instructions

You are an expert R package developer with deep knowledge of R programming best practices, package architecture, testing frameworks, and CRAN standards. You help developers write cleaner, more efficient, and more maintainable R code.

Your responsibilities:
- Analyze R code for quality, performance, and adherence to best practices
- Identify code style violations and suggest corrections
- Recommend performance optimizations with measurable impact
- Improve test coverage and test quality
- Enhance documentation clarity and completeness
- Suggest refactoring opportunities for maintainability
- Ensure CRAN compliance and package standards

Core principles:
1. Know R idioms: Use vectorization over loops, apply family over iteration, data.table/tidyverse patterns where appropriate
2. Memory efficiency: Identify unnecessary object copies, suggest efficient data structures
3. Error handling: Recommend defensive programming, proper error messages
4. Testing: Suggest testthat patterns, edge cases, and meaningful assertions
5. Documentation: Ensure Roxygen tags are complete, examples are runnable, parameters documented
6. Style consistency: Follow tidyverse or base R conventions consistently

Methodology:
1. Examine the code context: What does it do? What's its intended use? Performance requirements?
2. Identify improvement opportunities by category: performance, style, testing, documentation, maintainability
3. Prioritize by impact: Focus on changes that improve readability, reduce bugs, or significantly improve performance
4. Provide specific, actionable recommendations with before/after examples
5. Consider the package ecosystem: What dependencies exist? Are there better alternatives?

When analyzing code, evaluate:
- Vectorization opportunities (replacing loops or apply calls with vector operations)
- Memory usage (avoid unnecessary copies, use efficient data structures)
- Naming conventions (snake_case for functions/variables, PascalCase rarely used)
- Function length (consider breaking into smaller, testable units)
- Error handling (input validation, informative error messages)
- Test coverage (edge cases, error conditions, realistic inputs)
- Documentation completeness (all parameters, return value, examples)
- Package structure compliance (R/ directory, tests/testthat/, man/ auto-generated)

Output format:
- Prioritized list of improvements with impact/effort assessment
- For each recommendation:
- Category (Performance/Style/Testing/Documentation/Maintainability)
- Current issue with example code snippet
- Recommended solution with before/after comparison
- Rationale (why this improves the code)
- Summary of overall impact
- Order suggestions by: high-impact/low-effort first, then high-impact/medium-effort

Common R package improvements to look for:
- Replace for loops with vectorized operations or lapply/mapply
- Use seq_along() instead of seq(1:length(x))
- Avoid stringsAsFactors issues in functions
- Use proper argument validation at function entry
- Add testthat tests covering edge cases and error conditions
- Improve Roxygen documentation with @param, @return, @examples
- Use consistent coding style (indentation, spacing, naming)
- Avoid global variable assignments (<<-)
- Use :: for namespace clarity when calling other packages
- Consider S3/S4 methods if appropriate

Quality assurance:
- Verify recommendations are specific to R language/packages (not generic)
- Ensure all code examples are syntactically correct
- Check that suggestions follow tidyverse/CRAN conventions when applicable
- Confirm recommendations won't break existing functionality
- Test code examples mentally or verify they're runnable

When to ask for clarification:
- If the code's purpose or requirements are unclear
- If you need to know performance targets or constraints
- If multiple approaches exist and you need preference guidance
- If you need context about existing test coverage
- If the package's dependencies or target audience affect recommendations
- If you need to understand the codebase's conventions before making suggestions
57 changes: 57 additions & 0 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Copilot instructions for `climate`

`climate` is a CRAN R package for downloading in-situ meteorological and hydrological data from OGIMET, IMGW-PIB, NOAA, and University of Wyoming sources. The package targets R >= 4.1.0 and uses roxygen2 with markdown enabled.

## Build, test, and lint commands

Run commands from the package root.

- Load the package for interactive work: `R -q -e 'devtools::load_all()'`
- Regenerate `man/` and `NAMESPACE` after roxygen changes: `R -q -e 'devtools::document()'`
- Run the full test suite: `R -q -e 'devtools::test()'`
- Run a single test file: `R -q -e 'testthat::test_file("tests/testthat/test-meteo_imgw.R")'`
- Run package linting: `R -q -e 'lintr::lint_package()'`
- Run a local package check: `R -q -e 'devtools::check()'`
- Run the CI-style check locally when needed: `R -q -e 'rcmdcheck::rcmdcheck(args = c("--no-manual", "--as-cran", "--run-donttest"), error_on = "warning", check_dir = "check")'`
- Run coverage: `R -q -e 'covr::package_coverage()'`

## High-level architecture

- Public download functions are thin wrappers that dispatch by `interval` to interval-specific implementations. Keep wrapper signatures and the underlying `*_hourly()`, `*_daily()`, and `*_monthly()` functions in sync. Examples:
- `meteo_imgw()` -> `meteo_imgw_hourly()`, `meteo_imgw_daily()`, `meteo_imgw_monthly()`
- `hydro_imgw()` -> `hydro_imgw_daily()`, `hydro_imgw_monthly()`
- `meteo_ogimet()` -> `ogimet_hourly()`, `ogimet_daily()`

- The package has separate ingestion paths for each upstream source family:
- **IMGW archive downloads**: archive ZIP files are downloaded from `danepubliczne.imgw.pl`, unpacked, read through `imgw_read()`, then normalized and optionally joined with built-in station metadata.
- **IMGW datastore / telemetry downloads**: `meteo_imgw_datastore()` and `hydro_imgw_datastore()` fetch large monthly telemetry archives from the datastore endpoint. These are raw, high-volume datasets and are handled separately from the archive-style IMGW functions.
- **OGIMET**: HTML is scraped with `XML::readHTMLTable`; station identity is based on WMO IDs. Hourly precipitation post-processing is handled by `precip_split()`.
- **NOAA / Wyoming**: direct file or page downloads for ISH hourly data, Mauna Loa CO2, and Wyoming soundings.

- IMGW column renaming is a distinct normalization layer. Most IMGW functions accept `col_names = "short" | "full" | "polish"` and pass results through `meteo_shortening_imgw()` or `hydro_shortening_imgw()`. The mapping tables live in built-in datasets backed by `data-raw/`.

- Package data and docs follow standard R package patterns:
- exported code in `R/`
- tests in `tests/testthat/`
- built-in datasets in `data/`, generated from `data-raw/`
- roxygen-generated docs in `man/`

## Key conventions

- Do not hand-edit `man/` or `NAMESPACE`; update roxygen comments and run `devtools::document()`.

- Do not hand-edit `data/*.rda`; regenerate datasets from the relevant scripts in `data-raw/` and then use `usethis::use_data(...)`.

- Preserve graceful network-failure behavior. User-facing download functions commonly keep `allow_failure = TRUE` and wrap the real worker in a `tryCatch`, while the underlying implementation lives in a `*_bp` helper. Reuse `test_url()` for download gating instead of introducing hard failures for transient network issues.

- Tests that touch the network are written to be offline-safe. Follow the existing pattern at the top of network tests: `if (!curl::has_internet()) return(invisible(NULL))`.

- IMGW station handling is source-specific. Meteorological IMGW archive functions expect station names in uppercase, not numeric IDs; renamed stations may need multiple names such as `c("POZNAŃ", "POZNAŃ-ŁAWICA")`.

- Preserve the encoding fallback logic in `imgw_read()`. IMGW files vary in delimiter and encoding, so the CP1250 / UTF-8 / transliteration branches are intentional.

- If you add a new IMGW column, update both the abbreviation source data in `data-raw/` and the runtime shortening layer in `R/meteo_shortening_imgw.R` or `R/hydro_shortening_imgw.R`.

- If you introduce new data.table non-standard evaluation symbols, add them to `R/globals.R` to avoid `R CMD check` NOTES.

- `R/parser.R` is the exported parser implementation. If `inst/parser.R` exists, treat it as a sandbox/helper script rather than the package API surface.
6 changes: 3 additions & 3 deletions .github/workflows/html5-check.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,14 +31,14 @@ jobs:
- name: Install pdflatex
run: sudo apt-get install texlive-latex-base texlive-fonts-recommended texlive-fonts-extra texlive-latex-extra

- name: Install tidy and pandoc
run: sudo apt install tidy pandoc
- name: Install tidy, pandoc and v8
run: sudo apt install tidy pandoc libv8-dev

- name: Remove cached R libraries
run: rm -rf /home/runner/work/_temp/Library/data.table

- name: Install dependencies
run: R -e 'install.packages(c("knitr", "rmarkdown", "XML", "httr", "maps", "dplyr", "tidyr", "xml2", "testthat", "archive"))'
run: R -e 'install.packages(c("knitr", "rmarkdown", "XML", "httr", "maps", "dplyr", "tidyr", "xml2", "testthat", "archive" , "V8"))'

- name: Install data.table from source
run: Rscript -e 'install.packages("data.table", type = "source")'
Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,5 @@ docs
pkgdown
.Renviron
test-out.txt
.positai
.aider*
39 changes: 39 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project

`climate` is a CRAN R package that scrapes and downloads in-situ meteorological and hydrological data from public repositories: OGIMET, University of Wyoming soundings, NOAA ISH and CO2 (Mauna Loa), and IMGW-PIB (Poland). Standard R-package layout: code in `R/`, roxygen-generated docs in `man/`, tests in `tests/testthat/`, built-in datasets in `data/` (RDAs generated from `data-raw/`), example data in `inst/extdata/`, vignettes in `vignettes/`. Minimum R is 4.1.0; documentation is generated with roxygen2 markdown mode (do not hand-edit `man/` or `NAMESPACE`).

## Common commands

Run from the package root in R:

- `devtools::load_all()` — interactive load for development.
- `devtools::document()` — regenerate `man/` and `NAMESPACE` after touching roxygen blocks.
- `devtools::test()` — run the full test suite (testthat).
- `testthat::test_file("tests/testthat/test-meteo_imgw.R")` — run a single test file.
- `devtools::check()` (or `R CMD check`) — full package check; CI runs this on macOS/Windows/Ubuntu (R devel, release, 4.1).
- `lintr::lint_package()` — uses the custom `.lintr` (line length 120, cyclocomp limit 33, several default linters disabled). Respect those limits when adding code.
- `covr::package_coverage()` — coverage. Project target is 60%; `R/sounding_wyoming.R`, `R/imgw_read.R`, and `R/onAttach.R` are excluded via `.covrignore`.
- Built-in datasets are regenerated by sourcing the relevant scripts in `data-raw/` and re-running `usethis::use_data(...)`; do not edit `data/*.rda` by hand.

## Architecture

**Wrapper-then-implementation pattern.** Public entry points dispatch on `interval` to per-resolution implementations: `meteo_imgw()` → `meteo_imgw_hourly/daily/monthly()`, `hydro_imgw()` → `hydro_imgw_daily/monthly()`, `meteo_ogimet()` → `ogimet_hourly/daily()`. When adding a parameter to a wrapper, plumb it through every implementation it dispatches to.

**Three independent data-source families**, each with its own download/parse path:

- **IMGW-PIB** (Polish): downloads ZIPs from `danepubliczne.imgw.pl`, unzips, then reads CSVs through `imgw_read.R`. The reader has multi-step encoding fallbacks (CP1250, UTF-8, optional `iconv ISO-8859-2 → ASCII//TRANSLIT`); preserve those branches when editing — Polish station names contain diacritics and station files vary in delimiter/encoding. Stations are selected by NAME in capital letters (e.g. `"POZNAŃ"`), not by numeric ID. Some renamed stations require multiple names, e.g. `c("POZNAŃ", "POZNAŃ-ŁAWICA")`. Metadata lives in the built-in `imgw_meteo_stations` / `imgw_hydro_stations` datasets and in `R/clean_metadata_*.R`.
- **OGIMET**: HTML scraping via `XML::readHTMLTable` from `ogimet.com`. Stations are identified by WMO ID. `precip_split` / `R/precip_split.R` handles 6/12/24h precipitation disaggregation for hourly data.
- **NOAA / Wyoming**: direct file downloads (ISH gzipped fixed-width, CO2 text, sounding HTML).

**Column-name shortening layer.** Most IMGW download functions accept `col_names = "short" | "full" | "polish"` and pass the raw frame through `meteo_shortening_imgw()` / `hydro_shortening_imgw()` (in `R/*_shortening_imgw.R`). Full and short names are looked up against `imgw_meteo_abbrev` / `imgw_hydro_abbrev` (built-in data). When you add a new IMGW column, update both the abbrev table (`data-raw/`) and the shortener.

**Graceful network failure** is required for CRAN. Use `test_url()` (`R/test_url.R`) to gate downloads, and follow the existing `allow_failure = TRUE` pattern: wrap the real worker (`*_bp` "best practice" inner function) in `tryCatch` so user-facing functions return `NULL`/`invisible()` with a `message()` instead of erroring. Tests follow the same convention — every network test starts with `if (!curl::has_internet()) return(invisible(NULL))`. Don't add tests that fail when offline.

**Other notes.**
- `R/globals.R` holds `utils::globalVariables(...)` declarations needed because of data.table's NSE; add new NSE symbols there to keep `R CMD check` clean.
- `R/onAttach.R` prints a startup message; it's covr-ignored and behind `interactive() && runif < 0.25`.
- `inst/parser.R` and `R/parser.R` exist separately — `R/parser.R` is the exported package function; `inst/parser.R` is a sandbox script (currently untracked per `git status`). Don't conflate them.
3 changes: 2 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: climate
Title: Interface to Download Meteorological (and Hydrological) Datasets
Version: 1.3.0
Version: 1.4.0
Authors@R: c(person(given = "Bartosz",
family = "Czernecki",
role = c("aut", "cre"),
Expand Down Expand Up @@ -35,6 +35,7 @@ Imports:
curl,
data.table,
httr,
R6,
stringi,
XML
Suggests:
Expand Down
4 changes: 4 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# Generated by roxygen2: do not edit by hand

export(.onAttach)
export(compute_relative_humidity)
export(find_all_station_names)
export(hydro_imgw)
export(hydro_imgw_daily)
Expand All @@ -15,18 +16,21 @@ export(meteo_imgw_monthly)
export(meteo_noaa_co2)
export(meteo_noaa_hourly)
export(meteo_ogimet)
export(meteo_ogimet_synop)
export(meteo_shortening_imgw)
export(nearest_stations_imgw)
export(nearest_stations_noaa)
export(nearest_stations_ogimet)
export(ogimet_daily)
export(ogimet_hourly)
export(parser)
export(sounding_wyoming)
export(spheroid_dist)
export(stations_hydro_imgw_telemetry)
export(stations_meteo_imgw_telemetry)
export(stations_ogimet)
export(test_url)
import(R6)
import(data.table)
import(httr)
importFrom(XML,readHTMLTable)
Expand Down
11 changes: 10 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,13 @@
# climate 1.4.0

* adding the `parser()` function for reading raw SYNOP messages
* updating the `meteo_ogimet()` function to use the new `parser()`, but also keep possibility to use HTML scraping engine
* minor fixes
* adding label description to `hydro_imgw()` datasets to easen understanding of the data and avoid confusion with units (e.g. "Q [m3/s]" instead of "Q")
* updated documentation and vignettes to reflect changes in the code and new features
* unified R code syntax for assignments


# climate 1.3.0

* adapting code to most recent changes in the IMGW-PIB repository:
Expand All @@ -10,7 +20,6 @@
"WARSZAWA-OKECIE", "WARSZAWA-OBSERWATORIUM", etc.)



# climate 1.2.9

* fixes for corrupted header files in `meteo_imgw_` family of functions due to changes in the IMGW-PIB repository
Expand Down
3 changes: 2 additions & 1 deletion R/clean_metadata_hydro.R
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,13 @@ clean_metadata_hydro = function(address, interval) {

temp = tempfile()
test_url(link = address, output = temp)
a = read.csv(temp, header = FALSE, stringsAsFactors = FALSE)$V1
a = read.csv(temp, header = FALSE, stringsAsFactors = FALSE, fileEncoding = "Windows-1250")$V1

inds = grepl("^[A-Z]{2}.{5}", a)

code = trimws(substr(a, 1, 7))[inds]
name = trimws(substr(a, 10, nchar(a)))[inds]
a = data.frame(parameters = code, label = name)
a$label = stringi::stri_trans_general(a$label, 'LATIN-ASCII')
return(a)
}
45 changes: 45 additions & 0 deletions R/compute_relative_humidity.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
#' Compute relative humidity from air temperature and dew-point temperature
#'
#' Uses the August-Roche-Magnus approximation to derive relative humidity from
#' the 2-metre air temperature and dew-point temperature.
#'
#' @param t2m Numeric vector. Air temperature (2 m) in degrees Celsius.
#' @param dpt2m Numeric vector. Dew-point temperature (2 m) in degrees Celsius.
#' Must be the same length as `t2m`.
#'
#' @return Numeric vector of relative humidity values in percent (0–100).
#' Returns `NA` where either input is `NA`. Values are not clamped, so
#' rounding errors may produce results marginally outside 0–100.
#'
#' @details
#' The August-Roche-Magnus approximation is:
#'
#' \deqn{RH = 100 \times
#' \frac{\exp\!\bigl(\tfrac{17.625\,T_d}{243.04 + T_d}\bigr)}
#' {\exp\!\bigl(\tfrac{17.625\,T}{243.04 + T}\bigr)}}
#'
#' where \eqn{T} is the air temperature and \eqn{T_d} is the dew-point
#' temperature, both in degrees Celsius. The coefficients (17.625 and 243.04)
#' follow Alduchov & Eskridge (1996).
#'
#' @references
#' Alduchov, O. A., & Eskridge, R. E. (1996). Improved Magnus form approximation
#' of saturation vapor pressure. *Journal of Applied Meteorology*, 35(4), 601–609.
#'
#' @examples
#' compute_relative_humidity(t2m = 20, dpt2m = 10) # ~52 %
#' compute_relative_humidity(t2m = 0, dpt2m = 0) # 100 %
#' compute_relative_humidity(t2m = c(20, 15, NA), dpt2m = c(10, 12, 8))
#'
#' @export
compute_relative_humidity = function(t2m, dpt2m) {
if (!is.numeric(t2m) || !is.numeric(dpt2m)) {
stop("`t2m` and `dpt2m` must be numeric vectors")
}
if (length(t2m) != length(dpt2m)) {
stop("`t2m` and `dpt2m` must have the same length")
}
a = 17.625
b = 243.04
100 * exp((a * dpt2m) / (b + dpt2m)) / exp((a * t2m) / (b + t2m))
}
Loading
Loading