diff --git a/.Rbuildignore b/.Rbuildignore index 6037d255..7c282feb 100644 --- a/.Rbuildignore +++ b/.Rbuildignore @@ -10,6 +10,7 @@ ^data-raw$ ^vignettes/articles$ ^\.github$ +^\.github/copilot-instructions\.md$ ^\.Rhistory$ ^\.lintr$ vignettes/articles/usecase.Rmd @@ -17,3 +18,6 @@ vignettes/articles/usecase.Rmd ^.codecov.yml$ ^tests$ ^.covrignore$ +^\.positai$ +^\.claude$ +CLAUDE.md$ diff --git a/.github/agents/r-package-improver.agent.md b/.github/agents/r-package-improver.agent.md new file mode 100644 index 00000000..78525189 --- /dev/null +++ b/.github/agents/r-package-improver.agent.md @@ -0,0 +1,79 @@ +--- +description: "Use this agent when the user wants to improve the quality, performance, or maintainability of R package code.\n\nTrigger phrases include:\n- 'improve this R code'\n- 'optimize this function'\n- 'help me write better tests'\n- 'make this more efficient'\n- 'follow R best practices'\n- 'refactor this code'\n- 'improve documentation'\n- 'check if this follows package standards'\n- 'help me improve package quality'\n\nExamples:\n- User shows code and says 'can you help me make this function more efficient?' → invoke this agent to analyze performance and suggest optimizations\n- User asks 'I need to add more comprehensive tests to this function' → invoke this agent to identify gaps and recommend test cases\n- User says 'is this following R package best practices?' → invoke this agent to review structure, style, and conventions\n- User shows a function and asks 'how can I improve this?' → invoke this agent to provide holistic improvement recommendations" +name: r-package-improver +--- + +# r-package-improver instructions + +You are an expert R package developer with deep knowledge of R programming best practices, package architecture, testing frameworks, and CRAN standards. You help developers write cleaner, more efficient, and more maintainable R code. + +Your responsibilities: +- Analyze R code for quality, performance, and adherence to best practices +- Identify code style violations and suggest corrections +- Recommend performance optimizations with measurable impact +- Improve test coverage and test quality +- Enhance documentation clarity and completeness +- Suggest refactoring opportunities for maintainability +- Ensure CRAN compliance and package standards + +Core principles: +1. Know R idioms: Use vectorization over loops, apply family over iteration, data.table/tidyverse patterns where appropriate +2. Memory efficiency: Identify unnecessary object copies, suggest efficient data structures +3. Error handling: Recommend defensive programming, proper error messages +4. Testing: Suggest testthat patterns, edge cases, and meaningful assertions +5. Documentation: Ensure Roxygen tags are complete, examples are runnable, parameters documented +6. Style consistency: Follow tidyverse or base R conventions consistently + +Methodology: +1. Examine the code context: What does it do? What's its intended use? Performance requirements? +2. Identify improvement opportunities by category: performance, style, testing, documentation, maintainability +3. Prioritize by impact: Focus on changes that improve readability, reduce bugs, or significantly improve performance +4. Provide specific, actionable recommendations with before/after examples +5. Consider the package ecosystem: What dependencies exist? Are there better alternatives? + +When analyzing code, evaluate: +- Vectorization opportunities (replacing loops or apply calls with vector operations) +- Memory usage (avoid unnecessary copies, use efficient data structures) +- Naming conventions (snake_case for functions/variables, PascalCase rarely used) +- Function length (consider breaking into smaller, testable units) +- Error handling (input validation, informative error messages) +- Test coverage (edge cases, error conditions, realistic inputs) +- Documentation completeness (all parameters, return value, examples) +- Package structure compliance (R/ directory, tests/testthat/, man/ auto-generated) + +Output format: +- Prioritized list of improvements with impact/effort assessment +- For each recommendation: + - Category (Performance/Style/Testing/Documentation/Maintainability) + - Current issue with example code snippet + - Recommended solution with before/after comparison + - Rationale (why this improves the code) +- Summary of overall impact +- Order suggestions by: high-impact/low-effort first, then high-impact/medium-effort + +Common R package improvements to look for: +- Replace for loops with vectorized operations or lapply/mapply +- Use seq_along() instead of seq(1:length(x)) +- Avoid stringsAsFactors issues in functions +- Use proper argument validation at function entry +- Add testthat tests covering edge cases and error conditions +- Improve Roxygen documentation with @param, @return, @examples +- Use consistent coding style (indentation, spacing, naming) +- Avoid global variable assignments (<<-) +- Use :: for namespace clarity when calling other packages +- Consider S3/S4 methods if appropriate + +Quality assurance: +- Verify recommendations are specific to R language/packages (not generic) +- Ensure all code examples are syntactically correct +- Check that suggestions follow tidyverse/CRAN conventions when applicable +- Confirm recommendations won't break existing functionality +- Test code examples mentally or verify they're runnable + +When to ask for clarification: +- If the code's purpose or requirements are unclear +- If you need to know performance targets or constraints +- If multiple approaches exist and you need preference guidance +- If you need context about existing test coverage +- If the package's dependencies or target audience affect recommendations +- If you need to understand the codebase's conventions before making suggestions diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md new file mode 100644 index 00000000..b21f957b --- /dev/null +++ b/.github/copilot-instructions.md @@ -0,0 +1,57 @@ +# Copilot instructions for `climate` + +`climate` is a CRAN R package for downloading in-situ meteorological and hydrological data from OGIMET, IMGW-PIB, NOAA, and University of Wyoming sources. The package targets R >= 4.1.0 and uses roxygen2 with markdown enabled. + +## Build, test, and lint commands + +Run commands from the package root. + +- Load the package for interactive work: `R -q -e 'devtools::load_all()'` +- Regenerate `man/` and `NAMESPACE` after roxygen changes: `R -q -e 'devtools::document()'` +- Run the full test suite: `R -q -e 'devtools::test()'` +- Run a single test file: `R -q -e 'testthat::test_file("tests/testthat/test-meteo_imgw.R")'` +- Run package linting: `R -q -e 'lintr::lint_package()'` +- Run a local package check: `R -q -e 'devtools::check()'` +- Run the CI-style check locally when needed: `R -q -e 'rcmdcheck::rcmdcheck(args = c("--no-manual", "--as-cran", "--run-donttest"), error_on = "warning", check_dir = "check")'` +- Run coverage: `R -q -e 'covr::package_coverage()'` + +## High-level architecture + +- Public download functions are thin wrappers that dispatch by `interval` to interval-specific implementations. Keep wrapper signatures and the underlying `*_hourly()`, `*_daily()`, and `*_monthly()` functions in sync. Examples: + - `meteo_imgw()` -> `meteo_imgw_hourly()`, `meteo_imgw_daily()`, `meteo_imgw_monthly()` + - `hydro_imgw()` -> `hydro_imgw_daily()`, `hydro_imgw_monthly()` + - `meteo_ogimet()` -> `ogimet_hourly()`, `ogimet_daily()` + +- The package has separate ingestion paths for each upstream source family: + - **IMGW archive downloads**: archive ZIP files are downloaded from `danepubliczne.imgw.pl`, unpacked, read through `imgw_read()`, then normalized and optionally joined with built-in station metadata. + - **IMGW datastore / telemetry downloads**: `meteo_imgw_datastore()` and `hydro_imgw_datastore()` fetch large monthly telemetry archives from the datastore endpoint. These are raw, high-volume datasets and are handled separately from the archive-style IMGW functions. + - **OGIMET**: HTML is scraped with `XML::readHTMLTable`; station identity is based on WMO IDs. Hourly precipitation post-processing is handled by `precip_split()`. + - **NOAA / Wyoming**: direct file or page downloads for ISH hourly data, Mauna Loa CO2, and Wyoming soundings. + +- IMGW column renaming is a distinct normalization layer. Most IMGW functions accept `col_names = "short" | "full" | "polish"` and pass results through `meteo_shortening_imgw()` or `hydro_shortening_imgw()`. The mapping tables live in built-in datasets backed by `data-raw/`. + +- Package data and docs follow standard R package patterns: + - exported code in `R/` + - tests in `tests/testthat/` + - built-in datasets in `data/`, generated from `data-raw/` + - roxygen-generated docs in `man/` + +## Key conventions + +- Do not hand-edit `man/` or `NAMESPACE`; update roxygen comments and run `devtools::document()`. + +- Do not hand-edit `data/*.rda`; regenerate datasets from the relevant scripts in `data-raw/` and then use `usethis::use_data(...)`. + +- Preserve graceful network-failure behavior. User-facing download functions commonly keep `allow_failure = TRUE` and wrap the real worker in a `tryCatch`, while the underlying implementation lives in a `*_bp` helper. Reuse `test_url()` for download gating instead of introducing hard failures for transient network issues. + +- Tests that touch the network are written to be offline-safe. Follow the existing pattern at the top of network tests: `if (!curl::has_internet()) return(invisible(NULL))`. + +- IMGW station handling is source-specific. Meteorological IMGW archive functions expect station names in uppercase, not numeric IDs; renamed stations may need multiple names such as `c("POZNAŃ", "POZNAŃ-ŁAWICA")`. + +- Preserve the encoding fallback logic in `imgw_read()`. IMGW files vary in delimiter and encoding, so the CP1250 / UTF-8 / transliteration branches are intentional. + +- If you add a new IMGW column, update both the abbreviation source data in `data-raw/` and the runtime shortening layer in `R/meteo_shortening_imgw.R` or `R/hydro_shortening_imgw.R`. + +- If you introduce new data.table non-standard evaluation symbols, add them to `R/globals.R` to avoid `R CMD check` NOTES. + +- `R/parser.R` is the exported parser implementation. If `inst/parser.R` exists, treat it as a sandbox/helper script rather than the package API surface. diff --git a/.github/workflows/html5-check.yaml b/.github/workflows/html5-check.yaml index 0ba7814a..ce1445e1 100644 --- a/.github/workflows/html5-check.yaml +++ b/.github/workflows/html5-check.yaml @@ -31,14 +31,14 @@ jobs: - name: Install pdflatex run: sudo apt-get install texlive-latex-base texlive-fonts-recommended texlive-fonts-extra texlive-latex-extra - - name: Install tidy and pandoc - run: sudo apt install tidy pandoc + - name: Install tidy, pandoc and v8 + run: sudo apt install tidy pandoc libv8-dev - name: Remove cached R libraries run: rm -rf /home/runner/work/_temp/Library/data.table - name: Install dependencies - run: R -e 'install.packages(c("knitr", "rmarkdown", "XML", "httr", "maps", "dplyr", "tidyr", "xml2", "testthat", "archive"))' + run: R -e 'install.packages(c("knitr", "rmarkdown", "XML", "httr", "maps", "dplyr", "tidyr", "xml2", "testthat", "archive" , "V8"))' - name: Install data.table from source run: Rscript -e 'install.packages("data.table", type = "source")' diff --git a/.gitignore b/.gitignore index 74fac250..17050e89 100644 --- a/.gitignore +++ b/.gitignore @@ -14,3 +14,5 @@ docs pkgdown .Renviron test-out.txt +.positai +.aider* diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 00000000..7379afe7 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,39 @@ +# CLAUDE.md + +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. + +## Project + +`climate` is a CRAN R package that scrapes and downloads in-situ meteorological and hydrological data from public repositories: OGIMET, University of Wyoming soundings, NOAA ISH and CO2 (Mauna Loa), and IMGW-PIB (Poland). Standard R-package layout: code in `R/`, roxygen-generated docs in `man/`, tests in `tests/testthat/`, built-in datasets in `data/` (RDAs generated from `data-raw/`), example data in `inst/extdata/`, vignettes in `vignettes/`. Minimum R is 4.1.0; documentation is generated with roxygen2 markdown mode (do not hand-edit `man/` or `NAMESPACE`). + +## Common commands + +Run from the package root in R: + +- `devtools::load_all()` — interactive load for development. +- `devtools::document()` — regenerate `man/` and `NAMESPACE` after touching roxygen blocks. +- `devtools::test()` — run the full test suite (testthat). +- `testthat::test_file("tests/testthat/test-meteo_imgw.R")` — run a single test file. +- `devtools::check()` (or `R CMD check`) — full package check; CI runs this on macOS/Windows/Ubuntu (R devel, release, 4.1). +- `lintr::lint_package()` — uses the custom `.lintr` (line length 120, cyclocomp limit 33, several default linters disabled). Respect those limits when adding code. +- `covr::package_coverage()` — coverage. Project target is 60%; `R/sounding_wyoming.R`, `R/imgw_read.R`, and `R/onAttach.R` are excluded via `.covrignore`. +- Built-in datasets are regenerated by sourcing the relevant scripts in `data-raw/` and re-running `usethis::use_data(...)`; do not edit `data/*.rda` by hand. + +## Architecture + +**Wrapper-then-implementation pattern.** Public entry points dispatch on `interval` to per-resolution implementations: `meteo_imgw()` → `meteo_imgw_hourly/daily/monthly()`, `hydro_imgw()` → `hydro_imgw_daily/monthly()`, `meteo_ogimet()` → `ogimet_hourly/daily()`. When adding a parameter to a wrapper, plumb it through every implementation it dispatches to. + +**Three independent data-source families**, each with its own download/parse path: + +- **IMGW-PIB** (Polish): downloads ZIPs from `danepubliczne.imgw.pl`, unzips, then reads CSVs through `imgw_read.R`. The reader has multi-step encoding fallbacks (CP1250, UTF-8, optional `iconv ISO-8859-2 → ASCII//TRANSLIT`); preserve those branches when editing — Polish station names contain diacritics and station files vary in delimiter/encoding. Stations are selected by NAME in capital letters (e.g. `"POZNAŃ"`), not by numeric ID. Some renamed stations require multiple names, e.g. `c("POZNAŃ", "POZNAŃ-ŁAWICA")`. Metadata lives in the built-in `imgw_meteo_stations` / `imgw_hydro_stations` datasets and in `R/clean_metadata_*.R`. +- **OGIMET**: HTML scraping via `XML::readHTMLTable` from `ogimet.com`. Stations are identified by WMO ID. `precip_split` / `R/precip_split.R` handles 6/12/24h precipitation disaggregation for hourly data. +- **NOAA / Wyoming**: direct file downloads (ISH gzipped fixed-width, CO2 text, sounding HTML). + +**Column-name shortening layer.** Most IMGW download functions accept `col_names = "short" | "full" | "polish"` and pass the raw frame through `meteo_shortening_imgw()` / `hydro_shortening_imgw()` (in `R/*_shortening_imgw.R`). Full and short names are looked up against `imgw_meteo_abbrev` / `imgw_hydro_abbrev` (built-in data). When you add a new IMGW column, update both the abbrev table (`data-raw/`) and the shortener. + +**Graceful network failure** is required for CRAN. Use `test_url()` (`R/test_url.R`) to gate downloads, and follow the existing `allow_failure = TRUE` pattern: wrap the real worker (`*_bp` "best practice" inner function) in `tryCatch` so user-facing functions return `NULL`/`invisible()` with a `message()` instead of erroring. Tests follow the same convention — every network test starts with `if (!curl::has_internet()) return(invisible(NULL))`. Don't add tests that fail when offline. + +**Other notes.** +- `R/globals.R` holds `utils::globalVariables(...)` declarations needed because of data.table's NSE; add new NSE symbols there to keep `R CMD check` clean. +- `R/onAttach.R` prints a startup message; it's covr-ignored and behind `interactive() && runif < 0.25`. +- `inst/parser.R` and `R/parser.R` exist separately — `R/parser.R` is the exported package function; `inst/parser.R` is a sandbox script (currently untracked per `git status`). Don't conflate them. diff --git a/DESCRIPTION b/DESCRIPTION index d45741ae..a8216b54 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,6 +1,6 @@ Package: climate Title: Interface to Download Meteorological (and Hydrological) Datasets -Version: 1.3.0 +Version: 1.4.0 Authors@R: c(person(given = "Bartosz", family = "Czernecki", role = c("aut", "cre"), @@ -35,6 +35,7 @@ Imports: curl, data.table, httr, + R6, stringi, XML Suggests: diff --git a/NAMESPACE b/NAMESPACE index 399347f5..f8c30462 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -1,6 +1,7 @@ # Generated by roxygen2: do not edit by hand export(.onAttach) +export(compute_relative_humidity) export(find_all_station_names) export(hydro_imgw) export(hydro_imgw_daily) @@ -15,18 +16,21 @@ export(meteo_imgw_monthly) export(meteo_noaa_co2) export(meteo_noaa_hourly) export(meteo_ogimet) +export(meteo_ogimet_synop) export(meteo_shortening_imgw) export(nearest_stations_imgw) export(nearest_stations_noaa) export(nearest_stations_ogimet) export(ogimet_daily) export(ogimet_hourly) +export(parser) export(sounding_wyoming) export(spheroid_dist) export(stations_hydro_imgw_telemetry) export(stations_meteo_imgw_telemetry) export(stations_ogimet) export(test_url) +import(R6) import(data.table) import(httr) importFrom(XML,readHTMLTable) diff --git a/NEWS.md b/NEWS.md index ce48e349..f484f134 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,3 +1,13 @@ +# climate 1.4.0 + +* adding the `parser()` function for reading raw SYNOP messages +* updating the `meteo_ogimet()` function to use the new `parser()`, but also keep possibility to use HTML scraping engine +* minor fixes + * adding label description to `hydro_imgw()` datasets to easen understanding of the data and avoid confusion with units (e.g. "Q [m3/s]" instead of "Q") + * updated documentation and vignettes to reflect changes in the code and new features + * unified R code syntax for assignments + + # climate 1.3.0 * adapting code to most recent changes in the IMGW-PIB repository: @@ -10,7 +20,6 @@ "WARSZAWA-OKECIE", "WARSZAWA-OBSERWATORIUM", etc.) - # climate 1.2.9 * fixes for corrupted header files in `meteo_imgw_` family of functions due to changes in the IMGW-PIB repository diff --git a/R/clean_metadata_hydro.R b/R/clean_metadata_hydro.R index 3dd9a22f..5bace841 100644 --- a/R/clean_metadata_hydro.R +++ b/R/clean_metadata_hydro.R @@ -11,12 +11,13 @@ clean_metadata_hydro = function(address, interval) { temp = tempfile() test_url(link = address, output = temp) - a = read.csv(temp, header = FALSE, stringsAsFactors = FALSE)$V1 + a = read.csv(temp, header = FALSE, stringsAsFactors = FALSE, fileEncoding = "Windows-1250")$V1 inds = grepl("^[A-Z]{2}.{5}", a) code = trimws(substr(a, 1, 7))[inds] name = trimws(substr(a, 10, nchar(a)))[inds] a = data.frame(parameters = code, label = name) + a$label = stringi::stri_trans_general(a$label, 'LATIN-ASCII') return(a) } diff --git a/R/compute_relative_humidity.R b/R/compute_relative_humidity.R new file mode 100644 index 00000000..d0cc4145 --- /dev/null +++ b/R/compute_relative_humidity.R @@ -0,0 +1,45 @@ +#' Compute relative humidity from air temperature and dew-point temperature +#' +#' Uses the August-Roche-Magnus approximation to derive relative humidity from +#' the 2-metre air temperature and dew-point temperature. +#' +#' @param t2m Numeric vector. Air temperature (2 m) in degrees Celsius. +#' @param dpt2m Numeric vector. Dew-point temperature (2 m) in degrees Celsius. +#' Must be the same length as `t2m`. +#' +#' @return Numeric vector of relative humidity values in percent (0–100). +#' Returns `NA` where either input is `NA`. Values are not clamped, so +#' rounding errors may produce results marginally outside 0–100. +#' +#' @details +#' The August-Roche-Magnus approximation is: +#' +#' \deqn{RH = 100 \times +#' \frac{\exp\!\bigl(\tfrac{17.625\,T_d}{243.04 + T_d}\bigr)} +#' {\exp\!\bigl(\tfrac{17.625\,T}{243.04 + T}\bigr)}} +#' +#' where \eqn{T} is the air temperature and \eqn{T_d} is the dew-point +#' temperature, both in degrees Celsius. The coefficients (17.625 and 243.04) +#' follow Alduchov & Eskridge (1996). +#' +#' @references +#' Alduchov, O. A., & Eskridge, R. E. (1996). Improved Magnus form approximation +#' of saturation vapor pressure. *Journal of Applied Meteorology*, 35(4), 601–609. +#' +#' @examples +#' compute_relative_humidity(t2m = 20, dpt2m = 10) # ~52 % +#' compute_relative_humidity(t2m = 0, dpt2m = 0) # 100 % +#' compute_relative_humidity(t2m = c(20, 15, NA), dpt2m = c(10, 12, 8)) +#' +#' @export +compute_relative_humidity = function(t2m, dpt2m) { + if (!is.numeric(t2m) || !is.numeric(dpt2m)) { + stop("`t2m` and `dpt2m` must be numeric vectors") + } + if (length(t2m) != length(dpt2m)) { + stop("`t2m` and `dpt2m` must have the same length") + } + a = 17.625 + b = 243.04 + 100 * exp((a * dpt2m) / (b + dpt2m)) / exp((a * t2m) / (b + t2m)) +} diff --git a/R/hydro_imgw_daily.R b/R/hydro_imgw_daily.R index 09f2a29c..159ec04a 100644 --- a/R/hydro_imgw_daily.R +++ b/R/hydro_imgw_daily.R @@ -115,7 +115,10 @@ hydro_imgw_daily_bp = function(year, data1 = data1[, c(1:7, 10, 8:9)] } - colnames(data1) = meta[[1]][, 1] + colnames(data1) = meta[[1]]$parameters + for (labs in seq_along(meta[[1]]$parameters)) { + attr(data1[[labs]], "label") = meta[[1]]$label[[labs]] + } codz_data = rbind(codz_data, data1) } # end of codz_ @@ -129,7 +132,10 @@ hydro_imgw_daily_bp = function(year, unzip(zipfile = temp, exdir = temp2) file2 = paste(temp2, dir(temp2), sep = "/")[1] data2 = imgw_read(translit, file2) - colnames(data2) = gsub(x = meta[[2]][, 1], "^ZJ", "CO") # rename colnames starting with ^ZJ to be changed to ^CO: + colnames(data2) = gsub(x = meta[[2]]$parameters, "^ZJ", "CO") # rename colnames starting with ^ZJ to be changed to ^CO: + for (labs in seq_along(meta[[2]]$parameters)) { + attr(data2[[labs]], "label") = meta[[2]]$label[[labs]] + } zjaw_data = rbind(zjaw_data, data2) } @@ -157,9 +163,9 @@ hydro_imgw_daily_bp = function(year, } # end of loop for years (if more than 1 specified) all_data = do.call(rbind, all_data) - all_data[all_data == 9999] = NA - all_data[all_data == 99999.999] = NA - all_data[all_data == 99.9] = NA + all_data[all_data == 9999] = NA + all_data[all_data == 99999.999] = NA + all_data[all_data == 99.9] = NA all_data[all_data == 999] = NA if (coords) { @@ -182,5 +188,20 @@ hydro_imgw_daily_bp = function(year, #all_data = hydro_shortening_imgw(all_data, col_names = col_names, ...) all_data = unique(all_data) rownames(all_data) = 1:nrow(all_data) + + # Final pass: re-apply label attributes (rbind / data.table conversions / merge can drop them). + for (i in seq_len(nrow(meta[[1]]))) { + p = meta[[1]]$parameters[i] + if (p %in% colnames(all_data)) { + attr(all_data[[p]], "label") = meta[[1]]$label[i] + } + } + for (i in seq_len(nrow(meta[[2]]))) { + p = gsub("^ZJ", "CO", meta[[2]]$parameters[i]) + if (p %in% colnames(all_data)) { + attr(all_data[[p]], "label") = meta[[2]]$label[i] + } + } + return(all_data) } \ No newline at end of file diff --git a/R/hydro_imgw_monthly.R b/R/hydro_imgw_monthly.R index 8f863773..0bba899a 100644 --- a/R/hydro_imgw_monthly.R +++ b/R/hydro_imgw_monthly.R @@ -86,14 +86,18 @@ hydro_imgw_monthly_bp = function(year, unzip(zipfile = temp, exdir = temp2) file1 = paste(temp2, dir(temp2), sep = "/")[1] data1 = imgw_read(translit, file1) - colnames(data1) = meta[, 1] + colnames(data1) = meta$parameters + for (labs in seq_along(meta$parameters)) { + attr(data1[[labs]], "label") = meta$label[[labs]] + } all_data[[i]] = data1 } all_data = do.call(rbind, all_data) - - all_data[all_data == 9999] = NA - all_data[all_data == 99999.999] = NA - all_data[all_data == 99.9] = NA + all_data[all_data == 9999] = NA + all_data[all_data == 99999.999] = NA + all_data[all_data == 99.9] = NA + all_data[all_data == 999] = NA + colnames(all_data) = meta[, 1] # coords if (coords) { @@ -127,5 +131,13 @@ hydro_imgw_monthly_bp = function(year, #all_data = hydro_shortening_imgw(all_data, col_names = col_names, ...) + # Final pass: re-apply label attributes (rbind / merge can drop them). + for (i in seq_len(nrow(meta))) { + p = meta$parameters[i] + if (p %in% colnames(all_data)) { + attr(all_data[[p]], "label") = meta$label[i] + } + } + return(all_data) } diff --git a/R/meteo_imgw.R b/R/meteo_imgw.R index b8cc50d0..f53fe598 100644 --- a/R/meteo_imgw.R +++ b/R/meteo_imgw.R @@ -1,62 +1,90 @@ #' Meteorological data from the IMGW-PIB official repository #' #' Downloading hourly, daily, and monthly meteorological data from the -#' SYNOP / CLIMATE / PRECIP stations available in the danepubliczne.imgw.pl collection. +#' SYNOP / CLIMATE / PRECIP stations, or sub-hourly (10-minute) telemetry data from the +#' automated network, all available in the danepubliczne.imgw.pl collection. #' -#' @param interval temporal resolution of the data ("hourly", "daily", "monthly") -#' @param rank rank of the stations: "synop" (default), "climate" or "precip" -#' @param year vector of years (e.g., 1966:2000) +#' @param interval temporal resolution of the data: `"hourly"`, `"daily"`, or `"monthly"`. +#' Not used when `rank = "telemetry"` (telemetry data are always at 10-minute intervals). +#' Defaults to `NULL`, which is only valid together with `rank = "telemetry"`. +#' @param rank rank of the stations: `"synop"` (default), `"climate"`, `"precip"`, or `"telemetry"`. +#' Use `"telemetry"` for the automated IMGW datastore network (data available since 2008). +#' @param year vector of years (e.g., `1966:2000`). +#' For `rank = "telemetry"` all years must be >= 2008. #' @param status leave the columns with measurement and observation statuses -#' (default status = FALSE - i.e. the status columns are deleted) -#' @param coords add coordinates of the station (logical value TRUE or FALSE) -#' @param col_names three types of column names possible: "short" - default, values with shorten names, -#' "full" - full English description, "polish" - original names in the dataset -#' @param ... other parameters that may be passed to the 'shortening' function that shortens column names +#' (default `FALSE` — status columns are deleted). Not used when `rank = "telemetry"`. +#' @param coords add coordinates of the station (logical value `TRUE` or `FALSE`). +#' Default `FALSE`. +#' @param col_names column name style: `"short"` (default), `"full"` (English descriptions), +#' or `"polish"` (original dataset names). Not used when `rank = "telemetry"`. #' @param station name of meteorological station(s). -#' It accepts names (characters in CAPITAL LETTERS). Stations' IDs (numeric) are no longer supported. -#' Please note that station names may change over time and thus sometimes 2 names -#' are required in some cases, e.g. `c("POZNAŃ", "POZNAŃ-ŁAWICA")`. +#' For ranks `"synop"`, `"climate"`, `"precip"`: station name(s) in CAPITAL LETTERS. +#' Please note that station names may change over time — sometimes two names are required, +#' e.g. `c("POZNAŃ", "POZNAŃ-ŁAWICA")`. +#' For `rank = "telemetry"`: station name(s) as listed by `stations_meteo_imgw_telemetry()`. +#' `NULL` (default) downloads all available stations. +#' @param parameters character vector of parameter codes to download. +#' Only used when `rank = "telemetry"`. `NULL` (default) downloads all available parameters. +#' Accepted values: `"wd"`, `"t2m"`, `"t0m"`, `"rr_24h"`, `"rr_1h"`, `"rr_10min"`, +#' `"ws"`, `"ws_max"`, `"gust"`, `"rh"`, `"water_in_snow"`. +#' @param ... other parameters passed to the column-shortening function. Not used when +#' `rank = "telemetry"`. #' @export -#' @return A data.frame with columns describing the meteorological parameters -#' (e.g. temperature, wind speed, precipitation) where each row represent a measurement, -#' depending on the interval, at a given hour, month or year. -#' If `coords = TRUE` additional two -#' columns with geographic coordinates are added. +#' @return A data.frame with meteorological parameters where each row is a measurement. +#' For ranks `"synop"`, `"climate"`, `"precip"`: measurements at a given hour, day, or month, +#' depending on `interval`. If `coords = TRUE` two additional coordinate columns are appended. +#' For `rank = "telemetry"`: a data.table with 10-minute interval observations (not +#' expert-validated). If `coords = TRUE` columns `name`, `lon`, `lat`, and `alt` are appended. #' @examples #' \donttest{ #' x = meteo_imgw("monthly", year = 2018, coords = TRUE) #' head(x) +#' +#' # Telemetry (10-minute) data from automated stations (available since 2008): +#' tel = meteo_imgw(rank = "telemetry", year = 2022, +#' parameters = "t2m", +#' station = "HALA GĄSIENICOWA") +#' head(tel) #' } -meteo_imgw = function(interval, +meteo_imgw = function(interval = NULL, rank = "synop", year, status = FALSE, coords = FALSE, station = NULL, - col_names = "short", ...) { + col_names = "short", + parameters = NULL, + ...) { + if (rank == "telemetry") { + return(meteo_imgw_datastore(year = year, + parameters = parameters, + stations = station, + coords = coords)) + } + if (is.null(interval)) { + stop("The `interval` argument is required for rank '", rank, + "'. Use 'hourly', 'daily', or 'monthly'.") + } if (interval == "daily") { - # daily - result = meteo_imgw_daily(rank = rank, - year = year, - status = status, - coords = coords, - station = station, + result = meteo_imgw_daily(rank = rank, + year = year, + status = status, + coords = coords, + station = station, col_names = col_names, ...) } else if (interval == "monthly") { - #monthly - result = meteo_imgw_monthly(rank = rank, - year = year, - status = status, - coords = coords, - station = station, + result = meteo_imgw_monthly(rank = rank, + year = year, + status = status, + coords = coords, + station = station, col_names = col_names, ...) } else if (interval == "hourly") { - #hourly - result = meteo_imgw_hourly(rank = rank, - year = year, - status = status, - coords = coords, - station = station, + result = meteo_imgw_hourly(rank = rank, + year = year, + status = status, + coords = coords, + station = station, col_names = col_names, ...) } else { stop("Wrong `interval` value. It should be either 'hourly', 'daily', or 'monthly'.") diff --git a/R/meteo_imgw_datastore.R b/R/meteo_imgw_datastore.R index b81973f5..ff119f21 100644 --- a/R/meteo_imgw_datastore.R +++ b/R/meteo_imgw_datastore.R @@ -23,7 +23,7 @@ #' \item "rh" - relative humidity (%) #' \item "water_in_snow" - water equivalent of melted snow cover (mm) #' } -#' @param stations - character vector with station names as visible in the `meteo_imgw_telemetry_stations()`. +#' @param stations - character vector with station names as visible in the `stations_meteo_imgw_telemetry()`. #' Default `NULL` means to download data for all available stations. #' @param coords - logical - whether to append the dataset with station full name, longitude, latitude and altitude. Default: TRUE #' @param allow_failure logical - whether to proceed or stop on failure. By default set to TRUE (i.e. don't stop on error). For debugging purposes change to FALSE diff --git a/R/meteo_ogimet.R b/R/meteo_ogimet.R index 62c30c4a..2360143a 100644 --- a/R/meteo_ogimet.R +++ b/R/meteo_ogimet.R @@ -1,86 +1,230 @@ -#' Scrapping meteorological (Synop) data from the Ogimet webpage +#' Download meteorological (Synop) data from the Ogimet service #' -#' Downloading hourly or daily (meteorological) data from the Synop stations available at https://www.ogimet.com/ +#' Unified entry point for downloading hourly or daily meteorological data +#' from [Ogimet](https://www.ogimet.com/). Two backends are supported: +#' +#' - **`"synop"`** (default for hourly): Downloads raw SYNOP messages from the +#' Ogimet `getsynop` endpoint and decodes them with [parser()]. Supports +#' station mode (one or more WMO IDs) and/or country mode (`country_name`). +#' A default output columns are described in the **synop output** section below, but +#' can be enhanced optionally with `simplified = FALSE` or `return_list = TRUE` +#' to include more of decoded SYNOP fields. +#' +#' - **`"html"`** (default for daily): Scrapes pre-formatted summary tables +#' from the Ogimet `gsynres` endpoint using [XML::readHTMLTable()]. +#' Output columns are described in the **html output** section below. +#' +#' @param interval `"hourly"` (default) or `"daily"` — time resolution to retrieve. +#' @param date Length-2 character or Date vector giving the start and end of +#' the requested period, e.g. `c("2018-05-01", "2018-07-01")`. Defaults to +#' the last 30 days. +#' @param station WMO ID(s) of the station(s) to download. Character or numeric +#' vector. Not required when `country_name` is provided (SYNOP path only). +#' @param country_name Optional character string. When provided, the SYNOP path +#' downloads all Ogimet stations for the named country in a single request +#' (e.g. `"Poland"`, `"Germany"`), and `station` is ignored. Valid only with +#' `source = "synop"` (or the default hourly path). +#' @param source Character. Backend to use: `"synop"` (raw SYNOP decoding) or +#' `"html"` (HTML scraping). When `NULL` (default) the backend is chosen +#' automatically: `"synop"` for `interval = "hourly"`, `"html"` for +#' `interval = "daily"`. +#' @param ... Optional named arguments: +#' \describe{ +#' \item{`allow_failure`}{Logical. When `TRUE` (default) network or parsing +#' errors are caught and a message is emitted; when `FALSE` errors +#' propagate.} +#' \item{`simplified`}{Logical. Applies to `source = "synop"` only. When +#' `TRUE` (default) a compact 20-column `data.frame` is returned (see +#' **synop output** below). When `FALSE` the full [parser()] output is +#' returned with 30+ columns.} +#' \item{`coords`}{Logical. Add geographical coordinates (`Lon`, `Lat`) to +#' the output. Applies to `source = "html"` only; a warning is emitted +#' for `source = "synop"`. Default `FALSE`.} +#' \item{`precip_split`}{Logical. Split the precipitation field into +#' separate `pr6`, `pr12`, and `pr24` columns. Valid only for +#' `interval = "hourly"` with `source = "html"`; a warning is emitted +#' otherwise. Default `TRUE`.} +#' \item{`return_list`}{Logical. Applies to `source = "synop"` only. When +#' `TRUE` a named list with elements `data` (compact 20-column +#' `data.frame`) and `full` (30+ column parser output) is returned +#' instead of a `data.frame`. A warning is emitted when used with +#' `source = "html"`. Default `FALSE`.} +#' } #' -#' @param interval 'daily' or 'hourly' dataset to retrieve - given as character -#' @param date start and finish date (e.g., date = c("2018-05-01", "2018-07-01")) - character or Date class object. If not provided last 30 days are used. -#' @param coords add geographical coordinates of the station (logical value TRUE or FALSE) -#' @param station WMO ID of meteorological station(s). Character or numeric vector -#' @param precip_split whether to split precipitation fields into 6/12/24h -#' @param allow_failure logical - whether to proceed or stop on failure. By default set to TRUE (i.e. don't stop on error). For debugging purposes change to FALSE -#' numeric fields (logical value TRUE (default) or FALSE); valid only for hourly time step #' @importFrom XML readHTMLTable -#' +#' #' @export -#' @return A data.frame of measured values with columns describing the meteorological parameters (e.g. air temperature, wind speed, cloudines). -#' Depending on the interval, at a given hour or day. Different parameters are returned for daily and hourly datasets. -#' \enumerate{ -#' \item station_ID - WMO station identifier -#' \item Lon - longitude -#' \item Lat - latitude -#' \item Date - date (and time) of observations -#' \item TC - air temperature at 2 metres above ground level. Values given in Celsius degrees -#' \item TdC - dew point temperature at 2 metres above ground level. Values given in Celsius degrees -#' \item TmaxC - maximum air temperature at 2 metres above ground level. Values given in Celsius degrees -#' \item TminC - minimum air temperature at 2 metres above ground level. Values given in Celsius degrees -#' \item ddd - wind direction -#' \item ffkmh - wind speed in km/h -#' \item Gustkmh - wind gust in km/h -#' \item P0hpa - air pressure at elevation of the station in hPa -#' \item PseahPa - sea level pressure in hPa -#' \item PTnd - pressure tendency in hPa -#' \item Nt - total cloud cover -#' \item Nh - cloud cover by high-level cloud fraction -#' \item HKm - height of cloud base -#' \item InsoD1 - insolation in hours -#' \item Viskm - visibility in kilometres -#' \item Snowcm - depth of snow cover in centimetres -#' \item pr6 - precicipitation totals in 6 hours -#' \item pr12 - precicipitation totals in 12 hours -#' \item pr24 - precicipitation totals in 24 hours -#' \item TemperatureCAvg - average air temperature at 2 metres above ground level. Values given in Celsius degrees -#' \item TemperatureCMax - maximum air temperature at 2 metres above ground level. Values given in Celsius degrees -#' \item TemperatureCMin - minimum air temperature at 2 metres above ground level. Values given in Celsius degrees -#' \item TdAvgC - average dew point temperature at 2 metres above ground level. Values given in Celsius degrees -#' \item HrAvg - average relative humidity. Values given in % -#' \item WindkmhDir - wind direction -#' \item WindkmhInt - wind speed in km/h -#' \item WindkmhGust - wind gust in km/h -#' \item PresslevHp - Sea level pressure in hPa -#' \item Precmm - precipitation totals in mm -#' \item TotClOct - total cloudiness in octants -#' \item lowClOct - cloudiness by low level clouds in octants -#' \item SunD1h - sunshine duration in hours -#' \item PreselevHp - atmospheric pressure measured at altitude of station in hPa -#' \item SnowDepcm - depth of snow cover in centimetres -#' } - - -#' @examples +#' +#' @return +#' **synop output** (`source = "synop"`, `simplified = TRUE` or `return_list = TRUE` `$data`): +#' A `data.frame` with one row per decoded SYNOP observation and 20 columns: +#' `date` (POSIXct UTC), `station`, `t2m`, `dpt2m`, `rel_hum`, `tmax`, +#' `tmin`, `wd`, `ws`, `gust`, `press`, `slp`, `press_tend`, `precip`, +#' `Nt`, `Nh`, `N_base`, `insol`, `visibility`, `snow`. +#' +#' **synop output** (`source = "synop"`, `simplified = FALSE`): +#' A `data.frame` with 30+ columns from [parser()], prefixed by `station_id` +#' and `Date`. +#' +#' **html output** (`source = "html"`, `interval = "hourly"`): +#' A `data.frame` with columns: `station_ID`, optionally `Lon`/`Lat`, +#' `Date`, `TC`, `TdC`, `TmaxC`, `TminC`, `ddd`, `ffkmh`, `Gustkmh`, +#' `P0hPa`, `PseahPa`, `PTnd`, `Nt`, `Nh`, `HKm`, `InsoD1`, `Viskm`, +#' `Snowcm`, and (when `precip_split = TRUE`) `pr6`, `pr12`, `pr24`. +#' +#' **html output** (`source = "html"`, `interval = "daily"`): +#' A `data.frame` with columns: `station_ID`, optionally `Lon`/`Lat`, +#' `Date`, `TemperatureCAvg`, `TemperatureCMax`, `TemperatureCMin`, +#' `TdAvgC`, `HrAvg`, `WindkmhDir`, `WindkmhInt`, `WindkmhGust`, +#' `PresslevHp`, `PreselevHp`, `Precmm`, `SunD1h`, `SnowDepcm`, +#' `TotClOct`, `lowClOct`, `VisKm`. +#' +#' Returns `NULL` invisibly on failure when `allow_failure = TRUE`. +#' +#' @examples #' \donttest{ -#' # downloading daily data for New York - La Guardia (last 30 days by default) -#' new_york = meteo_ogimet(interval = "daily", -#' station = 72503, -#' coords = TRUE) +#' # Hourly SYNOP data for Poznan-Lawica (default source = "synop") +#' poznan_h = meteo_ogimet(interval = "hourly", +#' station = 12330, +#' date = c("2009-12-01", "2009-12-04")) +#' +#' # Daily HTML summaries for New York - La Guardia (default source = "html") +#' new_york = meteo_ogimet(interval = "daily", +#' station = 72503, +#' coords = TRUE) +#' +#' # Hourly with full parser output as a list +#' poznan_list = meteo_ogimet(interval = "hourly", +#' station = 12330, +#' date = c("2009-12-01", "2009-12-04"), +#' return_list = TRUE) +#' head(poznan_list$data) # simplified +#' head(poznan_list$full) # all parser columns +#' +#' # Country mode: all Polish stations for one day +#' germany = meteo_ogimet(interval = "hourly", +#' country_name = "Germany", +#' date = c("2009-12-15", "2009-12-15")) +#' +#' # Force SYNOP backend for daily data +#' poznan_d = meteo_ogimet(interval = "daily", +#' station = 12330, +#' date = c("2009-12-01", "2009-12-04"), +#' source = "synop") +#' +#' # Force HTML backend for hourly data +#' poznan_h2 = meteo_ogimet(interval = "hourly", +#' station = 12330, +#' date = c("2019-06-01", "2019-06-08"), +#' source = "html", +#' coords = TRUE) #' } #' -meteo_ogimet = function(interval, date = c(Sys.Date() - 30, Sys.Date()), - coords = FALSE, - station, - precip_split = TRUE, - allow_failure = TRUE) { - if (interval == "daily") { - # daily - if (!precip_split) { - warning("The `precip_split` argument is only valid for hourly time step", call. = FALSE) - } - all_data = ogimet_daily(date = date, coords = coords, station = station, allow_failure = allow_failure) - } else if (interval == "hourly") { - #hourly - all_data = ogimet_hourly(date = date, coords = coords, station = station, - precip_split = precip_split, allow_failure = allow_failure) - } else{ +meteo_ogimet = function(interval = "hourly", + date = c(Sys.Date() - 30, Sys.Date()), + station = NULL, + country_name = NULL, + source = NULL, + ...) { + + dots = list(...) + allow_failure = if (!is.null(dots$allow_failure)) dots$allow_failure else TRUE + simplified = if (!is.null(dots$simplified)) dots$simplified else TRUE + coords = if (!is.null(dots$coords)) dots$coords else FALSE + precip_split = if (!is.null(dots$precip_split)) dots$precip_split else TRUE + return_list = if (!is.null(dots$return_list)) dots$return_list else FALSE + + if (!interval %in% c("hourly", "daily")) { stop("Wrong `interval` value. It should be either 'hourly' or 'daily'") } - return(all_data) + + if (!is.null(source)) { + source = match.arg(source, c("synop", "html")) + } + + effective_source = if (!is.null(source)) { + source + } else if (interval == "hourly") { + "synop" + } else { + "html" + } + + # Warn for HTML-only params used with SYNOP + if (effective_source == "synop") { + if (isTRUE(coords)) { + warning("`coords` is not supported for source = 'synop' and will be ignored.", call. = FALSE) + } + if (!isTRUE(precip_split)) { + warning("`precip_split` is not supported for source = 'synop' and will be ignored.", call. = FALSE) + } + } + + # Warn for SYNOP-only params used with HTML + if (effective_source == "html") { + if (isTRUE(return_list)) { + warning("`return_list` is only supported for source = 'synop' and will be ignored.", call. = FALSE) + } + if (!is.null(country_name)) { + warning("`country_name` is only supported for source = 'synop' and will be ignored.", call. = FALSE) + } + } + + # ── HTML backend ───────────────────────────────────────────────────────────── + if (effective_source == "html") { + if (interval == "daily") { + if (!precip_split) { + warning("The `precip_split` argument is only valid for hourly time step", call. = FALSE) + } + return(ogimet_daily(date = date, coords = coords, + station = station, allow_failure = allow_failure)) + } else { + return(ogimet_hourly(date = date, coords = coords, station = station, + precip_split = precip_split, allow_failure = allow_failure)) + } + } + + # ── SYNOP backend ───────────────────────────────────────────────────────────── + if (return_list) { + # Fetch full (unsimplified) output, then build both simplified and full views. + full_data = if (allow_failure) { + tryCatch( + meteo_ogimet_synop_bp(station = station, date = date, + country = NULL, country_name = country_name, + simplified = FALSE), + error = function(e) { + message(paste("Problems with downloading data.", + "Run function with argument allow_failure = FALSE", + "to see more details")) + invisible(NULL) + } + ) + } else { + meteo_ogimet_synop_bp(station = station, date = date, + country = NULL, country_name = country_name, + simplified = FALSE) + } + + if (is.null(full_data) || nrow(full_data) == 0) return(invisible(NULL)) + return(list(data = .synop_simplify(full_data), full = full_data)) + } + + # Standard SYNOP path — return a data.frame + if (allow_failure) { + tryCatch( + meteo_ogimet_synop_bp(station = station, date = date, + country = NULL, country_name = country_name, + simplified = simplified), + error = function(e) { + message(paste("Problems with downloading data.", + "Run function with argument allow_failure = FALSE", + "to see more details")) + invisible(NULL) + } + ) + } else { + meteo_ogimet_synop_bp(station = station, date = date, + country = NULL, country_name = country_name, + simplified = simplified) + } } diff --git a/R/meteo_ogimet_synop.R b/R/meteo_ogimet_synop.R new file mode 100644 index 00000000..db7f1cdf --- /dev/null +++ b/R/meteo_ogimet_synop.R @@ -0,0 +1,399 @@ +#' Download and decode raw SYNOP messages from the Ogimet getsynop service +#' +#' Downloads raw SYNOP messages from the Ogimet `getsynop` endpoint and decodes +#' them into a tidy `data.frame` using the [parser()] function. Two retrieval +#' modes are supported: +#' +#' - **Station mode** (`station` provided): fetches messages for one or more +#' WMO station IDs. +#' URL form: `http://www.ogimet.com/cgi-bin/getsynop?block=&begin=&end=` +#' +#' - **Country mode** (`country_name` provided): fetches messages for all +#' Ogimet stations in a country in a single request. +#' URL form: `http://www.ogimet.com/cgi-bin/getsynop?begin=&end=&state=` +#' +#' When both `station` and `country_name` are supplied, `country_name` takes +#' precedence and a warning is issued. +#' +#' Each line of the response is a comma-separated record: +#' `station_id,year,month,day,hour,minute,`. +#' The SYNOP message is decoded via [parser()] with `as_data_frame = TRUE`. +#' +#' @param station Numeric or character vector of WMO station IDs. Optional when +#' `country_name` is provided; required otherwise. +#' @param date Character or Date vector of length 2 giving the start and end of +#' the requested period, e.g. `c("2009-12-01", "2009-12-04")`. Defaults to +#' the last 30 days. +#' @param country Optional; passed to [parser()] for country-specific +#' precipitation indicator decoding (e.g. `"RU"`). Single string or `NULL` +#' (default). This is distinct from `country_name`. +#' @param country_name Optional character string naming the country whose +#' stations should be downloaded, as recognised by Ogimet (e.g. +#' `"Poland"`, `"Germany"`, `"France"`). When provided, the `state=` Ogimet +#' parameter is used and `station` is ignored. The full date range is +#' fetched in a single request. +#' @param simplified Logical. When `TRUE` (default) returns a compact `data.frame` with +#' 20 standardised columns (see **Value** below). When `FALSE` the +#' full parser output is returned. +#' @param allow_failure Logical. When `TRUE` (default) network errors are caught +#' and a message is emitted; when `FALSE` errors propagate to the caller. +#' +#' @return By default (`simplified = TRUE`), a compact `data.frame` with one +#' row per decoded SYNOP observation. Columns: +#' +#' * `date` — Observation date-time (`POSIXct`, UTC). +#' * `station` — WMO station identifier (character). +#' * `t2m` — Air temperature at 2 m (°C). +#' * `dpt2m` — Dew-point temperature at 2 m (°C). +#' * `rel_hum` — Relative humidity (%), derived via [compute_relative_humidity()]. +#' * `tmax` — Daily maximum temperature from Section 3 (°C). +#' * `tmin` — Daily minimum temperature from Section 3 (°C). +#' * `wd` — Wind direction (degrees). +#' * `ws` — Wind speed (m/s or knots, per `wind_unit`). +#' * `gust` — Highest gust speed from Section 3, same unit as `ws`. +#' * `press` — Station-level pressure (hPa). +#' * `slp` — Sea-level pressure (hPa). +#' * `press_tend` — 3-hour pressure change (hPa). +#' * `precip` — Precipitation amount (mm). +#' * `Nt` — Total cloud cover (oktas, 0–8) from the `Nddff` group. +#' * `Nh` — Cover of low clouds (genera Sc, St, Cu, Cb) in oktas (0–8), +#' from Section 1 group `8NhCLCMCH`; `NA` when not reported. +#' * `N_base` — Height of base of lowest observed cloud (m). +#' * `insol` — Daily sunshine duration (hours). +#' * `visibility` — Horizontal visibility (m). +#' * `snow` — Total snow depth (cm); 0 for trace amounts. +#' +#' When `simplified = FALSE`, a `data.frame` with the first two columns +#' `station_id` (WMO identifier, character) and `Date` (`POSIXct`, UTC), +#' followed by all columns produced by [parser()] with `as_data_frame = TRUE`: +#' `station_type`, `region`, `obs_day`, `obs_hour`, `wind_unit`, +#' `wind_estimated`, `visibility`, `cloud_cover`, `wind_direction`, +#' `wind_speed`, `air_temperature`, `dewpoint_temperature`, +#' `station_pressure`, `sea_level_pressure`, `pressure_tendency`, +#' `pressure_change`, `precipitation_amount`, `precipitation_time`, +#' `cloud_base_min`, `cloud_base_max`, `low_cloud_type`, `middle_cloud_type`, +#' `high_cloud_type`, `low_cloud_amount`, `maximum_temperature`, +#' `minimum_temperature`, `gust`, `sunshine_duration`, +#' `snow_depth`, `snow_depth_state`, `source`. +#' +#' Returns `NULL` invisibly when the download fails and `allow_failure = TRUE`. +#' +#' @export +#' +#' @examples +#' \donttest{ +#' # Station mode: Poznan-Lawica (Poland) +#' poznan = meteo_ogimet_synop(station = 12330, +#' date = c("2009-12-01", "2009-12-04")) +#' head(poznan) +#' +#' # Station mode: multiple stations +#' two_stations = meteo_ogimet_synop(station = c(12330, 12375), +#' date = c("2019-06-01", "2019-06-03")) +#' head(two_stations) +#' +#' # Country mode: all Polish stations for one day +#' poland = meteo_ogimet_synop(country_name = "Poland", +#' date = c("2009-12-15", "2009-12-15")) +#' head(poland) +#' +#' # Simplified view +#' poznan_simple = meteo_ogimet_synop(station = 12330, +#' date = c("2009-12-01", "2009-12-04"), +#' simplified = TRUE) +#' head(poznan_simple) +#' } +#' +meteo_ogimet_synop = function(station = NULL, + date = c(Sys.Date() - 30, Sys.Date()), + country = NULL, + country_name = NULL, + simplified = TRUE, + allow_failure = TRUE) { + .Deprecated("meteo_ogimet", + msg = paste0("'meteo_ogimet_synop()' is deprecated. ", + "Use 'meteo_ogimet(source = \"synop\")' instead.")) + if (allow_failure) { + tryCatch( + meteo_ogimet_synop_bp(station = station, date = date, + country = country, country_name = country_name, + simplified = simplified), + error = function(e) { + message(paste("Problems with downloading data.", + "Run function with argument allow_failure = FALSE", + "to see more details")) + invisible(NULL) + } + ) + } else { + meteo_ogimet_synop_bp(station = station, date = date, + country = country, country_name = country_name, + simplified = simplified) + } +} + +#' @keywords internal +#' @noRd +meteo_ogimet_synop_bp = function(station, date, country, country_name, simplified) { + + if (is.null(station) && is.null(country_name)) { + stop("Provide at least one of `station` or `country_name`.") + } + + if (!is.null(station) && !is.null(country_name)) { + warning("`station` is ignored when `country_name` is provided.", call. = FALSE) + station = NULL + } + + if (!curl::has_internet()) { + message("No internet connection!") + return(invisible(NULL)) + } + + all_results = list() + + begin_date = as.Date(min(date)) + end_date = as.Date(max(date)) + + if (!is.null(country_name)) { + # ── Country mode ───────────────────────────────────────────────────────── + # Single request (auto-split if response exceeds 200 000 rows). + message(paste("Downloading country:", country_name)) + url_tmpl = paste0( + "http://www.ogimet.com/cgi-bin/getsynop?begin=%s", + "&end=%s", + "&state=", utils::URLencode(country_name, reserved = TRUE) + ) + chunk = .ogimet_synop_fetch_decode(url_tmpl, begin_date, end_date, + label = country_name, country = country, + use_csv_station_id = TRUE) + if (!is.null(chunk)) all_results[[length(all_results) + 1L]] = chunk + + } else { + # ── Station mode ────────────────────────────────────────────────────────── + for (station_nr in station) { + message(paste("station:", station_nr)) + url_tmpl = paste0( + "http://www.ogimet.com/cgi-bin/getsynop?block=", station_nr, + "&begin=%s&end=%s" + ) + chunk = .ogimet_synop_fetch_decode(url_tmpl, begin_date, end_date, + label = station_nr, country = country, + use_csv_station_id = FALSE, + station_id_override = as.character(station_nr)) + if (!is.null(chunk)) all_results[[length(all_results) + 1L]] = chunk + + if (!identical(station_nr, station[length(station)])) Sys.sleep(20) + } + } + + if (length(all_results) == 0) return(invisible(NULL)) + + out = do.call(rbind, all_results) + rownames(out) = NULL + + out = out[which(!is.na(out$Date) & + as.Date(out$Date) >= as.Date(min(date)) & + as.Date(out$Date) <= as.Date(max(date))), ] + out = unique(out) + rownames(out) = NULL + + if (simplified) { + out = .synop_simplify(out) + } + + out +} + +# Internal helper: convert full SYNOP parser output to the compact 20-column data.frame. +# Used both by meteo_ogimet_synop_bp (simplified=TRUE) and by meteo_ogimet (return_list=TRUE). +#' @keywords internal +#' @noRd +.synop_simplify = function(out) { + data.frame( + date = out$Date, + station = out$station_id, + t2m = out$air_temperature, + dpt2m = out$dewpoint_temperature, + rel_hum = round(compute_relative_humidity(out$air_temperature, + out$dewpoint_temperature), 1), + tmax = out$maximum_temperature, + tmin = out$minimum_temperature, + wd = out$wind_direction, + ws = out$wind_speed, + gust = out$gust, + press = out$station_pressure, + slp = out$sea_level_pressure, + press_tend = out$pressure_change, + precip = out$precipitation_amount, + Nt = out$cloud_cover, + Nh = out$low_cloud_amount, + N_base = out$cloud_base_min, + insol = out$sunshine_duration, + visibility = out$visibility, + snow = out$snow_depth, + stringsAsFactors = FALSE + ) +} + +# Recursive raw-line fetcher. +# +# Builds the URL from `url_tmpl` (a sprintf template with two %s slots for +# begin and end timestamps), GETs it, and returns the non-empty trimmed lines. +# Ogimet caps responses at 200 000 rows server-side; receiving exactly that many +# lines signals truncation, so the date range is halved and each half is fetched +# recursively. When the range can no longer be bisected (begin_date == end_date) +# a warning is issued and the truncated chunk is returned as-is. +.ogimet_synop_raw_lines = function(url_tmpl, begin_date, end_date, label) { + begin_str = paste0(format(begin_date, "%Y%m%d"), "0000") + end_str = paste0(format(end_date, "%Y%m%d"), "2359") + url = sprintf(url_tmpl, begin_str, end_str) + message(url) + + resp = tryCatch( + httr::GET( + url, + httr::add_headers( + `User-Agent` = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:143.0) Gecko/20100101 Firefox/143.0", + `Accept` = "text/plain,text/html,*/*", + `Accept-Language` = "en-US,en;q=0.5", + `Referer` = "http://www.ogimet.com/getsynop.phtml" + ) + ), + error = function(e) NULL + ) + + if (is.null(resp) || httr::http_error(resp)) { + message(paste("Could not retrieve data for:", label)) + return(NULL) + } + + body = httr::content(resp, as = "text", encoding = "UTF-8") + if (is.na(body) || !nzchar(trimws(body))) { + message(paste("Empty response for:", label)) + return(NULL) + } + + lines = strsplit(body, "\n")[[1]] + lines = trimws(lines[nzchar(trimws(lines))]) + + # 200 000 rows is the Ogimet server cap: receiving that many means the + # response was truncated, so split the date range and retry each half. + if (length(lines) >= 200000L) { + if (begin_date >= end_date) { + warning( + paste0("Response for ", label, " hit the 200 000-row server limit but", + " the date range cannot be split further. Returning partial results."), + call. = FALSE + ) + return(lines) + } + mid_date = begin_date + as.integer((end_date - begin_date) / 2L) + message(paste0( + "Server limit reached (200 000 rows); splitting date range: [", + begin_date, ", ", mid_date, "] and [", mid_date + 1L, ", ", end_date, "]" + )) + lo = .ogimet_synop_raw_lines(url_tmpl, begin_date, mid_date, label) + hi = .ogimet_synop_raw_lines(url_tmpl, mid_date + 1L, end_date, label) + return(c(lo, hi)) + } + + lines +} + +# Internal: fetch, (recursively) split if server limit is hit, parse lines, decode SYNOP. +# `url_tmpl` - sprintf template; two %s slots for begin/end timestamps +# `begin_date`/`end_date` - Date objects defining the requested range +# `label` - used in user-facing messages (station ID or country name) +# `use_csv_station_id` - TRUE -> station_id taken from field 1 of each CSV line +# FALSE -> station_id_override is used for every row +.ogimet_synop_fetch_decode = function(url_tmpl, begin_date, end_date, label, country, + use_csv_station_id, station_id_override = NULL) { + lines = .ogimet_synop_raw_lines(url_tmpl, begin_date, end_date, label) + + if (is.null(lines) || length(lines) == 0) { + message(paste("No SYNOP data returned for:", label)) + return(NULL) + } + + message(sprintf("Downloaded %d SYNOP messages for: %s", length(lines), label)) + + # Each line: station_id,year,month,day,hour,minute, + # Older records may omit the minute field (6 fields instead of 7). + parsed_lines = lapply(lines, function(line) { + parts = strsplit(line, ",", fixed = TRUE)[[1]] + n = length(parts) + if (n < 6) return(NULL) + + sid = if (use_csv_station_id) trimws(parts[1]) else station_id_override + yr = as.integer(parts[2]) + mo = as.integer(parts[3]) + dy = as.integer(parts[4]) + hr = as.integer(parts[5]) + + if (n >= 7) { + mn = as.integer(parts[6]) + synop_msg = paste(parts[7:n], collapse = ",") + } else { + mn = 0L + synop_msg = parts[6] + } + + dt = tryCatch( + as.POSIXct( + sprintf("%04d-%02d-%02d %02d:%02d", yr, mo, dy, hr, mn), + format = "%Y-%m-%d %H:%M", + tz = "UTC" + ), + error = function(e) NA + ) + + list(station_id = sid, Date = dt, synop = trimws(synop_msg)) + }) + + parsed_lines = Filter(Negate(is.null), parsed_lines) + if (length(parsed_lines) == 0) { + message(paste("Could not parse any lines for:", label)) + return(NULL) + } + + synop_msgs = vapply(parsed_lines, `[[`, character(1), "synop") + dates = do.call(c, lapply(parsed_lines, `[[`, "Date")) + station_ids = vapply(parsed_lines, `[[`, character(1), "station_id") + + n_msgs = length(synop_msgs) + pb = utils::txtProgressBar(min = 0L, max = n_msgs, style = 3L, file = stderr()) + on.exit(close(pb), add = TRUE) + + country_vec = if (is.null(country)) rep(list(NULL), n_msgs) else as.list(rep(country, length.out = n_msgs)) + + decoded_rows = vector("list", n_msgs) + for (i in seq_len(n_msgs)) { + decoded_rows[[i]] = tryCatch( + suppressMessages(parser(synop_msgs[[i]], country = country_vec[[i]], as_data_frame = TRUE)), + error = function(e) NULL + ) + utils::setTxtProgressBar(pb, i) + } + close(pb) + on.exit(NULL) # clear the on.exit so close() isn't called twice + + valid = !vapply(decoded_rows, is.null, logical(1)) + if (!any(valid)) { + message(paste("SYNOP decoding failed for all messages for:", label)) + return(NULL) + } + + decoded = do.call(rbind, decoded_rows[valid]) + dates = dates[valid] + station_ids = station_ids[valid] + + decoded$station_id = station_ids + + data.frame( + station_id = station_ids, + Date = dates, + decoded[, setdiff(names(decoded), "station_id")], + stringsAsFactors = FALSE + ) +} diff --git a/R/ogimet_daily.R b/R/ogimet_daily.R index a53774fa..ca2fb72c 100644 --- a/R/ogimet_daily.R +++ b/R/ogimet_daily.R @@ -54,7 +54,7 @@ ogimet_daily_bp = function(date = date, "UTC each day. Use the >>hour<< argument to change it \n" ) ) - data_station <- + data_station = data.frame( "Date" = character(), "TemperatureCMax" = character(), @@ -247,7 +247,6 @@ ogimet_daily_bp = function(date = date, }# end of looping for stations if (nrow(data_station) > 0) { - data_station = data_station[!duplicated(data_station), ] # converting character to proper field representation: @@ -255,13 +254,11 @@ ogimet_daily_bp = function(date = date, # get rid off "---" standing for missing/blank fields: data_station[which(data_station == "--" | data_station == "---" | data_station == "----" | data_station == "-----", arr.ind = TRUE)] = NA + cnames = c("TemperatureCMax", "TemperatureCMin", "TemperatureCAvg","TdAvgC" ,"HrAvg", + "WindkmhInt","WindkmhGust" ,"PresslevHp", "Precmm" , + "TotClOct", "lowClOct" ,"VisKm","station_ID") # other columns to numeric: - suppressWarnings(data_station[,c("TemperatureCMax", "TemperatureCMin", "TemperatureCAvg","TdAvgC" ,"HrAvg", - "WindkmhInt","WindkmhGust" ,"PresslevHp", "Precmm" , - "TotClOct", "lowClOct" ,"VisKm","station_ID")] <- - as.data.frame(sapply(data_station[,c("TemperatureCMax", "TemperatureCMin", "TemperatureCAvg","TdAvgC" ,"HrAvg", - "WindkmhInt","WindkmhGust" ,"PresslevHp", "Precmm" , - "TotClOct", "lowClOct" ,"VisKm","station_ID")], as.numeric))) + data_station[, cnames] = as.data.frame(sapply(data_station[, cnames], as.numeric)) # changing order of columns and removing blank records: if (coords) { diff --git a/R/ogimet_hourly.R b/R/ogimet_hourly.R index 6e3db395..5c435b74 100644 --- a/R/ogimet_hourly.R +++ b/R/ogimet_hourly.R @@ -202,7 +202,7 @@ ogimet_hourly_bp = function(date = date, "HKm", "InsoD1", "Viskm", "Snowcm", "station_ID") columns = colnames(data_station)[(colnames(data_station) %in% columns)] suppressWarnings(data_station[, columns] <- - as.data.frame(sapply(data_station[,columns], as.numeric))) + as.data.frame(sapply(data_station[, columns], as.numeric))) # changing order of columns and removing blank records: if (coords) { diff --git a/R/parser.R b/R/parser.R new file mode 100644 index 00000000..7e41c395 --- /dev/null +++ b/R/parser.R @@ -0,0 +1,2285 @@ +#' Parse SYNOP messages into structured lists or a data frame +#' +#' This function decodes SYNOP FM-12 meteorological messages which are commonly +#' used for reporting weather observations. +#' It parses one or more SYNOP messages and +#' returns their structured representation as generated by the `SYNOP` R6 +#' decoder. +#' +#' Currently, the decoder contains most of the core logic for parsing the main +#' sections of SYNOP messages that are commonly used in atmospheric sciences. +#' However, it does not yet cover all possible SYNOP groups and fields, +#' and some fields may be missing or incomplete. +#' +#' @param message Character vector with SYNOP messages. +#' @param country Optional; A single character value passed to the precipitation +#' indicator decoder to adjust country-specific behaviour (e.g. `"RU"`). +#' @param simplify Logical. If `TRUE` (default) and a single message is +#' provided, the function returns the decoded list directly instead of a +#' length-one list. Ignored when `as_data_frame = TRUE`. +#' @param as_data_frame Logical. If `TRUE`, return a `data.frame` with one row +#' per message and commonly-used decoded fields as columns. Missing or +#' unparsed fields are filled with `NA`. Default is `FALSE`. +#' +#' @return When `as_data_frame = FALSE` (default): a list of decoded SYNOP +#' messages, or the decoded list directly when `simplify = TRUE` and a single +#' message is supplied. When `as_data_frame = TRUE`: a `data.frame` with one +#' row per message and the following columns (all numeric/character as +#' appropriate, `NA` when not present in the message): +#' `station_type`, `station_id`, `region`, `obs_day`, `obs_hour`, +#' `wind_unit`, `wind_estimated`, `visibility`, `cloud_cover`, +#' `wind_direction`, `wind_speed`, `air_temperature`, `dewpoint_temperature`, +#' `station_pressure`, `sea_level_pressure`, `pressure_tendency`, +#' `pressure_change`, `precipitation_amount`, `precipitation_time`, +#' `cloud_base_min`, `cloud_base_max`, `low_cloud_type`, +#' `middle_cloud_type`, `high_cloud_type`, `low_cloud_amount`, +#' `maximum_temperature` (Section 3 daily maximum, °C), +#' `minimum_temperature` (Section 3 daily minimum, °C), +#' `gust` (highest gust speed from Section 3 group 910ff/911ff, in the wind unit of the message), +#' `cloudiness_height` (cloud cover in oktas of the highest cloud layer reported in Section 3, +#' i.e. cirrus/cirrocumulus/cirrostratus; `NA` when absent), +#' `sunshine_duration` (daily sunshine in hours, from Section 3 group 55SSS), +#' `snow_depth` (total snow depth in cm; 0 for trace amounts, `NA` for non-continuous cover or +#' unmeasurable depth), `snow_depth_state` (descriptive state of ground with snow/ice per WMO +#' code table 0975, e.g. `"Even layer of loose dry snow covering ground completely"`), +#' `source` (the original SYNOP message string). +#' Row names are sequential integers. +#' @examples +#' synop_code = "AAXX 01004 88889 12782 61506 10094 20047 30111 40197 53007 60001 81541" +#' parser(synop_code) +#' parser(rep(synop_code, 2), simplify = FALSE) +#' parser(synop_code, as_data_frame = TRUE) +#' parser(rep(synop_code, 2), as_data_frame = TRUE) +#' @import R6 +#' @export + +parser = function(message, country = NULL, simplify = TRUE, as_data_frame = FALSE) { + if (missing(message) || length(message) == 0) { + stop("`message` must contain at least one SYNOP string.") + } + + if (!is.character(message)) { + stop("`message` must be a character vector.") + } + + if (!is.null(country) && !(is.character(country) && length(country) %in% c(1, length(message)))) { + stop("`country` must be NULL, a single string, or a character vector matching the length of `message`.") + } + + country_vec = if (is.null(country)) rep(list(NULL), length(message)) else as.list(rep(country, length.out = length(message))) + + results = mapply( + function(msg, cntry) { + msg = trimws(msg) + if (nzchar(msg)) { + synop = SYNOP$new() + synop$country = cntry + synop$decode(msg) + } else { + message("Empty SYNOP message supplied; returning NULL.") + NULL + } + }, + message, + country_vec, + SIMPLIFY = FALSE + ) + + if (as_data_frame) { + rows = Map(.synop_to_row, results, message) + df = do.call(rbind, rows) + rownames(df) = NULL + return(df) + } + + if (simplify && length(results) == 1) { + return(results[[1]]) + } + + results +} + +# Internal helper: extract a deeply-nested value safely, returning NA on failure. +.sg = function(lst, ...) { + keys = c(...) + for (k in keys) { + if (is.null(lst) || !is.list(lst) || is.null(lst[[k]])) return(NA) + lst = lst[[k]] + } + lst +} + +# Internal helper: flatten one decoded SYNOP list into a single-row data.frame. +# `source` is the original SYNOP message string, added as the last column. +.synop_to_row = function(x, source = NA_character_) { + data.frame( + station_type = .sg(x, "station_type", "value"), + station_id = .sg(x, "station_id", "value"), + region = .sg(x, "region", "value"), + obs_day = .sg(x, "obs_time", "day", "value"), + obs_hour = .sg(x, "obs_time", "hour", "value"), + wind_unit = .sg(x, "wind_indicator", "unit"), + wind_estimated = .sg(x, "wind_indicator", "estimated"), + visibility = .sg(x, "visibility", "value"), + cloud_cover = .sg(x, "cloud_cover", "value"), + wind_direction = .sg(x, "surface_wind", "direction", "value"), + wind_speed = .sg(x, "surface_wind", "speed", "value"), + air_temperature = .sg(x, "air_temperature", "value"), + dewpoint_temperature = .sg(x, "dewpoint_temperature", "value"), + station_pressure = .sg(x, "station_pressure", "value"), + sea_level_pressure = .sg(x, "sea_level_pressure", "value"), + pressure_tendency = .sg(x, "pressure_tendency", "tendency", "value"), + pressure_change = .sg(x, "pressure_tendency", "change", "value"), + precipitation_amount = .sg(x, "precipitation_s1", "amount", "value"), + precipitation_time = .sg(x, "precipitation_s1", "time_before_obs", "value"), + cloud_base_min = .sg(x, "lowest_cloud_base", "min"), + cloud_base_max = .sg(x, "lowest_cloud_base", "max"), + low_cloud_type = .sg(x, "cloud_types", "low_cloud_type", "value"), + middle_cloud_type = .sg(x, "cloud_types", "middle_cloud_type", "value"), + high_cloud_type = .sg(x, "cloud_types", "high_cloud_type", "value"), + low_cloud_amount = .sg(x, "cloud_types", "low_cloud_amount", "value"), + maximum_temperature = .sg(x, "maximum_temperature", "value"), + minimum_temperature = .sg(x, "minimum_temperature", "value"), + gust = { + gusts = x[["highest_gust"]] + if (!is.null(gusts) && length(gusts) > 0) .sg(gusts[[1]], "speed", "value") else NA_real_ + }, + cloudiness_height = { + layers = x[["cloud_layer"]] + high_genera = c("Ci", "Cc", "Cs") + if (!is.null(layers) && length(layers) > 0) { + high_idx = which(vapply(layers, function(l) { + isTRUE(l[["cloud_genus"]][["value"]] %in% high_genera) + }, logical(1))) + if (length(high_idx) > 0) { + as.numeric(layers[[high_idx[1L]]][["cloud_cover"]][["value"]]) + } else NA_real_ + } else NA_real_ + }, + sunshine_duration = .sg(x, "sunshine", "value"), + snow_depth = .sg(x, "snow_depth", "depth", "value"), + snow_depth_state = .sg(x, "snow_depth", "state_of_ground"), + source = source, + stringsAsFactors = FALSE + ) +} + +################################################################################ +# observations.R +# +# Observation classes from SYNOP - R version +# +# This is an R port of pymetdecoder/synop/observations.py +# Adapted from Python to R using R6 classes and functional approach +################################################################################ + +################################################################################ +# BASE CLASSES +################################################################################ + +# Base Observation class +Observation = R6Class("Observation", + public = list( + null_char = "/", + code_len = NULL, + code_table = NULL, + unit = NULL, + valid_range = NULL, + + initialize = function(null_char = "/") { + self$null_char = null_char + }, + + # Check if value is available (not all null chars) + is_available = function(value, char = NULL) { + if (is.null(char)) char = self$null_char + if (is.null(value)) return(FALSE) + value_str = as.character(value) + !all(strsplit(value_str, "")[[1]] == char) + }, + + # Check if value is valid + is_valid = function(value, raise_exception = TRUE, name = NULL, ...) { + tryCatch({ + valid = private$check_valid(value, ...) + if (!valid && raise_exception) { + stop(paste0(value, " is not a valid code for ", ifelse(is.null(name), class(self)[1], name))) + } + valid + }, error = function(e) { + if (raise_exception) { + stop(e) + } + FALSE + }, warning = function(w) { + if (raise_exception) { + stop(w) + } + FALSE + }) + }, + + # Decode raw value + decode = function(raw, ...) { + kwargs = list(...) + + # Check if available + if (!self$is_available(raw)) { + return(NULL) + } + + # Check if valid + if (!self$is_valid(raw, raise_exception = FALSE, ...)) { + return(NULL) + } + + # Decode + tryCatch({ + self$decode_internal(raw, ...) + }, error = function(e) { + message(paste("Unable to decode:", raw)) + NULL + }) + }, + + # Encode observation + encode = function(data, ...) { + kwargs = list(...) + allow_none = ifelse(is.null(kwargs$allow_none), FALSE, kwargs$allow_none) + + tryCatch({ + if (is.null(data) || (is.list(data) && is.null(data$value))) { + if (allow_none || !is.null(self$code_table)) { + self$encode_internal(data, ...) + } else { + paste(rep(self$null_char, self$code_len), collapse = "") + } + } else { + self$encode_internal(data, ...) + } + }, error = function(e) { + message(paste("Unable to encode:", toString(data))) + paste(rep(self$null_char, self$code_len), collapse = "") + }) + }, + + # Internal decode method (to be overridden) + decode_internal = function(raw, ...) { + if (!is.null(self$components) && length(self$components) > 0) { + # Handle components + result = list() + for (comp in self$components) { + comp_class = comp[[4]] + comp_obj = comp_class$new() + result[[comp[[1]]]] = comp_obj$decode( + substr(raw, comp[[2]] + 1, comp[[2]] + comp[[3]]) + ) + } + result + } else { + self$decode_value(raw, ...) + } + }, + + # Internal encode method (to be overridden) + encode_internal = function(data, ...) { + if (!is.null(self$components)) { + # Handle components + result = character(0) + for (comp in self$components) { + comp_class = comp[[4]] + comp_obj = comp_class$new() + result = c(result, comp_obj$encode( + if (comp[[1]] %in% names(data)) data[[comp[[1]]]] else NULL + )) + } + paste(result, collapse = "") + } else { + self$encode_value(data, ...) + } + }, + + # Decode value (uses code table if available) + decode_value = function(val, ...) { + kwargs = list(...) + + # Check if value is available + if (!self$is_available(val)) { + return(NULL) + } + + # Get unit + unit = if (is.null(kwargs$unit)) self$unit else kwargs$unit + + # Get value from code table + if (!is.null(self$code_table)) { + out_val = tryCatch({ + self$code_table$decode(val, ...) + }, error = function(e) { + message(paste("Error decoding with code table:", val, "-", e$message)) + NULL + }, warning = function(w) { + message(paste("Warning decoding with code table:", val, "-", w$message)) + NULL + }, message = function(m) { + message(paste("Warning decoding with code table:", val, "-", trimws(conditionMessage(m)))) + NULL + }) + + if (!is.null(out_val) && !is.list(out_val)) { + out_val = list(value = out_val) + } + if (!is.null(out_val) && !("_code" %in% names(out_val))) { + code_val = suppressWarnings(as.integer(val)) + if (!is.na(code_val)) { + out_val[["_code"]] = code_val + } + } + } else { + # No code table - just convert to integer + out_val = tryCatch({ + code_val = suppressWarnings(as.integer(val)) + if (is.na(code_val)) { + return(NULL) + } + code_val + }, warning = function(w) { + NULL + }, error = function(e) { + NULL + }) + + if (is.null(out_val)) { + return(NULL) + } + + out_val = list(value = out_val) + } + + if (is.null(out_val)) return(NULL) + + # Convert to int if not a list + if (!is.list(out_val)) { + out_val = list(value = as.integer(out_val)) + } + + # Perform post conversion + out_val = self$decode_convert(out_val, ...) + + # Add unit if specified + if (!is.null(unit)) { + out_val$unit = unit + } + + out_val + }, + + # Encode value + encode_value = function(data, ...) { + # Get value from code table or data + if (!is.null(self$code_table)) { + out_val = self$code_table$encode(data) + } else { + out_val = if ("value" %in% names(data)) data$value else data + } + + # Convert value + out_val = self$encode_convert(out_val, ...) + + # Format code + if (is.null(self$code_len)) { + return(as.character(out_val)) + } + sprintf(paste0("%0", self$code_len, "d"), as.integer(out_val)) + }, + + # Conversion methods (to be overridden) + decode_convert = function(val, ...) { + val + }, + + encode_convert = function(val, ...) { + val + } + ), + + private = list( + check_valid = function(value, ...) { + tryCatch({ + # Check if value is available + if (!self$is_available(value)) { + return(TRUE) + } + + # Check valid range + if (!is.null(self$valid_range)) { + val_num = suppressWarnings(as.numeric(value)) + if (is.na(val_num)) { + return(FALSE) + } + if (val_num >= self$valid_range[1] && val_num <= self$valid_range[2]) { + return(TRUE) + } + return(FALSE) + } + + # If we reach here, assume valid + TRUE + }, error = function(e) { + FALSE + }, warning = function(w) { + FALSE + }) + } + ) +) + +################################################################################ +# SHARED CLASSES +################################################################################ + +CloudCover = R6Class("CloudCover", + inherit = Observation, + public = list( + initialize = function() { + super$initialize() + self$code_len = 1 + self$code_table = CodeTable2700$new() + self$unit = "okta" + } + ) +) + +CloudGenus = R6Class("CloudGenus", + inherit = Observation, + public = list( + initialize = function() { + super$initialize() + self$code_len = 1 + self$code_table = CodeTable0500$new() + } + ) +) + +Day = R6Class("Day", + inherit = Observation, + public = list( + initialize = function() { + super$initialize() + self$code_len = 2 + self$valid_range = c(1, 31) + } + ) +) + +DirectionCardinal = R6Class("DirectionCardinal", + inherit = Observation, + public = list( + initialize = function() { + super$initialize() + self$code_len = 1 + self$code_table = CodeTable0700$new() + } + ) +) + +DirectionDegrees = R6Class("DirectionDegrees", + inherit = Observation, + public = list( + initialize = function() { + super$initialize() + self$code_len = 2 + self$code_table = CodeTable0877$new() + self$unit = "deg" + } + ) +) + +Hour = R6Class("Hour", + inherit = Observation, + public = list( + initialize = function() { + super$initialize() + self$code_len = 2 + self$valid_range = c(0, 24) + } + ) +) + +Minute = R6Class("Minute", + inherit = Observation, + public = list( + initialize = function() { + super$initialize() + self$code_len = 2 + self$valid_range = c(0, 59) + } + ) +) + +SignedTemperature = R6Class("SignedTemperature", + inherit = Observation, + public = list( + initialize = function() { + super$initialize() + self$code_len = 4 + self$unit = "Celsius" + }, + + decode_internal = function(raw, ...) { + kwargs = list(...) + sign = kwargs$sign + + if (is.null(sign) || sign == "/") { + return(NULL) + } + + if (!sign %in% c("0", "1")) { + stop(paste(sign, "is not a valid temperature sign")) + } + + self$decode_value(raw, sign = sign) + }, + + decode_convert = function(val, ...) { + kwargs = list(...) + sign = kwargs$sign + if (is.null(sign)) return(val) + + factor = ifelse(sign == "0", 10, -10) + val$value = val$value / factor + val + }, + + encode_convert = function(val, ...) { + sign_char = ifelse(val >= 0, "0", "1") + abs_val = abs(val * 10) + paste0(sign_char, sprintf("%03d", as.integer(abs_val))) + } + ) +) + +Visibility = R6Class("Visibility", + inherit = Observation, + public = list( + initialize = function() { + super$initialize() + self$code_len = 2 + self$code_table = CodeTable4377$new() + self$unit = "m" + }, + + encode_internal = function(data, ...) { + kwargs = list(...) + use90 = ifelse(is.null(kwargs$use90), + ifelse("use90" %in% names(data), data$use90, FALSE), + kwargs$use90) + self$encode_value(data, use90 = use90) + } + ) +) + +################################################################################ +# CODE TABLE CLASSES (simplified versions) +################################################################################ + +# Base CodeTable class +CodeTable = R6Class("CodeTable", + public = list( + table_name = NULL, + + decode = function(value, ...) { + tryCatch({ + result = self$decode_internal(value, ...) + if (!is.null(result)) { + result$`_table` = self$table_name + } + result + }, error = function(e) { + message(paste("Unable to decode", value, "in", class(self)[1])) + NULL + }) + }, + + encode = function(value, ...) { + if (is.null(value)) return(NULL) + if (is.list(value) && "_code" %in% names(value)) { + return(value$`_code`) + } + self$encode_internal(value, ...) + }, + + decode_internal = function(value, ...) { + stop("decode_internal must be implemented in subclass") + }, + + encode_internal = function(value, ...) { + stop("encode_internal must be implemented in subclass") + } + ) +) + +# CodeTable2700 - Total cloud cover +CodeTable2700 = R6Class("CodeTable2700", + inherit = CodeTable, + public = list( + initialize = function() { + self$table_name = "2700" + }, + + decode_internal = function(N, ...) { + n = as.integer(N) + if (n == 9) { + list(value = NULL, obscured = TRUE, unit = "okta") + } else { + list(value = n, obscured = FALSE, unit = "okta") + } + }, + + encode_internal = function(data, ...) { + if (is.null(data$value)) { + if (data$obscured) return("9") + stop("Cannot encode cloud cover: value is NULL and obscured is FALSE") + } + as.character(data$value) + } + ) +) + +# CodeTable0500 - Genus of cloud +CodeTable0500 = R6Class("CodeTable0500", + inherit = CodeTable, + public = list( + values = c("Ci", "Cc", "Cs", "Ac", "As", "Ns", "Sc", "St", "Cu", "Cb"), + + initialize = function() { + self$table_name = "0500" + }, + + decode_internal = function(i, ...) { + idx = as.integer(i) + 1 + if (idx >= 1 && idx <= length(self$values)) { + list(value = self$values[idx]) + } else { + stop(paste("Invalid cloud genus code:", i)) + } + }, + + encode_internal = function(data, ...) { + val = if (is.list(data)) data$value else data + idx = which(self$values == val) + if (length(idx) == 0) { + stop(paste("Invalid cloud genus:", val)) + } + as.character(idx - 1) + } + ) +) + +# CodeTable0700 - Direction or bearing in one figure +CodeTable0700 = R6Class("CodeTable0700", + inherit = CodeTable, + public = list( + # NA placeholders preserve indexing: 0 = calm, 9 = allDirections (no compass value) + directions = c(NA_character_, "NE", "E", "SE", "S", "SW", "W", "NW", "N", NA_character_), + + initialize = function() { + self$table_name = "0700" + }, + + decode_internal = function(D, ...) { + if (D == "/") { + return(list(value = NULL, isCalmOrStationary = NULL, allDirections = NULL)) + } + + d = as.integer(D) + isCalmOrStationary = (d == 0) + allDirections = (d == 9) + + direction = if (d >= 0 && d < length(self$directions)) { + v = self$directions[d + 1] + if (is.na(v)) NULL else v + } else { + NULL + } + + list( + value = direction, + isCalmOrStationary = isCalmOrStationary, + allDirections = allDirections + ) + }, + + encode_internal = function(data, ...) { + if ("isCalmOrStationary" %in% names(data) && data$isCalmOrStationary) { + return("0") + } + if ("allDirections" %in% names(data) && data$allDirections) { + return("9") + } + if ("value" %in% names(data) && !is.null(data$value)) { + idx = which(self$directions == data$value) - 1 + if (length(idx) > 0) { + return(as.character(idx)) + } + } + stop("Cannot encode direction") + } + ) +) + +# CodeTable0877 - True direction in tens of degrees +CodeTable0877 = R6Class("CodeTable0877", + inherit = CodeTable, + public = list( + initialize = function() { + self$table_name = "0877" + }, + + decode_internal = function(dd, ...) { + dd_int = as.integer(dd) + calm = (dd_int == 0) + varAllUnknown = (dd_int == 99) + + if (calm) { + direction = NULL + } else if (varAllUnknown) { + direction = NULL + } else if (dd_int >= 1 && dd_int <= 36) { + direction = dd_int * 10 + } else { + stop(paste("Invalid direction code:", dd)) + } + + list( + value = direction, + varAllUnknown = varAllUnknown, + calm = calm + ) + }, + + encode_internal = function(data, ...) { + val = if (is.list(data)) data$value else data + if (is.null(val)) { + if ("calm" %in% names(data) && data$calm) return("00") + if ("varAllUnknown" %in% names(data) && data$varAllUnknown) return("99") + return("//") + } + code = round(val / 10) + if (code < 1) code = 0 + if (code > 36) code = 36 + sprintf("%02d", code) + } + ) +) + +# CodeTable4377 - Horizontal visibility at surface +CodeTable4377 = R6Class("CodeTable4377", + inherit = CodeTable, + public = list( + range90 = list( + c(0, 50), c(50, 200), c(200, 500), c(500, 1000), c(1000, 2000), + c(2000, 4000), c(4000, 10000), c(10000, 20000), c(20000, 50000), + c(50000, Inf) + ), + + initialize = function() { + self$table_name = "4377" + }, + + decode_internal = function(VV, ...) { + vv = as.integer(VV) + + if (vv >= 51 && vv <= 55) { + stop(paste("Invalid visibility code:", VV)) + } + + visibility = NULL + quantifier = NULL + + if (vv == 0) { + visibility = 100 + quantifier = "isLess" + } else if (vv <= 50) { + visibility = vv * 100 + } else if (vv <= 80) { + visibility = (vv - 50) * 1000 + } else if (vv <= 88) { + visibility = (vv - 74) * 5000 + } else if (vv == 89) { + visibility = 70000 + quantifier = "isGreater" + } else if (vv == 90) { + visibility = 50 + quantifier = "isLess" + } else if (vv == 91) { + visibility = 50 + } else if (vv == 92) { + visibility = 200 + } else if (vv == 93) { + visibility = 500 + } else if (vv == 94) { + visibility = 1000 + } else if (vv == 95) { + visibility = 2000 + } else if (vv == 96) { + visibility = 4000 + } else if (vv == 97) { + visibility = 10000 + } else if (vv == 98) { + visibility = 20000 + } else if (vv == 99) { + visibility = 50000 + quantifier = "isGreaterOrEqual" + } else { + stop(paste("Invalid visibility code:", VV)) + } + + use90 = (vv >= 90) + list( + value = visibility, + quantifier = quantifier, + use90 = use90 + ) + }, + + encode_internal = function(data, use90 = FALSE, ...) { + value = if (is.list(data)) data$value else data + quantifier = if (is.list(data) && "quantifier" %in% names(data)) data$quantifier else NULL + + if (use90) { + for (idx in seq_along(self$range90)) { + r = self$range90[[idx]] + if (value >= r[1] && value < r[2]) { + return(sprintf("%02d", idx + 89)) + } + } + } else { + if (value < 100) { + code = 0 + } else if (value <= 5000) { + code = floor(value / 100) + } else if (value <= 30000) { + code = floor(value / 1000) + 50 + } else if (value <= 70000 && is.null(quantifier)) { + code = floor(value / 5000) + 74 + } else { + code = 89 + } + return(sprintf("%02d", code)) + } + + stop(paste("Cannot encode visibility:", value)) + } + ) +) + +################################################################################ +# MAIN OBSERVATION CLASSES +################################################################################ + +# Temperature observation +Temperature = R6Class("Temperature", + inherit = Observation, + public = list( + initialize = function() { + super$initialize() + self$code_len = 4 + }, + + decode_internal = function(group, ...) { + sn = substr(group, 2, 2) + TTT = substr(group, 3, 5) + + # Fix trailing "/" (issue #10) + if (TTT != "///") { + TTT = sub("/$", "0", TTT) + } + + if (!sn %in% c("0", "1", "/")) { + message(paste(group, "is an invalid temperature group")) + return(NULL) + } + + temp_obs = SignedTemperature$new() + temp_obs$decode(TTT, sign = sn) + }, + + encode_internal = function(data, ...) { + temp_obs = SignedTemperature$new() + temp_obs$encode(data) + } + ) +) + +# Pressure observation +Pressure = R6Class("Pressure", + inherit = Observation, + public = list( + initialize = function() { + super$initialize() + self$code_len = 4 + self$unit = "hPa" + }, + + decode_convert = function(val, ...) { + val_int = as.integer(val$value) + val$value = (val_int / 10) + ifelse(val_int > 5000, 0, 1000) + val + }, + + encode_convert = function(val, ...) { + abs(val * 10) - ifelse(val >= 1000, 10000, 0) + } + ) +) + +# Surface wind observation +SurfaceWind = R6Class("SurfaceWind", + inherit = Observation, + public = list( + initialize = function() { + super$initialize() + self$code_len = 4 + }, + + decode_internal = function(ddff, ...) { + dd = substr(ddff, 1, 2) + ff = substr(ddff, 3, 4) + + dir_obs = DirectionDegrees$new() + direction = dir_obs$decode(dd) + + speed_obs = WindSpeed$new() + speed = speed_obs$decode(ff) + + # Sanity check: if wind is calm, it can't have a speed + if (!is.null(direction) && !is.null(direction$calm) && direction$calm && + !is.null(speed) && !is.null(speed$value) && speed$value > 0) { + message(paste("Wind is calm, yet has a speed (dd:", dd, ", ff:", ff, ")")) + speed = NULL + } + + list(direction = direction, speed = speed) + }, + + encode_internal = function(data, ...) { + dir_obs = DirectionDegrees$new() + speed_obs = WindSpeed$new() + + dd = dir_obs$encode(if ("direction" %in% names(data)) data$direction else NULL, allow_none = TRUE) + ff = speed_obs$encode(if ("speed" %in% names(data)) data$speed else NULL) + + paste0(dd, ff) + } + ) +) + +# Wind speed (simplified) +WindSpeed = R6Class("WindSpeed", + inherit = Observation, + public = list( + initialize = function() { + super$initialize() + self$code_len = 2 + }, + + decode_internal = function(ff, ...) { + # Decode wind speed - ff is just a numeric value + # Use the base decode_value method which handles numeric conversion + self$decode_value(ff, ...) + }, + + encode_internal = function(data, ...) { + if (is.null(data)) { + return(paste(rep(self$null_char, self$code_len), collapse = "")) + } + value = if (is.list(data)) data$value else data + if (!is.null(value) && value > 99) { + return(paste0("99 00", sprintf("%02d", value))) + } + sprintf("%02d", as.integer(value)) + } + ) +) + +################################################################################ +# SYNOP REPORT CLASS +################################################################################ + +# Base Report class +Report = R6Class("Report", + public = list( + not_implemented = list(), + + decode = function(message) { + tryCatch({ + self$decode_internal(message) + }, error = function(e) { + stop(paste("Decode error:", e$message)) + }) + }, + + decode_internal = function(message) { + stop("decode_internal must be implemented in subclass") + } + ) +) + +# SYNOP class - main class for decoding SYNOP messages +SYNOP = R6Class("SYNOP", + inherit = Report, + public = list( + country = NULL, + + initialize = function() { + self$not_implemented = list() + self$country = NULL + }, + + decode_internal = function(message) { + # Initialize data + data = list() + + # Split message into groups + groups = strsplit(message, " ")[[1]] + group_idx = 1 + + # Helper function to get next group + get_next_group = function() { + if (group_idx <= length(groups)) { + group = groups[group_idx] + group_idx <<- group_idx + 1 + return(group) + } + return(NULL) + } + + # Alias for convenience + next_group = get_next_group + + # SECTION 0: Station type, time, and identification + station_type = next_group() + if (is.null(station_type)) { + stop("Invalid SYNOP: missing station type") + } + + # For simplicity, assume AAXX format + data$station_type = list(value = station_type) + + # Get observation time and wind indicator (YYGGi) + yygii = next_group() + if (is.null(yygii) || nchar(yygii) < 5) { + stop("Invalid SYNOP: missing YYGGi group") + } + + # Decode observation time + obs_time = ObservationTime$new() + data$obs_time = obs_time$decode(substr(yygii, 1, 4)) + + # Decode wind indicator + wind_ind = WindIndicator$new() + data$wind_indicator = wind_ind$decode(substr(yygii, 5, 5)) + + # Get station ID + station_id_group = next_group() + if (is.null(station_id_group)) { + stop("Invalid SYNOP: missing station ID") + } + + data$station_id = list(value = station_id_group) + + # Decode region + tryCatch({ + region = Region$new() + result = region$decode(station_id_group) + if (!is.null(result)) { + data$region = result + } + }, error = function(e) { + message(paste("Error decoding region:", e$message)) + }) + + # Check if next group is NIL (station did not send data) + next_check = next_group() + if (!is.null(next_check) && (next_check == "NIL" || grepl("^NIL", next_check))) { + # Station did not send data - set remaining fields to NA + data$precipitation_indicator = NA + data$weather_indicator = NA + data$lowest_cloud_base = NA + data$visibility = NA + data$cloud_cover = NA + data$surface_wind = NA + data$air_temperature = NA + data$dewpoint_temperature = NA + data$relative_humidity = NA + data$station_pressure = NA + data$sea_level_pressure = NA + data$pressure_tendency = NA + data$precipitation_s1 = NA + data$present_weather = NA + data$past_weather = NA + data$cloud_types = NA + return(data) + } + + # SECTION 1: Main observations + section1 = next_check # Use the group we already got + if (is.null(section1) || nchar(section1) < 5) { + # If section1 is invalid, try to continue anyway + message("Invalid or missing section 1") + return(data) + } + + # Decode precipitation indicator, weather indicator, cloud base, visibility + tryCatch({ + precip_ind = PrecipitationIndicator$new() + result = precip_ind$decode(substr(section1, 1, 1), country = self$country) + if (!is.null(result)) { + data$precipitation_indicator = result + } + }, error = function(e) { + message(paste("Error decoding precipitation indicator:", e$message)) + }) + + tryCatch({ + weather_ind = WeatherIndicator$new() + result = weather_ind$decode(substr(section1, 2, 2)) + if (!is.null(result)) { + data$weather_indicator = result + } + }, error = function(e) { + message(paste("Error decoding weather indicator:", e$message)) + }) + + tryCatch({ + lowest_cloud = LowestCloudBase$new() + result = lowest_cloud$decode(substr(section1, 3, 3)) + if (!is.null(result)) { + data$lowest_cloud_base = result + } + }, error = function(e) { + message(paste("Error decoding lowest cloud base:", e$message)) + }) + + tryCatch({ + vis = Visibility$new() + result = vis$decode(substr(section1, 4, 5)) + if (!is.null(result)) { + data$visibility = result + } + }, error = function(e) { + message(paste("Error decoding visibility:", e$message)) + }) + + # Get cloud cover and wind (Nddff) + nddff = next_group() + if (!is.null(nddff) && nchar(nddff) >= 5) { + tryCatch({ + cloud = CloudCover$new() + result = cloud$decode(substr(nddff, 1, 1)) + if (!is.null(result)) { + data$cloud_cover = result + } + }, error = function(e) { + message(paste("Error decoding cloud cover from:", nddff, "-", e$message)) + }, message = function(m) { + message(paste("Warning decoding group:", nddff, "-", trimws(conditionMessage(m)))) + }) + + tryCatch({ + wind = SurfaceWind$new() + wind_data = wind$decode(substr(nddff, 2, 5)) + if (!is.null(wind_data)) { + if (!is.null(data$wind_indicator)) { + if (!is.null(wind_data$speed)) { + wind_data$speed$unit = data$wind_indicator$unit + } + } + data$surface_wind = wind_data + } + }, error = function(e) { + message(paste("Error decoding surface wind from:", nddff, "-", e$message)) + }, message = function(m) { + message(paste("Warning decoding group:", nddff, "-", trimws(conditionMessage(m)))) + }) + } + + # Parse section 1 groups (1sTTT, 2sTTT, 3P0P0P0, 4PPPP, etc.) + next_grp = next_group() + while (!is.null(next_grp)) { + if (grepl("^333|^444|^555", next_grp)) { + # Start of next section + break + } + + # Try to get header, handle errors gracefully + header = tryCatch({ + as.integer(substr(next_grp, 1, 1)) + }, error = function(e) { + message(paste("Unable to parse header from group:", next_grp)) + next_grp <<- next_group() + return(NULL) + }, warning = function(w) { + message(paste("Warning parsing header from group:", next_grp)) + next_grp <<- next_group() + return(NULL) + }) + + if (is.null(header) || is.na(header)) { + next_grp = next_group() + # Skip to next iteration + if (is.null(next_grp)) break + next + } + + tryCatch({ + if (header == 1) { + # Air temperature + temp = Temperature$new() + result = temp$decode(next_grp) + if (!is.null(result)) { + data$air_temperature = result + } + } else if (header == 2) { + # Dewpoint temperature or relative humidity + sn = substr(next_grp, 2, 2) + if (sn == "9") { + rel_hum = RelativeHumidity$new() + result = rel_hum$decode(substr(next_grp, 3, 5)) + if (!is.null(result)) { + data$relative_humidity = result + } + } else { + temp = Temperature$new() + result = temp$decode(next_grp) + if (!is.null(result)) { + data$dewpoint_temperature = result + } + } + } else if (header == 3) { + # Station pressure + press = Pressure$new() + result = press$decode(substr(next_grp, 2, 5)) + if (!is.null(result)) { + data$station_pressure = result + } + } else if (header == 4) { + # Sea level pressure + press = Pressure$new() + result = press$decode(substr(next_grp, 2, 5)) + if (!is.null(result)) { + data$sea_level_pressure = result + } + } else if (header == 5) { + # Pressure tendency + press_tend = PressureTendency$new() + result = press_tend$decode(next_grp) + if (!is.null(result)) { + data$pressure_tendency = result + } + } else if (header == 6) { + # Precipitation + if (!is.null(data$precipitation_indicator) && + data$precipitation_indicator$in_group_1) { + precip = Precipitation$new() + result = precip$decode(next_grp) + if (!is.null(result)) { + data$precipitation_s1 = result + } + } + } else if (header == 7) { + # Present and past weather + if (nchar(next_grp) >= 5) { + ww = Weather$new() + result = ww$decode(substr(next_grp, 2, 3), + time_before = list(value = 6, unit = "h"), + type = "present", + weather_indicator = if (!is.null(data$weather_indicator)) data$weather_indicator$value else NULL) + if (!is.null(result)) { + data$present_weather = result + } + result2 = ww$decode(substr(next_grp, 4, 4), type = "past", + weather_indicator = if (!is.null(data$weather_indicator)) data$weather_indicator$value else NULL) + result3 = ww$decode(substr(next_grp, 5, 5), type = "past", + weather_indicator = if (!is.null(data$weather_indicator)) data$weather_indicator$value else NULL) + if (!is.null(result2) || !is.null(result3)) { + data$past_weather = list(result2, result3) + } + } + } else if (header == 8) { + # Cloud types + cloud_types = CloudType$new() + result = cloud_types$decode(next_grp) + if (!is.null(result)) { + data$cloud_types = result + } + } + }, error = function(e) { + message(paste("Error decoding group:", next_grp, "-", e$message)) + # Continue to next group + }, warning = function(w) { + message(paste("Warning decoding group:", next_grp, "-", w$message)) + # Continue to next group + }, message = function(m) { + message(paste("Warning decoding group:", next_grp, "-", trimws(conditionMessage(m)))) + # Continue to next group + }) + + next_grp = next_group() + } + + # SECTION 3: Additional observations + if (!is.null(next_grp) && next_grp == "333") { + next_grp = next_group() + cloud_layers = list() + highest_gusts = list() + group_9 = list() # Collect group 9 codes + + while (!is.null(next_grp) && !grepl("^444|^555", next_grp)) { + # Try to get header, handle errors gracefully + header = tryCatch({ + as.integer(substr(next_grp, 1, 1)) + }, error = function(e) { + message(paste("Unable to parse header from group:", next_grp)) + return(NULL) + }, warning = function(w) { + message(paste("Warning parsing header from group:", next_grp)) + return(NULL) + }) + + if (is.null(header) || is.na(header)) { + next_grp = next_group() + # Skip to next iteration + if (is.null(next_grp)) break + next + } + + tryCatch({ + # Check if it's a group 9 code (9xxxx) + if (header == 9) { + group_9[[length(group_9) + 1]] = next_grp + } else if (header == 8) { + # Cloud layers + cloud_layer = CloudLayer$new() + result = cloud_layer$decode(next_grp) + if (!is.null(result)) { + cloud_layers[[length(cloud_layers) + 1]] = result + } + } else if (header == 1) { + # Maximum temperature + temp = Temperature$new() + result = temp$decode(next_grp) + if (!is.null(result)) { + data$maximum_temperature = result + } + } else if (header == 2) { + # Minimum temperature + temp = Temperature$new() + result = temp$decode(next_grp) + if (!is.null(result)) { + data$minimum_temperature = result + } + } else if (header == 4) { + # Snow depth: 4E'sss (WMO No. 306, Section 3) + # E' = state of ground with snow/ice (code table 0975) + # sss = total snow depth in whole cm, or special values: + # 000 / 997 -> trace (< 0.5 cm) + # 998 -> snow cover not continuous + # 999 -> depth cannot be measured (drifts) + .snow_ground_states = c( + "0" = "Ground predominantly covered by ice", + "1" = "Compact or wet snow covering less than one-half of the ground", + "2" = "Compact or wet snow covering at least one-half of the ground but not completely", + "3" = "Even layer of compact or wet snow covering ground completely", + "4" = "Uneven layer of compact or wet snow covering ground completely", + "5" = "Loose dry snow covering less than one-half of the ground", + "6" = "Loose dry snow covering at least one-half of the ground but not completely", + "7" = "Even layer of loose dry snow covering ground completely", + "8" = "Uneven layer of loose dry snow covering ground completely", + "9" = "Snow covering ground completely; deep drifts within or nearby" + ) + if (nchar(next_grp) >= 5) { + e_prime = substr(next_grp, 2, 2) + sss_raw = substr(next_grp, 3, 5) + if (sss_raw != "///") { + sss_int = suppressWarnings(as.integer(sss_raw)) + if (!is.na(sss_int)) { + depth_val = if (sss_int %in% c(0L, 997L)) { + 0 + } else if (sss_int %in% c(998L, 999L)) { + NA_real_ + } else { + as.numeric(sss_int) + } + special_val = switch(as.character(sss_int), + "0" = , "997" = "trace", + "998" = "not_continuous", + "999" = "unmeasurable", + NULL + ) + state_desc = unname(.snow_ground_states[e_prime]) + data$snow_depth = list( + state_of_ground = if (!is.na(state_desc)) state_desc else NA_character_, + depth = list( + value = depth_val, + unit = "cm", + special = special_val + ) + ) + } + } + } + } else if (header == 5) { + # Section 3 group 5: only 55SSS (daily sunshine in 1/10 h) is implemented. + # Pressure-change subgroups (j1 in 1..4) and radiation (j1 in 6..9) are skipped. + if (substr(next_grp, 2, 2) == "5" && nchar(next_grp) >= 5) { + sss = substr(next_grp, 3, 5) + if (sss != "///") { + sss_int = suppressWarnings(as.integer(sss)) + if (!is.na(sss_int) && sss_int >= 0 && sss_int <= 240) { + data$sunshine = list( + value = sss_int / 10, + unit = "h", + time_before_obs = list(value = 24, unit = "h") + ) + } + } + } + } + }, error = function(e) { + message(paste("Error decoding group:", next_grp, "-", e$message)) + # Continue to next group + }, warning = function(w) { + message(paste("Warning decoding group:", next_grp, "-", w$message)) + # Continue to next group + }, message = function(m) { + message(paste("Warning decoding group:", next_grp, "-", trimws(conditionMessage(m)))) + # Continue to next group + }) + + next_grp = next_group() + } + + # Parse group 9 codes (including highest gusts) + if (length(group_9) > 0) { + idx = 1 + while (idx <= length(group_9)) { + g = group_9[[idx]] + tryCatch({ + if (nchar(g) >= 3) { + j1 = substr(g, 2, 2) # Second character + j2 = substr(g, 3, 3) # Third character + + if (j1 == "1") { + # Group 91xx - highest gusts + if (j2 == "0") { + # 910ff - gust with 10 min period + if (is.null(data$highest_gust)) { + data$highest_gust = list() + } + gust = HighestGust$new() + gust_data = gust$decode(g, + unit = if (!is.null(data$wind_indicator)) data$wind_indicator$unit else NULL, + measure_period = list(value = 10, unit = "min") + ) + if (!is.null(gust_data)) { + data$highest_gust[[length(data$highest_gust) + 1]] = gust_data + } + idx = idx + 1 + } else if (j2 == "1") { + # 911ff - gust with time before obs + # Check if next group is direction (915dd) + if (idx < length(group_9)) { + next_g = group_9[[idx + 1]] + if (substr(next_g, 1, 3) == "915") { + gust_group = paste(g, next_g, sep = " ") + idx = idx + 2 # Skip next group + } else { + gust_group = g + idx = idx + 1 + } + } else { + gust_group = g + idx = idx + 1 + } + + if (is.null(data$highest_gust)) { + data$highest_gust = list() + } + gust = HighestGust$new() + gust_data = gust$decode(gust_group, + unit = if (!is.null(data$wind_indicator)) data$wind_indicator$unit else NULL, + time_before = list(value = 6, unit = "h") # Default time before + ) + if (!is.null(gust_data)) { + data$highest_gust[[length(data$highest_gust) + 1]] = gust_data + } + } else { + idx = idx + 1 + } + } else { + idx = idx + 1 + } + } else { + idx = idx + 1 + } + }, error = function(e) { + message(paste("Error decoding group 9 code:", g, "-", e$message)) + idx <<- idx + 1 + }, warning = function(w) { + message(paste("Warning decoding group 9 code:", g, "-", w$message)) + idx <<- idx + 1 + }, message = function(m) { + message(paste("Warning decoding group 9 code:", g, "-", trimws(conditionMessage(m)))) + idx <<- idx + 1 + }) + } + } + + if (length(cloud_layers) > 0) { + data$cloud_layer = cloud_layers + } + } + + return(data) + } + ) +) + +################################################################################ +# ADDITIONAL CLASSES NEEDED FOR SYNOP +################################################################################ + +# ObservationTime +ObservationTime = R6Class("ObservationTime", + inherit = Observation, + public = list( + components = list( + list("day", 0, 2, Day), + list("hour", 2, 2, Hour) + ), + + initialize = function() { + super$initialize() + self$code_len = 4 + } + ) +) + +# WindIndicator +WindIndicator = R6Class("WindIndicator", + inherit = Observation, + public = list( + initialize = function() { + super$initialize() + self$code_len = 1 + self$valid_range = c(1, 7) + }, + + decode_internal = function(iw, ...) { + iw_int = as.integer(iw) + if (iw == "/") { + list(value = NULL, unit = NULL, estimated = NULL) + } else { + list( + value = iw_int, + unit = ifelse(iw_int < 2, "m/s", "KT"), + estimated = (iw_int %in% c(0, 3)) + ) + } + } + ) +) + +# Region +Region = R6Class("Region", + inherit = Observation, + public = list( + decode_internal = function(raw, ...) { + raw_int = as.integer(raw) + + regions = list( + I = list(c(60000, 69998)), + II = list(c(20000, 20099), c(20200, 21998), c(23001, 25998), + c(28001, 32998), c(35001, 36998), c(38001, 39998), + c(40350, 48599), c(48800, 49998), c(50001, 59998)), + III = list(c(80001, 88998)), + IV = list(c(70001, 79998)), + V = list(c(48600, 48799), c(90001, 98998)), + VI = list(c(1, 19998), c(20100, 20199), c(22001, 22998), + c(26001, 27998), c(33001, 34998), c(37001, 37998), + c(40001, 40349)), + Antarctic = list(c(89001, 89998)) + ) + + for (reg_name in names(regions)) { + for (range in regions[[reg_name]]) { + if (raw_int >= range[1] && raw_int <= range[2]) { + return(list(value = reg_name)) + } + } + } + + stop(paste("Invalid region code:", raw)) + } + ) +) + +# PrecipitationIndicator +PrecipitationIndicator = R6Class("PrecipitationIndicator", + inherit = Observation, + public = list( + initialize = function() { + super$initialize() + self$code_len = 1 + }, + + decode_internal = function(i, ...) { + kwargs = list(...) + country = kwargs$country + i_int = as.integer(i) + + list( + value = i_int, + in_group_1 = (i %in% c("0", "1")) || (i == "6" && !is.null(country) && country == "RU"), + in_group_3 = (i %in% c("0", "2")) || (i == "7" && !is.null(country) && country == "RU") + ) + } + ) +) + +# WeatherIndicator +WeatherIndicator = R6Class("WeatherIndicator", + inherit = Observation, + public = list( + initialize = function() { + super$initialize() + self$code_len = 1 + self$valid_range = c(1, 7) + }, + + decode_internal = function(ix, ...) { + ix_int = ifelse(ix == "/", NULL, as.integer(ix)) + + list( + value = ix_int, + automatic = ifelse(is.null(ix_int) || ix_int < 3, FALSE, TRUE) + ) + } + ) +) + +# LowestCloudBase +LowestCloudBase = R6Class("LowestCloudBase", + inherit = Observation, + public = list( + initialize = function() { + super$initialize() + self$code_len = 1 + self$code_table = CodeTable1600$new() + self$unit = "m" + } + ) +) + +# CodeTable1600 +CodeTable1600 = R6Class("CodeTable1600", + inherit = CodeTable, + public = list( + ranges = list( + c(0, 50), c(50, 100), c(100, 200), c(200, 300), c(300, 600), + c(600, 1000), c(1000, 1500), c(1500, 2000), c(2000, 2500), c(2500, Inf) + ), + + initialize = function() { + self$table_name = "1600" + }, + + decode_internal = function(h, ...) { + h_int = as.integer(h) + if (h_int >= 0 && h_int < length(self$ranges)) { + range = self$ranges[[h_int + 1]] + # ifelse(test, yes, NULL) raises a warning that gets caught upstream and + # silently drops the result, so use plain if/else here. + if (is.infinite(range[2])) { + list(min = range[1], max = NULL, quantifier = "isGreaterOrEqual") + } else { + list(min = range[1], max = range[2], quantifier = NULL) + } + } else { + stop(paste("Invalid cloud base code:", h)) + } + } + ) +) + +# Precipitation +Precipitation = R6Class("Precipitation", + inherit = Observation, + public = list( + initialize = function() { + super$initialize() + self$code_len = 4 + }, + + decode_internal = function(group, ...) { + kwargs = list(...) + tenths = ifelse(is.null(kwargs$tenths), FALSE, kwargs$tenths) + + if (tenths) { + rrrr = substr(group, 2, 5) + amount = Amount24$new() + list( + amount = amount$decode(rrrr), + time_before_obs = list(value = 24, unit = "h") + ) + } else { + rrr = substr(group, 2, 4) + t = substr(group, 5, 5) + amount = Amount$new() + list( + amount = amount$decode(rrr), + time_before_obs = TimeBeforeObs$new()$decode(t) + ) + } + } + ) +) + +# Amount (simplified) +Amount = R6Class("Amount", + inherit = Observation, + public = list( + initialize = function() { + super$initialize() + self$code_len = 3 + self$code_table = CodeTable3590$new() + self$unit = "mm" + } + ) +) + +# Amount24 +Amount24 = R6Class("Amount24", + inherit = Observation, + public = list( + initialize = function() { + super$initialize() + self$code_len = 4 + self$code_table = CodeTable3590A$new() + self$unit = "mm" + } + ) +) + +# CodeTable3590 (simplified) +CodeTable3590 = R6Class("CodeTable3590", + inherit = CodeTable, + public = list( + initialize = function() { + self$table_name = "3590" + }, + + decode_internal = function(RRR, ...) { + rrr_int = as.integer(RRR) + if (rrr_int <= 988) { + list(value = rrr_int, quantifier = NULL, trace = FALSE) + } else if (rrr_int == 989) { + list(value = rrr_int, quantifier = "isGreaterOrEqual", trace = FALSE) + } else if (rrr_int == 990) { + list(value = 0, quantifier = NULL, trace = TRUE) + } else if (rrr_int >= 991 && rrr_int <= 999) { + list(value = (rrr_int - 990) / 10.0, quantifier = NULL, trace = FALSE) + } else { + stop(paste("Invalid precipitation code:", RRR)) + } + } + ) +) + +# CodeTable3590A (simplified) +CodeTable3590A = R6Class("CodeTable3590A", + inherit = CodeTable, + public = list( + initialize = function() { + self$table_name = "3590A" + }, + + decode_internal = function(RRRR, ...) { + rrrr_int = as.integer(RRRR) + if (rrrr_int <= 9998) { + list(value = round(rrrr_int * 0.1, 1), quantifier = NULL, trace = FALSE) + } else if (rrrr_int == 9999) { + list(value = 0, quantifier = NULL, trace = TRUE) + } else { + stop(paste("Invalid precipitation code:", RRRR)) + } + } + ) +) + +# TimeBeforeObs (simplified) +TimeBeforeObs = R6Class("TimeBeforeObs", + inherit = Observation, + public = list( + initialize = function() { + super$initialize() + self$code_len = 1 + self$code_table = CodeTable4019$new() + self$unit = "h" + } + ) +) + +# CodeTable4019 +CodeTable4019 = R6Class("CodeTable4019", + inherit = CodeTable, + public = list( + values = c(NULL, 6, 12, 18, 24, 1, 2, 3, 9, 15), + + initialize = function() { + self$table_name = "4019" + }, + + decode_internal = function(t, ...) { + t_int = as.integer(t) + 1 + if (t_int >= 1 && t_int <= length(self$values)) { + val = self$values[[t_int]] + if (!is.null(val)) { + list(value = val, unit = "h") + } else { + NULL + } + } else { + NULL + } + } + ) +) + +# PressureTendency +PressureTendency = R6Class("PressureTendency", + inherit = Observation, + public = list( + initialize = function() { + super$initialize() + self$code_len = 4 + }, + + decode_internal = function(group, ...) { + a = substr(group, 2, 2) + ppp = substr(group, 3, 5) + + tendency = Tendency$new() + change = Change$new() + + list( + tendency = tendency$decode(a), + change = change$decode(ppp, tendency = tendency$decode(a)) + ) + } + ) +) + +# Tendency (simplified) +Tendency = R6Class("Tendency", + inherit = Observation, + public = list( + initialize = function() { + super$initialize() + self$code_len = 1 + self$valid_range = c(0, 8) + } + ) +) + +# Change (simplified) +Change = R6Class("Change", + inherit = Observation, + public = list( + initialize = function() { + super$initialize() + self$code_len = 3 + self$unit = "hPa" + }, + + decode_convert = function(val, ...) { + kwargs = list(...) + tendency = kwargs$tendency + + if (is.list(tendency) && "value" %in% names(tendency)) { + factor = ifelse(tendency$value < 5, 10.0, -10.0) + val$value = val$value / factor + } + val + } + ) +) + +# Weather +Weather = R6Class("Weather", + inherit = Observation, + public = list( + initialize = function() { + super$initialize() + self$code_len = 2 + }, + + decode_internal = function(group, ...) { + kwargs = list(...) + w_type = kwargs$type + ix = kwargs$weather_indicator + + if (w_type == "present") { + table = ifelse(!is.null(ix) && ix %in% c(5, 6, 7), "4680", "4677") + } else if (w_type == "past") { + table = ifelse(!is.null(ix) && ix %in% c(5, 6, 7), "4531", "4561") + } else { + stop(paste("Invalid weather type:", w_type)) + } + + group_int = as.integer(group) + if (is.na(group_int)) { + return(NULL) + } + + result = list(value = group_int, `_table` = table) + if (!is.null(kwargs$time_before)) { + result$time_before_obs = kwargs$time_before + } + + result + } + ) +) + +# CloudType +CloudType = R6Class("CloudType", + inherit = Observation, + public = list( + initialize = function() { + super$initialize() + self$code_len = 4 + }, + + decode_internal = function(group, ...) { + nh = substr(group, 2, 2) + cl = substr(group, 3, 3) + cm = substr(group, 4, 4) + ch = substr(group, 5, 5) + + low_cloud = LowCloud$new() + middle_cloud = MiddleCloud$new() + high_cloud = HighCloud$new() + cloud_cover = CloudCover$new() + + result = list( + low_cloud_type = low_cloud$decode(cl), + middle_cloud_type = middle_cloud$decode(cm), + high_cloud_type = high_cloud$decode(ch) + ) + + cover = cloud_cover$decode(nh) + if (nh != "/") { + if (!is.null(result$low_cloud_type) && + result$low_cloud_type$value >= 1 && + result$low_cloud_type$value <= 9) { + result$low_cloud_amount = cover + } else if (!is.null(result$middle_cloud_type) && + result$middle_cloud_type$value >= 0 && + result$middle_cloud_type$value <= 9) { + result$middle_cloud_amount = cover + } else { + result$cloud_amount = cover + } + } + + result + } + ) +) + +# LowCloud, MiddleCloud, HighCloud (simplified) +LowCloud = R6Class("LowCloud", + inherit = Observation, + public = list( + initialize = function() { + super$initialize() + self$code_len = 1 + } + ) +) + +MiddleCloud = R6Class("MiddleCloud", + inherit = Observation, + public = list( + initialize = function() { + super$initialize() + self$code_len = 1 + } + ) +) + +HighCloud = R6Class("HighCloud", + inherit = Observation, + public = list( + initialize = function() { + super$initialize() + self$code_len = 1 + } + ) +) + +# CloudLayer +CloudLayer = R6Class("CloudLayer", + inherit = Observation, + public = list( + initialize = function() { + super$initialize() + self$code_len = 4 + }, + + decode_internal = function(group, ...) { + n = substr(group, 2, 2) + c = substr(group, 3, 3) + hh = substr(group, 4, 5) + + cloud_cover = CloudCover$new() + cloud_genus = CloudGenus$new() + height = Height$new() + + list( + cloud_cover = cloud_cover$decode(n), + cloud_genus = cloud_genus$decode(c), + cloud_height = height$decode(hh) + ) + } + ) +) + +# Height (simplified) +Height = R6Class("Height", + inherit = Observation, + public = list( + initialize = function() { + super$initialize() + self$code_len = 2 + self$code_table = CodeTable1677$new() + self$unit = "m" + } + ) +) + +# CodeTable1677 (simplified) +CodeTable1677 = R6Class("CodeTable1677", + inherit = CodeTable, + public = list( + initialize = function() { + self$table_name = "1677" + }, + + decode_internal = function(hh, ...) { + hh_int = as.integer(hh) + quantifier = NULL + + if (hh_int == 0) { + list(value = 30, quantifier = "isLess") + } else if (hh_int >= 1 && hh_int <= 50) { + list(value = hh_int * 30, quantifier = NULL) + } else if (hh_int >= 56 && hh_int <= 80) { + list(value = (hh_int - 50) * 300, quantifier = NULL) + } else if (hh_int >= 81 && hh_int <= 88) { + list(value = ((hh_int - 80) * 1500) + 9000, quantifier = NULL) + } else if (hh_int == 89) { + list(value = 21000, quantifier = "isGreater") + } else if (hh_int == 99) { + list(value = 21000, quantifier = "isGreater") + } else { + stop(paste("Invalid height code:", hh)) + } + } + ) +) + +# RelativeHumidity +RelativeHumidity = R6Class("RelativeHumidity", + inherit = Observation, + public = list( + initialize = function() { + super$initialize() + self$code_len = 3 + self$valid_range = c(0, 100) + self$unit = "%" + } + ) +) + +# HighestGust - Highest wind gust +HighestGust = R6Class("HighestGust", + inherit = Observation, + public = list( + initialize = function() { + super$initialize() + self$code_len = 2 + }, + + decode_internal = function(group, ...) { + kwargs = list(...) + + # Split group into separate groups if needed + groups = strsplit(group, " ")[[1]] + + # Get type, speed and direction + # Format: 910ff or 911ff, optionally followed by 915dd + t = NULL + ff = NULL + dd = NULL + + if (length(groups) > 0) { + # First group: 910ff or 911ff + first_group = groups[1] + if (nchar(first_group) >= 5) { + t = substr(first_group, 3, 3) + ff = substr(first_group, 4, 5) + } + } + + # Second group: 915dd (direction) + if (length(groups) > 1) { + second_group = groups[2] + if (nchar(second_group) >= 5 && substr(second_group, 1, 3) == "915") { + dd = substr(second_group, 4, 5) + } + } + + # Return values + time_before = kwargs$time_before + measure_period = kwargs$measure_period + + gust_obs = Gust$new() + dir_obs = DirectionDegrees$new() + + data = list( + speed = gust_obs$decode(ff, unit = kwargs$unit), + direction = dir_obs$decode(dd) + ) + + if (!is.null(time_before)) { + data$time_before_obs = time_before + } + if (!is.null(measure_period)) { + data$measure_period = measure_period + } + + data + }, + + encode_internal = function(data, ...) { + kwargs = list(...) + time_before = kwargs$time_before + measure_period = kwargs$measure_period + output = character(0) + + # Handle list of gusts or single gust + if (is.list(data) && "speed" %in% names(data)) { + data = list(data) # Convert single gust to list + } + + for (d in data) { + # Convert time before obs, if required + if ("time_before_obs" %in% names(d)) { + if (is.null(time_before) || + (!is.null(time_before) && !identical(d$time_before_obs, time_before))) { + time_before_obs = TimeBeforeObs$new() + tt = time_before_obs$encode(d$time_before_obs) + if (tt != "//") { + output = c(output, paste0("907", tt)) + } + } + prefix = "911" + } else if ("measure_period" %in% names(d)) { + if (identical(d$measure_period, list(value = 10, unit = "min"))) { + prefix = "910" + } else { + stop("Invalid value for measure_period") + } + } else { + prefix = "910" # Default + } + + # Convert the gust + gust_obs = Gust$new() + ff = gust_obs$encode(if ("speed" %in% names(d)) d$speed else NULL) + output = c(output, paste0(prefix, ff)) + + # Convert the direction + if ("direction" %in% names(d) && !is.null(d$direction)) { + dir_obs = DirectionDegrees$new() + dd = dir_obs$encode(d$direction) + output = c(output, paste0("915", dd)) + } + } + + paste(output, collapse = " ") + } + ) +) + +# Gust - Wind gust speed (internal class for HighestGust) +Gust = R6Class("Gust", + inherit = Observation, + public = list( + initialize = function() { + super$initialize() + self$code_len = 2 + }, + + decode_internal = function(ff, ...) { + # Decode wind gust speed - same as WindSpeed + self$decode_value(ff, ...) + }, + + encode_internal = function(data, ...) { + # Encode wind gust speed - same as WindSpeed + if (is.null(data)) { + return(paste(rep(self$null_char, self$code_len), collapse = "")) + } + value = if (is.list(data)) data$value else data + if (!is.null(value) && value > 99) { + return(paste0("99 00", sprintf("%02d", value))) + } + sprintf("%02d", as.integer(value)) + } + ) +) + +################################################################################ +# EXPORT FUNCTIONS +################################################################################ + +# Helper function to create observation instances +create_observation = function(class_name, ...) { + class_map = list( + "CloudCover" = CloudCover, + "CloudGenus" = CloudGenus, + "Day" = Day, + "DirectionCardinal" = DirectionCardinal, + "DirectionDegrees" = DirectionDegrees, + "Hour" = Hour, + "Minute" = Minute, + "SignedTemperature" = SignedTemperature, + "Visibility" = Visibility, + "Temperature" = Temperature, + "Pressure" = Pressure, + "SurfaceWind" = SurfaceWind, + "WindSpeed" = WindSpeed, + "SYNOP" = SYNOP + ) + + if (!class_name %in% names(class_map)) { + stop(paste("Unknown observation class:", class_name)) + } + + class_map[[class_name]]$new(...) +} diff --git a/R/zzz.R b/R/zzz.R index edeeeeda..afe03ef0 100644 --- a/R/zzz.R +++ b/R/zzz.R @@ -2,7 +2,7 @@ #' @return Empty env #' @keywords internal #' @noRd -env <- new.env(parent = emptyenv()) +env = new.env(parent = emptyenv()) globalVariables(c("DZ", "GG", "MC", "NSP", "POST.x", "ROK", "id", "..status_cols", "status_cols")) \ No newline at end of file diff --git a/README.md b/README.md index 61329e7e..97dbc2b7 100644 --- a/README.md +++ b/README.md @@ -42,57 +42,63 @@ install_github("bczernecki/climate") ### Meteorological data -- **meteo_ogimet()** - Downloading hourly and daily meteorological data from the SYNOP stations available in the ogimet.com collection. -Any meteorological (aka SYNOP) station working under the World Meteorological Organizaton framework after year 2000 should be accessible. +- 🌍 **meteo_ogimet()** - Downloading hourly and daily meteorological data from the SYNOP stations available in the ogimet.com collection. +Any meteorological (aka SYNOP) station working under the World Meteorological Organization framework after year 2000 should be accessible. +Two backends are available and selected automatically: raw **SYNOP decoding** (`source = "synop"`, default for `interval = "hourly"`) and **HTML scraping** (`source = "html"`, default for `interval = "daily"`). +Country-level bulk downloads are supported via the `country_name` argument (SYNOP backend only). -- **meteo_imgw()** - Downloading hourly, daily, and monthly meteorological data from the SYNOP/CLIMATE/PRECIP stations available in the danepubliczne.imgw.pl collection. -It is a wrapper for `meteo_monthly()`, `meteo_daily()`, and `meteo_hourly()`. If 10-min dataset is needed then consider using **`meteo_imgw_datastore()`** +- 🌍 **meteo_noaa_hourly()** - Downloading hourly NCEI/NOAA Integrated Surface Hourly (ISH) meteorological data - Some stations have > 100 years long history of observations -- **meteo_noaa_hourly()** - Downloading hourly NCEI/NOAA Integrated Surface Hourly (ISH) meteorological data - Some stations have > 100 years long history of observations +- 🌍 **meteo_noaa_co2()** - Downloading monthly CO2 measurements from Mauna Loa Observatory -- **meteo_noaa_co2()** - Downloading monthly CO2 measurements from Mauna Loa Observatory +- 🌍 **sounding_wyoming()** - Downloading measurements of the vertical profile of atmosphere (aka rawinsonde data) + +- 🇵🇱 **meteo_imgw()** - Downloading hourly, daily, and monthly meteorological data from the Polish met service across +all types of stations (i.e. SYNOP/CLIMATE/PRECIP ) available in the danepubliczne.imgw.pl collection. +It is a wrapper for `meteo_monthly()`, `meteo_daily()`, `meteo_hourly()` and `meteo_imgw_datastore()` which gives access from montly to even to 10-min dataset. -- **sounding_wyoming()** - Downloading measurements of the vertical profile of atmosphere (aka rawinsonde data) -### Hydrological data +### (Polish) Hydrological data -- **hydro_imgw()** - Downloading hourly, daily, and monthly hydrological data from stations available in the +- 🇵🇱 **hydro_imgw()** - Downloading hourly, daily, and monthly hydrological data from stations available in the danepubliczne.imgw.pl collection. It is a wrapper for previously developed set of functions such as: `hydro_monthly()`, and `hydro_daily()` -- **hydro_imgw_datastore()** - Downloading hourly and subhourly hydrological data from the IMGW-PIB hydro telemetry stations. +- 🇵🇱 **hydro_imgw_datastore()** - Downloading hourly and subhourly hydrological data from the IMGW-PIB hydro telemetry stations. ### Auxiliary functions and datasets -- **stations_ogimet()** - Downloading information about all stations available in the selected +- 🌍 **stations_ogimet()** - Downloading information about all stations available in the selected country in the Ogimet repository -- **nearest_stations_ogimet()** - Downloading information about nearest stations to the selected point using Ogimet repository -- **nearest_stations_noaa()** - Downloading information about nearest stations to the selected point available for the selected country in the NOAA ISH meteorological repository -- **nearest_stations_imgw()** - List of nearby meteorological or hydrological IMGW-PIB stations in Poland -- **imgw_meteo_stations** - Built-in metadata from the IMGW-PIB repository for meteorological stations, their geographical coordinates, and ID numbers -- **imgw_hydro_stations** - Built-in metadata from the IMGW-PIB repository for hydrological stations, their geographical coordinates, and ID numbers -- **stations_meteo_imgw_telemetry** - Downloading complete and up-to-date information about coordinates for IMGW-PIB telemetry meteorological stations -- **stations_hydro_imgw_telemetry** - Downloading complete and up-to-date information about coordinates for IMGW-PIB telemetry hydrological stations +- 🌍 **nearest_stations_ogimet()** - Downloading information about nearest stations to the selected point using Ogimet repository +- 🌍 **nearest_stations_noaa()** - Downloading information about nearest stations to the selected point available for the selected country in the NOAA ISH meteorological repository +- 🇵🇱 **nearest_stations_imgw()** - List of nearby meteorological or hydrological IMGW-PIB stations in Poland +- 🇵🇱 **imgw_meteo_stations** - Built-in metadata from the IMGW-PIB repository for meteorological stations, their geographical coordinates, and ID numbers +- 🇵🇱 **imgw_hydro_stations** - Built-in metadata from the IMGW-PIB repository for hydrological stations, their geographical coordinates, and ID numbers +- 🇵🇱 **stations_meteo_imgw_telemetry** - Downloading complete and up-to-date information about coordinates for IMGW-PIB telemetry meteorological stations +- 🇵🇱 **stations_hydro_imgw_telemetry** - Downloading complete and up-to-date information about coordinates for IMGW-PIB telemetry hydrological stations + +- 🌍 **parser()** - Decoding raw SYNOP meteorological messages into structured R lists or data frames. For a full walkthrough see the [SYNOP Messages vignette](https://bczernecki.github.io/climate/articles/synop_parser.html). ## Example 1 #### Download hourly dataset from NCEI/NOAA ISH meteorological repository: -``` r0 +``` r library(climate) -noaa <- meteo_noaa_hourly(station = "123300-99999", year = 2018:2019) # station ID: Poznan, Poland +noaa = meteo_noaa_hourly(station = "123300-99999", year = 2018:2019) # station ID: Poznan, Poland head(noaa) - -# year month day hour lon lat alt t2m dpt2m ws wd slp visibility -# 2019 1 1 0 16.85 52.417 84 3.3 2.3 5 220 1025.0 6000 -# 2019 1 1 1 16.85 52.417 84 3.7 3.0 4 220 1024.2 1500 -# 2019 1 1 2 16.85 52.417 84 4.2 3.6 4 220 1022.5 1300 -# 2019 1 1 3 16.85 52.417 84 5.2 4.6 5 240 1021.2 1900 ``` +| year | month | day | hour | lon | lat | alt | t2m | dpt2m | ws | wd | slp | visibility | +|------|-------|-----|------|-------|--------|-----|-----|-------|----|-----|--------|------------| +| 2019 | 1 | 1 | 0 | 16.85 | 52.417 | 84 | 3.3 | 2.3 | 5 | 220 | 1025.0 | 6000 | +| 2019 | 1 | 1 | 1 | 16.85 | 52.417 | 84 | 3.7 | 3.0 | 4 | 220 | 1024.2 | 1500 | +| 2019 | 1 | 1 | 2 | 16.85 | 52.417 | 84 | 4.2 | 3.6 | 4 | 220 | 1022.5 | 1300 | +| 2019 | 1 | 1 | 3 | 16.85 | 52.417 | 84 | 5.2 | 4.6 | 5 | 240 | 1021.2 | 1900 | + ## Example 2 -#### Finding a nearest meteorological stations in a given country using NCEI/NOAA ISH data source: +#### Finding a nearest meteorological stations in a given country using NCEI/NOAA ISH data source (used in Ex. 1): -``` r1 -library(climate) +``` r # find 100 nearest UK stations to longitude 1W and latitude 53N : nearest_stations_ogimet(country = "United+Kingdom", @@ -101,80 +107,98 @@ nearest_stations_ogimet(country = "United+Kingdom", point = c(-1, 53), no_of_stations = 100 ) - -# wmo_id station_names lon lat alt distance [km] -# 03354 Nottingham Weather Centre -1.250005 53.00000 117 28.04973 -# 03379 Cranwell -0.500010 53.03333 67 56.22175 -# 03377 Waddington -0.516677 53.16667 68 57.36093 -# 03373 Scampton -0.550011 53.30001 57 60.67897 -# 03462 Wittering -0.466676 52.61668 84 73.68934 -# 03544 Church Lawford -1.333340 52.36667 107 80.29844 -# ... ``` +| wmo_id | station_names | lon | lat | alt | distance [km] | +|--------|---------------------------|-----------|----------|-----|---------------| +| 03354 | Nottingham Weather Centre | -1.250005 | 53.00000 | 117 | 28.04973 | +| 03379 | Cranwell | -0.500010 | 53.03333 | 67 | 56.22175 | +| 03377 | Waddington | -0.516677 | 53.16667 | 68 | 57.36093 | +| 03373 | Scampton | -0.550011 | 53.30001 | 57 | 60.67897 | +| 03462 | Wittering | -0.466676 | 52.61668 | 84 | 73.68934 | +| 03544 | Church Lawford | -1.333340 | 52.36667 | 107 | 80.29844 | +| ... | ... | ... | ... | ... | ... | + ![100 nearest stations to given coordinates in UK](http://iqdata.eu/kolokwium/uk.png) ## Example 3 #### Downloading daily (or hourly) data from a global (OGIMET) repository knowing its ID (see also `nearest_stations_ogimet()`): ``` r -library(climate) +# Daily summary — uses HTML backend by default o = meteo_ogimet(date = c(Sys.Date() - 5, Sys.Date() - 1), interval = "daily", coords = FALSE, station = 12330) head(o) +``` -#> station_ID Date TemperatureCAvg TemperatureCMax TemperatureCMin TdAvgC HrAvg WindkmhDir -#> 3 12330 2019-12-21 8.8 13.2 4.9 5.3 79.3 SSE -#> 4 12330 2019-12-20 5.4 8.5 -1.2 4.5 92.4 ESE -#> 5 12330 2019-12-19 3.8 10.3 -3.0 1.9 89.6 SW -#> 6 12330 2019-12-18 6.3 9.0 2.2 4.1 84.8 S -#> 7 12330 2019-12-17 4.9 7.6 0.3 2.9 87.2 SSE -#> WindkmhInt WindkmhGust PresslevHp Precmm TotClOct lowClOct SunD1h VisKm SnowDepcm PreselevHp -#> 3 11.4 39.6 995.9 1.8 3.6 2.0 6.7 21.4 NA -#> 4 15.0 NA 1015.0 0.0 6.4 0.6 1.0 8.0 NA -#> 5 7.1 NA 1020.4 0.0 5.2 5.9 2.5 14.1 NA -#> 6 9.2 NA 1009.2 0.0 5.7 2.7 1.4 12.2 NA -#> 7 7.2 NA 1010.8 0.1 6.2 4.6 13.0 NA +| station_ID | Date | TempCAvg | TempCMax | TempCMin | TdAvgC | HrAvg | WindDir | WindInt | WindGust | PressHp | Precmm | TotClOct | lowClOct | SunD1h | VisKm | +|------------|------------|----------|----------|----------|--------|-------|---------|---------|----------|---------|--------|----------|----------|--------|-------| +| 12330 | 2019-12-21 | 8.8 | 13.2 | 4.9 | 5.3 | 79.3 | SSE | 11.4 | 39.6 | 995.9 | 1.8 | 3.6 | 2.0 | 6.7 | 21.4 | +| 12330 | 2019-12-20 | 5.4 | 8.5 | -1.2 | 4.5 | 92.4 | ESE | 15.0 | NA | 1015.0 | 0.0 | 6.4 | 0.6 | 1.0 | 8.0 | +| 12330 | 2019-12-19 | 3.8 | 10.3 | -3.0 | 1.9 | 89.6 | SW | 7.1 | NA | 1020.4 | 0.0 | 5.2 | 5.9 | 2.5 | 14.1 | +| 12330 | 2019-12-18 | 6.3 | 9.0 | 2.2 | 4.1 | 84.8 | S | 9.2 | NA | 1009.2 | 0.0 | 5.7 | 2.7 | 1.4 | 12.2 | +| 12330 | 2019-12-17 | 4.9 | 7.6 | 0.3 | 2.9 | 87.2 | SSE | 7.2 | NA | 1010.8 | 0.1 | 6.2 | 4.6 | NA | 13.0 | + +``` r +# Hourly observations — decoded from raw SYNOP messages by default +h = meteo_ogimet(date = c("2009-12-01", "2009-12-04"), + interval = "hourly", + station = 12330) +head(h) +``` + +| date | station | t2m | dpt2m | rel_hum | tmax | tmin | wd | ws | gust | press | slp | precip | Nt | snow | +|---------------------|---------|------|-------|---------|-------|-------|-----|-----|------|--------|--------|--------|----|------| +| 2009-12-01 00:00:00 | 12330 | 2.0 | 0.0 | 93 | NA | NA | 210 | 5 | NA | 1007.4 | 1016.3 | NA | 8 | NA | +| 2009-12-01 06:00:00 | 12330 | 1.0 | -1.0 | 92 | NA | NA | 200 | 3 | NA | 1009.8 | 1018.8 | NA | 8 | NA | +| 2009-12-01 12:00:00 | 12330 | 3.0 | 1.0 | 93 | 5.8 | 2.9 | 230 | 4 | NA | 1011.5 | 1020.4 | NA | 8 | NA | +| 2009-12-01 18:00:00 | 12330 | 2.0 | 0.0 | 93 | NA | NA | 240 | 3 | NA | 1013.3 | 1022.1 | NA | 7 | NA | + +``` r +# Country-level bulk download — all stations in a country for a given day +poland = meteo_ogimet(interval = "hourly", + country_name = "Poland", + date = c("2009-12-15", "2009-12-15")) +nrow(poland) #> several hundred rows (one per observation per station) ``` ## Example 4 #### Downloading monthly/daily/hourly meteorological/hydrological data from the Polish (IMGW-PIB) repository: -``` r3 +``` r m = meteo_imgw(interval = "monthly", rank = "synop", year = 2000, coords = TRUE) head(m) -#> rank id X Y station yy mm tmax_abs -#> 575 SYNOPTYCZNA 353230295 23.16228 53.10726 BIAŁYSTOK 2000 1 5.3 -#> 577 SYNOPTYCZNA 353230295 23.16228 53.10726 BIAŁYSTOK 2000 2 10.6 -#> 578 SYNOPTYCZNA 353230295 23.16228 53.10726 BIAŁYSTOK 2000 3 14.8 -#> 579 SYNOPTYCZNA 353230295 23.16228 53.10726 BIAŁYSTOK 2000 4 27.8 -#> 580 SYNOPTYCZNA 353230295 23.16228 53.10726 BIAŁYSTOK 2000 5 29.3 -#> 581 SYNOPTYCZNA 353230295 23.16228 53.10726 BIAŁYSTOK 2000 6 32.6 -#> tmax_mean tmin_abs tmin_mean t2m_mean_mon t5cm_min rr_monthly -#> 575 0.4 -16.5 -4.5 -2.1 -23.5 34.2 -#> 577 4.1 -10.4 -1.4 1.3 -12.9 25.4 -#> 578 6.2 -6.4 -1.0 2.4 -9.4 45.5 -#> 579 17.9 -4.6 4.7 11.5 -8.1 31.6 -#> 580 21.3 -4.3 5.7 13.8 -8.3 9.4 -#> 581 23.1 1.0 9.6 16.6 -1.8 36.4 +``` + +| rank | id | X | Y | station | yy | mm | tmax_abs | tmax_mean | tmin_abs | tmin_mean | t2m_mean_mon | t5cm_min | rr_monthly | +|-------|-----------|----------|----------|-----------|------|----|----------|-----------|----------|-----------|--------------|----------|------------| +| SYNOP | 353230295 | 23.16228 | 53.10726 | BIAŁYSTOK | 2000 | 1 | 5.3 | 0.4 | -16.5 | -4.5 | -2.1 | -23.5 | 34.2 | +| SYNOP | 353230295 | 23.16228 | 53.10726 | BIAŁYSTOK | 2000 | 2 | 10.6 | 4.1 | -10.4 | -1.4 | 1.3 | -12.9 | 25.4 | +| SYNOP | 353230295 | 23.16228 | 53.10726 | BIAŁYSTOK | 2000 | 3 | 14.8 | 6.2 | -6.4 | -1.0 | 2.4 | -9.4 | 45.5 | +| SYNOP | 353230295 | 23.16228 | 53.10726 | BIAŁYSTOK | 2000 | 4 | 27.8 | 17.9 | -4.6 | 4.7 | 11.5 | -8.1 | 31.6 | +| SYNOP | 353230295 | 23.16228 | 53.10726 | BIAŁYSTOK | 2000 | 5 | 29.3 | 21.3 | -4.3 | 5.7 | 13.8 | -8.3 | 9.4 | +| SYNOP | 353230295 | 23.16228 | 53.10726 | BIAŁYSTOK | 2000 | 6 | 32.6 | 23.1 | 1.0 | 9.6 | 16.6 | -1.8 | 36.4 | +``` r h = hydro_imgw(interval = "daily", year = 2010:2011) head(h) - id station riv_or_lake date hyy idhyy dd H Q T mm thick -1 150210180 ANNOPOL Wisła (2) 2009-11-01 2010 1 1 287 436 NA 11 NA -2 150210180 ANNOPOL Wisła (2) 2009-11-02 2010 1 2 282 412 NA 11 NA -3 150210180 ANNOPOL Wisła (2) 2009-11-03 2010 1 3 272 368 NA 11 NA -4 150210180 ANNOPOL Wisła (2) 2009-11-04 2010 1 4 268 352 NA 11 NA -5 150210180 ANNOPOL Wisła (2) 2009-11-05 2010 1 5 264 336 NA 11 NA -6 150210180 ANNOPOL Wisła (2) 2009-11-06 2010 1 6 260 320 NA 11 NA ``` +| id | station | riv_or_lake | date | hyy | idhyy | dd | H | Q | T | mm | thick | +|-----------|---------|-------------|------------|------|-------|----|-----|-----|----|----|-------| +| 150210180 | ANNOPOL | Wisła (2) | 2009-11-01 | 2010 | 1 | 1 | 287 | 436 | NA | 11 | NA | +| 150210180 | ANNOPOL | Wisła (2) | 2009-11-02 | 2010 | 1 | 2 | 282 | 412 | NA | 11 | NA | +| 150210180 | ANNOPOL | Wisła (2) | 2009-11-03 | 2010 | 1 | 3 | 272 | 368 | NA | 11 | NA | +| 150210180 | ANNOPOL | Wisła (2) | 2009-11-04 | 2010 | 1 | 4 | 268 | 352 | NA | 11 | NA | +| 150210180 | ANNOPOL | Wisła (2) | 2009-11-05 | 2010 | 1 | 5 | 264 | 336 | NA | 11 | NA | +| 150210180 | ANNOPOL | Wisła (2) | 2009-11-06 | 2010 | 1 | 6 | 260 | 320 | NA | 11 | NA | + ## Example 5 #### Create Walter & Lieth climatic diagram based on downloaded data -``` r4 +``` r library(climate) library(dplyr) @@ -192,13 +216,16 @@ monthly_summary = as.data.frame(t(monthly_summary[, c(5,2,3,4)])) monthly_summary = round(monthly_summary, 1) colnames(monthly_summary) = month.abb print(monthly_summary) +``` -# Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec -# prec 37.1 31.3 38.5 31.3 53.9 60.8 94.8 59.6 40.5 39.7 35.7 38.6 -# tmax 8.7 11.2 17.2 23.8 28.3 31.6 32.3 31.8 26.9 21.3 14.3 9.8 -# tmin -15.0 -11.9 -7.6 -3.3 1.0 5.8 8.9 7.5 2.7 -2.4 -5.2 -10.4 -# tavg -1.0 0.5 3.7 9.4 14.4 17.4 19.4 19.0 14.3 9.1 4.5 0.8 +| | Jan | Feb | Mar | Apr | May | Jun | Jul | Aug | Sep | Oct | Nov | Dec | +|------|-------|-------|------|------|------|------|------|------|------|------|------|-------| +| prec | 37.1 | 31.3 | 38.5 | 31.3 | 53.9 | 60.8 | 94.8 | 59.6 | 40.5 | 39.7 | 35.7 | 38.6 | +| tmax | 8.7 | 11.2 | 17.2 | 23.8 | 28.3 | 31.6 | 32.3 | 31.8 | 26.9 | 21.3 | 14.3 | 9.8 | +| tmin | -15.0 | -11.9 | -7.6 | -3.3 | 1.0 | 5.8 | 8.9 | 7.5 | 2.7 | -2.4 | -5.2 | -10.4 | +| tavg | -1.0 | 0.5 | 3.7 | 9.4 | 14.4 | 17.4 | 19.4 | 19.0 | 14.3 | 9.1 | 4.5 | 0.8 | +``` r # create plot with use of the "climatol" package: climatol::diagwl(monthly_summary, mlab = "en", est = "POZNAŃ", alt = NA, @@ -210,7 +237,7 @@ climatol::diagwl(monthly_summary, mlab = "en", ## Example 6 #### Download monthly CO2 dataset from Mauna Loa observatory -``` r5 +``` r library(climate) library(ggplot2) library(ggthemes) @@ -256,12 +283,60 @@ res["Date"] = pd.TimedeltaIndex(res["Date"], unit="d") + dt.datetime(1970,1,1) res.head >>> res[res.columns[0:7]].head() -# station_ID Date TemperatureCAvg ... TemperatureCMin TdAvgC HrAvg -#0 72503.0 2022-06-15 23.5 ... 19.4 10.9 45.2 -#1 72503.0 2022-06-14 25.0 ... 20.6 16.1 59.0 -#2 72503.0 2022-06-13 20.4 ... 17.8 16.0 74.8 -#3 72503.0 2022-06-12 21.3 ... 18.3 12.0 57.1 -#4 72503.0 2022-06-11 22.6 ... 17.8 8.1 40.1 +``` + +| station_ID | Date | TemperatureCAvg | TemperatureCMin | TdAvgC | HrAvg | +|------------|------------|-----------------|-----------------|--------|-------| +| 72503.0 | 2022-06-15 | 23.5 | 19.4 | 10.9 | 45.2 | +| 72503.0 | 2022-06-14 | 25.0 | 20.6 | 16.1 | 59.0 | +| 72503.0 | 2022-06-13 | 20.4 | 17.8 | 16.0 | 74.8 | +| 72503.0 | 2022-06-12 | 21.3 | 18.3 | 12.0 | 57.1 | +| 72503.0 | 2022-06-11 | 22.6 | 17.8 | 8.1 | 40.1 | + +## Example 8 +#### Decode raw SYNOP messages with `parser()` + +The `parser()` function decodes FM-12 SYNOP meteorological messages into structured R objects. +For a detailed guide including all parameters and output formats, see the [SYNOP Messages vignette](https://bczernecki.github.io/climate/articles/synop_parser.html). + +```r +library(climate) + +synop_code = "AAXX 01004 88889 12782 61506 10094 20047 30111 40197 53007 60001 81541" + +# Decode a single message — returns a named list +result = parser(synop_code) +result$station_id$value #> "88889" +result$air_temperature$value #> 9.4 +result$wind_speed$value #> 6 +result$visibility$value #> 40000 +result$sea_level_pressure$value #> 1019.7 +``` + +```r +# Return a tidy data frame with one row per message +df = parser(synop_code, as_data_frame = TRUE) +df +``` + +| station_type | station_id | region | obs_day | obs_hour | wind_unit | wind_estimated | visibility | cloud_cover | wind_direction | wind_speed | air_temperature | dewpoint_temperature | +|--------------|------------|--------|---------|----------|-----------|----------------|------------|-------------|----------------|------------|-----------------|----------------------| +| AAXX | 88889 | III | 1 | 0 | KT | FALSE | 40000 | 6 | 150 | 6 | 9.4 | 4.7 | + +| station_pressure | sea_level_pressure | pressure_tendency | pressure_change | precipitation_amount | precipitation_time | cloud_base_min | cloud_base_max | low_cloud_type | middle_cloud_type | high_cloud_type | low_cloud_amount | source | +|------------------|--------------------|-------------------|-----------------|----------------------|--------------------|----------------|----------------|----------------|-------------------|-----------------|------------------|-------------------------| +| 1011.1 | 1019.7 | 0 | 7 | 0 | 6 | 1500 | 2000 | 5 | 4 | 1 | 1 | AAXX 01004 88889 12782… | + +```r +# Decode multiple SYNOP messages at once +msgs = c( + "AAXX 01004 88889 12782 61506 10094 20047 30111 40197 53007 60001 81541", + "AAXX 10124 26477 32560 83102 10156 20106 38528 40128 52003 60001 333 56017" +) +df2 = parser(msgs, as_data_frame = TRUE) +nrow(df2) #> 2 +df2$station_id #> c("88889", "26477") +df2$source # original SYNOP strings preserved in last column ``` diff --git a/data-raw/01_example.R b/data-raw/01_example.R index 829bdd5d..a7fee6f4 100644 --- a/data-raw/01_example.R +++ b/data-raw/01_example.R @@ -1,5 +1,5 @@ library(climate) -df <- meteo_ogimet(interval = "hourly", date = c("2018-01-01", "2018-12-31"), +df = meteo_ogimet(interval = "hourly", date = c("2018-01-01", "2018-12-31"), station = "01008") #> [1] "01008" #> |======================================================================| 100 % diff --git a/data-raw/02_example.R b/data-raw/02_example.R index 2a6513aa..fcf247be 100644 --- a/data-raw/02_example.R +++ b/data-raw/02_example.R @@ -1,16 +1,16 @@ library(climate) # downloading data -df <- meteo_ogimet(interval = "hourly", date = c("2018-01-01", "2018-12-31"), +df = meteo_ogimet(interval = "hourly", date = c("2018-01-01", "2018-12-31"), station = c("01008")) library(openair) # external package for plotting wind roses # converting wind direction from character into degrees -wdir <- data.frame(ddd = c("CAL", "N", "NNE", "NE", "ENE", "E", "ESE", "SE", "SSE", +wdir = data.frame(ddd = c("CAL", "N", "NNE", "NE", "ENE", "E", "ESE", "SE", "SSE", "S", "SSW", "SW", "WSW", "W", "WNW", "NW", "NNW"), dir = c(NA, 0:15 * 22.5), stringsAsFactors = FALSE) # changing the date column to the format required by the openair package -df$date <- as.POSIXct(df$Date, tz = "UTC") -df <- merge(df, wdir, by = "ddd", all.x = TRUE) # joining two datasets -df$ws <- df$ffkmh / 3.6 # converting to m/s from km/h -df$gust <- df$Gustkmh / 3.6 # converting to m/s from km/h +df$date = as.POSIXct(df$Date, tz = "UTC") +df = merge(df, wdir, by = "ddd", all.x = TRUE) # joining two datasets +df$ws = df$ffkmh / 3.6 # converting to m/s from km/h +df$gust = df$Gustkmh / 3.6 # converting to m/s from km/h windRose(mydata = df, ws = "ws", wd = "dir", type = "season", paddle = FALSE, main = "Svalbard Lufthavn (2018)", ws.int = 3, dig.lab = 3, layout = c(4, 1)) diff --git a/data-raw/04_example.R b/data-raw/04_example.R index 39cb8fee..55ca706f 100644 --- a/data-raw/04_example.R +++ b/data-raw/04_example.R @@ -1,6 +1,6 @@ library(climate) -profile <- sounding_wyoming(wmo_id = 12120,yy = 2019, mm = 4, dd = 4, hh = 0) -df <- profile[[1]] +profile = sounding_wyoming(wmo_id = 12120,yy = 2019, mm = 4, dd = 4, hh = 0) +df = profile[[1]] colnames(df)[c(1, 3:4)] = c("press", "temp", "dewpt") # changing column names RadioSonde::plotsonde(df, winds = FALSE, title = "2019-04-04 00UTC (LEBA, PL)", col = c("red", "blue"), lwd = 3) diff --git a/data-raw/05_example.R b/data-raw/05_example.R index 74e29201..2a2eb73d 100644 --- a/data-raw/05_example.R +++ b/data-raw/05_example.R @@ -5,7 +5,7 @@ library(sf) library(tmap) library(rnaturalearth) library(climate) -ms <- meteo_imgw("monthly", "synop", year = 1978:2017, coords = TRUE) +ms = meteo_imgw("monthly", "synop", year = 1978:2017, coords = TRUE) # calculating annual values ms %>% filter(!(mm > 5 && mm < 9 && t2m_mean_mon == 0)) %>% @@ -16,15 +16,15 @@ ms %>% spread(yy, annual_mean_t2m) %>% na.omit() -> trend # extracting trends -regression <- function(x) { - df <- data.frame(yy = 1978:2017, temp = as.numeric(x)) +regression = function(x) { + df = data.frame(yy = 1978:2017, temp = as.numeric(x)) coef(lm(temp ~ yy, data = df))[2] } -trend$coef <- round(apply(trend[, -1:-4], 1, regression) * 100, 1) -trend <- st_as_sf(trend, coords = c("X", "Y"), crs = 4326) +trend$coef = round(apply(trend[, -1:-4], 1, regression) * 100, 1) +trend = st_as_sf(trend, coords = c("X", "Y"), crs = 4326) # mapping the results -world <- ne_countries(scale = "medium", returnclass = "sf") -tm <- tm_shape(world) + tm_borders() + +world = ne_countries(scale = "medium", returnclass = "sf") +tm = tm_shape(world) + tm_borders() + tm_shape(trend, is.master = TRUE) + tm_dots(col = "coef", size = 4) + tm_shape(trend) + tm_text(text = "coef") tm diff --git a/data-raw/parametry_przyklad_synop.R b/data-raw/parametry_przyklad_synop.R deleted file mode 100644 index 892c3231..00000000 --- a/data-raw/parametry_przyklad_synop.R +++ /dev/null @@ -1,27 +0,0 @@ -library(imgw) -library(stringr) -synop <- meteo_daily("synop", year=2010) -daily <- synop -head(daily) - -abbrev <- read.csv("data-raw/parametry_skrot.csv", stringsAsFactors = F) -saveRDS(abbrev, file="data/abbrev.rda") -abbrev <- read.csv("data-raw/parametry_skrot.csv", stringsAsFactors = F) - -orig_columns <- trimws(gsub("\\s+", " ", colnames(daily))) # remove double spaces - -# fullname polish, no changes required: -abbrev$fullname[match(orig_columns, abbrev$fullname)] - -# abbrev english -colnames(synop) <- abbrev$abbr_ang[match(orig_columns, abbrev$fullname)] -head(synop) - -# fullname english -colnames(synop) <- abbrev$fullname_ang[match(orig_columns, abbrev$fullname)] -head(synop) - - -# zastanowic sie nad usunieciem zduplikowanych kolumn (Np. nazwa stacji) -synop <- synop[,!duplicated(colnames(synop))] -head(synop) diff --git a/data-raw/unique_meteo_parameters.R b/data-raw/unique_meteo_parameters.R index 0b6ce10b..31feb7bc 100644 --- a/data-raw/unique_meteo_parameters.R +++ b/data-raw/unique_meteo_parameters.R @@ -1,14 +1,14 @@ library(climate) library(stringr) -m_hs <- meteo_metadata_imgw("hourly", "synop") -m_hc <- meteo_metadata_imgw("hourly", "climate") -m_ds <- meteo_metadata_imgw("daily", "synop") -m_dc <- meteo_metadata_imgw("daily", "climate") -m_dp <- meteo_metadata_imgw("daily", "precip") -m_ms <- meteo_metadata_imgw("monthly", "synop") -m_mc <- meteo_metadata_imgw("monthly", "climate") -m_mp <- meteo_metadata_imgw("monthly", "precip") +m_hs = meteo_metadata_imgw("hourly", "synop") +m_hc = meteo_metadata_imgw("hourly", "climate") +m_ds = meteo_metadata_imgw("daily", "synop") +m_dc = meteo_metadata_imgw("daily", "climate") +m_dp = meteo_metadata_imgw("daily", "precip") +m_ms = meteo_metadata_imgw("monthly", "synop") +m_mc = meteo_metadata_imgw("monthly", "climate") +m_mp = meteo_metadata_imgw("monthly", "precip") all_meteo_metadata = dplyr::bind_rows( m_hs[[1]], @@ -32,7 +32,7 @@ unique_meteo_parameters = sort(unique_meteo_parameters) View(unique_meteo_parameters) # sprawdzenie czy stworzona recznie baza daje sie polaczyc left_joinem: -skroty <- read.csv("data-raw/parametry_skrot.csv", stringsAsFactors = F) -wsio <- data.frame(fullname = unique_meteo_parameters) -laczenie <- dplyr::left_join(wsio,skroty) +skroty = read.csv("data-raw/parametry_skrot.csv", stringsAsFactors = F) +wsio = data.frame(fullname = unique_meteo_parameters) +laczenie = dplyr::left_join(wsio,skroty) head(laczenie) diff --git a/man/compute_relative_humidity.Rd b/man/compute_relative_humidity.Rd new file mode 100644 index 00000000..1882581d --- /dev/null +++ b/man/compute_relative_humidity.Rd @@ -0,0 +1,44 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/compute_relative_humidity.R +\name{compute_relative_humidity} +\alias{compute_relative_humidity} +\title{Compute relative humidity from air temperature and dew-point temperature} +\usage{ +compute_relative_humidity(t2m, dpt2m) +} +\arguments{ +\item{t2m}{Numeric vector. Air temperature (2 m) in degrees Celsius.} + +\item{dpt2m}{Numeric vector. Dew-point temperature (2 m) in degrees Celsius. +Must be the same length as \code{t2m}.} +} +\value{ +Numeric vector of relative humidity values in percent (0–100). +Returns \code{NA} where either input is \code{NA}. Values are not clamped, so +rounding errors may produce results marginally outside 0–100. +} +\description{ +Uses the August-Roche-Magnus approximation to derive relative humidity from +the 2-metre air temperature and dew-point temperature. +} +\details{ +The August-Roche-Magnus approximation is: + +\deqn{RH = 100 \times + \frac{\exp\!\bigl(\tfrac{17.625\,T_d}{243.04 + T_d}\bigr)} + {\exp\!\bigl(\tfrac{17.625\,T}{243.04 + T}\bigr)}} + +where \eqn{T} is the air temperature and \eqn{T_d} is the dew-point +temperature, both in degrees Celsius. The coefficients (17.625 and 243.04) +follow Alduchov & Eskridge (1996). +} +\examples{ +compute_relative_humidity(t2m = 20, dpt2m = 10) # ~52 \% +compute_relative_humidity(t2m = 0, dpt2m = 0) # 100 \% +compute_relative_humidity(t2m = c(20, 15, NA), dpt2m = c(10, 12, 8)) + +} +\references{ +Alduchov, O. A., & Eskridge, R. E. (1996). Improved Magnus form approximation +of saturation vapor pressure. \emph{Journal of Applied Meteorology}, 35(4), 601–609. +} diff --git a/man/meteo_imgw.Rd b/man/meteo_imgw.Rd index 5e6bc867..a300d83e 100644 --- a/man/meteo_imgw.Rd +++ b/man/meteo_imgw.Rd @@ -5,52 +5,73 @@ \title{Meteorological data from the IMGW-PIB official repository} \usage{ meteo_imgw( - interval, + interval = NULL, rank = "synop", year, status = FALSE, coords = FALSE, station = NULL, col_names = "short", + parameters = NULL, ... ) } \arguments{ -\item{interval}{temporal resolution of the data ("hourly", "daily", "monthly")} +\item{interval}{temporal resolution of the data: \code{"hourly"}, \code{"daily"}, or \code{"monthly"}. +Not used when \code{rank = "telemetry"} (telemetry data are always at 10-minute intervals). +Defaults to \code{NULL}, which is only valid together with \code{rank = "telemetry"}.} -\item{rank}{rank of the stations: "synop" (default), "climate" or "precip"} +\item{rank}{rank of the stations: \code{"synop"} (default), \code{"climate"}, \code{"precip"}, or \code{"telemetry"}. +Use \code{"telemetry"} for the automated IMGW datastore network (data available since 2008).} -\item{year}{vector of years (e.g., 1966:2000)} +\item{year}{vector of years (e.g., \code{1966:2000}). +For \code{rank = "telemetry"} all years must be >= 2008.} \item{status}{leave the columns with measurement and observation statuses -(default status = FALSE - i.e. the status columns are deleted)} +(default \code{FALSE} — status columns are deleted). Not used when \code{rank = "telemetry"}.} -\item{coords}{add coordinates of the station (logical value TRUE or FALSE)} +\item{coords}{add coordinates of the station (logical value \code{TRUE} or \code{FALSE}). +Default \code{FALSE}.} \item{station}{name of meteorological station(s). -It accepts names (characters in CAPITAL LETTERS). Stations' IDs (numeric) are no longer supported. -Please note that station names may change over time and thus sometimes 2 names -are required in some cases, e.g. \code{c("POZNAŃ", "POZNAŃ-ŁAWICA")}.} +For ranks \code{"synop"}, \code{"climate"}, \code{"precip"}: station name(s) in CAPITAL LETTERS. +Please note that station names may change over time — sometimes two names are required, +e.g. \code{c("POZNAŃ", "POZNAŃ-ŁAWICA")}. +For \code{rank = "telemetry"}: station name(s) as listed by \code{stations_meteo_imgw_telemetry()}. +\code{NULL} (default) downloads all available stations.} -\item{col_names}{three types of column names possible: "short" - default, values with shorten names, -"full" - full English description, "polish" - original names in the dataset} +\item{col_names}{column name style: \code{"short"} (default), \code{"full"} (English descriptions), +or \code{"polish"} (original dataset names). Not used when \code{rank = "telemetry"}.} -\item{...}{other parameters that may be passed to the 'shortening' function that shortens column names} +\item{parameters}{character vector of parameter codes to download. +Only used when \code{rank = "telemetry"}. \code{NULL} (default) downloads all available parameters. +Accepted values: \code{"wd"}, \code{"t2m"}, \code{"t0m"}, \code{"rr_24h"}, \code{"rr_1h"}, \code{"rr_10min"}, +\code{"ws"}, \code{"ws_max"}, \code{"gust"}, \code{"rh"}, \code{"water_in_snow"}.} + +\item{...}{other parameters passed to the column-shortening function. Not used when +\code{rank = "telemetry"}.} } \value{ -A data.frame with columns describing the meteorological parameters -(e.g. temperature, wind speed, precipitation) where each row represent a measurement, -depending on the interval, at a given hour, month or year. -If \code{coords = TRUE} additional two -columns with geographic coordinates are added. +A data.frame with meteorological parameters where each row is a measurement. +For ranks \code{"synop"}, \code{"climate"}, \code{"precip"}: measurements at a given hour, day, or month, +depending on \code{interval}. If \code{coords = TRUE} two additional coordinate columns are appended. +For \code{rank = "telemetry"}: a data.table with 10-minute interval observations (not +expert-validated). If \code{coords = TRUE} columns \code{name}, \code{lon}, \code{lat}, and \code{alt} are appended. } \description{ Downloading hourly, daily, and monthly meteorological data from the -SYNOP / CLIMATE / PRECIP stations available in the danepubliczne.imgw.pl collection. +SYNOP / CLIMATE / PRECIP stations, or sub-hourly (10-minute) telemetry data from the +automated network, all available in the danepubliczne.imgw.pl collection. } \examples{ \donttest{ x = meteo_imgw("monthly", year = 2018, coords = TRUE) head(x) + + # Telemetry (10-minute) data from automated stations (available since 2008): + tel = meteo_imgw(rank = "telemetry", year = 2022, + parameters = "t2m", + station = "HALA GĄSIENICOWA") + head(tel) } } diff --git a/man/meteo_imgw_datastore.Rd b/man/meteo_imgw_datastore.Rd index 01ec7dad..579334b8 100644 --- a/man/meteo_imgw_datastore.Rd +++ b/man/meteo_imgw_datastore.Rd @@ -33,7 +33,7 @@ meteo_imgw_datastore( }} \item{stations}{\itemize{ -\item character vector with station names as visible in the \code{meteo_imgw_telemetry_stations()}. +\item character vector with station names as visible in the \code{stations_meteo_imgw_telemetry()}. Default \code{NULL} means to download data for all available stations. }} diff --git a/man/meteo_ogimet.Rd b/man/meteo_ogimet.Rd index 62995f66..b8c177c1 100644 --- a/man/meteo_ogimet.Rd +++ b/man/meteo_ogimet.Rd @@ -2,84 +2,140 @@ % Please edit documentation in R/meteo_ogimet.R \name{meteo_ogimet} \alias{meteo_ogimet} -\title{Scrapping meteorological (Synop) data from the Ogimet webpage} +\title{Download meteorological (Synop) data from the Ogimet service} \usage{ meteo_ogimet( - interval, + interval = "hourly", date = c(Sys.Date() - 30, Sys.Date()), - coords = FALSE, - station, - precip_split = TRUE, - allow_failure = TRUE + station = NULL, + country_name = NULL, + source = NULL, + ... ) } \arguments{ -\item{interval}{'daily' or 'hourly' dataset to retrieve - given as character} +\item{interval}{\code{"hourly"} (default) or \code{"daily"} — time resolution to retrieve.} -\item{date}{start and finish date (e.g., date = c("2018-05-01", "2018-07-01")) - character or Date class object. If not provided last 30 days are used.} +\item{date}{Length-2 character or Date vector giving the start and end of +the requested period, e.g. \code{c("2018-05-01", "2018-07-01")}. Defaults to +the last 30 days.} -\item{coords}{add geographical coordinates of the station (logical value TRUE or FALSE)} +\item{station}{WMO ID(s) of the station(s) to download. Character or numeric +vector. Not required when \code{country_name} is provided (SYNOP path only).} -\item{station}{WMO ID of meteorological station(s). Character or numeric vector} +\item{country_name}{Optional character string. When provided, the SYNOP path +downloads all Ogimet stations for the named country in a single request +(e.g. \code{"Poland"}, \code{"Germany"}), and \code{station} is ignored. Valid only with +\code{source = "synop"} (or the default hourly path).} -\item{precip_split}{whether to split precipitation fields into 6/12/24h} +\item{source}{Character. Backend to use: \code{"synop"} (raw SYNOP decoding) or +\code{"html"} (HTML scraping). When \code{NULL} (default) the backend is chosen +automatically: \code{"synop"} for \code{interval = "hourly"}, \code{"html"} for +\code{interval = "daily"}.} -\item{allow_failure}{logical - whether to proceed or stop on failure. By default set to TRUE (i.e. don't stop on error). For debugging purposes change to FALSE -numeric fields (logical value TRUE (default) or FALSE); valid only for hourly time step} +\item{...}{Optional named arguments: +\describe{ +\item{\code{allow_failure}}{Logical. When \code{TRUE} (default) network or parsing +errors are caught and a message is emitted; when \code{FALSE} errors +propagate.} +\item{\code{simplified}}{Logical. Applies to \code{source = "synop"} only. When +\code{TRUE} (default) a compact 20-column \code{data.frame} is returned (see +\strong{synop output} below). When \code{FALSE} the full \code{\link[=parser]{parser()}} output is +returned with 30+ columns.} +\item{\code{coords}}{Logical. Add geographical coordinates (\code{Lon}, \code{Lat}) to +the output. Applies to \code{source = "html"} only; a warning is emitted +for \code{source = "synop"}. Default \code{FALSE}.} +\item{\code{precip_split}}{Logical. Split the precipitation field into +separate \code{pr6}, \code{pr12}, and \code{pr24} columns. Valid only for +\code{interval = "hourly"} with \code{source = "html"}; a warning is emitted +otherwise. Default \code{TRUE}.} +\item{\code{return_list}}{Logical. Applies to \code{source = "synop"} only. When +\code{TRUE} a named list with elements \code{data} (compact 20-column +\code{data.frame}) and \code{full} (30+ column parser output) is returned +instead of a \code{data.frame}. A warning is emitted when used with +\code{source = "html"}. Default \code{FALSE}.} +}} } \value{ -A data.frame of measured values with columns describing the meteorological parameters (e.g. air temperature, wind speed, cloudines). -Depending on the interval, at a given hour or day. Different parameters are returned for daily and hourly datasets. -\enumerate{ -\item station_ID - WMO station identifier -\item Lon - longitude -\item Lat - latitude -\item Date - date (and time) of observations -\item TC - air temperature at 2 metres above ground level. Values given in Celsius degrees -\item TdC - dew point temperature at 2 metres above ground level. Values given in Celsius degrees -\item TmaxC - maximum air temperature at 2 metres above ground level. Values given in Celsius degrees -\item TminC - minimum air temperature at 2 metres above ground level. Values given in Celsius degrees -\item ddd - wind direction -\item ffkmh - wind speed in km/h -\item Gustkmh - wind gust in km/h -\item P0hpa - air pressure at elevation of the station in hPa -\item PseahPa - sea level pressure in hPa -\item PTnd - pressure tendency in hPa -\item Nt - total cloud cover -\item Nh - cloud cover by high-level cloud fraction -\item HKm - height of cloud base -\item InsoD1 - insolation in hours -\item Viskm - visibility in kilometres -\item Snowcm - depth of snow cover in centimetres -\item pr6 - precicipitation totals in 6 hours -\item pr12 - precicipitation totals in 12 hours -\item pr24 - precicipitation totals in 24 hours -\item TemperatureCAvg - average air temperature at 2 metres above ground level. Values given in Celsius degrees -\item TemperatureCMax - maximum air temperature at 2 metres above ground level. Values given in Celsius degrees -\item TemperatureCMin - minimum air temperature at 2 metres above ground level. Values given in Celsius degrees -\item TdAvgC - average dew point temperature at 2 metres above ground level. Values given in Celsius degrees -\item HrAvg - average relative humidity. Values given in \% -\item WindkmhDir - wind direction -\item WindkmhInt - wind speed in km/h -\item WindkmhGust - wind gust in km/h -\item PresslevHp - Sea level pressure in hPa -\item Precmm - precipitation totals in mm -\item TotClOct - total cloudiness in octants -\item lowClOct - cloudiness by low level clouds in octants -\item SunD1h - sunshine duration in hours -\item PreselevHp - atmospheric pressure measured at altitude of station in hPa -\item SnowDepcm - depth of snow cover in centimetres -} +\strong{synop output} (\code{source = "synop"}, \code{simplified = TRUE} or \code{return_list = TRUE} \verb{$data}): +A \code{data.frame} with one row per decoded SYNOP observation and 20 columns: +\code{date} (POSIXct UTC), \code{station}, \code{t2m}, \code{dpt2m}, \code{rel_hum}, \code{tmax}, +\code{tmin}, \code{wd}, \code{ws}, \code{gust}, \code{press}, \code{slp}, \code{press_tend}, \code{precip}, +\code{Nt}, \code{Nh}, \code{N_base}, \code{insol}, \code{visibility}, \code{snow}. + +\strong{synop output} (\code{source = "synop"}, \code{simplified = FALSE}): +A \code{data.frame} with 30+ columns from \code{\link[=parser]{parser()}}, prefixed by \code{station_id} +and \code{Date}. + +\strong{html output} (\code{source = "html"}, \code{interval = "hourly"}): +A \code{data.frame} with columns: \code{station_ID}, optionally \code{Lon}/\code{Lat}, +\code{Date}, \code{TC}, \code{TdC}, \code{TmaxC}, \code{TminC}, \code{ddd}, \code{ffkmh}, \code{Gustkmh}, +\code{P0hPa}, \code{PseahPa}, \code{PTnd}, \code{Nt}, \code{Nh}, \code{HKm}, \code{InsoD1}, \code{Viskm}, +\code{Snowcm}, and (when \code{precip_split = TRUE}) \code{pr6}, \code{pr12}, \code{pr24}. + +\strong{html output} (\code{source = "html"}, \code{interval = "daily"}): +A \code{data.frame} with columns: \code{station_ID}, optionally \code{Lon}/\code{Lat}, +\code{Date}, \code{TemperatureCAvg}, \code{TemperatureCMax}, \code{TemperatureCMin}, +\code{TdAvgC}, \code{HrAvg}, \code{WindkmhDir}, \code{WindkmhInt}, \code{WindkmhGust}, +\code{PresslevHp}, \code{PreselevHp}, \code{Precmm}, \code{SunD1h}, \code{SnowDepcm}, +\code{TotClOct}, \code{lowClOct}, \code{VisKm}. + +Returns \code{NULL} invisibly on failure when \code{allow_failure = TRUE}. } \description{ -Downloading hourly or daily (meteorological) data from the Synop stations available at https://www.ogimet.com/ +Unified entry point for downloading hourly or daily meteorological data +from \href{https://www.ogimet.com/}{Ogimet}. Two backends are supported: +} +\details{ +\itemize{ +\item \strong{\code{"synop"}} (default for hourly): Downloads raw SYNOP messages from the +Ogimet \code{getsynop} endpoint and decodes them with \code{\link[=parser]{parser()}}. Supports +station mode (one or more WMO IDs) and/or country mode (\code{country_name}). +A default output columns are described in the \strong{synop output} section below, but +can be enhanced optionally with \code{simplified = FALSE} or \code{return_list = TRUE} +to include more of decoded SYNOP fields. +\item \strong{\code{"html"}} (default for daily): Scrapes pre-formatted summary tables +from the Ogimet \code{gsynres} endpoint using \code{\link[XML:readHTMLTable]{XML::readHTMLTable()}}. +Output columns are described in the \strong{html output} section below. +} } \examples{ \donttest{ - # downloading daily data for New York - La Guardia (last 30 days by default) - new_york = meteo_ogimet(interval = "daily", - station = 72503, - coords = TRUE) + # Hourly SYNOP data for Poznan-Lawica (default source = "synop") + poznan_h = meteo_ogimet(interval = "hourly", + station = 12330, + date = c("2009-12-01", "2009-12-04")) + + # Daily HTML summaries for New York - La Guardia (default source = "html") + new_york = meteo_ogimet(interval = "daily", + station = 72503, + coords = TRUE) + + # Hourly with full parser output as a list + poznan_list = meteo_ogimet(interval = "hourly", + station = 12330, + date = c("2009-12-01", "2009-12-04"), + return_list = TRUE) + head(poznan_list$data) # simplified + head(poznan_list$full) # all parser columns + + # Country mode: all Polish stations for one day + germany = meteo_ogimet(interval = "hourly", + country_name = "Germany", + date = c("2009-12-15", "2009-12-15")) + + # Force SYNOP backend for daily data + poznan_d = meteo_ogimet(interval = "daily", + station = 12330, + date = c("2009-12-01", "2009-12-04"), + source = "synop") + + # Force HTML backend for hourly data + poznan_h2 = meteo_ogimet(interval = "hourly", + station = 12330, + date = c("2019-06-01", "2019-06-08"), + source = "html", + coords = TRUE) } } diff --git a/man/meteo_ogimet_synop.Rd b/man/meteo_ogimet_synop.Rd new file mode 100644 index 00000000..5fb1c22e --- /dev/null +++ b/man/meteo_ogimet_synop.Rd @@ -0,0 +1,129 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/meteo_ogimet_synop.R +\name{meteo_ogimet_synop} +\alias{meteo_ogimet_synop} +\title{Download and decode raw SYNOP messages from the Ogimet getsynop service} +\usage{ +meteo_ogimet_synop( + station = NULL, + date = c(Sys.Date() - 30, Sys.Date()), + country = NULL, + country_name = NULL, + simplified = TRUE, + allow_failure = TRUE +) +} +\arguments{ +\item{station}{Numeric or character vector of WMO station IDs. Optional when +\code{country_name} is provided; required otherwise.} + +\item{date}{Character or Date vector of length 2 giving the start and end of +the requested period, e.g. \code{c("2009-12-01", "2009-12-04")}. Defaults to +the last 30 days.} + +\item{country}{Optional; passed to \code{\link[=parser]{parser()}} for country-specific +precipitation indicator decoding (e.g. \code{"RU"}). Single string or \code{NULL} +(default). This is distinct from \code{country_name}.} + +\item{country_name}{Optional character string naming the country whose +stations should be downloaded, as recognised by Ogimet (e.g. +\code{"Poland"}, \code{"Germany"}, \code{"France"}). When provided, the \verb{state=} Ogimet +parameter is used and \code{station} is ignored. The full date range is +fetched in a single request.} + +\item{simplified}{Logical. When \code{TRUE} (default) returns a compact \code{data.frame} with +20 standardised columns (see \strong{Value} below). When \code{FALSE} the +full parser output is returned.} + +\item{allow_failure}{Logical. When \code{TRUE} (default) network errors are caught +and a message is emitted; when \code{FALSE} errors propagate to the caller.} +} +\value{ +By default (\code{simplified = TRUE}), a compact \code{data.frame} with one +row per decoded SYNOP observation. Columns: +\itemize{ +\item \code{date} — Observation date-time (\code{POSIXct}, UTC). +\item \code{station} — WMO station identifier (character). +\item \code{t2m} — Air temperature at 2 m (°C). +\item \code{dpt2m} — Dew-point temperature at 2 m (°C). +\item \code{rel_hum} — Relative humidity (\%), derived via \code{\link[=compute_relative_humidity]{compute_relative_humidity()}}. +\item \code{tmax} — Daily maximum temperature from Section 3 (°C). +\item \code{tmin} — Daily minimum temperature from Section 3 (°C). +\item \code{wd} — Wind direction (degrees). +\item \code{ws} — Wind speed (m/s or knots, per \code{wind_unit}). +\item \code{gust} — Highest gust speed from Section 3, same unit as \code{ws}. +\item \code{press} — Station-level pressure (hPa). +\item \code{slp} — Sea-level pressure (hPa). +\item \code{press_tend} — 3-hour pressure change (hPa). +\item \code{precip} — Precipitation amount (mm). +\item \code{Nt} — Total cloud cover (oktas, 0–8) from the \code{Nddff} group. +\item \code{Nh} — Cover of low clouds (genera Sc, St, Cu, Cb) in oktas (0–8), +from Section 1 group \verb{8NhCLCMCH}; \code{NA} when not reported. +\item \code{N_base} — Height of base of lowest observed cloud (m). +\item \code{insol} — Daily sunshine duration (hours). +\item \code{visibility} — Horizontal visibility (m). +\item \code{snow} — Total snow depth (cm); 0 for trace amounts. +} + +When \code{simplified = FALSE}, a \code{data.frame} with the first two columns +\code{station_id} (WMO identifier, character) and \code{Date} (\code{POSIXct}, UTC), +followed by all columns produced by \code{\link[=parser]{parser()}} with \code{as_data_frame = TRUE}: +\code{station_type}, \code{region}, \code{obs_day}, \code{obs_hour}, \code{wind_unit}, +\code{wind_estimated}, \code{visibility}, \code{cloud_cover}, \code{wind_direction}, +\code{wind_speed}, \code{air_temperature}, \code{dewpoint_temperature}, +\code{station_pressure}, \code{sea_level_pressure}, \code{pressure_tendency}, +\code{pressure_change}, \code{precipitation_amount}, \code{precipitation_time}, +\code{cloud_base_min}, \code{cloud_base_max}, \code{low_cloud_type}, \code{middle_cloud_type}, +\code{high_cloud_type}, \code{low_cloud_amount}, \code{maximum_temperature}, +\code{minimum_temperature}, \code{gust}, \code{sunshine_duration}, +\code{snow_depth}, \code{snow_depth_state}, \code{source}. + +Returns \code{NULL} invisibly when the download fails and \code{allow_failure = TRUE}. +} +\description{ +Downloads raw SYNOP messages from the Ogimet \code{getsynop} endpoint and decodes +them into a tidy \code{data.frame} using the \code{\link[=parser]{parser()}} function. Two retrieval +modes are supported: +} +\details{ +\itemize{ +\item \strong{Station mode} (\code{station} provided): fetches messages for one or more +WMO station IDs. +URL form: \verb{http://www.ogimet.com/cgi-bin/getsynop?block=&begin=&end=} +\item \strong{Country mode} (\code{country_name} provided): fetches messages for all +Ogimet stations in a country in a single request. +URL form: \verb{http://www.ogimet.com/cgi-bin/getsynop?begin=&end=&state=} +} + +When both \code{station} and \code{country_name} are supplied, \code{country_name} takes +precedence and a warning is issued. + +Each line of the response is a comma-separated record: +\verb{station_id,year,month,day,hour,minute,}. +The SYNOP message is decoded via \code{\link[=parser]{parser()}} with \code{as_data_frame = TRUE}. +} +\examples{ +\donttest{ + # Station mode: Poznan-Lawica (Poland) + poznan = meteo_ogimet_synop(station = 12330, + date = c("2009-12-01", "2009-12-04")) + head(poznan) + + # Station mode: multiple stations + two_stations = meteo_ogimet_synop(station = c(12330, 12375), + date = c("2019-06-01", "2019-06-03")) + head(two_stations) + + # Country mode: all Polish stations for one day + poland = meteo_ogimet_synop(country_name = "Poland", + date = c("2009-12-15", "2009-12-15")) + head(poland) + + # Simplified view + poznan_simple = meteo_ogimet_synop(station = 12330, + date = c("2009-12-01", "2009-12-04"), + simplified = TRUE) + head(poznan_simple) +} + +} diff --git a/man/parser.Rd b/man/parser.Rd new file mode 100644 index 00000000..8dcaf886 --- /dev/null +++ b/man/parser.Rd @@ -0,0 +1,67 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/parser.R +\name{parser} +\alias{parser} +\title{Parse SYNOP messages into structured lists or a data frame} +\usage{ +parser(message, country = NULL, simplify = TRUE, as_data_frame = FALSE) +} +\arguments{ +\item{message}{Character vector with SYNOP messages.} + +\item{country}{Optional; A single character value passed to the precipitation +indicator decoder to adjust country-specific behaviour (e.g. \code{"RU"}).} + +\item{simplify}{Logical. If \code{TRUE} (default) and a single message is +provided, the function returns the decoded list directly instead of a +length-one list. Ignored when \code{as_data_frame = TRUE}.} + +\item{as_data_frame}{Logical. If \code{TRUE}, return a \code{data.frame} with one row +per message and commonly-used decoded fields as columns. Missing or +unparsed fields are filled with \code{NA}. Default is \code{FALSE}.} +} +\value{ +When \code{as_data_frame = FALSE} (default): a list of decoded SYNOP +messages, or the decoded list directly when \code{simplify = TRUE} and a single +message is supplied. When \code{as_data_frame = TRUE}: a \code{data.frame} with one +row per message and the following columns (all numeric/character as +appropriate, \code{NA} when not present in the message): +\code{station_type}, \code{station_id}, \code{region}, \code{obs_day}, \code{obs_hour}, +\code{wind_unit}, \code{wind_estimated}, \code{visibility}, \code{cloud_cover}, +\code{wind_direction}, \code{wind_speed}, \code{air_temperature}, \code{dewpoint_temperature}, +\code{station_pressure}, \code{sea_level_pressure}, \code{pressure_tendency}, +\code{pressure_change}, \code{precipitation_amount}, \code{precipitation_time}, +\code{cloud_base_min}, \code{cloud_base_max}, \code{low_cloud_type}, +\code{middle_cloud_type}, \code{high_cloud_type}, \code{low_cloud_amount}, +\code{maximum_temperature} (Section 3 daily maximum, °C), +\code{minimum_temperature} (Section 3 daily minimum, °C), +\code{gust} (highest gust speed from Section 3 group 910ff/911ff, in the wind unit of the message), +\code{cloudiness_height} (cloud cover in oktas of the highest cloud layer reported in Section 3, +i.e. cirrus/cirrocumulus/cirrostratus; \code{NA} when absent), +\code{sunshine_duration} (daily sunshine in hours, from Section 3 group 55SSS), +\code{snow_depth} (total snow depth in cm; 0 for trace amounts, \code{NA} for non-continuous cover or +unmeasurable depth), \code{snow_depth_state} (descriptive state of ground with snow/ice per WMO +code table 0975, e.g. \code{"Even layer of loose dry snow covering ground completely"}), +\code{source} (the original SYNOP message string). +Row names are sequential integers. +} +\description{ +This function decodes SYNOP FM-12 meteorological messages which are commonly +used for reporting weather observations. +It parses one or more SYNOP messages and +returns their structured representation as generated by the \code{SYNOP} R6 +decoder. +} +\details{ +Currently, the decoder contains most of the core logic for parsing the main +sections of SYNOP messages that are commonly used in atmospheric sciences. +However, it does not yet cover all possible SYNOP groups and fields, +and some fields may be missing or incomplete. +} +\examples{ +synop_code = "AAXX 01004 88889 12782 61506 10094 20047 30111 40197 53007 60001 81541" +parser(synop_code) +parser(rep(synop_code, 2), simplify = FALSE) +parser(synop_code, as_data_frame = TRUE) +parser(rep(synop_code, 2), as_data_frame = TRUE) +} diff --git a/tests/testthat/test-compute_relative_humidity.R b/tests/testthat/test-compute_relative_humidity.R new file mode 100644 index 00000000..47990b7b --- /dev/null +++ b/tests/testthat/test-compute_relative_humidity.R @@ -0,0 +1,36 @@ +test_that("compute_relative_humidity returns 100% when t2m equals dpt2m", { + expect_equal(compute_relative_humidity(0, 0), 100) + expect_equal(compute_relative_humidity(20, 20), 100) + expect_equal(compute_relative_humidity(-10, -10), 100) +}) + +test_that("compute_relative_humidity returns plausible value for known inputs", { + rh = compute_relative_humidity(t2m = 20, dpt2m = 10) + expect_gt(rh, 50) + expect_lt(rh, 55) +}) + +test_that("compute_relative_humidity is vectorised", { + rh = compute_relative_humidity(t2m = c(20, 0), dpt2m = c(20, 0)) + expect_equal(rh, c(100, 100)) +}) + +test_that("compute_relative_humidity propagates NA", { + expect_true(is.na(compute_relative_humidity(NA_real_, 10))) + expect_true(is.na(compute_relative_humidity(20, NA_real_))) +}) + +test_that("compute_relative_humidity stops on non-numeric input", { + expect_error(compute_relative_humidity("20", 10)) + expect_error(compute_relative_humidity(20, "10")) +}) + +test_that("compute_relative_humidity stops on mismatched lengths", { + expect_error(compute_relative_humidity(c(20, 15), 10)) +}) + +test_that("compute_relative_humidity decreases as dpt2m decreases from t2m", { + rh_high = compute_relative_humidity(20, 18) + rh_low = compute_relative_humidity(20, 5) + expect_gt(rh_high, rh_low) +}) diff --git a/tests/testthat/test-hydro_imgw.R b/tests/testthat/test-hydro_imgw.R index 2892d468..429bef62 100644 --- a/tests/testthat/test-hydro_imgw.R +++ b/tests/testthat/test-hydro_imgw.R @@ -1,5 +1,5 @@ context("hydro_imgw") -y <- 2017 +y = 2017 test_that("hydro_imgw_not_available", { diff --git a/tests/testthat/test-hydro_metadata_imgw.R b/tests/testthat/test-hydro_metadata_imgw.R index 62982d40..ef2319ed 100644 --- a/tests/testthat/test-hydro_metadata_imgw.R +++ b/tests/testthat/test-hydro_metadata_imgw.R @@ -1,7 +1,7 @@ context("hydro-metadata") -h_d <- suppressWarnings(hydro_metadata_imgw("daily")) -h_m <- suppressWarnings(hydro_metadata_imgw("monthly")) +h_d = suppressWarnings(hydro_metadata_imgw("daily")) +h_m = suppressWarnings(hydro_metadata_imgw("monthly")) test_that("hydro-metadata works!", { if (is.list(h_d)) { diff --git a/tests/testthat/test-meteo_imgw.R b/tests/testthat/test-meteo_imgw.R index ba1940a3..ffb0fb0d 100644 --- a/tests/testthat/test-meteo_imgw.R +++ b/tests/testthat/test-meteo_imgw.R @@ -1,5 +1,5 @@ context("meteo_imgw") -y <- 2018 +y = 2018 test_that("meteo_imgw works!", { @@ -7,20 +7,20 @@ test_that("meteo_imgw works!", { message("No internet connection! \n") return(invisible(NULL)) } else { - x <- meteo_imgw("hourly", "synop", year = y) - x <- meteo_imgw("hourly", "climate", year = y) - expect_message(x <- meteo_imgw("hourly", "precip", year = y)) - x <- meteo_imgw("daily", "synop", year = y) - x <- meteo_imgw("daily", "climate", year = y) - x <- meteo_imgw("daily", "precip", year = y) - x <- meteo_imgw("monthly", "synop", year = y) - x <- meteo_imgw("monthly", "climate", year = y) - x <- meteo_imgw("monthly", "precip", year = y) - x <- meteo_imgw("monthly", "synop", year = y, status = TRUE) - x <- meteo_imgw("monthly", "synop", year = y, coords = TRUE) - x <- meteo_imgw("monthly", "synop", year = y, col_names = "full") - x <- meteo_imgw("monthly", "synop", year = y, coords = TRUE, col_names = "polish") - testthat::expect_message(x <- suppressWarnings(meteo_imgw_daily(rank = "synop", year = 2001, station = "blabla"))) + x = meteo_imgw("hourly", "synop", year = y) + x = meteo_imgw("hourly", "climate", year = y) + expect_message(meteo_imgw("hourly", "precip", year = y)) + x = meteo_imgw("daily", "synop", year = y) + x = meteo_imgw("daily", "climate", year = y) + x = meteo_imgw("daily", "precip", year = y) + x = meteo_imgw("monthly", "synop", year = y) + x = meteo_imgw("monthly", "climate", year = y) + x = meteo_imgw("monthly", "precip", year = y) + x = meteo_imgw("monthly", "synop", year = y, status = TRUE) + x = meteo_imgw("monthly", "synop", year = y, coords = TRUE) + x = meteo_imgw("monthly", "synop", year = y, col_names = "full") + x = meteo_imgw("monthly", "synop", year = y, coords = TRUE, col_names = "polish") + expect_message(suppressWarnings(meteo_imgw_daily(rank = "synop", year = 2001, station = "blabla"))) leszno = meteo_imgw(interval = "monthly", rank = "synop", year = 2020:2021, station = "LESZNO") testthat::expect_equal(nrow(leszno), 24) } diff --git a/tests/testthat/test-meteo_ogimet.R b/tests/testthat/test-meteo_ogimet.R index 2e95b70b..53669be3 100644 --- a/tests/testthat/test-meteo_ogimet.R +++ b/tests/testthat/test-meteo_ogimet.R @@ -11,19 +11,21 @@ test_that("meteo_ogimet works!", { expect_true(any(colnames(df) %in% c("Lon", "Lat"))) } - # expected at least 100 rows in hourly dataset: + # expected at least 100 rows in hourly HTML dataset (explicit source = "html"): Sys.sleep(20) - x = meteo_ogimet(interval = "hourly", date = c("2019-06-01", "2019-06-08"), - station = c(12330), coords = TRUE) + x = meteo_ogimet(interval = "hourly", source = "html", + date = c("2019-06-01", "2019-06-08"), + station = c(12330), coords = TRUE) if (is.data.frame(x)) { testthat::expect_true(nrow(x) > 100) } - # check if January is going to be downloaded not other dates are downloaded by accident: + # check if January is going to be downloaded (HTML path keeps $Date column): Sys.sleep(20) - y = meteo_ogimet(interval = "hourly", date = c("2019-01-01", "2019-01-05"), - station = 12120, coords = FALSE) + y = meteo_ogimet(interval = "hourly", source = "html", + date = c("2019-01-01", "2019-01-05"), + station = 12120, coords = FALSE) if (is.data.frame(y)) { testthat::expect_equal(unique(format(y$Date, "%Y")), "2019") @@ -32,15 +34,16 @@ test_that("meteo_ogimet works!", { # check error for non existing station or problem with downloading any reasonable data: - # wrong station ID: + # wrong station ID (HTML path emits a warning): Sys.sleep(20) - expect_warning(meteo_ogimet(interval = "hourly", date = c("2019-01-01", "2019-01-05"), - station = 999999, coords = FALSE, allow_failure = FALSE)) - # no date: + expect_warning(meteo_ogimet(interval = "hourly", source = "html", + date = c("2019-01-01", "2019-01-05"), + station = 999999, coords = FALSE, allow_failure = FALSE)) + # no date (HTML path): Sys.sleep(20) - expect_message(meteo_ogimet(interval = "hourly", - date = c(NA, NA), - station = 12120, coords = FALSE, allow_failure = TRUE)) + expect_message(meteo_ogimet(interval = "hourly", source = "html", + date = c(NA, NA), + station = 12120, coords = FALSE, allow_failure = TRUE)) # no values for the selected station Sys.sleep(20) @@ -53,10 +56,8 @@ test_that("meteo_ogimet works!", { # expect_equal(nrow(meteo_ogimet(station = "64556", interval = "daily", # date = c("2025-09-26", "2025-09-26"))), 1) # - # no interval provided: - expect_error(meteo_ogimet(station = "06683", - date = c("2020-02-01", "2020-02-01"), - coords = FALSE, allow_failure = TRUE)) + # no interval provided: now defaults to "hourly", so no error expected + # (test removed - interval has a default value of "hourly") # split works only for daily: Sys.sleep(20) @@ -101,3 +102,112 @@ test_that("meteo_ogimet works!", { } } }) + +# ── New unified interface tests ─────────────────────────────────────────────── + +test_that("meteo_ogimet hourly defaults to SYNOP and returns expected columns", { + if (!curl::has_internet()) return(invisible(NULL)) + skip_on_cran() + + result = meteo_ogimet(interval = "hourly", + station = 12330, + date = c("2009-12-01", "2009-12-04")) + if (is.null(result)) return(invisible(NULL)) + + expect_s3_class(result, "data.frame") + expect_true(all(c("date", "station", "t2m", "ws", "Nt") %in% names(result))) + expect_s3_class(result$date, "POSIXct") + expect_equal(attr(result$date, "tzone"), "UTC") + expect_true(nrow(result) > 0) +}) + +test_that("meteo_ogimet hourly SYNOP date column is clipped to requested range", { + if (!curl::has_internet()) return(invisible(NULL)) + skip_on_cran() + + result = meteo_ogimet(interval = "hourly", + station = 12330, + date = c("2009-12-01", "2009-12-04")) + if (is.null(result) || nrow(result) == 0) return(invisible(NULL)) + + expect_true(all(as.Date(result$date) >= as.Date("2009-12-01"))) + expect_true(all(as.Date(result$date) <= as.Date("2009-12-04"))) +}) + +test_that("meteo_ogimet warns when coords = TRUE used with SYNOP (hourly default)", { + expect_warning( + suppressMessages( + meteo_ogimet(interval = "hourly", station = 12330, + date = c("2009-12-01", "2009-12-01"), + coords = TRUE, allow_failure = TRUE) + ), + "coords" + ) +}) + +test_that("meteo_ogimet warns when precip_split = FALSE used with SYNOP", { + expect_warning( + suppressMessages( + meteo_ogimet(interval = "hourly", station = 12330, + date = c("2009-12-01", "2009-12-01"), + precip_split = FALSE, allow_failure = TRUE) + ), + "precip_split" + ) +}) + +test_that("meteo_ogimet warns when return_list = TRUE used with source = 'html'", { + expect_warning( + suppressMessages( + meteo_ogimet(interval = "daily", station = 12330, + date = c("2019-06-01", "2019-06-01"), + source = "html", return_list = TRUE, allow_failure = TRUE) + ), + "return_list" + ) +}) + +test_that("meteo_ogimet return_list = TRUE gives a named list with data and full", { + if (!curl::has_internet()) return(invisible(NULL)) + skip_on_cran() + + result = meteo_ogimet(interval = "hourly", + station = 12330, + date = c("2009-12-01", "2009-12-04"), + return_list = TRUE) + if (is.null(result)) return(invisible(NULL)) + + expect_type(result, "list") + expect_true(all(c("data", "full") %in% names(result))) + expect_s3_class(result$data, "data.frame") + expect_s3_class(result$full, "data.frame") + expect_true(all(c("date", "t2m") %in% names(result$data))) + expect_true("air_temperature" %in% names(result$full)) + expect_true(ncol(result$full) > ncol(result$data)) +}) + +test_that("meteo_ogimet source = 'synop' works for daily interval", { + if (!curl::has_internet()) return(invisible(NULL)) + skip_on_cran() + + result = meteo_ogimet(interval = "daily", + station = 12330, + date = c("2009-12-01", "2009-12-04"), + source = "synop") + if (is.null(result)) return(invisible(NULL)) + + expect_s3_class(result, "data.frame") + expect_true("date" %in% names(result)) + expect_true(nrow(result) > 0) +}) + +test_that("meteo_ogimet_synop emits a deprecation warning", { + expect_warning( + suppressMessages( + meteo_ogimet_synop(station = 12330, + date = c("2009-12-01", "2009-12-01"), + allow_failure = TRUE) + ), + "deprecated" + ) +}) diff --git a/tests/testthat/test-meteo_ogimet_synop.R b/tests/testthat/test-meteo_ogimet_synop.R new file mode 100644 index 00000000..f424d2c8 --- /dev/null +++ b/tests/testthat/test-meteo_ogimet_synop.R @@ -0,0 +1,192 @@ +test_that("meteo_ogimet_synop stops when neither station nor country_name is given", { + expect_error( + meteo_ogimet_synop(allow_failure = FALSE), + "station.*country_name" + ) +}) + +test_that(".ogimet_synop_raw_lines splits and recurses when server limit is reached", { + # Stub the HTTP layer so no network call is made. + # First call returns exactly 200 000 fake lines (server cap hit -> split). + # Recursive halves each return 5 lines (within limit). + call_count = 0L + line_tmpl = "12330,2009,12,01,00,00,AAXX 01004 12330 ///// /////" + fake_full = paste(rep(line_tmpl, 200000L), collapse = "\n") + fake_half = paste(rep(line_tmpl, 5L), collapse = "\n") + + with_mocked_bindings( + GET = function(url, ...) { + call_count <<- call_count + 1L + structure( + list(status_code = 200L, + content = if (call_count == 1L) fake_full else fake_half), + class = "response" + ) + }, + http_error = function(resp, ...) FALSE, + content = function(resp, ...) resp$content, + .package = "httr", + { + result = climate:::.ogimet_synop_raw_lines( + url_tmpl = "http://example.com/getsynop?begin=%s&end=%s", + begin_date = as.Date("2009-12-01"), + end_date = as.Date("2009-12-04"), + label = "test" + ) + # First call hit the limit -> two recursive calls -> 5 + 5 = 10 lines + expect_equal(length(result), 10L) + expect_equal(call_count, 3L) + } + ) +}) + +test_that("meteo_ogimet_synop warns when both station and country_name are given", { + if (!curl::has_internet()) return(invisible(NULL)) + skip_on_cran() + + # only check the warning; the download itself may or may not succeed + expect_warning( + meteo_ogimet_synop(station = 12330, + country_name = "Poland", + date = c("2009-12-15", "2009-12-15"), + allow_failure = TRUE), + "`station` is ignored" + ) +}) + +test_that("meteo_ogimet_synop station mode returns a data.frame with expected columns", { + if (!curl::has_internet()) return(invisible(NULL)) + skip_on_cran() + + result = meteo_ogimet_synop(station = 12330, + date = c("2009-12-01", "2009-12-04"), + simplified = FALSE) + + if (is.null(result)) return(invisible(NULL)) + + expect_s3_class(result, "data.frame") + expect_true("station_id" %in% names(result)) + expect_true("Date" %in% names(result)) + expect_true("air_temperature" %in% names(result)) + expect_true("wind_speed" %in% names(result)) + expect_true(nrow(result) > 0) +}) + +test_that("meteo_ogimet_synop simplified station mode returns expected columns", { + if (!curl::has_internet()) return(invisible(NULL)) + skip_on_cran() + + result = meteo_ogimet_synop(station = 12330, + date = c("2009-12-01", "2009-12-04")) + + if (is.null(result)) return(invisible(NULL)) + + expect_s3_class(result, "data.frame") + expect_true(all(c("date", "station", "t2m", "ws", "Nt") %in% names(result))) + expect_true(nrow(result) > 0) +}) + +test_that("meteo_ogimet_synop station mode Date column is POSIXct UTC", { + if (!curl::has_internet()) return(invisible(NULL)) + skip_on_cran() + + result = meteo_ogimet_synop(station = 12330, + date = c("2009-12-01", "2009-12-04")) + + if (is.null(result) || nrow(result) == 0) return(invisible(NULL)) + + expect_s3_class(result$date, "POSIXct") + expect_equal(attr(result$date, "tzone"), "UTC") +}) + +test_that("meteo_ogimet_synop station mode clips to requested date range", { + if (!curl::has_internet()) return(invisible(NULL)) + skip_on_cran() + + result = meteo_ogimet_synop(station = 12330, + date = c("2009-12-01", "2009-12-04")) + + if (is.null(result) || nrow(result) == 0) return(invisible(NULL)) + + expect_true(all(as.Date(result$date) >= as.Date("2009-12-01"))) + expect_true(all(as.Date(result$date) <= as.Date("2009-12-04"))) +}) + +test_that("meteo_ogimet_synop station mode handles allow_failure gracefully", { + if (!curl::has_internet()) return(invisible(NULL)) + skip_on_cran() + + expect_no_error( + meteo_ogimet_synop(station = 9999999, date = c("2009-12-01", "2009-12-02"), + allow_failure = TRUE) + ) +}) + +test_that("meteo_ogimet_synop station mode source column contains SYNOP strings", { + if (!curl::has_internet()) return(invisible(NULL)) + skip_on_cran() + + result = meteo_ogimet_synop(station = 12330, + date = c("2009-12-01", "2009-12-04"), + simplified = FALSE) + + if (is.null(result) || nrow(result) == 0) return(invisible(NULL)) + + expect_true("source" %in% names(result)) + expect_true(all(nzchar(result$source[!is.na(result$source)]))) +}) + +test_that("meteo_ogimet_synop country mode returns a data.frame for one day", { + if (!curl::has_internet()) return(invisible(NULL)) + skip_on_cran() + + result = meteo_ogimet_synop(country_name = "Poland", + date = c("2009-12-15", "2009-12-15"), + simplified = FALSE) + + if (is.null(result)) return(invisible(NULL)) + + expect_s3_class(result, "data.frame") + expect_true("station_id" %in% names(result)) + expect_true("Date" %in% names(result)) + expect_true("air_temperature" %in% names(result)) + # Poland has many SYNOP stations; expect multiple rows + expect_true(nrow(result) > 1) +}) + +test_that("meteo_ogimet_synop country mode Date column is POSIXct UTC", { + if (!curl::has_internet()) return(invisible(NULL)) + skip_on_cran() + + result = meteo_ogimet_synop(country_name = "Poland", + date = c("2009-12-15", "2009-12-15")) + + if (is.null(result) || nrow(result) == 0) return(invisible(NULL)) + + expect_s3_class(result$date, "POSIXct") + expect_equal(attr(result$date, "tzone"), "UTC") +}) + +test_that("meteo_ogimet_synop country mode clips to the requested date", { + if (!curl::has_internet()) return(invisible(NULL)) + skip_on_cran() + + result = meteo_ogimet_synop(country_name = "Poland", + date = c("2009-12-15", "2009-12-15")) + + if (is.null(result) || nrow(result) == 0) return(invisible(NULL)) + + expect_true(all(as.Date(result$date) == as.Date("2009-12-15"))) +}) + + +# ── Nt / Nh column mapping ──────────────────────────────────────────────────── + +test_that("parser Nt=cloud_cover and Nh=low_cloud_amount are decoded correctly", { + # 71703 -> Nddff: N=7 (total cloud cover 7 oktas) + # 85232 -> 8NhCLCMCH: Nh=5 (low cloud cover 5 oktas), CL=2 (Sc), CM=3, CH=2 + msg = "AAXX 15151 12120 42461 71703 11013 21016 30184 40192 58006 85232=" + row = parser(msg, as_data_frame = TRUE) + expect_equal(row$cloud_cover, 7) # Nt source + expect_equal(row$low_cloud_amount, 5) # Nh source +}) diff --git a/tests/testthat/test-nearest_stations_ogimet.R b/tests/testthat/test-nearest_stations_ogimet.R index 5da2deee..557ecebd 100644 --- a/tests/testthat/test-nearest_stations_ogimet.R +++ b/tests/testthat/test-nearest_stations_ogimet.R @@ -1,15 +1,15 @@ -context("meteo_imgw") +context("nearest_station_ogimet") test_that("nearest_stations_ogimet works!", { - x <- nearest_stations_ogimet(country = "United Kingdom", point = c(-10, -50), add_map = TRUE, no_of_stations = 10) + x = nearest_stations_ogimet(country = "United Kingdom", point = c(-10, -50), add_map = TRUE, no_of_stations = 10) if (is.data.frame(x) && ncol(x) > 5) { testthat::expect_equal(nrow(x), 10) } Sys.sleep(21) - x <- nearest_stations_ogimet(country = "Poland", point = c(10, 50), add_map = TRUE, no_of_stations = 10) + x = nearest_stations_ogimet(country = "Poland", point = c(10, 50), add_map = TRUE, no_of_stations = 10) if (is.data.frame(x) && ncol(x) > 5) { testthat::expect_equal(nrow(x), 10) @@ -22,9 +22,9 @@ test_that("nearest_stations_ogimet works!", { # allow_failure = FALSE, # no_of_stations = 10)) - x <- nearest_stations_ogimet(country = c("United Kingdom", "Poland"), point = c(0, 0), add_map = TRUE, no_of_stations = 150) + x = nearest_stations_ogimet(country = c("United Kingdom", "Poland"), point = c(0, 0), add_map = TRUE, no_of_stations = 150) if (is.data.frame(x) && ncol(x) > 5) { - expect_true(mean(x$distance) > 5000) + expect_true(mean(x$distance) > 5000) } }) diff --git a/tests/testthat/test-parser.R b/tests/testthat/test-parser.R new file mode 100644 index 00000000..184988fb --- /dev/null +++ b/tests/testthat/test-parser.R @@ -0,0 +1,980 @@ +## Reference SYNOP message used across multiple tests: +## AAXX 01004 88889 12782 61506 10094 20047 30111 40197 53007 60001 81541 +## Decoded (reference values verified interactively): +## station_type = "AAXX" +## obs_time = day 1, hour 0 +## wind_indicator= 4 (KT, anemometer) +## station_id = "88889" (region III) +## cloud_cover = 6 okta +## visibility = 40000 m +## wind dir = 150 deg, speed = 6 kt +## air_temp = 9.4 Celsius, dewpoint = 4.7 Celsius +## station_pres = 1011.1 hPa +## sea_lvl_pres = 1019.7 hPa +## precip_s1 = 0 mm + +SYNOP_MSG = "AAXX 01004 88889 12782 61506 10094 20047 30111 40197 53007 60001 81541" + +# ── input validation ────────────────────────────────────────────────────────── + +test_that("parser stops on missing message", { + expect_error(parser(), "`message` must contain at least one SYNOP string.") +}) + +test_that("parser stops on zero-length character vector", { + expect_error(parser(character(0)), "`message` must contain at least one SYNOP string.") +}) + +test_that("parser stops on non-character input", { + expect_error(parser(12345), "`message` must be a character vector.") +}) + +test_that("parser emits message and returns NULL for empty string", { + expect_message(parser(""), "Empty SYNOP message supplied") +}) + +test_that("parser stops when country length mismatches message length", { + expect_error( + parser(SYNOP_MSG, country = c("RU", "PL")), + "`country` must be NULL" + ) +}) + +# ── return-type behaviour ───────────────────────────────────────────────────── + +test_that("parser returns a list for a single message (simplify = TRUE)", { + result = parser(SYNOP_MSG) + expect_type(result, "list") +}) + +test_that("parser with simplify = FALSE wraps single message in a list of length 1", { + result = parser(SYNOP_MSG, simplify = FALSE) + expect_type(result, "list") + expect_length(result, 1) + expect_type(result[[1]], "list") +}) + +test_that("parser returns a list of n elements for n messages", { + result = parser(rep(SYNOP_MSG, 3), simplify = FALSE) + expect_type(result, "list") + expect_length(result, 3) +}) + +test_that("parser simplify = FALSE and TRUE are consistent for single message", { + r_simplified = parser(SYNOP_MSG, simplify = TRUE) + r_wrapped = parser(SYNOP_MSG, simplify = FALSE) + expect_identical(r_simplified, r_wrapped[[1]]) +}) + +# ── top-level field presence ────────────────────────────────────────────────── + +test_that("parsed result contains expected top-level fields", { + result = parser(SYNOP_MSG) + expected_fields = c( + "station_type", "obs_time", "wind_indicator", "station_id", + "precipitation_indicator", "weather_indicator", + "visibility", "cloud_cover", "surface_wind", + "air_temperature", "dewpoint_temperature", + "station_pressure", "sea_level_pressure" + ) + for (field in expected_fields) { + expect_true(field %in% names(result), info = paste("missing field:", field)) + } +}) + +# ── decoded values ──────────────────────────────────────────────────────────── + +test_that("parser decodes station type correctly", { + result = parser(SYNOP_MSG) + expect_equal(result$station_type$value, "AAXX") +}) + +test_that("parser decodes station ID correctly", { + result = parser(SYNOP_MSG) + expect_equal(result$station_id$value, "88889") +}) + +test_that("parser decodes observation time correctly", { + result = parser(SYNOP_MSG) + expect_equal(result$obs_time$day$value, 1) + expect_equal(result$obs_time$hour$value, 0) +}) + +test_that("parser decodes wind indicator correctly", { + result = parser(SYNOP_MSG) + expect_equal(result$wind_indicator$unit, "KT") + expect_false(result$wind_indicator$estimated) +}) + +test_that("parser decodes cloud cover correctly", { + result = parser(SYNOP_MSG) + expect_equal(result$cloud_cover$value, 6) + expect_equal(result$cloud_cover$unit, "okta") +}) + +test_that("parser decodes visibility correctly", { + result = parser(SYNOP_MSG) + expect_equal(result$visibility$value, 40000) + expect_equal(result$visibility$unit, "m") +}) + +test_that("parser decodes surface wind correctly", { + result = parser(SYNOP_MSG) + expect_equal(result$surface_wind$direction$value, 150) + expect_equal(result$surface_wind$speed$value, 6) +}) + +test_that("parser decodes air temperature correctly", { + result = parser(SYNOP_MSG) + expect_equal(result$air_temperature$value, 9.4, tolerance = 1e-6) + expect_equal(result$air_temperature$unit, "Celsius") +}) + +test_that("parser decodes dewpoint temperature correctly", { + result = parser(SYNOP_MSG) + expect_equal(result$dewpoint_temperature$value, 4.7, tolerance = 1e-6) +}) + +test_that("parser decodes station pressure correctly", { + result = parser(SYNOP_MSG) + expect_equal(result$station_pressure$value, 1011.1, tolerance = 0.05) + expect_equal(result$station_pressure$unit, "hPa") +}) + +test_that("parser decodes sea-level pressure correctly", { + result = parser(SYNOP_MSG) + expect_equal(result$sea_level_pressure$value, 1019.7, tolerance = 0.05) + expect_equal(result$sea_level_pressure$unit, "hPa") +}) + +test_that("parser decodes section-1 precipitation amount correctly", { + result = parser(SYNOP_MSG) + expect_equal(result$precipitation_s1$amount$value, 0) +}) + +# ── country parameter ───────────────────────────────────────────────────────── + +test_that("parser accepts a valid single-value country argument", { + result = parser(SYNOP_MSG, country = "RU") + expect_type(result, "list") + expect_true("station_type" %in% names(result)) +}) + +test_that("parser accepts country vector matching message length", { + result = parser(rep(SYNOP_MSG, 2), country = c("RU", "PL"), simplify = FALSE) + expect_length(result, 2) +}) + +# ── whitespace handling ─────────────────────────────────────────────────────── + +test_that("parser trims leading/trailing whitespace from messages", { + padded = paste0(" ", SYNOP_MSG, " ") + result = parser(padded) + expect_equal(result$station_id$value, "88889") +}) + +# ── multiple messages consistency ───────────────────────────────────────────── + +test_that("each element of a multi-message result matches the single-message result", { + single = parser(SYNOP_MSG) + multi = parser(rep(SYNOP_MSG, 2), simplify = FALSE) + expect_identical(multi[[1]], single) + expect_identical(multi[[2]], single) +}) + +# ── as_data_frame ───────────────────────────────────────────────────────────── + +test_that("as_data_frame = TRUE returns a data.frame for a single message", { + df = parser(SYNOP_MSG, as_data_frame = TRUE) + expect_s3_class(df, "data.frame") + expect_equal(nrow(df), 1L) +}) + +test_that("as_data_frame = TRUE returns n rows for n messages", { + df = parser(rep(SYNOP_MSG, 3), as_data_frame = TRUE) + expect_s3_class(df, "data.frame") + expect_equal(nrow(df), 3L) +}) + +test_that("as_data_frame result has expected column names", { + expected_cols = c( + "station_type", "station_id", "region", "obs_day", "obs_hour", + "wind_unit", "wind_estimated", "visibility", "cloud_cover", + "wind_direction", "wind_speed", "air_temperature", "dewpoint_temperature", + "station_pressure", "sea_level_pressure", "pressure_tendency", + "pressure_change", "precipitation_amount", "precipitation_time", + "cloud_base_min", "cloud_base_max", "low_cloud_type", + "middle_cloud_type", "high_cloud_type", "low_cloud_amount", "source" + ) + df = parser(SYNOP_MSG, as_data_frame = TRUE) + expect_true(all(expected_cols %in% names(df))) + expect_equal(tail(names(df), 1), "source") +}) + +test_that("as_data_frame result contains correct decoded values", { + df = parser(SYNOP_MSG, as_data_frame = TRUE) + expect_equal(df$station_type, "AAXX") + expect_equal(df$station_id, "88889") + expect_equal(df$region, "III") + expect_equal(df$obs_day, 1L) + expect_equal(df$obs_hour, 0L) + expect_equal(df$wind_unit, "KT") + expect_equal(df$wind_direction, 150) + expect_equal(df$wind_speed, 6L) + expect_equal(df$air_temperature, 9.4) + expect_equal(df$dewpoint_temperature, 4.7) + expect_equal(df$station_pressure, 1011.1, tolerance = 0.01) + expect_equal(df$sea_level_pressure, 1019.7, tolerance = 0.01) + expect_equal(df$cloud_cover, 6L) + expect_equal(df$visibility, 40000) + expect_equal(df$precipitation_amount, 0L) + expect_equal(df$source, SYNOP_MSG) +}) + +test_that("multi-row as_data_frame result is consistent across rows", { + df = parser(rep(SYNOP_MSG, 2), as_data_frame = TRUE) + expect_equal(rownames(df), c("1", "2")) + row1 = df[1, ] + row2 = df[2, ] + rownames(row1) = NULL + rownames(row2) = NULL + expect_identical(row1, row2) +}) + +test_that("as_data_frame row for NULL result contains all-NA numeric columns", { + df = suppressWarnings(parser("", as_data_frame = TRUE)) + numeric_cols = c("obs_day", "obs_hour", "visibility", "cloud_cover", + "wind_direction", "wind_speed", "air_temperature", + "station_pressure", "sea_level_pressure") + for (col in numeric_cols) { + expect_true(is.na(df[[col]]), label = paste("column", col, "is NA")) + } +}) + +test_that("as_data_frame has integer rownames (not SYNOP strings)", { + msgs = c(SYNOP_MSG, SYNOP_MSG) + df = parser(msgs, as_data_frame = TRUE) + expect_equal(rownames(df), c("1", "2")) +}) + +test_that("as_data_frame source column contains the original message strings", { + msgs = c(SYNOP_MSG, SYNOP_MSG) + df = parser(msgs, as_data_frame = TRUE) + expect_equal(df$source, msgs) +}) + +test_that("simplify is ignored when as_data_frame = TRUE", { + df_default = parser(SYNOP_MSG, as_data_frame = TRUE) + df_nosimply = parser(SYNOP_MSG, as_data_frame = TRUE, simplify = FALSE) + expect_s3_class(df_default, "data.frame") + expect_s3_class(df_nosimply, "data.frame") + expect_equal(nrow(df_default), 1L) + expect_equal(nrow(df_nosimply), 1L) +}) + +# ── SYNOP message variants ───────────────────────────────────────────────────── + +test_that("NIL station returns NA for all observation fields", { + result = parser("AAXX 01004 88889 NIL") + expect_true(is.na(result$visibility)) + expect_true(is.na(result$cloud_cover)) + expect_true(is.na(result$air_temperature)) + expect_true(is.na(result$precipitation_s1)) + expect_true(is.na(result$present_weather)) +}) + +test_that("relative humidity group (sn=9) is decoded", { + result = parser("AAXX 01004 88889 12782 61506 10094 29067") + expect_false(is.null(result$relative_humidity)) + expect_equal(result$relative_humidity$value, 67L) + expect_equal(result$relative_humidity$unit, "%") +}) + +test_that("weather group 7 (present and past weather) is decoded", { + # 71023: ww=10, W1=2, W2=3 + result = parser("AAXX 01004 88889 12782 61506 10094 71023") + expect_false(is.null(result$present_weather)) + expect_equal(result$present_weather$value, 10L) + expect_false(is.null(result$past_weather)) + expect_equal(length(result$past_weather), 2L) +}) + +test_that("section 3 maximum and minimum temperature are decoded", { + msg = paste(SYNOP_MSG, "333 10025 20012") + result = parser(msg) + expect_false(is.null(result$maximum_temperature)) + expect_equal(result$maximum_temperature$value, 2.5) + expect_false(is.null(result$minimum_temperature)) + expect_equal(result$minimum_temperature$value, 1.2) +}) + +test_that("section 3 sunshine (55SSS) is decoded", { + msg = paste(SYNOP_MSG, "333 55060") + result = parser(msg) + expect_false(is.null(result$sunshine)) + expect_equal(result$sunshine$value, 6.0) + expect_equal(result$sunshine$unit, "h") +}) + +test_that("section 3 cloud layer (8NChh) is decoded", { + msg = paste(SYNOP_MSG, "333 81656") + result = parser(msg) + expect_false(is.null(result$cloud_layer)) + expect_equal(length(result$cloud_layer), 1L) + expect_equal(result$cloud_layer[[1]]$cloud_genus$value, "Sc") +}) + +test_that("section 3 highest gust 910ff with 10-min period is decoded", { + msg = paste(SYNOP_MSG, "333 91020") + result = parser(msg) + expect_false(is.null(result$highest_gust)) + expect_equal(result$highest_gust[[1]]$speed$value, 20L) + expect_equal(result$highest_gust[[1]]$measure_period$value, 10) +}) + +test_that("section 3 highest gust 911ff followed by 915dd is decoded", { + msg = paste(SYNOP_MSG, "333 91120 91518") + result = parser(msg) + expect_false(is.null(result$highest_gust)) + gust = result$highest_gust[[1]] + expect_equal(gust$speed$value, 20L) + expect_equal(gust$direction$value, 180) +}) + +test_that("section 3 highest gust 911ff without direction group is decoded", { + msg = paste(SYNOP_MSG, "333 91120") + result = parser(msg) + expect_false(is.null(result$highest_gust)) + expect_equal(result$highest_gust[[1]]$speed$value, 20L) +}) + +test_that("section 3 unrecognised j2 in group 9 is skipped gracefully", { + # 91220 has j2=2, which is neither 0 nor 1 + msg = paste(SYNOP_MSG, "333 91220") + result = parser(msg) + # No crash; highest_gust should be absent or empty + expect_true(is.null(result$highest_gust) || length(result$highest_gust) == 0) +}) + +test_that("visibility VV=00 gives isLess 100m", { + result = parser("AAXX 01004 88889 12700 61506 10094") + expect_equal(result$visibility$value, 100) + expect_equal(result$visibility$quantifier, "isLess") +}) + +test_that("visibility VV=25 gives 2500m", { + result = parser("AAXX 01004 88889 12725 61506 10094") + expect_equal(result$visibility$value, 2500) +}) + +test_that("visibility VV=60 gives 10000m", { + result = parser("AAXX 01004 88889 12760 61506 10094") + expect_equal(result$visibility$value, 10000) +}) + +test_that("visibility VV=89 gives isGreater 70000m", { + result = parser("AAXX 01004 88889 12789 61506 10094") + expect_equal(result$visibility$value, 70000) + expect_equal(result$visibility$quantifier, "isGreater") +}) + +test_that("visibility VV=90 gives isLess 50m and use90=TRUE", { + result = parser("AAXX 01004 88889 12790 61506 10094") + expect_equal(result$visibility$value, 50) + expect_equal(result$visibility$quantifier, "isLess") + expect_true(result$visibility$use90) +}) + +test_that("visibility VV=91 gives 50m with use90=TRUE", { + result = parser("AAXX 01004 88889 12791 61506 10094") + expect_equal(result$visibility$value, 50) + expect_true(result$visibility$use90) +}) + +test_that("visibility VV=99 gives isGreaterOrEqual 50000m", { + result = parser("AAXX 01004 88889 12799 61506 10094") + expect_equal(result$visibility$value, 50000) + expect_equal(result$visibility$quantifier, "isGreaterOrEqual") +}) + +test_that("invalid visibility code 51-55 emits a message", { + expect_message(result <- parser("AAXX 01004 88889 12753 61506 10094")) + expect_null(result$visibility) +}) + +test_that("precipitation code 989 gives isGreaterOrEqual", { + result = parser("AAXX 01004 88889 12782 61506 10094 20047 69891") + expect_false(is.null(result$precipitation_s1)) + expect_equal(result$precipitation_s1$amount$quantifier, "isGreaterOrEqual") +}) + +test_that("precipitation code 990 gives trace", { + result = parser("AAXX 01004 88889 12782 61506 10094 20047 69901") + expect_false(is.null(result$precipitation_s1)) + expect_true(result$precipitation_s1$amount$trace) +}) + +test_that("precipitation code 993 gives 0.3 mm", { + result = parser("AAXX 01004 88889 12782 61506 10094 20047 69931") + expect_false(is.null(result$precipitation_s1)) + expect_equal(result$precipitation_s1$amount$value, 0.3) +}) + +test_that("calm wind with nonzero speed triggers a message", { + expect_message(parser("AAXX 01004 88889 12782 60015 10094")) +}) + +test_that("wind direction dd=99 (variable, all directions) is decoded", { + result = parser("AAXX 01004 88889 12782 69906 10094") + expect_true(result$surface_wind$direction$varAllUnknown) +}) + +# ── check_valid / is_valid paths ─────────────────────────────────────────────── + +test_that("is_valid returns TRUE for unavailable (slash) value", { + expect_true(Hour$new()$is_valid("//")) +}) + +test_that("is_valid returns FALSE for out-of-range value without raising", { + expect_false(Hour$new()$is_valid("99", raise_exception = FALSE)) +}) + +test_that("is_valid returns FALSE for non-numeric value when range set", { + expect_false(Hour$new()$is_valid("XY", raise_exception = FALSE)) +}) + +# ── decode error path ────────────────────────────────────────────────────────── + +test_that("decode emits message and returns NULL on internal error", { + expect_message(result <- SignedTemperature$new()$decode("094", sign = "X")) + expect_null(result) +}) + +# ── Observation.encode paths ─────────────────────────────────────────────────── + +test_that("encode returns null chars for NULL data when no code_table", { + # SurfaceWind has no code_table; NULL data → "////" (code_len=4) + result = SurfaceWind$new()$encode(NULL) + expect_equal(result, "////") +}) + +test_that("encode calls encode_internal for NULL data when code_table present", { + # CloudCover has code_table; NULL with obscured=TRUE → "9" via CodeTable2700 + result = CloudCover$new()$encode(list(value = NULL, obscured = TRUE)) + expect_equal(result, "9") +}) + +test_that("encode emits message and returns null char when encode_internal errors", { + # CodeTable2700 stops when value=NULL and obscured=FALSE + expect_message(result <- CloudCover$new()$encode(list(value = NULL, obscured = FALSE))) + expect_equal(result, "/") +}) + +# ── Observation.encode_internal component path ───────────────────────────────── + +test_that("encode_internal handles component-based classes (ObservationTime)", { + ot = ObservationTime$new() + # Must call encode_internal directly: data has no $value key so encode() treats it as null + result = ot$encode_internal(list(day = list(value = 15L), hour = list(value = 12L))) + expect_equal(result, "1512") +}) + +test_that("encode_internal uses null chars for missing component keys", { + ot = ObservationTime$new() + result = ot$encode_internal(list()) # neither day nor hour present + expect_equal(result, "////") +}) + +# ── decode_value paths ───────────────────────────────────────────────────────── + +test_that("decode_value returns NULL for unavailable value '/'", { + result = Hour$new()$decode_value("/") + expect_null(result) +}) + +test_that("decode_value emits message and returns NULL when code_table decode fails", { + # Code "10" exceeds CodeTable0500's index range → stop → message chain + expect_message(result <- CloudGenus$new()$decode("10")) + expect_null(result) +}) + +test_that("decode_value returns NULL for non-numeric string without code_table", { + result = Hour$new()$decode_value("XY") + expect_null(result) +}) + +# ── Temperature encode ───────────────────────────────────────────────────────── + +test_that("Temperature encodes a positive value correctly", { + temp = Temperature$new() + result = temp$encode(list(value = 9.4)) + expect_equal(result, "0094") +}) + +test_that("Temperature encodes a negative value correctly", { + temp = Temperature$new() + result = temp$encode(list(value = -9.4)) + expect_equal(result, "1094") +}) + +# ── Pressure encode ──────────────────────────────────────────────────────────── + +test_that("Pressure encodes a value >= 1000 hPa correctly", { + press = Pressure$new() + result = press$encode(list(value = 1019.7)) + expect_equal(result, "0197") +}) + +test_that("Pressure encodes a value < 1000 hPa correctly", { + press = Pressure$new() + result = press$encode(list(value = 978.5)) + expect_equal(result, "9785") +}) + +# ── Visibility encode via CodeTable4377 ──────────────────────────────────────── + +test_that("Visibility encodes < 100m to code 00", { + result = Visibility$new()$encode(list(value = 50, use90 = FALSE)) + expect_equal(result, "00") +}) + +test_that("Visibility encodes 5000m (<=5000) to correct code", { + result = Visibility$new()$encode(list(value = 5000, use90 = FALSE)) + expect_equal(result, "50") +}) + +test_that("Visibility encodes 10000m (5001-30000) to correct code", { + result = Visibility$new()$encode(list(value = 10000, use90 = FALSE)) + expect_equal(result, "60") +}) + +test_that("Visibility encodes > 70000m (isGreater quantifier) to 89", { + result = Visibility$new()$encode(list(value = 100000, quantifier = "isGreater", use90 = FALSE)) + expect_equal(result, "89") +}) + +test_that("CodeTable4377 encode_internal with use90=TRUE maps 50m to code 91", { + vt = CodeTable4377$new() + result = vt$encode_internal(list(value = 50), use90 = TRUE) + expect_equal(result, "91") +}) + +test_that("CodeTable4377 encode_internal with use90=TRUE maps 200m to code 92", { + vt = CodeTable4377$new() + result = vt$encode_internal(list(value = 200), use90 = TRUE) + expect_equal(result, "92") +}) + +test_that("CodeTable4377 encode_internal stops on unmatched use90=TRUE value", { + vt = CodeTable4377$new() + expect_error(vt$encode_internal(list(value = -1), use90 = TRUE), "Cannot encode visibility") +}) + +# ── SurfaceWind encode ───────────────────────────────────────────────────────── + +test_that("SurfaceWind encodes direction and speed correctly", { + sw = SurfaceWind$new() + # Must call encode_internal directly: complex data has no $value key + result = sw$encode_internal(list(direction = list(value = 150), speed = list(value = 6L))) + expect_equal(result, "1506") +}) + +# ── WindSpeed encode ─────────────────────────────────────────────────────────── + +test_that("WindSpeed encodes NULL with allow_none=TRUE to '//'", { + result = WindSpeed$new()$encode(NULL, allow_none = TRUE) + expect_equal(result, "//") +}) + +test_that("WindSpeed encodes speed > 99 using 99 prefix", { + result = WindSpeed$new()$encode(list(value = 120)) + expect_match(result, "^99 00120$") +}) + +test_that("WindSpeed encodes a normal speed", { + result = WindSpeed$new()$encode(list(value = 35)) + expect_equal(result, "35") +}) + +# ── CloudCover encode ────────────────────────────────────────────────────────── + +test_that("CloudCover encodes a numeric value", { + result = CloudCover$new()$encode(list(value = 6, obscured = FALSE)) + expect_equal(result, "6") +}) + +# ── CloudGenus decode and encode ─────────────────────────────────────────────── + +test_that("CloudGenus decodes code 6 to Sc", { + result = CloudGenus$new()$decode("6") + expect_equal(result$value, "Sc") +}) + +test_that("CloudGenus encodes Sc to code 6", { + result = CloudGenus$new()$encode(list(value = "Sc")) + expect_equal(result, "6") +}) + +test_that("CloudGenus emits message on invalid genus name", { + expect_message(result <- CloudGenus$new()$encode(list(value = "XX"))) + expect_equal(result, "/") +}) + +# ── DirectionCardinal decode and encode ──────────────────────────────────────── + +test_that("DirectionCardinal decodes 0 as calm", { + result = DirectionCardinal$new()$decode("0") + expect_true(result$isCalmOrStationary) + expect_null(result$value) +}) + +test_that("DirectionCardinal decodes 9 as all-directions", { + result = DirectionCardinal$new()$decode("9") + expect_true(result$allDirections) +}) + +test_that("DirectionCardinal decodes 1 as NE", { + result = DirectionCardinal$new()$decode("1") + expect_equal(result$value, "NE") +}) + +test_that("DirectionCardinal encodes calm flag to '0'", { + result = DirectionCardinal$new()$encode(list(isCalmOrStationary = TRUE)) + expect_equal(result, "0") +}) + +test_that("DirectionCardinal encodes all-directions flag to '9'", { + result = DirectionCardinal$new()$encode(list(isCalmOrStationary = FALSE, allDirections = TRUE)) + expect_equal(result, "9") +}) + +test_that("DirectionCardinal encodes NE to '1'", { + result = DirectionCardinal$new()$encode(list(isCalmOrStationary = FALSE, allDirections = FALSE, value = "NE")) + expect_equal(result, "1") +}) + +test_that("DirectionCardinal emits message on unresolvable direction", { + expect_message(result <- DirectionCardinal$new()$encode( + list(isCalmOrStationary = FALSE, allDirections = FALSE, value = NULL) + )) + expect_equal(result, "/") +}) + +# ── DirectionDegrees encode ──────────────────────────────────────────────────── + +test_that("DirectionDegrees encodes calm to '00'", { + result = DirectionDegrees$new()$encode(list(value = NULL, calm = TRUE)) + expect_equal(result, "00") +}) + +test_that("DirectionDegrees encodes varAllUnknown to '99'", { + result = DirectionDegrees$new()$encode(list(value = NULL, varAllUnknown = TRUE)) + expect_equal(result, "99") +}) + +test_that("DirectionDegrees encodes NULL with no flags to '//'", { + # CodeTable0877.encode_internal returns "//" for null-value with no flags; + # test via code table directly since encode_value mangles "/" strings + result = CodeTable0877$new()$encode_internal(list(value = NULL, calm = FALSE, varAllUnknown = FALSE)) + expect_equal(result, "//") +}) + +test_that("DirectionDegrees encodes a degree value", { + result = DirectionDegrees$new()$encode(list(value = 150)) + expect_equal(result, "15") +}) + +test_that("DirectionDegrees emits message on invalid direction code", { + expect_message(DirectionDegrees$new()$decode("37")) +}) + +# ── CodeTable4377 decode edge cases ─────────────────────────────────────────── + +test_that("CodeTable4377 decodes VV=00 to isLess 100m", { + result = CodeTable4377$new()$decode("00") + expect_equal(result$value, 100) + expect_equal(result$quantifier, "isLess") +}) + +test_that("CodeTable4377 decodes VV=25 to 2500m", { + result = CodeTable4377$new()$decode("25") + expect_equal(result$value, 2500) + expect_null(result$quantifier) +}) + +test_that("CodeTable4377 decodes VV=60 to 10000m", { + result = CodeTable4377$new()$decode("60") + expect_equal(result$value, 10000) +}) + +test_that("CodeTable4377 decodes VV=89 to isGreater 70000m", { + result = CodeTable4377$new()$decode("89") + expect_equal(result$value, 70000) + expect_equal(result$quantifier, "isGreater") +}) + +test_that("CodeTable4377 decodes VV=90 to isLess 50m with use90", { + result = CodeTable4377$new()$decode("90") + expect_equal(result$value, 50) + expect_equal(result$quantifier, "isLess") + expect_true(result$use90) +}) + +test_that("CodeTable4377 decodes VV=92 to 200m with use90", { + result = CodeTable4377$new()$decode("92") + expect_equal(result$value, 200) + expect_true(result$use90) +}) + +test_that("CodeTable4377 decodes VV=98 to 20000m with use90", { + result = CodeTable4377$new()$decode("98") + expect_equal(result$value, 20000) + expect_true(result$use90) +}) + +test_that("CodeTable4377 decodes VV=99 to isGreaterOrEqual 50000m", { + result = CodeTable4377$new()$decode("99") + expect_equal(result$value, 50000) + expect_equal(result$quantifier, "isGreaterOrEqual") +}) + +test_that("CodeTable4377 emits message on invalid code 53", { + expect_message(result <- CodeTable4377$new()$decode("53")) + expect_null(result) +}) + +# ── CodeTable1677 decode (Height) edge cases ─────────────────────────────────── + +test_that("Height decodes hh=00 to isLess 30m", { + result = Height$new()$decode("00") + expect_equal(result$value, 30) + expect_equal(result$quantifier, "isLess") +}) + +test_that("Height decodes hh=25 to 750m", { + result = Height$new()$decode("25") + expect_equal(result$value, 750) +}) + +test_that("Height decodes hh=56 to 1800m", { + result = Height$new()$decode("56") + expect_equal(result$value, 1800) +}) + +test_that("Height decodes hh=82 to 12000m", { + result = Height$new()$decode("82") + expect_equal(result$value, 12000) +}) + +test_that("Height decodes hh=89 to isGreater 21000m", { + result = Height$new()$decode("89") + expect_equal(result$value, 21000) + expect_equal(result$quantifier, "isGreater") +}) + +test_that("Height decodes hh=99 to isGreater 21000m", { + result = Height$new()$decode("99") + expect_equal(result$value, 21000) + expect_equal(result$quantifier, "isGreater") +}) + +test_that("Height emits message on invalid code 55 (gap between ranges)", { + expect_message(result <- Height$new()$decode("55")) + expect_null(result) +}) + +# ── Amount24 / CodeTable3590A decode ────────────────────────────────────────── + +test_that("Amount24 decodes normal value to tenths of mm", { + result = Amount24$new()$decode("0500") + expect_equal(result$value, 50.0) + expect_false(result$trace) +}) + +test_that("Amount24 decodes 9999 as trace", { + result = Amount24$new()$decode("9999") + expect_equal(result$value, 0) + expect_true(result$trace) +}) + +# ── Precipitation.decode_internal with tenths=TRUE ───────────────────────────── + +test_that("Precipitation decodes with tenths=TRUE using Amount24", { + precip = Precipitation$new() + result = precip$decode("60010", tenths = TRUE) + expect_false(is.null(result$amount)) + expect_equal(result$time_before_obs$value, 24) +}) + +# ── LowestCloudBase decode with invalid code ─────────────────────────────────── + +test_that("LowestCloudBase emits message on out-of-range code", { + # CodeTable1600 only has 10 entries (codes 0-9); code 10 is out of range + # LowestCloudBase.decode("9") returns the last valid entry without message + # so use two-digit code which as.integer truncates to the first character anyway; + # instead test via CodeTable1600 directly with an out-of-range integer string + expect_message(result <- LowestCloudBase$new()$decode_value("a")) + expect_null(result) # non-numeric string → NA integer → NULL via decode_value path +}) + +test_that("Region emits message on station ID outside all defined ranges", { + expect_message(result <- Region$new()$decode("99999")) + expect_null(result) +}) + +# ── Gust encode ──────────────────────────────────────────────────────────────── + +test_that("Gust encodes NULL with allow_none=TRUE to '//'", { + result = Gust$new()$encode(NULL, allow_none = TRUE) + expect_equal(result, "//") +}) + +test_that("Gust encodes speed > 99 using 99 prefix", { + result = Gust$new()$encode(list(value = 120)) + expect_match(result, "^99 00120$") +}) + +test_that("Gust encodes normal speed", { + result = Gust$new()$encode(list(value = 35)) + expect_equal(result, "35") +}) + +# ── HighestGust encode ──────────────────────────────────────────────────────── + +test_that("HighestGust encodes gust with 10-min measure_period", { + hg = HighestGust$new() + # Must call encode_internal directly: complex data has no $value key + result = hg$encode_internal(list(speed = list(value = 20), measure_period = list(value = 10, unit = "min"))) + expect_equal(result, "91020") +}) + +test_that("HighestGust encodes gust with time_before_obs using 911 prefix", { + hg = HighestGust$new() + result = hg$encode_internal(list( + speed = list(value = 20), + time_before_obs = list("_code" = "5") + )) + expect_match(result, "91120") +}) + +test_that("HighestGust encodes gust with direction appended as 915dd", { + hg = HighestGust$new() + result = hg$encode_internal(list( + speed = list(value = 25), + measure_period = list(value = 10, unit = "min"), + direction = list(value = 180) + )) + expect_match(result, "91025") + expect_match(result, "91518") +}) + +test_that("HighestGust encode_internal stops on invalid measure_period", { + hg = HighestGust$new() + expect_error(hg$encode_internal(list( + speed = list(value = 20), + measure_period = list(value = 5, unit = "min") # only 10 min is valid + )), "Invalid value for measure_period") +}) + +# ── create_observation ──────────────────────────────────────────────────────── + +test_that("create_observation returns the correct R6 class instance", { + obj = create_observation("Temperature") + expect_true(inherits(obj, "Temperature")) +}) + +test_that("create_observation stops on unknown class name", { + expect_error(create_observation("UnknownClass"), "Unknown observation class") +}) + +# ── Minute class ────────────────────────────────────────────────────────────── + +test_that("Minute decodes a valid minute value", { + result = Minute$new()$decode("30") + expect_equal(result$value, 30L) +}) + +test_that("Minute returns NULL for out-of-range value", { + result = Minute$new()$decode("60") + expect_null(result) +}) + +# ── is_available ────────────────────────────────────────────────────────────── + +test_that("is_available returns FALSE for NULL", { + expect_false(Hour$new()$is_available(NULL)) +}) + +test_that("is_available returns FALSE for all-slash string", { + expect_false(Hour$new()$is_available("//")) +}) + +test_that("is_available returns TRUE for a valid string", { + expect_true(Hour$new()$is_available("12")) +}) + +# ── WindIndicator decode ────────────────────────────────────────────────────── + +test_that("WindIndicator decodes iw=3 as KT estimated", { + result = WindIndicator$new()$decode("3") + expect_equal(result$unit, "KT") + expect_true(result$estimated) +}) + +# ── Hour encode ─────────────────────────────────────────────────────────────── + +test_that("Hour encode_convert passes through via encode", { + result = Hour$new()$encode(list(value = 12L)) + expect_equal(result, "12") +}) + +# ── Snow depth (Section 3, group 4E'sss) ───────────────────────────────────── + +test_that("snow depth: trace (sss=997) is decoded to 0 with correct state", { + msg = "AAXX 15061 12530 11225 80000 11012 21012 39997 40204 56006 69902 72022 885// 333 11011 21017 3/102 47997 79999 93097=" + row = parser(msg, as_data_frame = TRUE) + expect_equal(row$snow_depth, 0) + expect_equal(row$snow_depth_state, "Even layer of loose dry snow covering ground completely") +}) + +test_that("snow depth: actual depth is decoded correctly", { + msg = "AAXX 01004 88889 12782 61506 10094 20047 30111 40197 53007 60001 81541 333 40055=" + row = parser(msg, as_data_frame = TRUE) + expect_equal(row$snow_depth, 55) + expect_equal(row$snow_depth_state, "Ground predominantly covered by ice") +}) + +test_that("snow depth: non-continuous (sss=998) returns NA", { + msg = "AAXX 01004 88889 12782 61506 10094 20047 30111 40197 53007 60001 81541 333 42998=" + row = parser(msg, as_data_frame = TRUE) + expect_true(is.na(row$snow_depth)) + expect_equal(row$snow_depth_state, "Compact or wet snow covering at least one-half of the ground but not completely") +}) + +test_that("snow depth: unmeasurable (sss=999) returns NA", { + msg = "AAXX 01004 88889 12782 61506 10094 20047 30111 40197 53007 60001 81541 333 43999=" + row = parser(msg, as_data_frame = TRUE) + expect_true(is.na(row$snow_depth)) +}) + +test_that("snow depth: absent in message gives NA columns", { + msg = "AAXX 01004 88889 12782 61506 10094 20047 30111 40197 53007 60001 81541" + row = parser(msg, as_data_frame = TRUE) + expect_true(is.na(row$snow_depth)) + expect_true(is.na(row$snow_depth_state)) +}) + +# ── Nddff message chain ──────────────────────────────────────────────────────── + +test_that("invalid wind direction in Nddff emits full context message chain", { + # Group 88695: N=8 (cloud cover), dd=86 (invalid direction), ff=95 (wind speed) + # Expected chain: "Warning decoding group: 88695 - Warning decoding with code table: 86 - ..." + expect_message( + parser("AAXX 10061 11035 11234 88695 11020 21015="), + "Warning decoding group: 88695" + ) + expect_message( + parser("AAXX 10061 11035 11234 88695 11020 21015="), + "Warning decoding with code table: 86" + ) +}) diff --git a/vignettes/articles/pl.Rmd b/vignettes/articles/pl.Rmd index 74763b3a..8df2d56b 100644 --- a/vignettes/articles/pl.Rmd +++ b/vignettes/articles/pl.Rmd @@ -83,10 +83,10 @@ kolejnych kolumnach. ```{r filtering, eval=TRUE, include=TRUE} h2 = h %>% - filter(MCWSKEX == 3) %>% - select(id, PSNZWP, X, Y, MCROKH, MCPRZP) %>% - group_by(MCROKH, id, PSNZWP, X, Y) %>% - summarise(srednie_roczne_Q = round(mean(MCPRZP, na.rm = TRUE),1)) %>% + dplyr::filter(MCWSKEX == 3) %>% + dplyr::select(id, PSNZWP, X, Y, MCROKH, MCPRZP) %>% + dplyr::group_by(MCROKH, id, PSNZWP, X, Y) %>% + dplyr::summarise(srednie_roczne_Q = round(mean(MCPRZP, na.rm = TRUE),1)) %>% spread(MCROKH, srednie_roczne_Q) ``` diff --git a/vignettes/articles/synop_parser.Rmd b/vignettes/articles/synop_parser.Rmd new file mode 100644 index 00000000..4b47eec4 --- /dev/null +++ b/vignettes/articles/synop_parser.Rmd @@ -0,0 +1,213 @@ +--- +title: "SYNOP Messages" +--- + +```{r setup, include=FALSE} +knitr::opts_chunk$set(echo = TRUE, eval = FALSE) +``` + +## Overview + +SYNOP is a standardized format for reporting weather observations usually every hour. +This vignette demonstrates how to work with SYNOP FM-12 meteorological messages using the +**climate** package. + +The package provides two complementary functions for handling SYNOP data: + +- **`parser()`** — parse raw SYNOP message strings into structured data +- **`meteo_ogimet(source = "synop")`** — download raw SYNOP messages from Ogimet and decode them automatically + +> **Note:** The older `meteo_ogimet_synop()` function is deprecated. Use `meteo_ogimet(source = "synop")` instead. + +## The `parser()` Function + +The `parser()` function decodes raw SYNOP messages into either a tidy data frame or a nested list structure. + +### Basic Usage - data frame output: + + +```{r} +suppressMessages(library(climate)) +``` + + +```{r echo=TRUE} +# A simple SYNOP message +synop_code = "AAXX 01004 88889 12782 61506 10094 20047 30111 40197 53007 60001 81541" + +# Parse into a data frame (default for analysis) +df_single = parser(synop_code, as_data_frame = TRUE) +head(df_single) +``` + +### Multiple Messages + +To parse multiple SYNOP messages at once: + +```{r echo=TRUE} +# Parse multiple messages +messages = c( + "AAXX 01004 88889 12782 61506 10094 20047 30111 40197 53007 60001 81541", + "AAXX 01006 12330 32782 61506 10094 20047 30111 40197 53007 60001 81541" +) + +# Multiple messages as a data frame +df_multiple = parser(messages, as_data_frame = TRUE) +head(df_multiple) +``` + +### Available Columns + +When `as_data_frame = TRUE`, the returned data frame includes the following columns: + +**Identification and metadata:** +- `station_type` — station classification code +- `station_id` — WMO station identifier +- `region` — geographic region code +- `obs_day` — observation day of month +- `obs_hour` — observation hour (UTC) + +**Wind:** +- `wind_unit` — either "m/s" or "knots" +- `wind_estimated` — whether wind is estimated +- `wind_direction` — direction in degrees (0–360°) +- `wind_speed` — speed in the reported unit + +**Temperature and moisture:** +- `air_temperature` — air temperature (°C) +- `dewpoint_temperature` — dew point (°C) + +**Pressure:** +- `station_pressure` — station-level pressure (hPa) +- `sea_level_pressure` — sea-level pressure (hPa) +- `pressure_tendency` — pressure change indicator +- `pressure_change` — 3-hour pressure change (hPa) + +**Clouds:** +- `cloud_cover` — total cloud cover (oktas, 0–8) +- `low_cloud_type` — type of low clouds (Sc, St, Cu, Cb) +- `middle_cloud_type` — type of middle clouds (Ac, As, Ns) +- `high_cloud_type` — type of high clouds (Ci, Cc, Cs) +- `low_cloud_amount` — cover of low clouds (oktas) +- `visibility` — horizontal visibility (m) +- `cloud_base_min`, `cloud_base_max` — cloud base height range (m) + +**Precipitation and other:** +- `precipitation_amount` — total precipitation (mm) +- `precipitation_time` — precipitation period (hours) +- `maximum_temperature` — daily maximum (°C, from Section 3) +- `minimum_temperature` — daily minimum (°C, from Section 3) +- `gust` — highest wind gust (in the message's wind unit) +- `sunshine_duration` — daily sunshine duration (hours) +- `snow_depth` — total snow depth (cm) +- `snow_depth_state` — descriptive snow/ground state + +### Nested List Output + +For more granular access most of decoded field, use the default list output: + +```{r echo=TRUE} +# Single message as a nested list +decoded = parser(synop_code) +str(decoded) +``` + +To parse multiple messages and keep each as a separate list element: + +```{r} +# Returns a list of two decoded messages +decoded_list = parser(messages, simplify = FALSE) + +# Access the first message +decoded_list[[1]] +``` + +### Country-Specific Decoding + +Some SYNOP groups (e.g., precipitation indicators) are country-specific. Pass the `country` parameter to adjust decoding: + +```{r echo=TRUE} +# Parse with Russia-specific precipitation decoding +decoded_ru = parser(synop_code, country = "RU", as_data_frame = TRUE) + +# Compare with default +decoded_default = parser(synop_code, as_data_frame = TRUE) +``` + +## The `meteo_ogimet()` Function — SYNOP Backend + +The `meteo_ogimet()` function (with `source = "synop"`) downloads SYNOP messages from +[Ogimet](http://www.ogimet.com) and automatically decodes them using `parser()`. + +### Station Mode: Single or Multiple Stations + +Fetch SYNOP data for one or more WMO station IDs: + +```{r echo=TRUE} +# can be used with multiple stations: +two_stations = meteo_ogimet( + source = "synop", + station = c(12330, 12375), + date = c("2024-01-01", "2024-01-10") +) +head(two_stations) +``` + +### Country mode: all stations in a country + +Fetch SYNOP data for all Ogimet stations in a country for a given time range: + +```{r} +# All Polish stations for a single day +poland = meteo_ogimet( + source = "synop", + country_name = "Poland", + date = c("2024-01-15", "2024-01-15") +) +head(poland) +``` + +### Simplified vs. full data frame output + +By default, `meteo_ogimet()` returns a simplified data frame with the most commonly used variables. +To get all decoded SYNOP fields, use `simplified = FALSE`: + +```{r echo=TRUE} +# Full parser output +full = meteo_ogimet( + source = "synop", + station = 12330, + date = c("2024-01-01", "2024-01-05"), + simplified = FALSE +) +head(full) +``` + +To obtain both the simplified and full data frames in one call, use `return_list = TRUE`: + +```{r echo=TRUE} +result = meteo_ogimet( + source = "synop", + station = 12330, + date = c("2024-01-01", "2024-01-05"), + return_list = TRUE +) +# Compact 20-column view +names(result$data) +# Full 30+ column parser output +names(result$full) +``` + + + +## Notes and Limitations + +- **Ogimet server limits:** The Ogimet server caps responses at 200,000 rows. If a date range would exceed this, the function automatically splits the range and recursively fetches each half. +- **Station delays:** When fetching multiple stations in station mode, a 20-second delay is inserted between requests to avoid overloading the Ogimet server. +- **Missing observations:** Not all SYNOP variables are reported at all times. Missing fields are returned as `NA`. +- **Historical data:** Ogimet typically provides historical SYNOP data since approx. 1999/2000; availability varies by station and country. + +## See Also + +- Daily and hourly OGIMET data via the HTML backend: `meteo_ogimet(source = "html")` +- WMO SYNOP documentation for FM-12: [FM 12-XIII Ext. SYNOP](https://library.wmo.int/records/item/35713-manual-on-codes-volume-i-1-international-codes) diff --git a/vignettes/articles/usecase_ogimet.Rmd b/vignettes/articles/usecase_ogimet.Rmd index 0eb99d46..d33472af 100644 --- a/vignettes/articles/usecase_ogimet.Rmd +++ b/vignettes/articles/usecase_ogimet.Rmd @@ -1,46 +1,207 @@ --- -title: "Use Case - Ogimet database" +title: "Ogimet meteorological database" output: html_document --- ```{r setup, include=FALSE} -knitr::opts_chunk$set(echo = TRUE) +knitr::opts_chunk$set(echo = TRUE, eval = FALSE) ``` -## Ogimet - download and visualize wind patterns over Svalbard +Most of meteorological stations working under the umbrella of the World Meteorlogical Organization (WMO) report their observations in the SYNOP format. The **climate** package provides a convenient interface to download and decode SYNOP messages from the [Ogimet](https://www.ogimet.com/) database, which aggregates data from thousands of stations worldwide. Therefore, it is often the most preferable source of meteorological information being a valuable resource for researchers, students, and weather enthusiasts looking to access historical and real-time meteorological data for analysis, visualization, and educational purposes. This information is most often used as a complementary source to national meteorological services that is delivered free of charge and without any API key requirements. -1. Downloading hourly data from the Ogimet repository for the defined time frame (2018/01/01-2018/12/31); chosen station: Svalbard Lufthavn -2. Using external package 'openair' to visualize the downloaded results +The `meteo_ogimet()` function is the unified entry point for downloading data from [Ogimet](https://www.ogimet.com/). +It uses two backends (automatically) selected by the `interval` argument: -```{r,warning=FALSE} +- **`interval = "hourly"`** → **SYNOP backend** (default): downloads and decodes raw SYNOP messages via + the Ogimet `getsynop` endpoint. Returns clean units used in meteorology (`ws` in m/s, `wd` in degrees, `t2m` + in °C, POSIXct `date` in UTC). +- **`interval = "daily"`** → **HTML backend** (default): scrapes pre-formatted daily summary tables + generated in the Ogimet web service. + +Use `source = "synop"` or `source = "html"` to override the default for any interval. + +--- + +## 1. Hourly wind patterns over Svalbard (SYNOP backend) + +Download a full year of hourly observations and visualize wind patterns and temperature advection +using the [openair](https://davidcarslaw.github.io/openair/) package. + +With the SYNOP backend the output already contains: + +- `ws` — wind speed in **m/s** +- `wd` — wind direction in **degrees** (no character-to-degrees conversion needed) +- `t2m` — air temperature in °C +- `date` — POSIXct timestamp in UTC (the column name openair expects) + +```{r svalbard-download, echo=TRUE} library(climate) -# downloading data -df <- meteo_ogimet(interval = "hourly", - date = c("2018-01-01", "2018-12-31"), - station = "01008") -# loading external packages: +# Download hourly data for Svalbard Lufthavn (WMO 01008) +df = meteo_ogimet(interval = "hourly", + date = c("2018-01-01", "2018-12-31"), + station = "01008") +head(df[, c("date", "station", "t2m", "ws", "wd")]) +``` + +```{r svalbard-windrose, echo=TRUE, message=FALSE, warning=FALSE} +library(openair) + +# Wind rose by season — no unit conversion needed +windRose(mydata = df, + ws = "ws", + wd = "wd", + type = "season", + paddle = FALSE, + main = "Svalbard Lufthavn (2018)", + ws.int = 3, + dig.lab = 3, + layout = c(4, 1)) +``` + +```{r svalbard-timeplot, echo=TRUE} +# Temporal overview of temperature and wind speed +timePlot(df, pollutant = c("t2m", "ws")) +``` + +```{r svalbard-polar, echo=TRUE} +# Which wind sectors bring warm / cold air masses? +polarPlot(df, + pollutant = "t2m", + x = "ws", + wd = "wd", + k = 50, + force.positive = FALSE, + type = "season", + layout = c(4, 1), + resolution = "fine", + normalise = FALSE) +``` + +--- + +## 2. Daily summaries — multi-station temperature comparison + +The HTML backend (`interval = "daily"`) returns pre-aggregated daily statistics (Tmax, Tmin, Tavg, +precipitation, etc.) for one or more stations. + +```{r daily-download, echo=TRUE} +library(climate) + +# Daily summaries for two Polish stations: Poznan (12330) and Warsaw (12375) +daily = meteo_ogimet(interval = "daily", + date = c("2023-06-01", "2023-08-31"), + station = c(12330, 12375), + coords = TRUE) +head(daily) +``` + +```{r daily-plot, echo=TRUE, message=FALSE} +library(ggplot2) + +# Compare average temperatures between the two stations +ggplot(daily, aes(x = as.Date(Date), y = TemperatureCAvg, + colour = factor(station_ID), group = station_ID)) + + geom_line() + + labs(title = "Daily average temperature — summer 2023", + x = "Date", + y = "Temperature (°C)", + colour = "Station (WMO ID)") + + theme_bw() +``` + +--- + +## 3. Country-level bulk download + +The SYNOP backend supports downloading all Ogimet stations for an entire country in a single +request via the `country_name` argument. This is useful for spatial analysis. + +```{r country-download, echo=TRUE} +library(climate) + +# All stations in Poland for a single day +poland = meteo_ogimet(interval = "hourly", + country_name = "Poland", + date = c("2023-12-15", "2023-12-15")) + +cat("Stations:", length(unique(poland$station)), "\n") +cat("Observations:", nrow(poland), "\n") +head(poland[, c("date", "station", "t2m", "ws", "slp")]) +``` + +```{r country-map, echo=TRUE, message=FALSE, warning=FALSE} +# Quick spatial overview — requires coordinates from stations_ogimet() library(dplyr) -library(openair) # external package for plotting wind roses -# converting wind direction from character into degress required by most -wdir <- data.frame(ddd = c("CAL","N","NNE","NE","ENE","E","ESE","SE","SSE", - "S","SSW","SW","WSW","W","WNW","NW","NNW"), - dir = c(NA, 0:15 * 22.5), stringsAsFactors = FALSE) -# changing date column to the format required by openair package: -df$Date <- as.POSIXct(df$Date, tz = "UTC") -df$date <- df$Date -df <- left_join(df, wdir) +station_meta = stations_ogimet(country = "Poland", date = Sys.Date()) + +# Join decoded observations with coordinates +poland_geo = poland %>% + group_by(station) %>% + summarise(t2m_mean = mean(t2m, na.rm = TRUE), .groups = "drop") %>% + left_join(station_meta, by = c("station" = "wmo_id")) + +ggplot(poland_geo, aes(x = lon, y = lat, colour = t2m_mean, size = t2m_mean)) + + geom_point(alpha = 0.7) + + scale_colour_distiller(palette = "RdYlBu", direction = -1, name = "T2m (°C)") + + coord_quickmap() + + labs(title = "Mean temperature — Poland, 2023-12-15") + + theme_bw() + + theme(legend.position = "right") +``` + +--- + +## 4. Full SYNOP output with `return_list` -df$ws <- df$ffkmh / 3.6 # conversion to m/s from km/h -df$gust <- as.numeric(df$Gustmax) / 3.6 # conversion to m/s from km/h -windRose(mydata = df, ws = "ws", wd = "dir", type = "season", paddle = FALSE, - main = "Svalbard Lufthavn (2018)", ws.int = 3, dig.lab = 3, layout = c(4, 1)) +Setting `return_list = TRUE` returns a named list with two data frames: `$data` (the compact +20-column simplified view) and `$full` (the complete parser output with 30+ columns including +cloud types, precipitation period, snow state, etc.). -# do we miss any data? -summaryPlot(df[ ,c("date", "TC", "ws", "gust")]) +```{r return-list, echo=TRUE} +library(climate) + +result = meteo_ogimet(interval = "hourly", + station = 12330, + date = c("2023-06-01", "2023-06-03"), + return_list = TRUE) + +# Compact view +names(result$data) + +# Full parser output +names(result$full) + +# Variables only available in the full output +result$full[1, c("low_cloud_type", "middle_cloud_type", "high_cloud_type", + "precipitation_time", "snow_depth_state", "source")] +``` + +--- + +## 5. Selecting the backend explicitly -# which sectors are responsible for warm/cold air mass advection: -polarPlot(df, pollutant = "TC", x = "ws", wd = "dir", k = 50, force.positive = FALSE, - type = "season", layout = c(4, 1), resolution = "fine", normalise = FALSE) +Override the automatic backend selection with the `source` argument: + +```{r backend-override, echo=TRUE} +library(climate) + +# Force SYNOP backend for daily data +poznan_synop = meteo_ogimet(interval = "daily", + station = 12330, + date = c("2023-06-01", "2023-06-07"), + source = "synop") +head(poznan_synop[, c("date", "station", "t2m", "tmax", "tmin", "precip")]) + +# Force HTML backend for hourly data (returns km/h wind speed, character wind direction, etc.) +poznan_html = meteo_ogimet(interval = "hourly", + station = 12330, + date = c("2023-06-01", "2023-06-03"), + source = "html", + coords = TRUE, + precip_split = TRUE) +head(poznan_html[, c("station_ID", "Date", "TC", "ffkmh", "ddd", "pr6", "pr12", "pr24")]) ``` + +> **Note:** `meteo_ogimet_synop()` is deprecated. Use `meteo_ogimet(source = "synop")` instead. diff --git a/vignettes/getstarted.Rmd b/vignettes/getstarted.Rmd index 721241e1..25f741a0 100644 --- a/vignettes/getstarted.Rmd +++ b/vignettes/getstarted.Rmd @@ -14,51 +14,77 @@ knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) -old <- options(scipen = 999) +old = options(scipen = 999) ``` -The goal of the **climate** R package is to automatize downloading of meteorological +The goal of the **climate** R package is to automatize downloading of *in-situ* meteorological and hydrological data from publicly available repositories: -- OGIMET [(ogimet.com)](http://ogimet.com/index.phtml.en) -- University of Wyoming - atmospheric vertical profiling data (http://weather.uwyo.edu/upperair/). +- OGIMET [(ogimet.com)](http://ogimet.com/index.phtml.en) - up-to-date collection of SYNOP dataset +- University of Wyoming - atmospheric vertical profiling data (http://weather.uwyo.edu/upperair/) - Polish Institute of Meteorology and Water Management - National Research Institute [(IMGW-PIB)](https://dane.imgw.pl/) -- National Oceanic & Atmospheric Administration - Earth System Research Laboratory - Global - Monitoring Division [(NOAA)](https://gml.noaa.gov/ccgg/trends/) -- National Oceanic & Atmospheric Administration - National Climatic Data Center - Integrated Surface Hourly (ISH) [(NOAA)](https://www.ncei.noaa.gov/pub/data/noaa/) +- National Oceanic & Atmospheric Administration - Earth System Research Laboratories - Global Monitoring Laboratory [(NOAA)](https://gml.noaa.gov/ccgg/trends/) +- National Centers for Environmental Information, National Oceanic & Atmospheric Administration - National Climatic Data Center - Integrated Surface Hourly (ISH) [(NOAA)](https://www.ncei.noaa.gov/pub/data/noaa/) ## Functions -The **climate** package consists of ten main functions - three for meteorological data, one for hydrological data and six auxiliary functions and datasets: - ### Meteorological data - -- **meteo_ogimet()** - Downloading hourly and daily meteorological data from the SYNOP stations available in the ogimet.com collection. -Any meteorological (aka SYNOP) station working under the World Meteorological Organizaton (WMO) framework after year 2000 should be accessible. -- **meteo_imgw()** - Downloading hourly, daily, and monthly meteorological data from the SYNOP/CLIMATE/PRECIP stations available in the dane.imgw.pl collection. -It is a wrapper for `meteo_monthly()`, `meteo_daily()`, and `meteo_hourly()` +- **meteo_ogimet()** - Downloading hourly and daily meteorological data from the SYNOP stations +available in the ogimet.com collection. Any meteorological (aka SYNOP) station working under the +World Meteorological Organization (WMO) framework after year 2000 should be accessible. +Two backends are available and selected automatically: raw **SYNOP decoding** (`source = "synop"`, +default for `interval = "hourly"`) and **HTML scraping** (`source = "html"`, default for +`interval = "daily"`). Country-level bulk downloads are supported via the `country_name` argument +(SYNOP backend only). + +- **meteo_imgw()** - Downloading hourly, daily, and monthly meteorological data from the +SYNOP/CLIMATE/PRECIP stations available in the danepubliczne.imgw.pl collection. +It is a wrapper for `meteo_monthly()`, `meteo_daily()`, `meteo_hourly()` and +`meteo_imgw_datastore()` which gives access from monthly to even 10-min datasets. + +- **meteo_noaa_hourly()** - Downloading hourly NCEI/NOAA Integrated Surface Hourly (ISH) +meteorological data - some stations have > 100 years of observations. + +- **meteo_noaa_co2()** - Downloading monthly CO2 measurements from Mauna Loa Observatory. -- **meteo_noaa_hourly()** - Downloading hourly NOAA Integrated Surface Hourly (ISH) meteorological data - Some stations have > 100 years long history of observations +- **sounding_wyoming()** - Downloading measurements of the vertical profile of atmosphere +(aka rawinsonde data). -- **sounding_wyoming()** - Downloading measurements of the vertical profile of atmosphere (aka rawinsonde data) +- **parser()** - Decoding raw FM-12 SYNOP meteorological messages into structured R lists or +data frames. For a full walkthrough see the +[SYNOP Messages vignette](https://bczernecki.github.io/climate/articles/synop_parser.html). ### Hydrological data -- **hydro_imgw()** - Downloading hourly, daily, and monthly hydrological data from the SYNOP / CLIMATE / PRECIP stations available in the -danepubliczne.imgw.pl collection. -It is a wrapper for `hydro_monthly()`, and `hydro_daily()` +- **hydro_imgw()** - Downloading daily and monthly hydrological data from stations available in the +danepubliczne.imgw.pl collection. It is a wrapper for `hydro_monthly()` and `hydro_daily()`. + +- **hydro_imgw_datastore()** - Downloading hourly and sub-hourly hydrological data from the +IMGW-PIB hydro telemetry stations. ### Auxiliary functions and datasets - **stations_ogimet()** - Downloading information about all stations available in the selected country in the Ogimet repository -- **nearest_stations_ogimet()** - Downloading information about nearest stations to the selected point -available for the selected country in the Ogimet repository -- **imgw_meteo_stations** - Built-in metadata from the IMGW-PIB repository for meteorological stations, their geographical coordinates, and ID numbers -- **imgw_hydro_stations** - Built-in metadata from the IMGW-PIB repository for hydrological stations, their geographical coordinates, and ID numbers -- **imgw_meteo_abbrev** - Dictionary explaining variables available for meteorological stations (from the IMGW-PIB repository) -- **imgw_hydro_abbrev** - Dictionary explaining variables available for hydrological stations (from the IMGW-PIB repository) +- **nearest_stations_ogimet()** - Downloading information about nearest stations to the selected +point using the Ogimet repository +- **nearest_stations_noaa()** - Downloading information about nearest stations to the selected +point in the NOAA ISH meteorological repository +- **nearest_stations_imgw()** - List of nearby meteorological or hydrological IMGW-PIB stations +in Poland +- **imgw_meteo_stations** - Built-in metadata from the IMGW-PIB repository for meteorological +stations, their geographical coordinates, and ID numbers +- **imgw_hydro_stations** - Built-in metadata from the IMGW-PIB repository for hydrological +stations, their geographical coordinates, and ID numbers +- **stations_meteo_imgw_telemetry** - Downloading complete and up-to-date coordinates for +IMGW-PIB telemetry meteorological stations +- **stations_hydro_imgw_telemetry** - Downloading complete and up-to-date coordinates for +IMGW-PIB telemetry hydrological stations +- **imgw_meteo_abbrev** - Dictionary explaining variables available for meteorological stations +(from the IMGW-PIB repository) +- **imgw_hydro_abbrev** - Dictionary explaining variables available for hydrological stations +(from the IMGW-PIB repository) ## Examples @@ -73,7 +99,7 @@ Finding a 50 nearest meteorological stations for a given coordinates in a given ``` {r stations , eval=T, fig.width=7,fig.height=7, fig.fullwidth=TRUE} library(climate) -ns = nearest_stations_ogimet(country = c("United Kingdom", "France"), +ns = nearest_stations_ogimet(country = c("United Kingdom"), point = c(-3, 50), no_of_stations = 50, add_map = TRUE) @@ -111,9 +137,9 @@ df = readRDS(system.file("extdata/vignettes/svalbard_noaa.rds", package = "clima df = meteo_noaa_hourly(station = "010080-99999", year = 2016) # You can also download the same (but more granular) data with Ogimet.com (example for year 2016): -# df = meteo_ogimet(interval = "hourly", +# df = meteo_ogimet(interval = "hourly", # date = c("2016-01-01", "2016-12-31"), -# station = c("01008")) +# station = "01008") ``` ``` {r noaa-kable,eval=T} @@ -131,7 +157,7 @@ colnames(df2)[c(1, 3:4)] = c("PRESS", "TEMP", "DEWPT") # changing column names ``` ```{r sonda, eval=F, include=T} -profile_demo <- sounding_wyoming(wmo_id = 12120, +profile_demo = sounding_wyoming(wmo_id = 12120, yy = 2000, mm = 3, dd = 23, @@ -215,6 +241,46 @@ knitr::kable(head(h2), caption = "Exemplary data frame of hydrological preprocesssing.") ``` +### Example 7 +Downloading monthly CO2 measurements from Mauna Loa Observatory and plotting the Keeling Curve: + +```{r co2, eval=FALSE, include=TRUE} +library(climate) +library(ggplot2) + +co2 = meteo_noaa_co2() +co2$date = ISOdate(co2$yy, co2$mm, 1) + +ggplot(co2, aes(date, co2_avg)) + + geom_line() + + geom_smooth() + + theme_bw() + + labs(title = "Carbon Dioxide (CO2)", + subtitle = "Mauna Loa Observatory", + x = "", y = "ppm") +``` + +### Example 8 +Decoding raw SYNOP meteorological messages with **parser()**: + +```{r parser, eval=TRUE, include=TRUE} +library(climate) + +synop_code = "AAXX 01004 88889 12782 61506 10094 20047 30111 40197 53007 60001 81541" + +# Decode a single message — returns a named list +result = parser(synop_code) +result$air_temperature$value # 9.4°C +result$wind_speed$value # 6 kt +result$sea_level_pressure$value # 1019.7 hPa +``` + +```{r parser-df, eval=TRUE, include=TRUE} +# Return a tidy data frame with one row per message +df_parser = parser(synop_code, as_data_frame = TRUE) +knitr::kable(df_parser[, 1:8]) +``` + ## Acknowledgment Ogimet.com, University of Wyoming, and Institute of Meteorology and Water Management - National Research Institute (IMGW-PIB), National Oceanic & Atmospheric Administration (NOAA) - Earth System Research Laboratories - Global Monitoring Laboratory, Global Monitoring Division and Integrated Surface Hourly (NOAA ISH) are the sources of the data.