Given that analyses using all available fluxnet sites may be common, and that memory constraints prevent working with at least the daily and hourly data in-memory, it might be prudent to make the next version of this package work primarily with duckdb. There are now examples in https://github.com/EcosystemEcologyLab/fluxnet-annual-2026 of how to ingest all resolutions of data and a manifest into a duckdb database and how to check for and make updates.
I can imagine a workflow as follows:
- flux_listall()
- flux_download()
- flux_extract()
- flux_build_db(): constructs a duckdb database by ingesting available CSVs
- flux_update_db(): compares
flux_discover_files() with manifest stored in database and upserts data.
- con <- flux_connect(): opens a connection to the duckdb database at a default location
- annual <- tbl(con, "annual")
- flux_qc(annual): returns a lazy tibble
Given that analyses using all available fluxnet sites may be common, and that memory constraints prevent working with at least the daily and hourly data in-memory, it might be prudent to make the next version of this package work primarily with
duckdb. There are now examples in https://github.com/EcosystemEcologyLab/fluxnet-annual-2026 of how to ingest all resolutions of data and a manifest into a duckdb database and how to check for and make updates.I can imagine a workflow as follows:
flux_discover_files()with manifest stored in database and upserts data.