Skip to content

EcosystemEcologyLab/fluxnet-citations

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

fluxnet-citations

A portable R script that produces BibTeX citations, a network-conditional acknowledgments file, and a review-flags file for a list of FLUXNET sites you have downloaded. It draws on the manifest from your FLUXNET shuttle download as its metadata source — not a live query to the shuttle.


Why citations matter for FLUXNET data

FLUXNET is not a single monolithic dataset. The measurements behind each tower site were collected by an individual site team — typically a principal investigator plus students, technicians, and collaborators — who installed the instruments, maintained them through seasons and storms, and spent years quality-controlling the records. A single FLUXNET site entry can represent a decade or more of on-the-ground work before a single data file is made available for synthesis.

The data hubs — AmeriFlux, ICOS, TERN, SAEON, JapanFlux, KoFlux, and others — coordinate distribution, standardize processing through the ONEFlux pipeline, and host the downloads. They are essential infrastructure. But they do not own the underlying measurements; those belong to the site teams. The hubs require user attribution as a condition of access, and this is not bureaucracy: it is how site teams demonstrate impact, justify continued funding, recruit students, and get credit for work that often spans careers.

The required attribution goes beyond citing the hub or the ONEFlux pipeline paper. It requires an individual site-level reference for every tower whose data appears in your analysis. A study using 50 FLUXNET sites needs 50 site-level citations in its reference list — one per tower — plus the relevant hub-level and synthesis references. Assembling those 50 citations by hand from the FLUXNET registry is tedious and error-prone. That is what this tool does for you.

Failing to attribute individual sites correctly is unfair to the researchers whose work you used, and it may breach the data use agreements you accepted when you downloaded the data. Both outcomes are avoidable.


What this tool does and doesn't do

Does:

  • Reads a manifest CSV from a completed FLUXNET shuttle download.
  • Produces a .bib file with one @misc BibTeX entry per site, formatted correctly for the site's hub family (AmeriFlux, ICOS, TERN, SAEON), plus mandated synthesis references appended at the end.
  • Produces a _acknowledgments.md file with network-conditional acknowledgment text ready to paste into a manuscript.
  • Produces a _review_flags.md file noting per-site issues that need human attention before the bibliography is finalised (parse errors, missing identifiers, preserved upstream typos, no-author ICOS entries).
  • Optional subsetting: an optional site_ids_csv parameter restricts citations to a named subset of the manifest. Any site ID in the list that is not present in the manifest causes an explicit error — you cannot cite what you have not downloaded.

Does not:

  • Download FLUXNET data. A completed download is a prerequisite.
  • Query flux_listall() or the shuttle at citation-generation time. Citations reflect the manifest's metadata at download time. This is intentional: pulling fresh metadata later could produce citations that don't match what was actually downloaded. Six months of upstream changes — new sites added, data re-versioned, author typos corrected, PI lists updated — could silently corrupt your bibliography.
  • Verify citations against the BIF (site information) files on disk.
  • Check author affiliations or resolve persistent identifiers (DOI, handle) against their registries.
  • Produce non-BibTeX output formats.

Installation

R 4.0 or later (R 4.6 recommended).

Three packages from CRAN:

install.packages(c("dplyr", "readr", "stringr"))

No other R dependencies. No Python. No environment variables.

To also run the shuttle download workflow that produces the manifest in the first place, install Eric Scott's fluxnet R package:

install.packages("pak")
pak::pak("EcosystemEcologyLab/fluxnet-package")

The example/ directory in this repo contains a frozen 10-site manifest so you can run the tool and inspect the outputs without installing the fluxnet package or downloading any data.


Quick start (example dataset)

# From the fluxnet-citations/ directory:
setwd("example")

source("../generate_fluxnet_citations.R")

generate_fluxnet_citations(
  site_ids_csv  = "example_sites.csv",
  manifest_path = "example_manifest.csv",
  output_prefix = "output/example"
)

Three files are written to example/output/:

  • output/example.bib
  • output/example_acknowledgments.md
  • output/example_review_flags.md

See example/README.md for a full description of the expected outputs.


Real use

Step 1 — Download FLUXNET data

Download your sites using the shuttle. The recommended path is through the fluxnet R package, which handles authentication and batch management:

library(fluxnet)

# Always save the manifest before downloading.
manifest <- flux_listall()
readr::write_csv(manifest, "fluxnet_shuttle_snapshot_20260601.csv")

flux_download(
  file_list_df  = manifest,
  download_dir  = "data/raw"
)

Save flux_listall() output to a CSV file before calling flux_download(). That saved CSV is your manifest. The manifest freezes the metadata state — author names, product IDs, citation strings — as they existed when you downloaded the data. Without it, you have no durable record of that state. flux_download() itself does not save a manifest; if you skip this step, you cannot recover it exactly later.

Step 2 — Identify your manifest

The manifest is the CSV you saved in step 1. Its filename follows the pattern fluxnet_shuttle_snapshot_<YYYYMMDDTHHMMSS>.csv. Keep it alongside your data files or in a version-controlled location.

See Where is my manifest? below for details.

Step 3 — Prepare an optional site list (if citing a subset)

If your analysis used only a subset of your downloaded sites, create a one-column CSV with a site_id header:

site_id
US-Ha1
DE-Tha
AU-How

If you want citations for all sites in your manifest, skip this step.

Step 4 — Generate citations

source("generate_fluxnet_citations.R")

generate_fluxnet_citations(
  site_ids_csv  = "my_sites.csv",          # omit to cite all manifest sites
  manifest_path = "fluxnet_shuttle_snapshot_20260601T120000.csv",
  output_prefix = "outputs/citations/fluxnet_2026"
)

Where is my manifest?

The manifest is the CSV produced by flux_listall() and saved before your download run.

Behavioral rule: Always save your flux_listall() output to a CSV file before calling flux_download(). That saved CSV is your manifest. The shuttle does not save one for you automatically — you must do it explicitly.

manifest <- flux_listall()
readr::write_csv(manifest, paste0(
  "fluxnet_shuttle_snapshot_",
  format(Sys.time(), "%Y%m%dT%H%M%S"),
  ".csv"
))

If you used the paper-repo workflow (01_download.R): the manifest was saved automatically by write_snapshot() before each download run. It lives in data/snapshots/fluxnet_shuttle_snapshot_<timestamp>.csv in your repository clone. The most recent snapshot written before or during your download run is the correct input. There may be several timestamped snapshots in data/snapshots/ if you ran the download in multiple batches; use the one that matches the run in question.

If you downloaded with the Python shuttle CLI directly: check whether you ran fluxnet-shuttle listall -o <dir> before downloading. If you did, the output is fluxnet_shuttle_snapshot_<timestamp>.csv in the directory you specified. If you did not save a manifest before downloading, there is no exact recovery path; your best option is to call flux_listall() now and save it, accepting that some metadata may have drifted since your download.


Output files

All three outputs share the same prefix you pass to output_prefix.

{prefix}.bib — BibTeX entries, one @misc per site. A %% NOTICE block at the top of the file explains the provenance. Sites are followed by mandated synthesis references (@article entries) that vary by the networks present in your site list.

{prefix}_acknowledgments.md — Network-conditional acknowledgment text. Each hub family (AmeriFlux, ICOS, TERN/OzFlux, ChinaFlux/KoFlux, SAEON) has its own paragraph, generated only when sites from that family are present. A data availability statement with [SOURCE] and [DATE] placeholders appears at the end; fill these in before submission.

{prefix}_review_flags.md — Sites that need human review before the bibliography is finalised. Common flags:

  • ICOS/JPF/KOF sites with no author block in their citation string
  • Preserved upstream typos (e.g., AU-Cow's FLUXNEXT, US-Hsm's Kyle_Delwiche) — flagged and preserved verbatim, not silently corrected
  • Sites missing a product_id (entry generated, but no persistent identifier)
  • Parse errors (rare; indicate an unexpected citation format in the manifest)
  • A checklist of mandated references to verify before submission

Limitations and known issues

  • A manifest is required. Users who have not downloaded FLUXNET data, or who downloaded without saving a manifest, cannot use this tool. This is intentional: the manifest is the only durable record of what you actually downloaded and what the metadata said at that time.

  • Upstream typos are preserved, not corrected. AU-Cow's product citation contains "FLUXNEXT" (a typo for "FLUXNET") and US-Hsm's citation contains the author name "Kyle_Delwiche" (with a literal underscore). These are reproduced verbatim from the manifest and flagged in _review_flags.md. They reflect upstream data — correcting them silently would make your citations diverge from the registered record.

  • No automated verification. Citations are taken from the manifest's product_citation field without cross-checking against BIF files or the FLUXNET registry. The _review_flags.md output is not optional reading before submission — it exists specifically to flag cases that need a human check.

  • BibTeX only. No RIS, CSL, or plain-text output formats.


Citing this tool and the underlying work

See CITATION.md for:

  • How to cite this tool (placeholder; to be updated when the FLUXNET 2026 annual paper publishes)
  • How to cite Eric Scott's fluxnet R package
  • The Pastorello et al. 2020 ONEFlux pipeline reference (placeholder; to be updated with the FLUXNET 2026 synthesis citation)
  • A note on why site-level citations are required and not optional

License

MIT. See LICENSE.


Contact and contributions

Issues and pull requests are welcome on GitHub: https://github.com/EcosystemEcologyLab/fluxnet-citations

For direct contact: David J. P. Moore — davidjpmoore@arizona.edu

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages