Currently flux_qc() flags overly gap-filled rows in aggregated (daily and coarser) data. For the Annual Paper pipeline we need QC filtering at all temporal resolutions. Not sure how many sites have HR rather than HH files but it's possible that some sites can't resolve eddy's in the HH time window. I remember that Ray Leuning refused to integrate to HH and there might be other sites with this logic today.
I had an AI read through several pages of QaQc flag descriptions and it noticed a few things that I have been trying to think though.
Half-hourly / hourly data (HH/HR):
At HH/HR resolution the _QC flag is an integer (0–3):
0 = measured
1 = good quality gap-fill (MDS)
2 = medium quality gap-fill
3 = poor quality gap-fill
Would like flux_qc() to accept a threshold argument for HH/HR data (default: keep _QC <= 1) and flag or drop rows that don't meet it.
Aggregated data (DD, WW, MM, YY):
At coarser resolutions the _QC flag is a fraction (0–1) representing the proportion of underlying HH/HR records that were measured or good-quality gap-fills. Best is it was configurable per-resolution thresholds with sensible defaults, e.g.:
rflux_qc(
data,
threshold_hh = 1, # keep _QC <= 1 at HH/HR
threshold_dd = 0.75, # keep _QC > 0.75 at daily
threshold_ww = 0.75, # keep _QC > 0.75 at weekly
threshold_mm = 0.75, # keep _QC > 0.75 at monthly
threshold_yy = 0.75 # keep _QC > 0.75 at annual
)
The default of 0.75 is stricter than the FLUXNET published convention of 0.5 - I feel better with a higher default but I may walk it back.
ERA-Interim fills:
Would also like a way to identify records where consolidated meteorological variables were ERA-Interim filled (_F_QC == 2) rather than MDS filled (_F_QC == 1). The default behaviour would be to flag these rather than drop them, giving the user the choice.
Will plan to implement a stopgap version of this in the paper analysis repository in the meantime. Will share when it gets done.
Currently flux_qc() flags overly gap-filled rows in aggregated (daily and coarser) data. For the Annual Paper pipeline we need QC filtering at all temporal resolutions. Not sure how many sites have HR rather than HH files but it's possible that some sites can't resolve eddy's in the HH time window. I remember that Ray Leuning refused to integrate to HH and there might be other sites with this logic today.
I had an AI read through several pages of QaQc flag descriptions and it noticed a few things that I have been trying to think though.
Half-hourly / hourly data (HH/HR):
At HH/HR resolution the _QC flag is an integer (0–3):
0 = measured
1 = good quality gap-fill (MDS)
2 = medium quality gap-fill
3 = poor quality gap-fill
Would like flux_qc() to accept a threshold argument for HH/HR data (default: keep _QC <= 1) and flag or drop rows that don't meet it.
Aggregated data (DD, WW, MM, YY):
At coarser resolutions the _QC flag is a fraction (0–1) representing the proportion of underlying HH/HR records that were measured or good-quality gap-fills. Best is it was configurable per-resolution thresholds with sensible defaults, e.g.:
rflux_qc(
data,
threshold_hh = 1, # keep _QC <= 1 at HH/HR
threshold_dd = 0.75, # keep _QC > 0.75 at daily
threshold_ww = 0.75, # keep _QC > 0.75 at weekly
threshold_mm = 0.75, # keep _QC > 0.75 at monthly
threshold_yy = 0.75 # keep _QC > 0.75 at annual
)
The default of 0.75 is stricter than the FLUXNET published convention of 0.5 - I feel better with a higher default but I may walk it back.
ERA-Interim fills:
Would also like a way to identify records where consolidated meteorological variables were ERA-Interim filled (_F_QC == 2) rather than MDS filled (_F_QC == 1). The default behaviour would be to flag these rather than drop them, giving the user the choice.
Will plan to implement a stopgap version of this in the paper analysis repository in the meantime. Will share when it gets done.