Skip to content

Add processing pipeline for realtime sbd data #180

@MBARIMike

Description

@MBARIMike

The STOQS software is still being used for processing realtime short burst data received during LRAUV deployments. This needs to be moved over to the auv-python project so that we can fully retire the STOQS lrauvNc4ToNetcdf.py code. Here is the plan that Claude & I devised:

Add realtime SBD shore.nc4 processing pipeline

Background

The legacy lrauvNc4ToNetcdf.py script processed two LRAUV data sources:

  • Delayed mode — full-resolution .nc4 log files from missionlogs/. This path has been migrated to the auv-python pipeline (extract → combine → align → resample → products).
  • Realtime — decimated shore.nc4 files telemetered via SBD from realtime/sbdlogs/. This path has not been ported.

Realtime data is valuable because it's available during a deployment, not only after vehicle recovery. This issue tracks implementing the realtime pipeline in auv-python.

What shore.nc4 files look like

  • Location: realtime/sbdlogs/YYYY/YYYYMM/YYYYMMDDTHHMMSS/shore.nc4
  • One file per SBD transmission; many small files span a single deployment
  • Data is already binned/averaged at ~2 S by the vehicle onboard
  • Variables include bin_mean_* and bin_median_* prefixes alongside raw values
  • Primary time axis: depth_time (unlike full-resolution files which use time_time)
  • Legacy processing method: processNc4FileDecimated() in lrauvNc4ToNetcdf.py

Proposed approach: lean standalone pipeline

Because shore.nc4 files are already decimated, the full 5-stage delayed-mode pipeline (extract → combine → align → resample → products) is unnecessary. A 3-stage pipeline is appropriate:

shore.nc4 files  (many per deployment, in realtime/sbdlogs/)
  ↓ sbd2netcdf.py
     1. Discover shore.nc4 files within deployment date window
     2. Download each via pooch (same pattern as nc42netcdfs.py)
     3. Open with xarray, read groups per SBD_PARMS
     4. Rename variables: <group>_<variable> convention (see below)
     5. Concatenate over time → single xr.Dataset
     6. Resample to 1S grid
     7. Write _sbd_1S.nc with CF metadata
  ↓ create_products.py  (existing, with minor extension for bin_mean_* variables)
  ↓ archive.py  (extend to handle realtime/ paths)

Variable naming convention

Variable names from shore.nc4 are preserved as-is from the source, prefixed with the lowercased group name — the same <group>_<variable> convention used for delayed-mode data. The _ (root/backseat) group maps to the prefix backseat.

Group Source variable Output variable
CTD_Seabird bin_mean_sea_water_temperature ctdseabird_bin_mean_sea_water_temperature
CTD_Seabird bin_median_sea_water_salinity ctdseabird_bin_median_sea_water_salinity
WetLabsBB2FL bin_mean_mass_concentration_of_chlorophyll_in_sea_water wetlabsbb2fl_bin_mean_mass_concentration_of_chlorophyll_in_sea_water
_ (backseat) planktivore_diatoms backseat_planktivore_diatoms
/ (root coords) depth, latitude, longitude unchanged (coordinates)
/ (root) platform_pitch_angle universals_platform_pitch_angle

New files

  • src/data/process_lrauv_sbd.py — entry point; accepts --auv_name, --start, --end, --clobber, -v. Mirrors process_lrauv.py in structure.
  • src/data/sbd2netcdf.py — core module; SbdExtract class handles discovery, download, concatenation, and resampling.

Files to extend

  • src/data/create_products.py — add a helper that strips bin_mean_ / bin_median_ prefixes when resolving variable → colormap/label/column mappings, so existing lookup tables cover all bin_* variants automatically.
  • src/data/archive.py — add realtime/sbdlogs/ path handling alongside existing missionlogs/ paths.

Open questions

  1. Need to inspect an actual shore.nc4 file (xr.open_dataset(..., group='CTD_Seabird')) to confirm exact group names and variable names before writing SBD_PARMS.
  2. When both bin_mean_* and raw values exist for the same quantity, should both be written to the output, or only the binned version?

Acceptance criteria

  • process_lrauv_sbd.py --auv_name ahi --start 20260406 --end 20260412 -v runs end-to-end on a deployment with realtime/sbdlogs/ data
  • Output _sbd_1S.nc is CF-compliant and uses <group>_<variable> naming with backseat_ prefix for _-group variables
  • create_products.py plots bin_mean_* variables with correct colormaps and column placement
  • Plots compare sensibly against delayed-mode plots for the same deployment period
  • archive.py copies output files to the correct realtime/sbdlogs/ paths

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions