The STOQS software is still being used for processing realtime short burst data received during LRAUV deployments. This needs to be moved over to the auv-python project so that we can fully retire the STOQS lrauvNc4ToNetcdf.py code. Here is the plan that Claude & I devised:
Add realtime SBD shore.nc4 processing pipeline
Background
The legacy lrauvNc4ToNetcdf.py script processed two LRAUV data sources:
- Delayed mode — full-resolution
.nc4 log files from missionlogs/. This path has been migrated to the auv-python pipeline (extract → combine → align → resample → products).
- Realtime — decimated
shore.nc4 files telemetered via SBD from realtime/sbdlogs/. This path has not been ported.
Realtime data is valuable because it's available during a deployment, not only after vehicle recovery. This issue tracks implementing the realtime pipeline in auv-python.
What shore.nc4 files look like
- Location:
realtime/sbdlogs/YYYY/YYYYMM/YYYYMMDDTHHMMSS/shore.nc4
- One file per SBD transmission; many small files span a single deployment
- Data is already binned/averaged at ~2 S by the vehicle onboard
- Variables include
bin_mean_* and bin_median_* prefixes alongside raw values
- Primary time axis:
depth_time (unlike full-resolution files which use time_time)
- Legacy processing method:
processNc4FileDecimated() in lrauvNc4ToNetcdf.py
Proposed approach: lean standalone pipeline
Because shore.nc4 files are already decimated, the full 5-stage delayed-mode pipeline (extract → combine → align → resample → products) is unnecessary. A 3-stage pipeline is appropriate:
shore.nc4 files (many per deployment, in realtime/sbdlogs/)
↓ sbd2netcdf.py
1. Discover shore.nc4 files within deployment date window
2. Download each via pooch (same pattern as nc42netcdfs.py)
3. Open with xarray, read groups per SBD_PARMS
4. Rename variables: <group>_<variable> convention (see below)
5. Concatenate over time → single xr.Dataset
6. Resample to 1S grid
7. Write _sbd_1S.nc with CF metadata
↓ create_products.py (existing, with minor extension for bin_mean_* variables)
↓ archive.py (extend to handle realtime/ paths)
Variable naming convention
Variable names from shore.nc4 are preserved as-is from the source, prefixed with the lowercased group name — the same <group>_<variable> convention used for delayed-mode data. The _ (root/backseat) group maps to the prefix backseat.
| Group |
Source variable |
Output variable |
CTD_Seabird |
bin_mean_sea_water_temperature |
ctdseabird_bin_mean_sea_water_temperature |
CTD_Seabird |
bin_median_sea_water_salinity |
ctdseabird_bin_median_sea_water_salinity |
WetLabsBB2FL |
bin_mean_mass_concentration_of_chlorophyll_in_sea_water |
wetlabsbb2fl_bin_mean_mass_concentration_of_chlorophyll_in_sea_water |
_ (backseat) |
planktivore_diatoms |
backseat_planktivore_diatoms |
/ (root coords) |
depth, latitude, longitude |
unchanged (coordinates) |
/ (root) |
platform_pitch_angle |
universals_platform_pitch_angle |
New files
src/data/process_lrauv_sbd.py — entry point; accepts --auv_name, --start, --end, --clobber, -v. Mirrors process_lrauv.py in structure.
src/data/sbd2netcdf.py — core module; SbdExtract class handles discovery, download, concatenation, and resampling.
Files to extend
src/data/create_products.py — add a helper that strips bin_mean_ / bin_median_ prefixes when resolving variable → colormap/label/column mappings, so existing lookup tables cover all bin_* variants automatically.
src/data/archive.py — add realtime/sbdlogs/ path handling alongside existing missionlogs/ paths.
Open questions
- Need to inspect an actual
shore.nc4 file (xr.open_dataset(..., group='CTD_Seabird')) to confirm exact group names and variable names before writing SBD_PARMS.
- When both
bin_mean_* and raw values exist for the same quantity, should both be written to the output, or only the binned version?
Acceptance criteria
The STOQS software is still being used for processing realtime short burst data received during LRAUV deployments. This needs to be moved over to the auv-python project so that we can fully retire the STOQS lrauvNc4ToNetcdf.py code. Here is the plan that Claude & I devised:
Add realtime SBD shore.nc4 processing pipeline
Background
The legacy
lrauvNc4ToNetcdf.pyscript processed two LRAUV data sources:.nc4log files frommissionlogs/. This path has been migrated to the auv-python pipeline (extract → combine → align → resample → products).shore.nc4files telemetered via SBD fromrealtime/sbdlogs/. This path has not been ported.Realtime data is valuable because it's available during a deployment, not only after vehicle recovery. This issue tracks implementing the realtime pipeline in auv-python.
What shore.nc4 files look like
realtime/sbdlogs/YYYY/YYYYMM/YYYYMMDDTHHMMSS/shore.nc4bin_mean_*andbin_median_*prefixes alongside raw valuesdepth_time(unlike full-resolution files which usetime_time)processNc4FileDecimated()inlrauvNc4ToNetcdf.pyProposed approach: lean standalone pipeline
Because shore.nc4 files are already decimated, the full 5-stage delayed-mode pipeline (extract → combine → align → resample → products) is unnecessary. A 3-stage pipeline is appropriate:
Variable naming convention
Variable names from shore.nc4 are preserved as-is from the source, prefixed with the lowercased group name — the same
<group>_<variable>convention used for delayed-mode data. The_(root/backseat) group maps to the prefixbackseat.CTD_Seabirdbin_mean_sea_water_temperaturectdseabird_bin_mean_sea_water_temperatureCTD_Seabirdbin_median_sea_water_salinityctdseabird_bin_median_sea_water_salinityWetLabsBB2FLbin_mean_mass_concentration_of_chlorophyll_in_sea_waterwetlabsbb2fl_bin_mean_mass_concentration_of_chlorophyll_in_sea_water_(backseat)planktivore_diatomsbackseat_planktivore_diatoms/(root coords)depth,latitude,longitude/(root)platform_pitch_angleuniversals_platform_pitch_angleNew files
src/data/process_lrauv_sbd.py— entry point; accepts--auv_name,--start,--end,--clobber,-v. Mirrorsprocess_lrauv.pyin structure.src/data/sbd2netcdf.py— core module;SbdExtractclass handles discovery, download, concatenation, and resampling.Files to extend
src/data/create_products.py— add a helper that stripsbin_mean_/bin_median_prefixes when resolving variable → colormap/label/column mappings, so existing lookup tables cover allbin_*variants automatically.src/data/archive.py— addrealtime/sbdlogs/path handling alongside existingmissionlogs/paths.Open questions
shore.nc4file (xr.open_dataset(..., group='CTD_Seabird')) to confirm exact group names and variable names before writingSBD_PARMS.bin_mean_*and raw values exist for the same quantity, should both be written to the output, or only the binned version?Acceptance criteria
process_lrauv_sbd.py --auv_name ahi --start 20260406 --end 20260412 -vruns end-to-end on a deployment withrealtime/sbdlogs/data_sbd_1S.ncis CF-compliant and uses<group>_<variable>naming withbackseat_prefix for_-group variablescreate_products.pyplotsbin_mean_*variables with correct colormaps and column placementarchive.pycopies output files to the correctrealtime/sbdlogs/paths