vessel_satellite_radiance: duplicate TIME collision when two vessels operate simultaneously

## Problem

The `vessel_satellite_radiance_delayed_qc` (and `vessel_satellite_radiance_derived_product`) dataset config combines data from **two simultaneously-operating vessels** into a **single Zarr store** using `TIME` as the only append dimension:

```
paths:
  s3://imos-data/IMOS/SRS/OC/radiometer/VMQ9273_Solander      # RV Solander
  s3://imos-data/IMOS/SRS/OC/radiometer/VLHJ_Southern-Surveyor  # RV Southern Surveyor
```

When both vessels are at sea at the same time, their `TIME` values overlap. The current code path in `_write_ds` detects the overlap and calls `_handle_duplicate_regions`, which **overwrites** the existing data at those TIME positions with the new batch's data. This means whichever vessel was processed **last silently replaces** the other vessel's observations — **silent data loss**.

The `_find_duplicated_values` method detects this post-write and logs a WARNING, but:
1. The warning fires once per batch (many repeated lines with different UUIDs, same content)
2. A misleading TODO comment claims this may be acceptable: `# Not necessarily an issue. For example, some SOOP dataset, same TIME, 2 different NetCDF files, 2 different vessel and location.`
3. It doesn't name the specific conflicting source files

The config already tracks vessel identity via `platform_code` (global attribute → per-TIME variable, e.g. `VMQ9273` / `VLHJ`), so the information exists to detect and handle this.

## Impact

Any TIME step where both vessels were active will only retain data from whichever vessel's batch was processed last. At re-run time the situation may flip, making results non-deterministic.

## Options

### Option 1 — Split into two separate dataset configs

Create:
- `vessel_satellite_radiance_VMQ9273_Solander_delayed_qc`
- `vessel_satellite_radiance_VLHJ_Southern-Surveyor_delayed_qc`

Each config points to one vessel's S3 path, producing a separate Zarr store. No TIME collisions possible.

**Pros:** Cleanest, minimal code changes, fully correct.
**Cons:** Two configs to maintain; downstream consumers query both stores.

### Option 2 — Add `platform` as a fixed second dimension (recommended)

Restructure the Zarr to have dims `(TIME, platform)` (or `(platform, TIME)`):
- `TIME` remains the **append dimension** — `_handle_duplicate_regions` works unchanged
- `platform` is a **fixed 2-value dim** (`["VMQ9273", "VLHJ"]`) — no append logic needed

Implementation in preprocessing:
1. Determine the vessel's `platform_code` from the file's global attributes
2. Expand the 1D `(TIME,)` variables to `(platform, TIME)` with `NaN` fill for the inactive platform
3. Set the `platform` coordinate accordingly

The Zarr region-write logic (`region={TIME: slice(start, end)}`) still works because `platform` is a fixed dimension — `_write_ds` and `_handle_duplicate_regions` only iterate over the TIME dimension.

**Pros:** Single store, all vessels co-located, no data loss, `platform` is a queryable coordinate.
**Cons:** Requires preprocessing changes to expand 1D files to 2D; schema update; migration of existing stores.

### Option 3 — Protective guard in `_write_ds` (short-term mitigation)

Detect cross-file TIME collisions at write time using the `filename` / `platform_code` variables already in the store. If `common_append_dim_values > 0` AND the `platform_code` at those positions in the store ≠ the incoming batch's `platform_code` → log a clear ERROR and **skip** (don't overwrite) instead of calling `_handle_duplicate_regions`.

This prevents silent data loss while a proper fix is designed.

## Recommendation

1. **Immediately (separate PR):** implement Option 3 as a protective guard to stop silent overwriting and log the conflicting files clearly
2. **Preferred fix:** implement Option 2 (add `platform` fixed dim) — stays as a single dataset, correct semantics
3. **Alternative:** Option 1 (split configs) if Option 2 proves too complex

cc @lbesnard

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vessel_satellite_radiance: duplicate TIME collision when two vessels operate simultaneously #289

Problem

Impact

Options

Option 1 — Split into two separate dataset configs

Option 2 — Add `platform` as a fixed second dimension (recommended)

Option 3 — Protective guard in `_write_ds` (short-term mitigation)

Recommendation

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

vessel_satellite_radiance: duplicate TIME collision when two vessels operate simultaneously #289

Description

Problem

Impact

Options

Option 1 — Split into two separate dataset configs

Option 2 — Add platform as a fixed second dimension (recommended)

Option 3 — Protective guard in _write_ds (short-term mitigation)

Recommendation

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Option 2 — Add `platform` as a fixed second dimension (recommended)

Option 3 — Protective guard in `_write_ds` (short-term mitigation)