Skip to content

Add Phase 3: stereo image fidelity (inter-channel coherence)#19

Merged
nschimme merged 2 commits into
masterfrom
phase3-stereo-image-fidelity
Jun 13, 2026
Merged

Add Phase 3: stereo image fidelity (inter-channel coherence)#19
nschimme merged 2 commits into
masterfrom
phase3-stereo-image-fidelity

Conversation

@nschimme

@nschimme nschimme commented Jun 13, 2026

Copy link
Copy Markdown
Owner

Motivation

ViSQOL audio mode (Phase 2) is effectively monaural — decoding the same AAC to stereo vs. mono yields a near-identical MOS (measured: 2.9248 ≡ 2.9241). It scores per-frame spectral fidelity and is blind to the stereo image.

This biases the suite toward stereo collapse: forced Intensity Stereo (--joint 2) discards the L/R relationship to bank bits for spectral fidelity ViSQOL rewards, so it can out-score Mixed Mode (--joint 3) on MOS while being worse for stereo material. Every reward the suite gives on music currently pushes encoders toward mono collapse.

What this adds

A Phase 3 — Stereo Image Fidelity step that measures the property MOS cannot: how faithfully the inter-channel relationship is reconstructed.

  • phase3_stereo.py — computes a windowed (50 ms) inter-channel coherence error between reference and decoded output, time-aligned for codec delay via cross-correlation. Lower = truer stereo image. Runs only on stereo scenarios (skips mono speech), parallelized; needs only ffmpeg + numpy, no ViSQOL. Writes ic_err per matrix entry.
  • run_benchmark.py — wired in as Phase 3 (runs locally after MOS), with a --skip-stereo opt-out.
  • compare_results.py — surfaces a Stereo Image Δ summary line (sign matches MOS: positive = candidate truer stereo), a per-scenario Stereo Δ column, and a Worst Stereo Drop outlier.

Validation

On music_low (49 stereo files @ 64 kbps), forced-IS vs. Mixed Mode:

Metric forced-IS Mixed Mode
Monaural ViSQOL MOS 3.3473 3.3494 parity (metric is blind to stereo)
Inter-channel coherence error 0.1534 0.1499 Mixed better
Per-file coherence win rate 16/49 33/49 Mixed ~2:1

The files where Mixed Mode "loses" most on monaural ViSQOL are exactly where it wins most on coherence — confirming the new metric captures a real property MOS misses.

Scope / caveats

  • Reported, not gating. It surfaces stereo regressions without failing CI, since coherence error is a proxy — the gold standard remains a subjective MUSHRA/ABX listening test. Easy to make gating on a Worst Stereo Drop threshold later if desired.
  • --skip-mos currently also skips Phase 3 (early return); the full-benchmark path is unaffected.
  • Unrelated local phase2_mos.py changes are intentionally excluded from this PR.

🤖 Generated with Claude Code

Summary by Sourcery

Add a new stereo image fidelity analysis phase to the audio benchmark and surface its results alongside existing MOS metrics.

New Features:

  • Introduce a Phase 3 stereo image fidelity computation that measures inter-channel coherence error for stereo samples using ffmpeg and numpy.
  • Add a command-line option to run or skip the new stereo image analysis phase in the benchmark runner.
  • Expose stereo image deltas and worst-case stereo regressions in the comparison report, including per-scenario stereo metrics.

Enhancements:

  • Extend result aggregation to track and summarize stereo image coherence statistics across suites and scenarios.

ViSQOL "audio" mode is effectively monaural — decoding the same AAC to
stereo vs. mono yields a near-identical MOS, so it is blind to the stereo
image. This biases the suite toward stereo collapse: forced Intensity
Stereo discards the L/R relationship to bank bits for spectral fidelity
ViSQOL rewards, and can out-score Mixed Mode on MOS while being worse for
stereo material.

Add a Phase 3 step that measures the property MOS cannot: a windowed
inter-channel coherence error (time-aligned for codec delay; lower = truer
stereo image) between reference and decoded output.

- phase3_stereo.py: compute ic_err per stereo entry (skips mono speech),
  parallelized; needs only ffmpeg + numpy, no ViSQOL.
- run_benchmark.py: run as Phase 3 after MOS, with --skip-stereo opt-out.
- compare_results.py: surface "Stereo Image Δ" in the summary, a per-
  scenario "Stereo Δ" column, and a "Worst Stereo Drop" outlier.

Reported, not gating: it surfaces stereo regressions without failing CI,
since coherence error is a proxy (the gold standard remains a subjective
MUSHRA/ABX listening test), not a perceptual ground truth.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@sourcery-ai

sourcery-ai Bot commented Jun 13, 2026

Copy link
Copy Markdown

Reviewer's Guide

Adds Phase 3 stereo image fidelity measurement based on inter-channel coherence error, wires it into the benchmark pipeline, and surfaces new stereo metrics in the comparison report alongside MOS and throughput.

Flow diagram for benchmark phases including new stereo Phase 3

flowchart TD
    A[run_benchmark main] --> B[Parse args]
    B --> C[Phase 1: phase1_encode.py]
    C --> D{skip_mos?}
    D -->|yes| G{skip_stereo?}
    D -->|no| E[Phase 2: phase2_mos.py via container]
    E --> G{skip_stereo?}
    G -->|yes| H[Benchmark complete]
    G -->|no| F[Phase 3: phase3_stereo.py local]
    F --> H[Benchmark complete]
Loading

Flow diagram for Phase 3 stereo coherence computation

flowchart TD
    A[phase3_stereo main] --> B[Load results_json and matrix]
    B --> C[Filter pending non speech stereo entries]
    C --> D{Any pending?}
    D -->|no| I[Exit: No pending stereo computations]
    D -->|yes| E[ProcessPoolExecutor over pending]
    E --> F[compute_single for each key]
    F --> G[decode_stereo ref and aac]
    G --> H[coherence_error compute ic_err]
    H --> E
    E --> J[Collect ic_err results]
    J --> K[Update matrix entries ic_err]
    K --> L[Write updated results_json]
    L --> M[Print Phase 3 complete]
Loading

File-Level Changes

Change Details Files
Introduce Phase 3 stereo image fidelity calculation using inter-channel coherence error for stereo scenarios.
  • Add new phase3_stereo.py module implementing windowed inter-channel coherence error between reference and decoded audio, including ffmpeg-based decoding and codec-delay alignment via cross-correlation.
  • Use SCENARIOS config and existing AAC path resolution to select non-speech (stereo) samples and locate encoded files.
  • Parallelize per-sample coherence computation via ProcessPoolExecutor, skipping mono/speech scenarios and already-scored entries, and persist results by writing ic_err back into the main results JSON.
phase3_stereo.py
Wire stereo image fidelity as an optional Phase 3 in the benchmark runner.
  • Extend CLI with --skip-stereo flag to optionally skip stereo image computation.
  • Locate the new phase3_stereo.py script and invoke it after Phase 2, passing paths to the results JSON, AAC output, and external reference audio directory.
run_benchmark.py
Report stereo image fidelity metrics in the comparison summary and per-scenario tables.
  • Track aggregate inter-channel coherence deltas and counts, plus worst stereo regression, when comparing base and candidate results.
  • Compute and emit high-level "Stereo Image Δ" and "Worst Stereo Drop" lines in the summary when enough data is present, with small thresholds to filter noise.
  • Extend the scenario performance aggregation to maintain per-scenario stereo coherence deltas and show them in a new "Stereo Δ" column in the table.
compare_results.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 security issue, 2 other issues, and left some high level feedback:

Security issues:

  • Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'. (link)

General comments:

  • In phase3_stereo.decode_stereo, consider using check=True or at least logging stderr on ffmpeg failures so that decoding issues are visible instead of silently returning None.
  • In phase3_stereo.read_stereo, using a context manager (with wave.open(path, 'rb') as w:) would ensure the file handle is always closed even if an exception is raised while reading.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `phase3_stereo.decode_stereo`, consider using `check=True` or at least logging `stderr` on ffmpeg failures so that decoding issues are visible instead of silently returning `None`.
- In `phase3_stereo.read_stereo`, using a context manager (`with wave.open(path, 'rb') as w:`) would ensure the file handle is always closed even if an exception is raised while reading.

## Individual Comments

### Comment 1
<location path="phase3_stereo.py" line_range="70-79" />
<code_context>
+    return out if r.returncode == 0 else None
+
+
+def read_stereo(path):
+    w = wave.open(path, "rb")
+    ch = w.getnchannels()
+    raw = w.readframes(w.getnframes())
+    w.close()
+    a = np.frombuffer(raw, dtype=np.int16).astype(np.float64)
+    if ch >= 2:
+        a = a.reshape(-1, ch)
+        return a[:, 0], a[:, 1]
+    return a, a  # mono source: both channels identical
+
+
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Use a context manager when opening wave files to avoid leaking file descriptors on errors.

If `wave.open` or `readframes` raises, `w.close()` is never called and the descriptor may be leaked. Using `with wave.open(path, "rb") as w:` ensures the handle is always closed and matches common Python style.

```suggestion
def read_stereo(path):
    with wave.open(path, "rb") as w:
        ch = w.getnchannels()
        raw = w.readframes(w.getnframes())
    a = np.frombuffer(raw, dtype=np.int16).astype(np.float64)
    if ch >= 2:
        a = a.reshape(-1, ch)
        return a[:, 0], a[:, 1]
    return a, a  # mono source: both channels identical
```
</issue_to_address>

### Comment 2
<location path="phase3_stereo.py" line_range="114-119" />
<code_context>
+    m = min(len(rL), len(dL))
+    rL, rR, dL, dR = rL[:m], rR[:m], dL[:m], dR[:m]
+
+    errs = []
+    for s in range(0, m - FRAME, FRAME):
+        ec = abs(coherence(rL[s:s + FRAME], rR[s:s + FRAME])
+                 - coherence(dL[s:s + FRAME], dR[s:s + FRAME]))
+        errs.append(ec)
+    return float(np.mean(errs)) if errs else None
+
+
</code_context>
<issue_to_address>
**issue:** Very short stereo clips never get an `ic_err` because frames shorter than `FRAME` are discarded.

When `m < FRAME`, the loop never executes, `errs` stays empty, and `coherence_error` returns `None` even for valid stereo, so these short items never get an `ic_err` and remain `pending`. Consider handling `m < FRAME` by computing a coherence error over the whole segment (e.g., shrinking the effective frame for the last/only window) or adding an explicit fallback path that still computes a single error value.
</issue_to_address>

### Comment 3
<location path="run_benchmark.py" line_range="277-282" />
<code_context>
        subprocess.run([
            sys.executable, phase3_script,
            args.output,
            os.path.join(script_dir, "output"),
            os.path.join(script_dir, "data", "external"),
        ], check=True)
</code_context>
<issue_to_address>
**security (python.lang.security.audit.dangerous-subprocess-use-audit):** Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.

*Source: opengrep*
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread phase3_stereo.py
Comment on lines +70 to +79
def read_stereo(path):
w = wave.open(path, "rb")
ch = w.getnchannels()
raw = w.readframes(w.getnframes())
w.close()
a = np.frombuffer(raw, dtype=np.int16).astype(np.float64)
if ch >= 2:
a = a.reshape(-1, ch)
return a[:, 0], a[:, 1]
return a, a # mono source: both channels identical

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Use a context manager when opening wave files to avoid leaking file descriptors on errors.

If wave.open or readframes raises, w.close() is never called and the descriptor may be leaked. Using with wave.open(path, "rb") as w: ensures the handle is always closed and matches common Python style.

Suggested change
def read_stereo(path):
w = wave.open(path, "rb")
ch = w.getnchannels()
raw = w.readframes(w.getnframes())
w.close()
a = np.frombuffer(raw, dtype=np.int16).astype(np.float64)
if ch >= 2:
a = a.reshape(-1, ch)
return a[:, 0], a[:, 1]
return a, a # mono source: both channels identical
def read_stereo(path):
with wave.open(path, "rb") as w:
ch = w.getnchannels()
raw = w.readframes(w.getnframes())
a = np.frombuffer(raw, dtype=np.int16).astype(np.float64)
if ch >= 2:
a = a.reshape(-1, ch)
return a[:, 0], a[:, 1]
return a, a # mono source: both channels identical

Comment thread phase3_stereo.py
Comment on lines +114 to +119
errs = []
for s in range(0, m - FRAME, FRAME):
ec = abs(coherence(rL[s:s + FRAME], rR[s:s + FRAME])
- coherence(dL[s:s + FRAME], dR[s:s + FRAME]))
errs.append(ec)
return float(np.mean(errs)) if errs else None

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: Very short stereo clips never get an ic_err because frames shorter than FRAME are discarded.

When m < FRAME, the loop never executes, errs stays empty, and coherence_error returns None even for valid stereo, so these short items never get an ic_err and remain pending. Consider handling m < FRAME by computing a coherence error over the whole segment (e.g., shrinking the effective frame for the last/only window) or adding an explicit fallback path that still computes a single error value.

Comment thread run_benchmark.py
Comment on lines +277 to +282
subprocess.run([
sys.executable, phase3_script,
args.output,
os.path.join(script_dir, "output"),
os.path.join(script_dir, "data", "external"),
], check=True)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security (python.lang.security.audit.dangerous-subprocess-use-audit): Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.

Source: opengrep

@nschimme nschimme merged commit 950c372 into master Jun 13, 2026
3 checks passed
@nschimme nschimme deleted the phase3-stereo-image-fidelity branch June 13, 2026 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant