Add Phase 3: stereo image fidelity (inter-channel coherence)#19
Conversation
ViSQOL "audio" mode is effectively monaural — decoding the same AAC to stereo vs. mono yields a near-identical MOS, so it is blind to the stereo image. This biases the suite toward stereo collapse: forced Intensity Stereo discards the L/R relationship to bank bits for spectral fidelity ViSQOL rewards, and can out-score Mixed Mode on MOS while being worse for stereo material. Add a Phase 3 step that measures the property MOS cannot: a windowed inter-channel coherence error (time-aligned for codec delay; lower = truer stereo image) between reference and decoded output. - phase3_stereo.py: compute ic_err per stereo entry (skips mono speech), parallelized; needs only ffmpeg + numpy, no ViSQOL. - run_benchmark.py: run as Phase 3 after MOS, with --skip-stereo opt-out. - compare_results.py: surface "Stereo Image Δ" in the summary, a per- scenario "Stereo Δ" column, and a "Worst Stereo Drop" outlier. Reported, not gating: it surfaces stereo regressions without failing CI, since coherence error is a proxy (the gold standard remains a subjective MUSHRA/ABX listening test), not a perceptual ground truth. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Reviewer's GuideAdds Phase 3 stereo image fidelity measurement based on inter-channel coherence error, wires it into the benchmark pipeline, and surfaces new stereo metrics in the comparison report alongside MOS and throughput. Flow diagram for benchmark phases including new stereo Phase 3flowchart TD
A[run_benchmark main] --> B[Parse args]
B --> C[Phase 1: phase1_encode.py]
C --> D{skip_mos?}
D -->|yes| G{skip_stereo?}
D -->|no| E[Phase 2: phase2_mos.py via container]
E --> G{skip_stereo?}
G -->|yes| H[Benchmark complete]
G -->|no| F[Phase 3: phase3_stereo.py local]
F --> H[Benchmark complete]
Flow diagram for Phase 3 stereo coherence computationflowchart TD
A[phase3_stereo main] --> B[Load results_json and matrix]
B --> C[Filter pending non speech stereo entries]
C --> D{Any pending?}
D -->|no| I[Exit: No pending stereo computations]
D -->|yes| E[ProcessPoolExecutor over pending]
E --> F[compute_single for each key]
F --> G[decode_stereo ref and aac]
G --> H[coherence_error compute ic_err]
H --> E
E --> J[Collect ic_err results]
J --> K[Update matrix entries ic_err]
K --> L[Write updated results_json]
L --> M[Print Phase 3 complete]
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 1 security issue, 2 other issues, and left some high level feedback:
Security issues:
- Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'. (link)
General comments:
- In
phase3_stereo.decode_stereo, consider usingcheck=Trueor at least loggingstderron ffmpeg failures so that decoding issues are visible instead of silently returningNone. - In
phase3_stereo.read_stereo, using a context manager (with wave.open(path, 'rb') as w:) would ensure the file handle is always closed even if an exception is raised while reading.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `phase3_stereo.decode_stereo`, consider using `check=True` or at least logging `stderr` on ffmpeg failures so that decoding issues are visible instead of silently returning `None`.
- In `phase3_stereo.read_stereo`, using a context manager (`with wave.open(path, 'rb') as w:`) would ensure the file handle is always closed even if an exception is raised while reading.
## Individual Comments
### Comment 1
<location path="phase3_stereo.py" line_range="70-79" />
<code_context>
+ return out if r.returncode == 0 else None
+
+
+def read_stereo(path):
+ w = wave.open(path, "rb")
+ ch = w.getnchannels()
+ raw = w.readframes(w.getnframes())
+ w.close()
+ a = np.frombuffer(raw, dtype=np.int16).astype(np.float64)
+ if ch >= 2:
+ a = a.reshape(-1, ch)
+ return a[:, 0], a[:, 1]
+ return a, a # mono source: both channels identical
+
+
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Use a context manager when opening wave files to avoid leaking file descriptors on errors.
If `wave.open` or `readframes` raises, `w.close()` is never called and the descriptor may be leaked. Using `with wave.open(path, "rb") as w:` ensures the handle is always closed and matches common Python style.
```suggestion
def read_stereo(path):
with wave.open(path, "rb") as w:
ch = w.getnchannels()
raw = w.readframes(w.getnframes())
a = np.frombuffer(raw, dtype=np.int16).astype(np.float64)
if ch >= 2:
a = a.reshape(-1, ch)
return a[:, 0], a[:, 1]
return a, a # mono source: both channels identical
```
</issue_to_address>
### Comment 2
<location path="phase3_stereo.py" line_range="114-119" />
<code_context>
+ m = min(len(rL), len(dL))
+ rL, rR, dL, dR = rL[:m], rR[:m], dL[:m], dR[:m]
+
+ errs = []
+ for s in range(0, m - FRAME, FRAME):
+ ec = abs(coherence(rL[s:s + FRAME], rR[s:s + FRAME])
+ - coherence(dL[s:s + FRAME], dR[s:s + FRAME]))
+ errs.append(ec)
+ return float(np.mean(errs)) if errs else None
+
+
</code_context>
<issue_to_address>
**issue:** Very short stereo clips never get an `ic_err` because frames shorter than `FRAME` are discarded.
When `m < FRAME`, the loop never executes, `errs` stays empty, and `coherence_error` returns `None` even for valid stereo, so these short items never get an `ic_err` and remain `pending`. Consider handling `m < FRAME` by computing a coherence error over the whole segment (e.g., shrinking the effective frame for the last/only window) or adding an explicit fallback path that still computes a single error value.
</issue_to_address>
### Comment 3
<location path="run_benchmark.py" line_range="277-282" />
<code_context>
subprocess.run([
sys.executable, phase3_script,
args.output,
os.path.join(script_dir, "output"),
os.path.join(script_dir, "data", "external"),
], check=True)
</code_context>
<issue_to_address>
**security (python.lang.security.audit.dangerous-subprocess-use-audit):** Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.
*Source: opengrep*
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| def read_stereo(path): | ||
| w = wave.open(path, "rb") | ||
| ch = w.getnchannels() | ||
| raw = w.readframes(w.getnframes()) | ||
| w.close() | ||
| a = np.frombuffer(raw, dtype=np.int16).astype(np.float64) | ||
| if ch >= 2: | ||
| a = a.reshape(-1, ch) | ||
| return a[:, 0], a[:, 1] | ||
| return a, a # mono source: both channels identical |
There was a problem hiding this comment.
suggestion (bug_risk): Use a context manager when opening wave files to avoid leaking file descriptors on errors.
If wave.open or readframes raises, w.close() is never called and the descriptor may be leaked. Using with wave.open(path, "rb") as w: ensures the handle is always closed and matches common Python style.
| def read_stereo(path): | |
| w = wave.open(path, "rb") | |
| ch = w.getnchannels() | |
| raw = w.readframes(w.getnframes()) | |
| w.close() | |
| a = np.frombuffer(raw, dtype=np.int16).astype(np.float64) | |
| if ch >= 2: | |
| a = a.reshape(-1, ch) | |
| return a[:, 0], a[:, 1] | |
| return a, a # mono source: both channels identical | |
| def read_stereo(path): | |
| with wave.open(path, "rb") as w: | |
| ch = w.getnchannels() | |
| raw = w.readframes(w.getnframes()) | |
| a = np.frombuffer(raw, dtype=np.int16).astype(np.float64) | |
| if ch >= 2: | |
| a = a.reshape(-1, ch) | |
| return a[:, 0], a[:, 1] | |
| return a, a # mono source: both channels identical |
| errs = [] | ||
| for s in range(0, m - FRAME, FRAME): | ||
| ec = abs(coherence(rL[s:s + FRAME], rR[s:s + FRAME]) | ||
| - coherence(dL[s:s + FRAME], dR[s:s + FRAME])) | ||
| errs.append(ec) | ||
| return float(np.mean(errs)) if errs else None |
There was a problem hiding this comment.
issue: Very short stereo clips never get an ic_err because frames shorter than FRAME are discarded.
When m < FRAME, the loop never executes, errs stays empty, and coherence_error returns None even for valid stereo, so these short items never get an ic_err and remain pending. Consider handling m < FRAME by computing a coherence error over the whole segment (e.g., shrinking the effective frame for the last/only window) or adding an explicit fallback path that still computes a single error value.
| subprocess.run([ | ||
| sys.executable, phase3_script, | ||
| args.output, | ||
| os.path.join(script_dir, "output"), | ||
| os.path.join(script_dir, "data", "external"), | ||
| ], check=True) |
There was a problem hiding this comment.
security (python.lang.security.audit.dangerous-subprocess-use-audit): Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.
Source: opengrep
Motivation
ViSQOL
audiomode (Phase 2) is effectively monaural — decoding the same AAC to stereo vs. mono yields a near-identical MOS (measured: 2.9248 ≡ 2.9241). It scores per-frame spectral fidelity and is blind to the stereo image.This biases the suite toward stereo collapse: forced Intensity Stereo (
--joint 2) discards the L/R relationship to bank bits for spectral fidelity ViSQOL rewards, so it can out-score Mixed Mode (--joint 3) on MOS while being worse for stereo material. Every reward the suite gives on music currently pushes encoders toward mono collapse.What this adds
A Phase 3 — Stereo Image Fidelity step that measures the property MOS cannot: how faithfully the inter-channel relationship is reconstructed.
phase3_stereo.py— computes a windowed (50 ms) inter-channel coherence error between reference and decoded output, time-aligned for codec delay via cross-correlation. Lower = truer stereo image. Runs only on stereo scenarios (skips mono speech), parallelized; needs onlyffmpeg+numpy, no ViSQOL. Writesic_errper matrix entry.run_benchmark.py— wired in as Phase 3 (runs locally after MOS), with a--skip-stereoopt-out.compare_results.py— surfaces a Stereo Image Δ summary line (sign matches MOS: positive = candidate truer stereo), a per-scenario Stereo Δ column, and a Worst Stereo Drop outlier.Validation
On
music_low(49 stereo files @ 64 kbps), forced-IS vs. Mixed Mode:The files where Mixed Mode "loses" most on monaural ViSQOL are exactly where it wins most on coherence — confirming the new metric captures a real property MOS misses.
Scope / caveats
Worst Stereo Dropthreshold later if desired.--skip-moscurrently also skips Phase 3 (early return); the full-benchmark path is unaffected.phase2_mos.pychanges are intentionally excluded from this PR.🤖 Generated with Claude Code
Summary by Sourcery
Add a new stereo image fidelity analysis phase to the audio benchmark and surface its results alongside existing MOS metrics.
New Features:
Enhancements: