Add Phase 3: stereo image fidelity (inter-channel coherence) by nschimme · Pull Request #19 · nschimme/faac-benchmark

nschimme · 2026-06-13T13:37:19Z

Motivation

ViSQOL audio mode (Phase 2) is effectively monaural — decoding the same AAC to stereo vs. mono yields a near-identical MOS (measured: 2.9248 ≡ 2.9241). It scores per-frame spectral fidelity and is blind to the stereo image.

This biases the suite toward stereo collapse: forced Intensity Stereo (--joint 2) discards the L/R relationship to bank bits for spectral fidelity ViSQOL rewards, so it can out-score Mixed Mode (--joint 3) on MOS while being worse for stereo material. Every reward the suite gives on music currently pushes encoders toward mono collapse.

What this adds

A Phase 3 — Stereo Image Fidelity step that measures the property MOS cannot: how faithfully the inter-channel relationship is reconstructed.

phase3_stereo.py — computes a windowed (50 ms) inter-channel coherence error between reference and decoded output, time-aligned for codec delay via cross-correlation. Lower = truer stereo image. Runs only on stereo scenarios (skips mono speech), parallelized; needs only ffmpeg + numpy, no ViSQOL. Writes ic_err per matrix entry.
run_benchmark.py — wired in as Phase 3 (runs locally after MOS), with a --skip-stereo opt-out.
compare_results.py — surfaces a Stereo Image Δ summary line (sign matches MOS: positive = candidate truer stereo), a per-scenario Stereo Δ column, and a Worst Stereo Drop outlier.

Validation

On music_low (49 stereo files @ 64 kbps), forced-IS vs. Mixed Mode:

Metric	forced-IS	Mixed Mode
Monaural ViSQOL MOS	3.3473	3.3494	parity (metric is blind to stereo)
Inter-channel coherence error	0.1534	0.1499	Mixed better
Per-file coherence win rate	16/49	33/49	Mixed ~2:1

The files where Mixed Mode "loses" most on monaural ViSQOL are exactly where it wins most on coherence — confirming the new metric captures a real property MOS misses.

Scope / caveats

Reported, not gating. It surfaces stereo regressions without failing CI, since coherence error is a proxy — the gold standard remains a subjective MUSHRA/ABX listening test. Easy to make gating on a Worst Stereo Drop threshold later if desired.
--skip-mos currently also skips Phase 3 (early return); the full-benchmark path is unaffected.
Unrelated local phase2_mos.py changes are intentionally excluded from this PR.

🤖 Generated with Claude Code

Summary by Sourcery

Add a new stereo image fidelity analysis phase to the audio benchmark and surface its results alongside existing MOS metrics.

New Features:

Introduce a Phase 3 stereo image fidelity computation that measures inter-channel coherence error for stereo samples using ffmpeg and numpy.
Add a command-line option to run or skip the new stereo image analysis phase in the benchmark runner.
Expose stereo image deltas and worst-case stereo regressions in the comparison report, including per-scenario stereo metrics.

Enhancements:

Extend result aggregation to track and summarize stereo image coherence statistics across suites and scenarios.

ViSQOL "audio" mode is effectively monaural — decoding the same AAC to stereo vs. mono yields a near-identical MOS, so it is blind to the stereo image. This biases the suite toward stereo collapse: forced Intensity Stereo discards the L/R relationship to bank bits for spectral fidelity ViSQOL rewards, and can out-score Mixed Mode on MOS while being worse for stereo material. Add a Phase 3 step that measures the property MOS cannot: a windowed inter-channel coherence error (time-aligned for codec delay; lower = truer stereo image) between reference and decoded output. - phase3_stereo.py: compute ic_err per stereo entry (skips mono speech), parallelized; needs only ffmpeg + numpy, no ViSQOL. - run_benchmark.py: run as Phase 3 after MOS, with --skip-stereo opt-out. - compare_results.py: surface "Stereo Image Δ" in the summary, a per- scenario "Stereo Δ" column, and a "Worst Stereo Drop" outlier. Reported, not gating: it surfaces stereo regressions without failing CI, since coherence error is a proxy (the gold standard remains a subjective MUSHRA/ABX listening test), not a perceptual ground truth. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

sourcery-ai · 2026-06-13T13:37:25Z

Reviewer's Guide

Adds Phase 3 stereo image fidelity measurement based on inter-channel coherence error, wires it into the benchmark pipeline, and surfaces new stereo metrics in the comparison report alongside MOS and throughput.

Flow diagram for benchmark phases including new stereo Phase 3

flowchart TD
    A[run_benchmark main] --> B[Parse args]
    B --> C[Phase 1: phase1_encode.py]
    C --> D{skip_mos?}
    D -->|yes| G{skip_stereo?}
    D -->|no| E[Phase 2: phase2_mos.py via container]
    E --> G{skip_stereo?}
    G -->|yes| H[Benchmark complete]
    G -->|no| F[Phase 3: phase3_stereo.py local]
    F --> H[Benchmark complete]

Flow diagram for Phase 3 stereo coherence computation

flowchart TD
    A[phase3_stereo main] --> B[Load results_json and matrix]
    B --> C[Filter pending non speech stereo entries]
    C --> D{Any pending?}
    D -->|no| I[Exit: No pending stereo computations]
    D -->|yes| E[ProcessPoolExecutor over pending]
    E --> F[compute_single for each key]
    F --> G[decode_stereo ref and aac]
    G --> H[coherence_error compute ic_err]
    H --> E
    E --> J[Collect ic_err results]
    J --> K[Update matrix entries ic_err]
    K --> L[Write updated results_json]
    L --> M[Print Phase 3 complete]

File-Level Changes

Change	Details	Files
Introduce Phase 3 stereo image fidelity calculation using inter-channel coherence error for stereo scenarios.	Add new phase3_stereo.py module implementing windowed inter-channel coherence error between reference and decoded audio, including ffmpeg-based decoding and codec-delay alignment via cross-correlation. Use SCENARIOS config and existing AAC path resolution to select non-speech (stereo) samples and locate encoded files. Parallelize per-sample coherence computation via ProcessPoolExecutor, skipping mono/speech scenarios and already-scored entries, and persist results by writing ic_err back into the main results JSON.	`phase3_stereo.py`
Wire stereo image fidelity as an optional Phase 3 in the benchmark runner.	Extend CLI with --skip-stereo flag to optionally skip stereo image computation. Locate the new phase3_stereo.py script and invoke it after Phase 2, passing paths to the results JSON, AAC output, and external reference audio directory.	`run_benchmark.py`
Report stereo image fidelity metrics in the comparison summary and per-scenario tables.	Track aggregate inter-channel coherence deltas and counts, plus worst stereo regression, when comparing base and candidate results. Compute and emit high-level "Stereo Image Δ" and "Worst Stereo Drop" lines in the summary when enough data is present, with small thresholds to filter noise. Extend the scenario performance aggregation to maintain per-scenario stereo coherence deltas and show them in a new "Stereo Δ" column in the table.	`compare_results.py`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey - I've found 1 security issue, 2 other issues, and left some high level feedback:

Security issues:

Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'. (link)

General comments:

In phase3_stereo.decode_stereo, consider using check=True or at least logging stderr on ffmpeg failures so that decoding issues are visible instead of silently returning None.
In phase3_stereo.read_stereo, using a context manager (with wave.open(path, 'rb') as w:) would ensure the file handle is always closed even if an exception is raised while reading.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In `phase3_stereo.decode_stereo`, consider using `check=True` or at least logging `stderr` on ffmpeg failures so that decoding issues are visible instead of silently returning `None`.
- In `phase3_stereo.read_stereo`, using a context manager (`with wave.open(path, 'rb') as w:`) would ensure the file handle is always closed even if an exception is raised while reading.

## Individual Comments

### Comment 1
<location path="phase3_stereo.py" line_range="70-79" />
<code_context>
+    return out if r.returncode == 0 else None
+
+
+def read_stereo(path):
+    w = wave.open(path, "rb")
+    ch = w.getnchannels()
+    raw = w.readframes(w.getnframes())
+    w.close()
+    a = np.frombuffer(raw, dtype=np.int16).astype(np.float64)
+    if ch >= 2:
+        a = a.reshape(-1, ch)
+        return a[:, 0], a[:, 1]
+    return a, a  # mono source: both channels identical
+
+
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Use a context manager when opening wave files to avoid leaking file descriptors on errors.

If `wave.open` or `readframes` raises, `w.close()` is never called and the descriptor may be leaked. Using `with wave.open(path, "rb") as w:` ensures the handle is always closed and matches common Python style.

```suggestion
def read_stereo(path):
    with wave.open(path, "rb") as w:
        ch = w.getnchannels()
        raw = w.readframes(w.getnframes())
    a = np.frombuffer(raw, dtype=np.int16).astype(np.float64)
    if ch >= 2:
        a = a.reshape(-1, ch)
        return a[:, 0], a[:, 1]
    return a, a  # mono source: both channels identical
```
</issue_to_address>

### Comment 2
<location path="phase3_stereo.py" line_range="114-119" />
<code_context>
+    m = min(len(rL), len(dL))
+    rL, rR, dL, dR = rL[:m], rR[:m], dL[:m], dR[:m]
+
+    errs = []
+    for s in range(0, m - FRAME, FRAME):
+        ec = abs(coherence(rL[s:s + FRAME], rR[s:s + FRAME])
+                 - coherence(dL[s:s + FRAME], dR[s:s + FRAME]))
+        errs.append(ec)
+    return float(np.mean(errs)) if errs else None
+
+
</code_context>
<issue_to_address>
**issue:** Very short stereo clips never get an `ic_err` because frames shorter than `FRAME` are discarded.

When `m < FRAME`, the loop never executes, `errs` stays empty, and `coherence_error` returns `None` even for valid stereo, so these short items never get an `ic_err` and remain `pending`. Consider handling `m < FRAME` by computing a coherence error over the whole segment (e.g., shrinking the effective frame for the last/only window) or adding an explicit fallback path that still computes a single error value.
</issue_to_address>

### Comment 3
<location path="run_benchmark.py" line_range="277-282" />
<code_context>
        subprocess.run([
            sys.executable, phase3_script,
            args.output,
            os.path.join(script_dir, "output"),
            os.path.join(script_dir, "data", "external"),
        ], check=True)
</code_context>
<issue_to_address>
**security (python.lang.security.audit.dangerous-subprocess-use-audit):** Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.

*Source: opengrep*
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2026-06-13T13:38:28Z

+def read_stereo(path):
+    w = wave.open(path, "rb")
+    ch = w.getnchannels()
+    raw = w.readframes(w.getnframes())
+    w.close()
+    a = np.frombuffer(raw, dtype=np.int16).astype(np.float64)
+    if ch >= 2:
+        a = a.reshape(-1, ch)
+        return a[:, 0], a[:, 1]
+    return a, a  # mono source: both channels identical


suggestion (bug_risk): Use a context manager when opening wave files to avoid leaking file descriptors on errors.

If wave.open or readframes raises, w.close() is never called and the descriptor may be leaked. Using with wave.open(path, "rb") as w: ensures the handle is always closed and matches common Python style.

Suggested change

def read_stereo(path):

w = wave.open(path, "rb")

ch = w.getnchannels()

raw = w.readframes(w.getnframes())

w.close()

a = np.frombuffer(raw, dtype=np.int16).astype(np.float64)

if ch >= 2:

a = a.reshape(-1, ch)

return a[:, 0], a[:, 1]

return a, a # mono source: both channels identical

def read_stereo(path):

with wave.open(path, "rb") as w:

ch = w.getnchannels()

raw = w.readframes(w.getnframes())

a = np.frombuffer(raw, dtype=np.int16).astype(np.float64)

if ch >= 2:

a = a.reshape(-1, ch)

return a[:, 0], a[:, 1]

return a, a # mono source: both channels identical

sourcery-ai · 2026-06-13T13:38:28Z

+    errs = []
+    for s in range(0, m - FRAME, FRAME):
+        ec = abs(coherence(rL[s:s + FRAME], rR[s:s + FRAME])
+                 - coherence(dL[s:s + FRAME], dR[s:s + FRAME]))
+        errs.append(ec)
+    return float(np.mean(errs)) if errs else None


issue: Very short stereo clips never get an ic_err because frames shorter than FRAME are discarded.

When m < FRAME, the loop never executes, errs stays empty, and coherence_error returns None even for valid stereo, so these short items never get an ic_err and remain pending. Consider handling m < FRAME by computing a coherence error over the whole segment (e.g., shrinking the effective frame for the last/only window) or adding an explicit fallback path that still computes a single error value.

sourcery-ai · 2026-06-13T13:38:28Z

+        subprocess.run([
+            sys.executable, phase3_script,
+            args.output,
+            os.path.join(script_dir, "output"),
+            os.path.join(script_dir, "data", "external"),
+        ], check=True)


security (python.lang.security.audit.dangerous-subprocess-use-audit): Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.

Source: opengrep

sourcery-ai Bot reviewed Jun 13, 2026

View reviewed changes

feedback

5051ac9

nschimme merged commit 950c372 into master Jun 13, 2026
3 checks passed

nschimme deleted the phase3-stereo-image-fidelity branch June 13, 2026 14:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Phase 3: stereo image fidelity (inter-channel coherence)#19

Add Phase 3: stereo image fidelity (inter-channel coherence)#19
nschimme merged 2 commits into
masterfrom
phase3-stereo-image-fidelity

nschimme commented Jun 13, 2026 •

edited by sourcery-ai Bot

Loading

Uh oh!

sourcery-ai Bot commented Jun 13, 2026 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

sourcery-ai Bot Jun 13, 2026

Uh oh!

sourcery-ai Bot Jun 13, 2026

Uh oh!

sourcery-ai Bot Jun 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nschimme commented Jun 13, 2026 • edited by sourcery-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

What this adds

Validation

Scope / caveats

Summary by Sourcery

Uh oh!

sourcery-ai Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Flow diagram for benchmark phases including new stereo Phase 3

Flow diagram for Phase 3 stereo coherence computation

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nschimme commented Jun 13, 2026 •

edited by sourcery-ai Bot

Loading

sourcery-ai Bot commented Jun 13, 2026 •

edited

Loading