fix scalefactor delta overflows to comply with AAC spec (ISO 14496-3)#99
Conversation
This commit addresses two bugs in BlocQuant()'s scale factor range pass that allowed scalefactor differences to exceed the strict ±60 limit required by the AAC specification: 1. Fix PNS delta predictor: PNS scale factors share the HCB_PNS codebook and require a separate delta predictor. Without it, the first PNS band's delta was computed relative to the regular global_gain, producing out-of-bounds deltas. A dedicated `lastpns` predictor is now initialized 90 steps below `global_gain` so the first PNS entry fits comfortably within the ±60 constraint. 2. Enforce limits on all active bands: The quantizer previously only clamped deltas for HCB_ESC bands. The condition is now updated to `(book != HCB_ZERO) && (book != HCB_NONE)` to ensure the ±60 limit is enforced for every active scalefactor band, regardless of the Huffman codebook. Additionally, this refactors the clamping logic into a centralized `clamp_sf_diff()` inline function and replaces hardcoded scalefactor magic numbers with named constants in `huff2.h`.
|
Context on why The AAC Scalefactor Delta BugThe Bug: "The Broken Bridge"AAC uses Huffman Book 12 to encode the difference (delta) between band volumes.
The Scenario
The Fix: "Two Tracks & Guardrails"
Why Speech MOS DroppedIn speech, "noise" (consonants like S or F) is often louder than background hiss.
|
|
Okay, I discovered that this is partially responsible for #40. I'm working on the remaining fix. |
|
Okay, this fixes #40 and possibly other cases too. No performance drop: https://github.com/nschimme/faac/actions/runs/24285523096 |
|
Hopefully the above two fixes also let me finally got Pseudo SBR to work. I kept getting blocked 😅 |
High-energy transients were producing quantized indices > 8191, exceeding the AAC escape sequence limit and corrupting the bitstream. - Peak Detection: Tracks bandmaxe (max spectral magnitude) per band. - Gain Limiting: Proactively caps sfacfix in qlevel if the peak coefficient would exceed the representable range. - Sync-Lock: Floors the scalefactor and re-derives the final gain to ensure encoder-decoder reconstruction alignment.
| qfunc = quantize_scalar; | ||
|
|
||
| /* 2^0.25 (1.50515 dB) step from AAC specs */ | ||
| sfstep = 1.0 / FAAC_LOG10(FAAC_SQRT(FAAC_SQRT(2.0))); |
There was a problem hiding this comment.
Could you declare them as static const in the file header?
There was a problem hiding this comment.
Unfortunately Clang still rejects this as its more standards compliant than GCC and in the C language, global or static variables must be initialized with a constant expression.
This is great, thank you very much! |
Isn't this a pretty made up scenario to explain why |
You're right that the 140/50 split is just an illustrative example to show how the ±60 limit gets breached. Any two values with a wide enough gap would trigger the same out-of-bounds bug. However, the value 90 isn't something I introduced for this PR; it’s actually the original behavior of the encoder. If you check the legacy code in // Original code
lastpns = coder->global_gain - 90; |
| lastsf = coder->global_gain; | ||
| lastis = 0; | ||
| lastpns = coder->global_gain - 90; | ||
| lastpns = coder->global_gain - SF_PNS_OFFSET; |
There was a problem hiding this comment.
Note that SF_PNS_OFFSET was in the original code
fabiangreffrath
left a comment
There was a problem hiding this comment.
Very well done, thank you!
This commit addresses two bugs in BlocQuant()'s scale factor range pass that allowed scalefactor differences to exceed the strict ±60 limit required by the AAC specification:
lastpnspredictor is now initialized 90 steps belowglobal_gainso the first PNS entry fits comfortably within the ±60 constraint.(book != HCB_ZERO) && (book != HCB_NONE)to ensure the ±60 limit is enforced for every active scalefactor band, regardless of the Huffman codebook.Additionally, this refactors the clamping logic into a centralized
clamp_sf_diff()inline function and replaces hardcoded scalefactor magic numbers with named constants inhuff2.h.This causes a slight regression to speech where they accidentally did better, but overall all other cases we see a slight MOS improvement at the cost of 1% CPU throughput that we need to pay anyway for correctness: https://github.com/nschimme/faac/actions/runs/24264165540