Skip to content

fix(audio): remove atomic CAS from 48kHz envelope tick#40

Open
Aristide021 wants to merge 5 commits into
magenta:mainfrom
Aristide021:fix/audio-thread-envelope-cas-removal
Open

fix(audio): remove atomic CAS from 48kHz envelope tick#40
Aristide021 wants to merge 5 commits into
magenta:mainfrom
Aristide021:fix/audio-thread-envelope-cas-removal

Conversation

@Aristide021

@Aristide021 Aristide021 commented Jun 5, 2026

Copy link
Copy Markdown

What

Replaces the per-sample compare_exchange_weak loop in ExponentialEnvelope::tick() with plain float arithmetic and a single atomic bool flag for cross-thread reset signaling.

Before: ExponentialEnvelope::value was std::atomic<float>. On every sample the audio thread ran a CAS retry loop, and the UI thread wrote 0.0f directly into the atomic -- a race between two writers at 48kHz.

After: value is a plain float owned exclusively by the audio thread. The UI thread sets reset_env_trigger_ (std::atomic<bool>) with release ordering; the audio thread exchanges it to false with acquire ordering once per block, then writes value = 0.0f on its own thread.

Changes

  • ExponentialEnvelope: extracted into core/include/magentart/envelope.h so it can be included and tested independently without pulling in MLX/TFLite machinery
  • ExponentialEnvelope::value: std::atomic<float> -> float
  • ExponentialEnvelope::tick(): remove CAS loop, plain one-pole math
  • RealtimeRunner: add reset_env_trigger_ (std::atomic<bool>)
  • trigger_reset() / trigger_transport_reset(): store(true, release) on the flag instead of writing the envelope value directly
  • read_audio_stereo: consume trigger with exchange(false, acquire) at block boundary before the sample loop
  • trigger_transport_reset(): envelope trigger stored unconditionally even if the reset_request_ CAS fails -- intentional; the bool is idempotent and the User-path trigger_reset() already armed it
  • depthformer.py: add from __future__ import annotations to fix TypeError: type 'mlx.core.Dtype' is not subscriptable on Python 3.12 (pre-existing issue, unrelated to the audio fix)

Why

Two bugs in the original code:

  1. Real-time safety violation. The UI thread was writing reset_env_.value.store(0.0f, relaxed) concurrently with the audio thread's CAS loop on the same atomic. The audio thread could win the CAS and have the result immediately overwritten by the UI store with no way to detect it.

  2. Unnecessary atomic ops. The audio thread is the sole writer of the envelope during normal operation. The CAS loop existed to handle the UI reset case, but the right fix is to separate ownership, not serialize two writers through an atomic. This eliminates ~96,000 atomic ops/sec (one CAS attempt per sample at 48kHz).

Memory ordering

trigger_reset() and trigger_transport_reset() use memory_order_release on both stores (reset_request_ and reset_env_trigger_). read_audio_stereo uses memory_order_acquire on the exchange. This gives the audio thread a formal happens-before edge on all UI-thread state preceding the trigger. The subsequent plain float write to reset_env_.value happens on the audio thread after the exchange and is covered by single-thread sequencing -- no additional barrier needed.

Note on cost: release stores compile to stlr on AArch64 vs str for relaxed. The pipeline difference is negligible (~1 cycle on M-series) but not zero -- "negligible cost" is the honest framing.

Tests

Added core/src/envelope_reset_test.cpp. The test includes envelope.h directly, so it exercises the real ExponentialEnvelope definition -- not a copy. No MLX/TFLite link required since envelope.h has no external dependencies.

  • Test 1 -- ExponentialEnvelope::tick() arithmetic: verifies values stay in [0, 1], are finite, converge toward target, and snap-to-zero works correctly.
  • Test 2 -- concurrent trigger pattern: two threads exercising the release/acquire handshake, designed to run under -fsanitize=thread to verify env.value has no cross-thread access.

Both tests include a --sabotage mode that injects the bugs being fixed and confirms the assertions catch them:

./envelope_reset_test
Test 1: ExponentialEnvelope tick() arithmetic
  [PASS]
Test 2: concurrent trigger pattern
  [PASS]
[PASS] all envelope reset tests passed.

./envelope_reset_test --sabotage
  [sabotage] forcing env.value = 2.0 instead of 0.0
  [FAIL] tick() out of range: 1.999808 at release sample 0
  [sabotage] UI thread will write env.value = -1.0f directly (no atomic flag -- the pre-fix pattern)
  [FAIL] concurrent tick() produced out-of-range value -0.995209 (block 5, sample 0) -- UI thread corrupted env.value
[PASS] sabotage mode confirmed: tests caught both injected bugs.

Related Issues

Addresses the RT safety violation on the audio callback's envelope path.
Pre-existing depthformer.py annotation bug fixed as a separate commit.

Local Pytests

Note

Use mrt models init to download the necessary resources. Then use mrt checkpoints download to download mrt2_small.safetensors and mrt2_base.safetensors

Commands run:

pytest tests/test_musiccoca.py tests/test_prefill_correctness.py tests/test_bitlevel_parity.py -s

Output:

============================= test session starts ==============================
platform darwin -- Python 3.12.11, pytest-9.0.3, pluggy-1.6.0
collected 24 items

tests/test_musiccoca.py .....................
tests/test_prefill_correctness.py
  Save/Load state bit-exactness verified.
  .
tests/test_bitlevel_parity.py
  depth_logits     max_diff=8.11e-05 (across 12 RVQ levels)
    rvq= 0: max_diff=4.20e-05
    rvq= 1: max_diff=4.77e-05
    rvq= 2: max_diff=5.34e-05
    rvq= 3: max_diff=5.53e-05
    rvq= 4: max_diff=4.58e-05
    rvq= 5: max_diff=4.58e-05
    rvq= 6: max_diff=6.10e-05
    rvq= 7: max_diff=5.72e-05
    rvq= 8: max_diff=5.72e-05
    rvq= 9: max_diff=6.29e-05
    rvq=10: max_diff=8.11e-05
    rvq=11: max_diff=7.82e-05
  . temporal_inputs  max_diff=7.63e-06
  . temporal_outputs max_diff=9.46e-04
  .

======================= 24 passed, 4 warnings in 33.80s ========================

Benchmark Regression Test

Commands run:

./envelope_reset_test
./envelope_reset_test --sabotage

Output:

Test 1: ExponentialEnvelope tick() arithmetic
  [PASS]

Test 2: concurrent trigger pattern (run under TSan for full coverage)
  [PASS]

[PASS] all envelope reset tests passed.

---

  [sabotage] forcing env.value = 2.0 instead of 0.0
  [FAIL] tick() out of range: 1.999808 at release sample 0
  [sabotage] UI thread will write env.value = -1.0f directly (no atomic flag -- the pre-fix pattern)
  [FAIL] concurrent tick() produced out-of-range value -0.995209 (block 18, sample 0) -- UI thread corrupted env.value
=== SABOTAGE MODE: both tests should FAIL ===

Test 1: ExponentialEnvelope tick() arithmetic [sabotage: value forced to 2.0]
  [FAIL]

Test 2: concurrent trigger pattern [sabotage: UI writes env.value directly]
  [FAIL]

[PASS] sabotage mode confirmed: tests caught both injected bugs.

Replace the per-sample compare_exchange_weak loop in ExponentialEnvelope::tick()
with plain float arithmetic. The audio thread is the sole owner of the envelope
state; cross-thread reset signals are now delivered via an atomic bool flag
(reset_env_trigger_) that the audio thread exchanges-to-consume once per block.

Changes:
- ExponentialEnvelope::value: std::atomic<float> -> float
- ExponentialEnvelope::tick(): remove CAS loop, plain one-pole math
- RealtimeRunner: add reset_env_trigger_ (std::atomic<bool>)
- trigger_reset() / trigger_transport_reset(): write flag, not envelope value
- read_audio_stereo: consume trigger at block boundary before the sample loop
- trigger stores: relaxed -> release; exchange: relaxed -> acquire (formal
  happens-before between UI stores and the audio-thread consume)

This eliminates ~96,000 atomic ops/sec in the audio callback and removes a
real-time safety violation (UI thread racing the audio CAS). Double-trigger
semantics are idempotent (bool flag collapses N stores of true to one
snap-to-zero). The per-sample gain loop remains scalar due to loop-carried
IIR dependencies on smoothed_gain_ and the envelope accumulators.

Memory ordering: the UI thread uses release on both stores in trigger_reset()
and trigger_transport_reset(); the audio thread uses acquire on the exchange.
This gives a formal happens-before edge between the UI stores and the audio
read. The subsequent plain float write to reset_env_.value happens on the
audio thread after the exchange and is covered by single-thread sequencing.
Two tests covering fix/audio-thread-envelope-cas-removal:

1. ExponentialEnvelope tick() arithmetic: verifies values stay in [0, 1],
   are finite, converge toward target, and snap-to-zero works correctly.

2. Concurrent trigger pattern: two threads exercising the release/acquire
   atomic-bool handshake that replaced the direct atomic float writes.
   Run under -fsanitize=thread to verify race-freedom on env.value.

The test is standalone (no MLX/TFLite link) following the same pattern
as numpy_random_state_test.
Adds a --sabotage flag that runs deliberately broken versions of both tests:

- Test 1: forces env.value = 2.0 (out-of-range reset) instead of 0.0;
  the bounds check catches the corrupt value on the first tick.
- Test 2: UI thread writes env.value = -1.0f directly without the atomic
  flag, reproducing the pre-fix cross-thread access pattern; the audio
  thread's bounds check detects the corruption deterministically.

In sabotage mode the binary exits 0 only if both tests FAIL, confirming
the assertions are strong enough to catch the bugs they guard against.
@google-cla

google-cla Bot commented Jun 5, 2026

Copy link
Copy Markdown

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

mlx.core.Dtype is a nanobind C extension type and does not support
__class_getitem__. The sl.DType[Any] type annotations in this file
caused a TypeError at import time on Python 3.12 with the vendored
sequence-layers. Adding from __future__ import annotations defers
annotation evaluation to strings, which prevents the subscript error
without changing any runtime behavior.
@Aristide021 Aristide021 marked this pull request as ready for review June 5, 2026 02:34
Move ExponentialEnvelope out of realtime_runner.h into its own header
(core/include/magentart/envelope.h) so it can be included and tested
independently without pulling in the full MLX/TFLite machinery.

- envelope.h: defines ExponentialEnvelope in magentart::core namespace
- realtime_runner.h: includes envelope.h, struct definition removed
- envelope_reset_test.cpp: includes envelope.h directly instead of
  mirroring the struct -- test now covers the real production definition
- CMakeLists.txt: add include path for envelope_reset_test target
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant