fix(audio): remove atomic CAS from 48kHz envelope tick#40
Open
Aristide021 wants to merge 5 commits into
Open
Conversation
Replace the per-sample compare_exchange_weak loop in ExponentialEnvelope::tick() with plain float arithmetic. The audio thread is the sole owner of the envelope state; cross-thread reset signals are now delivered via an atomic bool flag (reset_env_trigger_) that the audio thread exchanges-to-consume once per block. Changes: - ExponentialEnvelope::value: std::atomic<float> -> float - ExponentialEnvelope::tick(): remove CAS loop, plain one-pole math - RealtimeRunner: add reset_env_trigger_ (std::atomic<bool>) - trigger_reset() / trigger_transport_reset(): write flag, not envelope value - read_audio_stereo: consume trigger at block boundary before the sample loop - trigger stores: relaxed -> release; exchange: relaxed -> acquire (formal happens-before between UI stores and the audio-thread consume) This eliminates ~96,000 atomic ops/sec in the audio callback and removes a real-time safety violation (UI thread racing the audio CAS). Double-trigger semantics are idempotent (bool flag collapses N stores of true to one snap-to-zero). The per-sample gain loop remains scalar due to loop-carried IIR dependencies on smoothed_gain_ and the envelope accumulators. Memory ordering: the UI thread uses release on both stores in trigger_reset() and trigger_transport_reset(); the audio thread uses acquire on the exchange. This gives a formal happens-before edge between the UI stores and the audio read. The subsequent plain float write to reset_env_.value happens on the audio thread after the exchange and is covered by single-thread sequencing.
Two tests covering fix/audio-thread-envelope-cas-removal: 1. ExponentialEnvelope tick() arithmetic: verifies values stay in [0, 1], are finite, converge toward target, and snap-to-zero works correctly. 2. Concurrent trigger pattern: two threads exercising the release/acquire atomic-bool handshake that replaced the direct atomic float writes. Run under -fsanitize=thread to verify race-freedom on env.value. The test is standalone (no MLX/TFLite link) following the same pattern as numpy_random_state_test.
Adds a --sabotage flag that runs deliberately broken versions of both tests: - Test 1: forces env.value = 2.0 (out-of-range reset) instead of 0.0; the bounds check catches the corrupt value on the first tick. - Test 2: UI thread writes env.value = -1.0f directly without the atomic flag, reproducing the pre-fix cross-thread access pattern; the audio thread's bounds check detects the corruption deterministically. In sabotage mode the binary exits 0 only if both tests FAIL, confirming the assertions are strong enough to catch the bugs they guard against.
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
mlx.core.Dtype is a nanobind C extension type and does not support __class_getitem__. The sl.DType[Any] type annotations in this file caused a TypeError at import time on Python 3.12 with the vendored sequence-layers. Adding from __future__ import annotations defers annotation evaluation to strings, which prevents the subscript error without changing any runtime behavior.
Move ExponentialEnvelope out of realtime_runner.h into its own header (core/include/magentart/envelope.h) so it can be included and tested independently without pulling in the full MLX/TFLite machinery. - envelope.h: defines ExponentialEnvelope in magentart::core namespace - realtime_runner.h: includes envelope.h, struct definition removed - envelope_reset_test.cpp: includes envelope.h directly instead of mirroring the struct -- test now covers the real production definition - CMakeLists.txt: add include path for envelope_reset_test target
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Replaces the per-sample
compare_exchange_weakloop inExponentialEnvelope::tick()with plain float arithmetic and a single atomic bool flag for cross-thread reset signaling.Before:
ExponentialEnvelope::valuewasstd::atomic<float>. On every sample the audio thread ran a CAS retry loop, and the UI thread wrote0.0fdirectly into the atomic -- a race between two writers at 48kHz.After:
valueis a plainfloatowned exclusively by the audio thread. The UI thread setsreset_env_trigger_(std::atomic<bool>) with release ordering; the audio thread exchanges it to false with acquire ordering once per block, then writesvalue = 0.0fon its own thread.Changes
ExponentialEnvelope: extracted intocore/include/magentart/envelope.hso it can be included and tested independently without pulling in MLX/TFLite machineryExponentialEnvelope::value:std::atomic<float>->floatExponentialEnvelope::tick(): remove CAS loop, plain one-pole mathRealtimeRunner: addreset_env_trigger_(std::atomic<bool>)trigger_reset()/trigger_transport_reset():store(true, release)on the flag instead of writing the envelope value directlyread_audio_stereo: consume trigger withexchange(false, acquire)at block boundary before the sample looptrigger_transport_reset(): envelope trigger stored unconditionally even if thereset_request_CAS fails -- intentional; the bool is idempotent and the User-pathtrigger_reset()already armed itdepthformer.py: addfrom __future__ import annotationsto fixTypeError: type 'mlx.core.Dtype' is not subscriptableon Python 3.12 (pre-existing issue, unrelated to the audio fix)Why
Two bugs in the original code:
Real-time safety violation. The UI thread was writing
reset_env_.value.store(0.0f, relaxed)concurrently with the audio thread's CAS loop on the same atomic. The audio thread could win the CAS and have the result immediately overwritten by the UI store with no way to detect it.Unnecessary atomic ops. The audio thread is the sole writer of the envelope during normal operation. The CAS loop existed to handle the UI reset case, but the right fix is to separate ownership, not serialize two writers through an atomic. This eliminates ~96,000 atomic ops/sec (one CAS attempt per sample at 48kHz).
Memory ordering
trigger_reset()andtrigger_transport_reset()usememory_order_releaseon both stores (reset_request_andreset_env_trigger_).read_audio_stereousesmemory_order_acquireon the exchange. This gives the audio thread a formal happens-before edge on all UI-thread state preceding the trigger. The subsequent plain float write toreset_env_.valuehappens on the audio thread after the exchange and is covered by single-thread sequencing -- no additional barrier needed.Note on cost:
releasestores compile tostlron AArch64 vsstrforrelaxed. The pipeline difference is negligible (~1 cycle on M-series) but not zero -- "negligible cost" is the honest framing.Tests
Added
core/src/envelope_reset_test.cpp. The test includesenvelope.hdirectly, so it exercises the realExponentialEnvelopedefinition -- not a copy. No MLX/TFLite link required sinceenvelope.hhas no external dependencies.ExponentialEnvelope::tick()arithmetic: verifies values stay in[0, 1], are finite, converge toward target, and snap-to-zero works correctly.-fsanitize=threadto verifyenv.valuehas no cross-thread access.Both tests include a
--sabotagemode that injects the bugs being fixed and confirms the assertions catch them:Related Issues
Addresses the RT safety violation on the audio callback's envelope path.
Pre-existing
depthformer.pyannotation bug fixed as a separate commit.Local Pytests
Note
Use
mrt models initto download the necessary resources. Then usemrt checkpoints downloadto downloadmrt2_small.safetensorsandmrt2_base.safetensorsCommands run:
Output:
Benchmark Regression Test
Commands run:
Output: