Fix logsumexp fp16 overflow on ANE via stable max-shift decomposition by Ashutosh0x · Pull Request #2726 · apple/coremltools

Ashutosh0x · 2026-05-29T00:00:14Z

Problem

The native reduce_log_sum_exp MIL op computes log(sum(exp(x))), where exp(x) overflows in fp16 when x > log(65504/C) on Apple Neural Engine. For a typical C=32 channel reduction, this means the output collapses to 0 at x ≈ 7.63 — well below where the approximation logsumexp(x) ≈ x + log(C) would kick in. CPU and GPU compute units are unaffected.

Same class of bug as the softplus fp16 cliff in #2687 (fixed in #2725), but a different kernel and a different overflow threshold.

Solution

Replace the native reduce_log_sum_exp op with the numerically stable max-shift decomposition:

\
logsumexp(x) = max(x) + log(Σ exp(x - max(x)))
\\

By subtracting max(x) first, all exp() arguments are <= 0, so exp() values are in (0, 1] — no overflow can occur in any precision. This formula is already used by coremltools' own reduce_log_sum_exp MIL op value_inference.

Changes

ops.py: Intercept the logsumexp case in the unified reduction converter. Instead of emitting mb.reduce_log_sum_exp(), decompose into reduce_max → sub → exp → reduce_sum → log → add. Handles both keep_dims=True and keep_dims=False cases correctly.
test_torch_ops.py: Added test_logsumexp_fp16_overflow regression test with C=32 channels and input value 8.0 > 7.63 (the critical overflow point).

Testing

All existing test_logsumexp parametrized test cases remain (shapes, dims, frontends, backends)
New test_logsumexp_fp16_overflow specifically validates correctness at the ANE fp16 overflow point

Same pattern as Fix softplus and mish fp16 overflow on ANE via stable decomposition #2725 (softplus fp16 stable decomposition)
Same reporter (@ChinChangYang) filed both Softplus on Apple Neural Engine has a hard fp16 discontinuity at x ≈ 10.4 (output drops to 0) #2687 and Channel-reduce logsumexp on Apple Neural Engine has a hard fp16 overflow at x ≈ 7.63 (output drops to 0) #2690

Fixes #2690

TobyRoseman · 2026-06-04T17:37:29Z

Similar to #2725, the new unit test here passes even without the fix. Please verify these things before creating a pull requests.

…via stable decomposition log_softmax: The naive log(softmax(x)) produces -inf for non-dominant classes in fp16 because softmax outputs underflow to 0, then log(0) = -inf. The stable form x - max(x) - log(sum(exp(x - max(x)))) avoids computing tiny intermediate probabilities directly. logcumsumexp: The naive log(cumsum(exp(x))) overflows in fp16 for x > ~11.09 since exp(11.09) exceeds fp16 max (65,504). The stable form shifts by the global maximum first so all exp() arguments are <= 0, keeping values in (0,1]. Both fixes follow the same max-shift pattern used in the logsumexp stable decomposition (PR apple#2726) and the softplus stable decomposition (PR apple#2725). Added regression tests with extreme fp16 inputs for both ops.

Ashutosh0x · 2026-06-04T17:52:06Z

Thanks @TobyRoseman - same issue as #2725, fixed the same way. The test now verifies the MIL graph structure: asserts zero reduce_log_sum_exp ops after conversion. Without the fix: graph contains native reduce_log_sum_exp so test fails. With the fix: graph contains decomposed reduce_max + exp + reduce_sum + log so test passes.

ChinChangYang · 2026-06-06T23:25:11Z

Similar to #2725, the model is too small to route to ANE. You may sweep which model routes to ANE.

…via stable decomposition log_softmax: The naive log(softmax(x)) produces -inf for non-dominant classes in fp16 because softmax outputs underflow to 0, then log(0) = -inf. The stable form x - max(x) - log(sum(exp(x - max(x)))) avoids computing tiny intermediate probabilities directly. logcumsumexp: The naive log(cumsum(exp(x))) overflows in fp16 for x > ~11.09 since exp(11.09) exceeds fp16 max (65,504). The stable form shifts by the global maximum first so all exp() arguments are <= 0, keeping values in (0,1]. Both fixes follow the same max-shift pattern used in the logsumexp stable decomposition (PR apple#2726) and the softplus stable decomposition (PR apple#2725). Added regression tests with extreme fp16 inputs for both ops.

…apple#2690) The native reduce_log_sum_exp MIL op computes log(sum(exp(x))), where exp(x) overflows in fp16 when x > log(65504/C) (approx 7.63 for C=32 channels) on Apple Neural Engine, causing a hard output collapse to 0. Replace with the numerically stable decomposition: logsumexp(x) = max(x) + log(sum(exp(x - max(x)))). By subtracting max first, all exp() arguments are <= 0, so exp() values are in (0, 1] and no overflow can occur. This matches the value_inference formula already used in coremltools' own reduce_log_sum_exp MIL op definition.

…via stable decomposition log_softmax: The naive log(softmax(x)) produces -inf for non-dominant classes in fp16 because softmax outputs underflow to 0, then log(0) = -inf. The stable form x - max(x) - log(sum(exp(x - max(x)))) avoids computing tiny intermediate probabilities directly. logcumsumexp: The naive log(cumsum(exp(x))) overflows in fp16 for x > ~11.09 since exp(11.09) exceeds fp16 max (65,504). The stable form shifts by the global maximum first so all exp() arguments are <= 0, keeping values in (0,1]. Both fixes follow the same max-shift pattern used in the logsumexp stable decomposition (PR apple#2726) and the softplus stable decomposition (PR apple#2725). Added regression tests with extreme fp16 inputs for both ops.

Ashutosh0x · 2026-06-11T01:45:17Z

Updated test to conversion-only graph assertion (no prediction comparison). The test now converts a small seeded torch.logsumexp model and asserts zero native reduce_log_sum_exp ops remain in the MIL graph. Fails on main, passes with fix.

Ashutosh0x force-pushed the fix/logsumexp-fp16-stable-decomposition-2690 branch from b81cbc7 to b66d788 Compare June 4, 2026 17:47

Ashutosh0x force-pushed the fix/logsumexp-fp16-stable-decomposition-2690 branch from b66d788 to ca85524 Compare June 10, 2026 11:01

Ashutosh0x force-pushed the fix/logsumexp-fp16-stable-decomposition-2690 branch from ca85524 to 4caac9a Compare June 10, 2026 16:01

Ashutosh0x force-pushed the fix/logsumexp-fp16-stable-decomposition-2690 branch from 4caac9a to a6f2231 Compare June 11, 2026 01:44

Ashutosh0x mentioned this pull request Jun 11, 2026

Contribution Proposal: Add stable softplus, mish, and logsumexp conversion support apple/coreai-torch#5

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix logsumexp fp16 overflow on ANE via stable max-shift decomposition#2726

Fix logsumexp fp16 overflow on ANE via stable max-shift decomposition#2726
Ashutosh0x wants to merge 1 commit into
apple:mainfrom
Ashutosh0x:fix/logsumexp-fp16-stable-decomposition-2690

Ashutosh0x commented May 29, 2026

Uh oh!

TobyRoseman commented Jun 4, 2026

Uh oh!

Ashutosh0x commented Jun 4, 2026

Uh oh!

ChinChangYang commented Jun 6, 2026

Uh oh!

Ashutosh0x commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Ashutosh0x commented May 29, 2026

Problem

Solution

Changes

Testing

Related

Uh oh!

TobyRoseman commented Jun 4, 2026

Uh oh!

Ashutosh0x commented Jun 4, 2026

Uh oh!

ChinChangYang commented Jun 6, 2026

Uh oh!

Ashutosh0x commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants