Metal SSD streaming: tool-call-quality exact path misses routed expert views

## Summary

`./ds4_test --tool-call-quality` fails under Metal SSD streaming in the exact / `quality=true` path. The fast path completes first; the exact path stops before tool-call parsing because `ds4_session_eval()` fails after Metal cannot wrap routed-expert model ranges.

I found this while validating #454, but it reproduces on a clean `upstream/main` worktree at `80ebbc3`, so it appears independent of that PR.

## Environment

- Machine: Apple M5 Pro
- RAM: 64 GiB
- Backend: Metal SSD streaming
- Model: `DeepSeek-V4-Flash-IQ2XXS-w2Q2K-AProjQ8-SExpQ8-OutQ8-chat-v2-imatrix.gguf`
- Reproduced on: clean `upstream/main` at `80ebbc3`

## Repro

```sh
make ds4_test

env DS4_TEST_MODEL=/path/to/DeepSeek-V4-Flash-IQ2XXS-w2Q2K-AProjQ8-SExpQ8-OutQ8-chat-v2-imatrix.gguf \
    DS4_TEST_SSD_STREAMING=1 \
    DS4_TEST_SSD_STREAMING_CACHE_GB=16 \
    ./ds4_test --tool-call-quality
```

## Relevant log

```text
tool-call-quality:
ds4-test: tool-call quality fast path
...
ds4-test: tool-call quality exact path
ds4: Metal SSD streaming mode enabled; full model residency and warmup are skipped
ds4: SSD streaming initial metal model map restricted to token embedding (1 spans, 0.99 GiB tensor span)
ds4: metal backend initialized for graph diagnostics
ds4: Metal model range 0.01..0.53 GiB is not covered by mapped model views
ds4: Metal model range 1.19..1.70 GiB is not covered by mapped model views
ds4: Metal model range 0.53..1.19 GiB is not covered by mapped model views
tests/ds4_test.c:2008: assertion failed: decode_ok
tests/ds4_test.c:2010: assertion failed: calls.len > 0
tests/ds4_test.c:2011: assertion failed: calls.len > 0 && !strcmp(calls.v[0].name, "list_files")
tool-call-quality: ERR
```

## Notes

With `DS4_METAL_STREAMING_MAP_TRACE=1`, the exact path maps decode/static spans successfully, then fails on routed expert ranges that are not covered by the current model views.

My read is that the SSD-streaming decode spans intentionally exclude uniform routed expert tensors because the fast path serves those via the streaming expert cache. In `quality=true`, though, the selected-slot fast kernels are disabled, so the exact fallback asks for the full gate/up/down routed tensors through `ds4_gpu_wrap_model_range()`, which requires those ranges to already be covered by mapped model views.

So this looks like a Metal SSD-streaming exact-path mapping issue rather than a DSML/tool-call parser issue: the tool-call assertions are just cascading after decode stops.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metal SSD streaming: tool-call-quality exact path misses routed expert views #455

Summary

Environment

Repro

Relevant log

Notes

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Metal SSD streaming: tool-call-quality exact path misses routed expert views #455

Description

Summary

Environment

Repro

Relevant log

Notes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions