Bug: `GGML_SCHED_MAX_COPIES=2` produces repeated tokens when graph reuse is enabled

### What happened?

With a CUDA build using `GGML_SCHED_MAX_COPIES=2`, `llama-server` produces bad output when graph reuse is enabled. The output repeats almost every token:

```text
The capital capital of of France France is is Paris Paris.. Paris Paris is is not not only only the the capital capital but but also also the the largest largest city city in in France France..
```

Adding `--no-graph-reuse` fixes the output. I expected graph reuse not to change the generated text.

### Name and Version

```text
llama-server.exe --version
version: 4633 (b3dfb785)
built with MSVC 19.51.36248.0
```

### What operating system are you seeing the problem on?

Windows

### Relevant log output

```shell
llama_init_from_model: graph_reuse   = 1
llama_init_from_model: pipeline parallelism enabled (n_copies=2)
llama_init_from_model: graph splits = 3
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: `GGML_SCHED_MAX_COPIES=2` produces repeated tokens when graph reuse is enabled #2006

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Bug: GGML_SCHED_MAX_COPIES=2 produces repeated tokens when graph reuse is enabled #2006

Description

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Bug: `GGML_SCHED_MAX_COPIES=2` produces repeated tokens when graph reuse is enabled #2006