Gemma 3 generation does not stop at <end_of_turn>

### Model name

gemma-3-4b-it

### Command run

```shell
# Exporting the model
uv run coreai.llm.export google/gemma-3-4b-it

# Running via command line tool
swift run -c release llm-runner --model ./exports/gemma_3_4b_it_4bit_dynamic --prompt "Hello"
```

### macOS / iOS target

macOS 27 beta 1

### Xcode version

Xcode 27 beta 1

### Python / uv version

uv 0.11.19 (7b2cff1c3 2026-06-03 aarch64-apple-darwin), Python 3.14.5

### Full error output

```shell
Hello there! How can I help you today? 😊 

Do you want to:

*   Talk about something?
*   Get help with a task?
*   Just chat?<end_of_turn>
<end_of_turn>
<end_of_turn>
<end_of_turn>
<end_of_turn>
```

### Anything else?

Gemma 3 Instruct commonly terminates assistant responses with `<end_of_turn>` (token ID 106). The current Swift runtime only stops at the tokenizer’s configured EOS token, `<eos>` (token ID 1).

As a result, Gemma 3 generation may continue beyond the end of the assistant turn until `<eos>` is generated or the maximum token limit is reached.

There is also an issue with the llm-runner workaround:
--stop-tokens '<end_of_turn>'

The stop string is encoded with special tokens enabled. Because Gemma 3 automatically prepends `<bos>`, this produces
[2, 106] instead of [106]. That sequence is unlikely to occur during generation and therefore does not stop it. 

**Expected behavior**

Gemma 3 generation should stop when either of these tokens is generated:
- `<eos>`: ID 1
- `<end_of_turn>`: ID 106

Custom stop strings should be encoded without automatically adding BOS/EOS tokens.

**Relevant code**

- `StopSequences.init(for:)` only includes `tokenizer.eosTokenId`:
[swift/Sources/CoreAILanguageModels/DecodingStrategies/DecodingStrategy.swift](https://github.com/apple/coreai-models/blob/main/swift/Sources/CoreAILanguageModels/DecodingStrategies/DecodingStrategy.swift)

- The Foundation Models integration only checks `tokenizer.eosTokenId`:
[swift/Sources/CoreAILanguageModels/LanguageModel/CoreAILanguageModel.swift](https://github.com/apple/coreai-models/blob/main/swift/Sources/CoreAILanguageModels/LanguageModel/CoreAILanguageModel.swift)

- CLI stop strings are encoded with special tokens enabled:
[swift/Sources/Tools/llm-runner/LLMRunnerMain.swift](https://github.com/apple/coreai-models/blob/main/swift/Sources/Tools/llm-runner/LLMRunnerMain.swift)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gemma 3 generation does not stop at <end_of_turn> #25

Model name

Command run

macOS / iOS target

Xcode version

Python / uv version

Full error output

Anything else?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Gemma 3 generation does not stop at <end_of_turn> #25

Description

Model name

Command run

macOS / iOS target

Xcode version

Python / uv version

Full error output

Anything else?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions