Skip to content

Gemma 3 generation does not stop at <end_of_turn> #25

@timokoethe

Description

@timokoethe

Model name

gemma-3-4b-it

Command run

# Exporting the model
uv run coreai.llm.export google/gemma-3-4b-it

# Running via command line tool
swift run -c release llm-runner --model ./exports/gemma_3_4b_it_4bit_dynamic --prompt "Hello"

macOS / iOS target

macOS 27 beta 1

Xcode version

Xcode 27 beta 1

Python / uv version

uv 0.11.19 (7b2cff1c3 2026-06-03 aarch64-apple-darwin), Python 3.14.5

Full error output

Hello there! How can I help you today? 😊 

Do you want to:

*   Talk about something?
*   Get help with a task?
*   Just chat?<end_of_turn>
<end_of_turn>
<end_of_turn>
<end_of_turn>
<end_of_turn>

Anything else?

Gemma 3 Instruct commonly terminates assistant responses with <end_of_turn> (token ID 106). The current Swift runtime only stops at the tokenizer’s configured EOS token, <eos> (token ID 1).

As a result, Gemma 3 generation may continue beyond the end of the assistant turn until <eos> is generated or the maximum token limit is reached.

There is also an issue with the llm-runner workaround:
--stop-tokens '<end_of_turn>'

The stop string is encoded with special tokens enabled. Because Gemma 3 automatically prepends <bos>, this produces
[2, 106] instead of [106]. That sequence is unlikely to occur during generation and therefore does not stop it.

Expected behavior

Gemma 3 generation should stop when either of these tokens is generated:

  • <eos>: ID 1
  • <end_of_turn>: ID 106

Custom stop strings should be encoded without automatically adding BOS/EOS tokens.

Relevant code

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions