Model name
gemma-3-4b-it
Command run
# Exporting the model
uv run coreai.llm.export google/gemma-3-4b-it
# Running via command line tool
swift run -c release llm-runner --model ./exports/gemma_3_4b_it_4bit_dynamic --prompt "Hello"
macOS / iOS target
macOS 27 beta 1
Xcode version
Xcode 27 beta 1
Python / uv version
uv 0.11.19 (7b2cff1c3 2026-06-03 aarch64-apple-darwin), Python 3.14.5
Full error output
Hello there! How can I help you today? 😊
Do you want to:
* Talk about something?
* Get help with a task?
* Just chat?<end_of_turn>
<end_of_turn>
<end_of_turn>
<end_of_turn>
<end_of_turn>
Anything else?
Gemma 3 Instruct commonly terminates assistant responses with <end_of_turn> (token ID 106). The current Swift runtime only stops at the tokenizer’s configured EOS token, <eos> (token ID 1).
As a result, Gemma 3 generation may continue beyond the end of the assistant turn until <eos> is generated or the maximum token limit is reached.
There is also an issue with the llm-runner workaround:
--stop-tokens '<end_of_turn>'
The stop string is encoded with special tokens enabled. Because Gemma 3 automatically prepends <bos>, this produces
[2, 106] instead of [106]. That sequence is unlikely to occur during generation and therefore does not stop it.
Expected behavior
Gemma 3 generation should stop when either of these tokens is generated:
<eos>: ID 1
<end_of_turn>: ID 106
Custom stop strings should be encoded without automatically adding BOS/EOS tokens.
Relevant code
Model name
gemma-3-4b-it
Command run
macOS / iOS target
macOS 27 beta 1
Xcode version
Xcode 27 beta 1
Python / uv version
uv 0.11.19 (7b2cff1c3 2026-06-03 aarch64-apple-darwin), Python 3.14.5
Full error output
Anything else?
Gemma 3 Instruct commonly terminates assistant responses with
<end_of_turn>(token ID 106). The current Swift runtime only stops at the tokenizer’s configured EOS token,<eos>(token ID 1).As a result, Gemma 3 generation may continue beyond the end of the assistant turn until
<eos>is generated or the maximum token limit is reached.There is also an issue with the llm-runner workaround:
--stop-tokens '<end_of_turn>'
The stop string is encoded with special tokens enabled. Because Gemma 3 automatically prepends
<bos>, this produces[2, 106] instead of [106]. That sequence is unlikely to occur during generation and therefore does not stop it.
Expected behavior
Gemma 3 generation should stop when either of these tokens is generated:
<eos>: ID 1<end_of_turn>: ID 106Custom stop strings should be encoded without automatically adding BOS/EOS tokens.
Relevant code
StopSequences.init(for:)only includestokenizer.eosTokenId:swift/Sources/CoreAILanguageModels/DecodingStrategies/DecodingStrategy.swift
The Foundation Models integration only checks
tokenizer.eosTokenId:swift/Sources/CoreAILanguageModels/LanguageModel/CoreAILanguageModel.swift
CLI stop strings are encoded with special tokens enabled:
swift/Sources/Tools/llm-runner/LLMRunnerMain.swift