Feature request
Add an optional response_schema= parameter to add_response_schema and GRPOConfig/GRPOTrainer. When passed, TRL skips template fingerprinting and assigns the schema directly onto the inner tokenizer, the same thing the workaround does manually today, just first-class:
GRPOTrainer(..., response_schema=qwen3_5_schema)
# or
add_response_schema(tokenizer, response_schema=qwen3_5_schema)
Motivation
add_response_schema currently identifies tokenizers by exact string equality against TRL's vendored .jinja files. Any fork of a supported model with legitimate edits to the template falls outside TRL's recognition and crashes:
ValueError: Unrecognized chat template, failed to add response schema...
Concretely, Qwen/Qwen3.5-4B works because its chat_template matches TRL's vendored file. unsloth/Qwen3.5-4B has modified the template so it fails, even though the schema that should be applied is the same.
from transformers import AutoTokenizer
from trl.chat_template_utils import add_response_schema
import json
from huggingface_hub import hf_hub_download
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3.5-4B")
add_response_schema(tokenizer) # OK
config_path = hf_hub_download("unsloth/Qwen3.5-4B", "tokenizer_config.json")
with open(config_path) as f:
tokenizer.chat_template = json.load(f)["chat_template"]
add_response_schema(tokenizer) # ValueError
The workaround is tokenizer.response_schema = qwen3_5_schema before constructing the trainer, but this is undiscoverable, nothing in the error message points to it, and it leaks trainer internals to the caller.
Your contribution
Happy to submit a PR for this :)
Feature request
Add an optional
response_schema=parameter toadd_response_schemaandGRPOConfig/GRPOTrainer. When passed, TRL skips template fingerprinting and assigns the schema directly onto the inner tokenizer, the same thing the workaround does manually today, just first-class:Motivation
add_response_schemacurrently identifies tokenizers by exact string equality against TRL's vendored.jinjafiles. Any fork of a supported model with legitimate edits to the template falls outside TRL's recognition and crashes:Concretely,
Qwen/Qwen3.5-4Bworks because itschat_templatematches TRL's vendored file.unsloth/Qwen3.5-4Bhas modified the template so it fails, even though the schema that should be applied is the same.The workaround is
tokenizer.response_schema = qwen3_5_schemabefore constructing the trainer, but this is undiscoverable, nothing in the error message points to it, and it leaks trainer internals to the caller.Your contribution
Happy to submit a PR for this :)