Skip to content

fix: strip <think> tokens from reasoning model output#617

Open
benjamin7007 wants to merge 1 commit into
OpenBMB:mainfrom
benjamin7007:fix/strip-thinking-tokens
Open

fix: strip <think> tokens from reasoning model output#617
benjamin7007 wants to merge 1 commit into
OpenBMB:mainfrom
benjamin7007:fix/strip-thinking-tokens

Conversation

@benjamin7007
Copy link
Copy Markdown

Problem

Models with thinking/reasoning capabilities (DeepSeek-R1, MiniMax-M2.7, QwQ, Qwen3, etc.) include <think>...</think> blocks in their response content when used via OpenAI-compatible API endpoints. These internal reasoning tokens leak into:

  1. Agent output — downstream nodes receive thinking tokens as part of the input
  2. Timeline content — execution logs show raw thinking blocks
  3. Final workflow result — end users see <think> tags in the output

Root Cause

OpenAIProvider._deserialize_chat_response() and _append_chat_response_output() pass raw content from model responses without filtering reasoning tokens.

Fix

Add _strip_thinking_tokens() classmethod to OpenAIProvider:

  • Uses regex <think>.*?</think>\s* with re.DOTALL to strip thinking blocks
  • Fast path: skips regex if <think> substring not found (zero-cost for non-thinking models)
  • Applied in both deserialization paths (_deserialize_chat_response and _append_chat_response_output)

Testing

Verified with MiniMax-M2.7 (thinking model) in a Writer→Reviewer workflow:

  • Before fix: <think> blocks leaked into Reviewer input and final output
  • After fix: Clean output, no thinking tokens visible

Notes

  • This is a minimal, targeted fix in the OpenAI provider only
  • The Gemini provider uses a different content structure (MessageBlock) and would need separate handling if Gemini models add thinking tokens
  • No existing tests were broken

Models with thinking/reasoning capabilities (DeepSeek-R1, MiniMax-M2.7,
QwQ, etc.) include <think>...</think> blocks in their response content.
These internal reasoning tokens leak into agent output and downstream
node inputs, corrupting the workflow.

Add _strip_thinking_tokens() classmethod to OpenAIProvider that filters
<think>...</think> blocks via regex. Applied in both:
- _deserialize_chat_response() (Message content)
- _append_chat_response_output() (timeline content)

The fix is zero-cost for models without thinking tokens (fast path
checks for '<think>' substring before regex).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant