Skip to content

fix: propagate context reset to LLM service#276

Closed
bhavik-mangla wants to merge 2 commits into
pipecat-ai:mainfrom
bhavik-mangla:fix/context-reset-propagation
Closed

fix: propagate context reset to LLM service#276
bhavik-mangla wants to merge 2 commits into
pipecat-ai:mainfrom
bhavik-mangla:fix/context-reset-propagation

Conversation

@bhavik-mangla

Copy link
Copy Markdown

When using ContextStrategy.RESET, FlowManager sends an LLMMessagesUpdateFrame. However, it currently omits the run_llm=True flag. In the Pipecat pipeline, the LLMContextAggregator only pushes the updated context downstream if this flag is set. Consequently, the LLM service continues using its internal cached history, making the reset ineffective.

This PR sets run_llm=True on the update frame to ensure the cleared context is properly synchronized with the LLM service.

Sets run_llm=True on LLMMessagesUpdateFrame when resetting context.
This ensures the aggregator pushes the fresh context upstream to the LLM,
preventing the LLM from using its internal cached history.
Copilot AI review requested due to automatic review settings May 29, 2026 21:14

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adjusts how LLM context frames are queued during node updates, distinguishing between update and append frame types with different constructor arguments.

Changes:

  • Replaces the conditional frame type selection with an explicit if/else branch.
  • Passes run_llm=True for LLMMessagesUpdateFrame and omits it for LLMMessagesAppendFrame.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/pipecat_flows/manager.py Outdated
Updates both LLMMessagesUpdateFrame and LLMMessagesAppendFrame to include
run_llm=True. This ensures that any context change (reset or append) is
immediately propagated to the LLM service, maintaining consistency across
all flow transitions.
@bhavik-mangla bhavik-mangla force-pushed the fix/context-reset-propagation branch from 0ddde2a to 6510a0e Compare May 29, 2026 21:37
@bhavik-mangla

Copy link
Copy Markdown
Author

@markbackman Can you review? Thanks

@markbackman

Copy link
Copy Markdown
Contributor

Thanks for the PR! I don't think this change is needed, and I think it would cause a regression.

The reset context already reaches the LLM today. After _update_llm_context() replaces the context via set_messages(), _set_node() queues an LLMRunFrame() that pushes the cleared context downstream, resulting in one inference.

For edge functions, the function result is intentionally sent with run_llm=False so inference is deferred until the transition completes. That gives exactly one inference per transition, not two. Setting run_llm=True on the update/append frame would push the context immediately, and then the existing LLMRunFrame would push it again, causing a double inference.

I ran an example and confirmed ContextStrategy.RESET clears context correctly as-is. If you have a concrete repro where it doesn't, please share it. Otherwise I'd suggest holding off on this change.

@markbackman markbackman closed this Jun 3, 2026
@bhavik-mangla

bhavik-mangla commented Jun 3, 2026

Copy link
Copy Markdown
Author

@markbackman

Thanks for the detailed explanation and for taking the time to review! You are right that for standard transitions with respond_immediately=True, the subsequent LLMRunFrame already pushes the LLMContextFrame downstream, which properly triggers the single inference for stateless models. I missed that secondary trigger in my initial diagnosis, and I agree this PR as written would cause a double-inference regression.

However, the specific failure mode I'm encountering happens when respond_immediately: False is used (which is a supported option in NodeConfig), and the impact is especially noticeable when using stateful, streaming WebRTC models (like OpenAIRealtimeLLMService or GeminiMultimodalLiveService), though it affects stateless models like GoogleLLMService too.

Here is the exact blind spot in _set_node (around line 757):

  1. When respond_immediately: False, FlowManager queues the LLMMessagesUpdateFrame(run_llm=False) but intentionally skips queueing the LLMRunFrame.
  2. The LLMContextAggregator receives the update frame, modifies its internal memory, but pushes nothing downstream (because run_llm is False and there is no LLMRunFrame to flush it).
  3. The Result: The LLM service in the pipeline receives zero frames indicating a context change.

For standard turn-based text bots using stateless models (like GoogleLLMService), this seems fine—the aggregator will eventually push the new context when the user next finishes speaking. But it creates a race condition where the LLM's internal state hasn't actually updated yet.

For streaming realtime services, the bug is critical. Audio frames (InputAudioRawFrame) flow continuously to the LLM. Because the LLM service was never notified of the node transition, it continues processing incoming user audio using its old, "dirty" server-side session and old system prompt until something else happens to trigger a context sync.

Proposed Compromise
To guarantee exactly one synchronization event—and to ensure models are updated immediately even if we defer inference—would you be open to a surgical fix in manager.py?

We could pass run_llm=respond_immediately directly on the update frame, and skip queueing the separate LLMRunFrame.

# manager.py (in _update_llm_context)
frame_type = LLMMessagesUpdateFrame if is_reset else LLMMessagesAppendFrame
frames.append(frame_type(messages=messages, run_llm=respond_immediately))

# Remove the separate LLMRunFrame push in _set_node

This ensures the LLM receives exactly one state update immediately, preventing the double-inference issue while keeping streaming models in sync and respecting the respond_immediately flag. Let me know what you think, and I'd be happy to update the PR to reflect this!

@markbackman

Copy link
Copy Markdown
Contributor

Thanks for digging into this, and for confirming the double-inference point.

On the respond_immediately=False case: that behavior is by design. The whole purpose of respond_immediately=False is to skip inference for that turn, so the LLM intentionally doesn't run and the context isn't pushed. Setting run_llm=True and queueing LLMRunFrame() are really the same trigger, and we deliberately do neither here. The new context isn't lost either, since set_messages() has already updated the aggregator, so it syncs on the next user turn.

In practice respond_immediately=False is meant for the first turn of a conversation, where you want the bot to wait for the user to speak first. I can't think of a mid-conversation case where it applies, and even there the behavior above is correct rather than a bug.

If you have a concrete realtime repro where a node transition leaves the session stale, please open a separate issue with a minimal example and we'll take a look. For this PR I'll keep it closed. Thanks again for the thoughtful investigation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants