insert the gt to completions

I've noticed that in both `grpo_trainer_mmp.py` and `grpo_trainer_aid.py`, there is an operation of inserting `gt` into `completions`. I'm wondering if `gt` here refers to the standard ground truth, and if it includes the "think" part. Additionally, what's the purpose of this operation? Can it contribute to the model update? In my understanding, simply performing a replacement might not have a proper effect on gradient backpropagation.
<img width="477" height="136" alt="Image" src="https://github.com/user-attachments/assets/d5d95066-8360-4212-9cfa-16a5589c5f90" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

insert the gt to completions #247

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

insert the gt to completions #247

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions