Skip to content

insert the gt to completions #247

Description

@Justarrrrr

I've noticed that in both grpo_trainer_mmp.py and grpo_trainer_aid.py, there is an operation of inserting gt into completions. I'm wondering if gt here refers to the standard ground truth, and if it includes the "think" part. Additionally, what's the purpose of this operation? Can it contribute to the model update? In my understanding, simply performing a replacement might not have a proper effect on gradient backpropagation.
Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions