Skip to content

Question about topk distillation #10

@Nebularaid2000

Description

@Nebularaid2000

In SDPO, topk tokens from the student model are used to compute KL. However, in Openclaw-RL, topk tokens from the teacher model are used to compute KL. I wonder which one shall we follow and why?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions