Skip to content

Seeking training recipe/advice for SDPO on Math tasks #26

@AzureStarz

Description

@AzureStarz

Could you please share a training recipe for applying SDPO to math tasks?

Currently, I am training Qwen2.5-3B-Instruct on a Math training split using your SDPO implementation, but the val-core metric keeps degrading throughout the training progress. I have already tried swapping in other models and different datasets, but the training still isn't working as expected.

I would love to know if you have any empirical experience, recommended hyperparameters, or general advice for adapting your method successfully to math-heavy tasks. Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions