Skip to content

grad_norm is close to 0 while loss remains normal, validation performance stagnates on Ascend 910C #38

@ZtZhang-SCUT

Description

@ZtZhang-SCUT

Hi authors,
Thank you for the impressive work on SDPO. I am currently trying to reproduce the results on the Tool Use task using the official codebase and the default configuration provided. My experiments are conducted on Ascend 910C accelerators.

Observations:

Image

Loss vs. Gradient Norm Mismatch: The sdpo_loss stays within a seemingly normal range (typically ~0.0-0.2), which initially suggests stable optimization. However, the gradient norm (grad_norm) consistently drops to ~10^-5, which is orders of magnitude smaller than the values reported in Figure 18 of the paper (where it hovers between 0 and 20 althougt it's on LCBv6) and the wandb log.

Image

Validation Performance: Correspondingly, validation metrics (accuracy/pass rate) fluctuate randomly without any upward trend, indicating that model parameters are effectively not updating.
Stability Concern: The collapse in gradient flow happens early and persists, suggesting the model is stuck in a plateau rather than converging.
Questions:
Is a grad_norm of closing to 0 expected behavior in SDPO, or does it indicate a gradient vanishing/collapse issue? The training curves in the paper show significantly larger norms.

Any insights or suggestions would be greatly appreciated!

Environment:
Hardware: Ascend 910C
Model: Olmo3-7b
Task: Tool Use
Framework: PyTorch + CANN / verl
Config: Default settings from the repo

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions