grad_norm is close to 0 while loss remains normal, validation performance stagnates on Ascend 910C

Hi authors,
Thank you for the impressive work on SDPO. I am currently trying to reproduce the results on the Tool Use task using the official codebase and the default configuration provided. My experiments are conducted on Ascend 910C accelerators.

Observations:

<img width="2030" height="373" alt="Image" src="https://github.com/user-attachments/assets/05259fb9-357b-4ed2-b7a4-ff633a22b5dc" />

Loss vs. Gradient Norm Mismatch: The sdpo_loss stays within a seemingly normal range (typically ~0.0-0.2), which initially suggests stable optimization. However, the gradient norm (grad_norm) consistently drops to ~10^-5, which is orders of magnitude smaller than the values reported in Figure 18 of the paper (where it hovers between 0 and 20 althougt it's on LCBv6) and the wandb log.

![Image](https://github.com/user-attachments/assets/cc66709a-a9a2-45d8-a271-fbe8d4d457f0)

Validation Performance: Correspondingly, validation metrics (accuracy/pass rate) fluctuate randomly without any upward trend, indicating that model parameters are effectively not updating.
Stability Concern: The collapse in gradient flow happens early and persists, suggesting the model is stuck in a plateau rather than converging.
Questions:
Is a grad_norm of closing to 0 expected behavior in SDPO, or does it indicate a gradient vanishing/collapse issue? The training curves in the paper show significantly larger norms.

Any insights or suggestions would be greatly appreciated!

Environment:
Hardware: Ascend 910C
Model: Olmo3-7b
Task: Tool Use
Framework: PyTorch + CANN / verl
Config: Default settings from the repo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

grad_norm is close to 0 while loss remains normal, validation performance stagnates on Ascend 910C #38

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

grad_norm is close to 0 while loss remains normal, validation performance stagnates on Ascend 910C #38

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions