Skip to content

AgnesAI-Labs/DSPO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

DSPO: Stable and Efficient Policy Optimization for Agentic Search and Reasoning

Official repository for the paper:

DSPO: Stable and Efficient Policy Optimization for Agentic Search and Reasoning

Paper

  • arXiv: https://arxiv.org/abs/2510.09255
  • Title: DSPO: Stable and Efficient Policy Optimization for Agentic Search and Reasoning
  • Authors: Chenyang Gu, Yewen Pu, Bruce Yang, Xiaofan Li, Huan Gao
  • Version: arXiv v4, revised on March 19, 2026

Overview

Dynamic-filter Sequence-level Policy Optimization (DSPO) is an RL algorithm designed for stable and efficient agentic search and reasoning.

DSPO trains models to interleave multi-turn search and reasoning through reinforcement learning, using sequence-level optimization and dynamic sample filtering to improve training stability and performance.

News

  • [2026-03-19] Paper updated to arXiv v4.
  • [2025-10-10] Paper first released on arXiv.

Repository Status

Code and training details will be released soon.

Citation

If you find this work useful, please cite:

@article{gu2025dspo,
  title={DSPO: Stable and Efficient Policy Optimization for Agentic Search and Reasoning},
  author={Gu, Chenyang and Pu, Yewen and Yang, Bruce and Li, Xiaofan and Gao, Huan},
  journal={arXiv preprint arXiv:2510.09255},
  year={2025}
}

License

The paper and figures are licensed under CC BY-SA 4.0.

Code, if released, will be licensed separately.

About

Official repository for DSPO: Stable and Efficient Policy Optimization for Agentic Search and Reasoning.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors