DSPO: Stable and Efficient Policy Optimization for Agentic Search and Reasoning

Official repository for the paper:

DSPO: Stable and Efficient Policy Optimization for Agentic Search and Reasoning

Paper

arXiv: https://arxiv.org/abs/2510.09255
Title: DSPO: Stable and Efficient Policy Optimization for Agentic Search and Reasoning
Authors: Chenyang Gu, Yewen Pu, Bruce Yang, Xiaofan Li, Huan Gao
Version: arXiv v4, revised on March 19, 2026

Overview

Dynamic-filter Sequence-level Policy Optimization (DSPO) is an RL algorithm designed for stable and efficient agentic search and reasoning.

DSPO trains models to interleave multi-turn search and reasoning through reinforcement learning, using sequence-level optimization and dynamic sample filtering to improve training stability and performance.

News

[2026-03-19] Paper updated to arXiv v4.
[2025-10-10] Paper first released on arXiv.

Repository Status

Code and training details will be released soon.

Citation

If you find this work useful, please cite:

@article{gu2025dspo,
  title={DSPO: Stable and Efficient Policy Optimization for Agentic Search and Reasoning},
  author={Gu, Chenyang and Pu, Yewen and Yang, Bruce and Li, Xiaofan and Gao, Huan},
  journal={arXiv preprint arXiv:2510.09255},
  year={2025}
}

License

The paper and figures are licensed under CC BY-SA 4.0.

Code, if released, will be licensed separately.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
2510.09255v4.pdf		2510.09255v4.pdf
CITATION.cff		CITATION.cff
LICENSE-paper		LICENSE-paper
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DSPO: Stable and Efficient Policy Optimization for Agentic Search and Reasoning

Paper

Overview

News

Repository Status

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

DSPO: Stable and Efficient Policy Optimization for Agentic Search and Reasoning

Paper

Overview

News

Repository Status

Citation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages