This repository contains the official implementation of VAST (Horizon Adaptive Offline Policy Learning via VAlue STitching) designed for long-horizon, complex offline RL tasks.
Traditional TD-based value learning relies on fixed-step backups, however, often fail to capture the complex temporal structure of long-horizon, multi-stage tasks. VAST overcomes this limitation by coupling value optimization with
- a future-conditioned auxiliary value function,
- a stitching policy that optimally selects the reward maximizing future.
VAST enables direct estimation and compositional "stitching" of variable-length returns grounded in actionable sub-goal states, providing an accurate and greedily exploitable value-supervision signal for offline policy optimization.
The code will be released gradually.
- arxiv released
- initial repo
