Skip to content

Whiterrrrr/value-stitching

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

This repository contains the official implementation of VAST (Horizon Adaptive Offline Policy Learning via VAlue STitching) designed for long-horizon, complex offline RL tasks.

Overview

Traditional TD-based value learning relies on fixed-step backups, however, often fail to capture the complex temporal structure of long-horizon, multi-stage tasks. VAST overcomes this limitation by coupling value optimization with

  • a future-conditioned auxiliary value function,
  • a stitching policy that optimally selects the reward maximizing future.

VAST enables direct estimation and compositional "stitching" of variable-length returns grounded in actionable sub-goal states, providing an accurate and greedily exploitable value-supervision signal for offline policy optimization.

To Do List

The code will be released gradually.

  • arxiv released
  • initial repo

About

Horizon Adaptive Offline Policy Learning via Value Stitching

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors