Robust Reward-Free Exploration under Distributional Drift

Most reinforcement learning methods rely on dense, well-shaped rewards, which are often unavailable, biased, or expensive to engineer. Reward-free exploration (RFE) instead learns good state representations or exploratory policies without using rewards, and only introduces rewards later for downstream tasks. However, most existing RFE frameworks assume a static environment. In realistic settings, the data distribution can drift over time: goals move, dynamics change, or noise increases. This project explores how reward-free methods behave under such distributional drift.

High Level Workflow

Build a synthetic MDP (GridWorld) with tunable drift.
Pretrain reward-free*representations using UCRL-RFE and baselines.
Introduce downstream tasks with rewards after pretraining.
Compare how quickly / robustly different representations adapt under drift.

Goals

Environment: Drift-Enabled GridWorld

Implement a small GridWorld-style MDP with:
- Drift strength: how much the transition or reward structure changes.
- Drift schedule: when drift happens (e.g., sudden jump, gradual shift, periodic).
- Examples:
  - Shifting goal locations.
  - Changing transition noise.
  - Altering blocked cells or wall layouts.
Reward-Free Exploration with UCRL-RFE

Apply a reward-free exploration algorithm (e.g. UCRL-RFE) to:
- Collect trajectories without rewards.
- Maximize state coverage.
- Produce a replay buffer or dataset for later representation learning.
Representation Pretraining

Train two families of state encoders:
- Fixed-environment encoder
  - Pretrained on data from a single (or early) environment configuration.
  - Ignores later drift during pretraining.
- Drift-aware encoder**
  - Pretrained across time with drifting dynamics.
  - May condition on time, drift index, or inferred context.
  - Goal: learn stable or adaptable features under nonstationarity.
Downstream Rewarded Tasks

After pretraining:
- Introduce explicit reward functions
- Train simple RL agents
- Evaluate:
  - Learning speed
  - Final performance / Reward Earned
  - Representation stability
Baselines and Comparisons

Metrics to track:
- Coverage of state space over time.
- Downstream sample efficiency.
- Performance drop when drift occurs.
- Representation similarity / drift across environments.

Installation

pip install -r requirements.txt

Usage

python run.py

This runs the complete pipeline:

Reward-free exploration
Representation training (fixed and drift-aware encoders)
Downstream RL training
Evaluation with all metrics

Results are saved to results/final_results.png

Project Structure

rfe-drift-gridworld/
├── rfe_drift/          # Core implementation
│   ├── env/            # DriftGridWorld environment
│   ├── exploration/    # UCRL-RFE algorithm
│   ├── representations/# State encoders
│   ├── rl/             # RL agents (Q-Learning, DQN)
│   └── utils/          # Metrics, rewards, visualization
├── run.py              # Main script
├── requirements.txt    # Dependencies
└── README.md           # This file

Metrics Tracked

State Coverage: Fraction of state space explored during RFE
Sample Efficiency: Learning speed in downstream tasks
Performance Drop: Reward before vs after drift (key metric!)
Robustness: How well agents adapt to environmental changes

Configuration

Edit the CONFIG dict in run.py to adjust parameters:

CONFIG = {
    "grid_size": 10,              # Grid dimensions
    "drift_strength": 0.7,         # How much environment changes (0-1)
    "drift_time": 200,             # When drift occurs (in steps)
    "num_exploration_steps": 10000,# Reward-free exploration
    "num_train_episodes": 300,     # Downstream RL training
    "num_eval_episodes": 50,       # Evaluation episodes
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
rfe_drift		rfe_drift
scripts		scripts
.gitignore		.gitignore
README.md		README.md
Robust Reward-Free Exploration under Distributional Drift.pdf		Robust Reward-Free Exploration under Distributional Drift.pdf
USAGE.md		USAGE.md
requirements.txt		requirements.txt
rfe_drift.py		rfe_drift.py
run.py		run.py
run_drift_experiments.py		run_drift_experiments.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Robust Reward-Free Exploration under Distributional Drift

High Level Workflow

Goals

Installation

Usage

Project Structure

Metrics Tracked

Configuration

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Robust Reward-Free Exploration under Distributional Drift

High Level Workflow

Goals

Installation

Usage

Project Structure

Metrics Tracked

Configuration

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages