v0.8.2: CLI cleanup — push/pull + drop reward/init by adithya-s-k · Pull Request #17 · huggingface/Repo2RLEnv

adithya-s-k · 2026-05-14T09:34:55Z

Closes #16.

End-to-end-validated via a live HF Hub round-trip on pallets/click (no E2B / cloud sandbox).

What's changing

Action	Command
Add	`repo2rlenv push <local-dir> <hf://owner/dataset>` — supports `--private`, `--message`
Add	`repo2rlenv pull <hf://owner/dataset> [<local-dir>]` — supports `--task`, `--force`
Remove	`repo2rlenv reward` (Python function `repo2rlenv.reward.calculate_diff_similarity_reward` stays for training-loop users)
Remove	`repo2rlenv init` (replaced by README example)
Soft-deprecate	`--out hf://...` magic inside `generate` — emits a warning; to be removed in v0.9

Resulting CLI (5 commands, every one load-bearing):

repo2rlenv generate    # emit Harbor task spec → local dir
repo2rlenv validate    # check spec conformance
repo2rlenv bootstrap   # build a Docker image only (no synthesis)
repo2rlenv push        # NEW: local dir → HF Hub
repo2rlenv pull        # NEW: HF Hub → local dir

Live round-trip evidence (Phase 4, all green)

Real Hub round-trip captured in plans/v0.8.2_e2e.md:

Step	Result
`generate` (pr_runtime on `pallets/click`)	✅ 4/8 tasks emitted
`validate` (local)	✅ 4/4 valid
`push hf://AdithyaSK/click-r2e-v082`	✅ commit `82de5c57`, registry.json published
`pull hf://AdithyaSK/click-r2e-v082 ... --force`	✅ 4 tasks materialized
`validate` (pulled copy)	✅ 4/4 valid
`harbor run --agent oracle --path <pulled task>`	✅ Mean = 1.000 in 5s

Harness bug surfaced + fixed during e2e

Initial Harbor oracle attempt returned Mean = 0.0 — root cause was the bootstrap agent saving test_cmds with a trailing | head -50 pipe, so targeted_test_cmds_for_pr appended test files after the pipe → broken shell. Fix: normalize_test_cmds_for_runtime now strips | head/tail, 2>&1, > /dev/null before per-runner normalization. 4 regression tests added.

Test plan

Implementation complete — push/pull commands wired
23 new tests for push/pull argparse + URI parsing + normalize regex
All 464 unit tests pass; ruff check + ruff format --check clean
Live round-trip completed successfully against AdithyaSK/click-r2e-v082
Docs: README.md, CLAUDE.md, docs/quickstart.md, docs/pipelines/pr_diff.md, docs/reference/API.md, docs/reference/SPEC.md all updated
CI green (lint + matrix test 3.12/3.13/3.14 + build) — auto on push

After the v0.8.1 e2e runs, the actual user workflow is repo2rlenv generate → repo2rlenv validate → repo2rlenv push → harbor run so the existing CLI had three pieces of dead weight: - `repo2rlenv reward` — diff-similarity scoring as a CLI command is misleading. The real reward signal comes from `harbor run`. Diff similarity is useful for RL training loops, but those should call `repo2rlenv.reward.calculate_diff_similarity_reward()` directly from Python, not shell out per rollout. - `repo2rlenv init` — wrote a stale 30-line YAML template nobody used. Same job done better by a README example. - `generate --out hf://...` magic — hidden push baked into the destination string; you couldn't re-push or push an existing local dir without re-running generation. This commit: - Adds `repo2rlenv push <local-dir> <hf://owner/dataset>` — explicit publish, supports `--private` and `--message`. Wraps the existing `hub.push_to_hub`. - Adds `repo2rlenv pull <hf://owner/dataset> [<local-dir>]` — fetch from Hub. Supports `--task <name>` (single task) and `--force`. New `hub.pull_from_hub` helper wraps `huggingface_hub.snapshot_download` and flattens the staged `tasks/<id>/` layout back to `<dir>/<id>/`. - Removes `repo2rlenv reward` (the Python function stays for training loops — only the CLI wrapper goes away). - Removes `repo2rlenv init` + the `_SAMPLE_CONFIG` template string. - Soft-deprecates `generate --out hf://...` with a one-version warning; to be removed in v0.9. Tests: 19 new in `tests/test_cli_push_pull.py` covering URI parsing (both `hf://owner/name` and bare `owner/name`, plus malformed-input rejection) + cmd_push / cmd_pull argument plumbing with mocked Hub I/O. 461/461 pass; ruff + format clean. Docs: README, CLAUDE.md, docs/quickstart.md, docs/pipelines/pr_diff.md, docs/reference/API.md, docs/reference/SPEC.md all updated. New flow documented end-to-end (generate → validate → push → pull → harbor).

Live e2e on pallets/click surfaced a real bug: the bootstrap agent sometimes saves test_cmds with a tail-truncator pipe like python -m pytest -q 2>&1 | head -50 and `targeted_test_cmds_for_pr` then appended PR test files at the end, landing them AFTER the pipe and breaking the shell: cd /workspace && python -m pytest -q 2>&1 | head -50 -v tests/... ^^^^^^^^^^^^ args land here Side effect of v0.8.1's STOP CONDITION prompt: telling the agent to SAVE_SETUP early led some agents to embed the same truncator they were using to keep their own diagnostic output short. Fix in `normalize_test_cmds_for_runtime`: strip `| head/tail [...]`, `2>&1`, `> /dev/null`, `&> /dev/null` BEFORE any per-runner normalization. 4 regression tests added. Confirmed live: harbor run --agent oracle on the pallets/click task that was returning Mean=0.0 now returns Mean=1.000 in 5s.

adithya-s-k added 4 commits May 14, 2026 15:04

v0.8.2 — CLI cleanup (push/pull/drop reward+init): scaffold branch

73e08bd

v0.8.2

b11faa4

adithya-s-k marked this pull request as ready for review May 14, 2026 09:55

tests: ruff format pass on pr_runtime test additions

8c6733a

adithya-s-k merged commit 870d708 into main May 14, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.8.2: CLI cleanup — push/pull + drop reward/init#17

v0.8.2: CLI cleanup — push/pull + drop reward/init#17
adithya-s-k merged 5 commits into
mainfrom
v0.8.2-cli-cleanup

adithya-s-k commented May 14, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

adithya-s-k commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's changing

Live round-trip evidence (Phase 4, all green)

Harness bug surfaced + fixed during e2e

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

adithya-s-k commented May 14, 2026 •

edited

Loading