Skip to content

v0.8.2: CLI cleanup — push/pull + drop reward/init#17

Merged
adithya-s-k merged 5 commits into
mainfrom
v0.8.2-cli-cleanup
May 14, 2026
Merged

v0.8.2: CLI cleanup — push/pull + drop reward/init#17
adithya-s-k merged 5 commits into
mainfrom
v0.8.2-cli-cleanup

Conversation

@adithya-s-k
Copy link
Copy Markdown
Collaborator

@adithya-s-k adithya-s-k commented May 14, 2026

Closes #16.

End-to-end-validated via a live HF Hub round-trip on pallets/click (no E2B / cloud sandbox).

What's changing

Action Command
Add repo2rlenv push <local-dir> <hf://owner/dataset> — supports --private, --message
Add repo2rlenv pull <hf://owner/dataset> [<local-dir>] — supports --task, --force
Remove repo2rlenv reward (Python function repo2rlenv.reward.calculate_diff_similarity_reward stays for training-loop users)
Remove repo2rlenv init (replaced by README example)
Soft-deprecate --out hf://... magic inside generate — emits a warning; to be removed in v0.9

Resulting CLI (5 commands, every one load-bearing):

repo2rlenv generate    # emit Harbor task spec → local dir
repo2rlenv validate    # check spec conformance
repo2rlenv bootstrap   # build a Docker image only (no synthesis)
repo2rlenv push        # NEW: local dir → HF Hub
repo2rlenv pull        # NEW: HF Hub → local dir

Live round-trip evidence (Phase 4, all green)

Real Hub round-trip captured in plans/v0.8.2_e2e.md:

Step Result
generate (pr_runtime on pallets/click) ✅ 4/8 tasks emitted
validate (local) ✅ 4/4 valid
push hf://AdithyaSK/click-r2e-v082 ✅ commit 82de5c57, registry.json published
pull hf://AdithyaSK/click-r2e-v082 ... --force ✅ 4 tasks materialized
validate (pulled copy) ✅ 4/4 valid
harbor run --agent oracle --path <pulled task> Mean = 1.000 in 5s

Harness bug surfaced + fixed during e2e

Initial Harbor oracle attempt returned Mean = 0.0 — root cause was the bootstrap agent saving test_cmds with a trailing | head -50 pipe, so targeted_test_cmds_for_pr appended test files after the pipe → broken shell. Fix: normalize_test_cmds_for_runtime now strips | head/tail, 2>&1, > /dev/null before per-runner normalization. 4 regression tests added.

Test plan

  • Implementation complete — push/pull commands wired
  • 23 new tests for push/pull argparse + URI parsing + normalize regex
  • All 464 unit tests pass; ruff check + ruff format --check clean
  • Live round-trip completed successfully against AdithyaSK/click-r2e-v082
  • Docs: README.md, CLAUDE.md, docs/quickstart.md, docs/pipelines/pr_diff.md, docs/reference/API.md, docs/reference/SPEC.md all updated
  • CI green (lint + matrix test 3.12/3.13/3.14 + build) — auto on push

After the v0.8.1 e2e runs, the actual user workflow is

  repo2rlenv generate → repo2rlenv validate → repo2rlenv push → harbor run

so the existing CLI had three pieces of dead weight:

- `repo2rlenv reward` — diff-similarity scoring as a CLI command is
  misleading. The real reward signal comes from `harbor run`. Diff
  similarity is useful for RL training loops, but those should call
  `repo2rlenv.reward.calculate_diff_similarity_reward()` directly from
  Python, not shell out per rollout.
- `repo2rlenv init` — wrote a stale 30-line YAML template nobody used.
  Same job done better by a README example.
- `generate --out hf://...` magic — hidden push baked into the destination
  string; you couldn't re-push or push an existing local dir without
  re-running generation.

This commit:

- Adds `repo2rlenv push <local-dir> <hf://owner/dataset>` — explicit
  publish, supports `--private` and `--message`. Wraps the existing
  `hub.push_to_hub`.
- Adds `repo2rlenv pull <hf://owner/dataset> [<local-dir>]` — fetch from
  Hub. Supports `--task <name>` (single task) and `--force`. New
  `hub.pull_from_hub` helper wraps `huggingface_hub.snapshot_download`
  and flattens the staged `tasks/<id>/` layout back to `<dir>/<id>/`.
- Removes `repo2rlenv reward` (the Python function stays for training
  loops — only the CLI wrapper goes away).
- Removes `repo2rlenv init` + the `_SAMPLE_CONFIG` template string.
- Soft-deprecates `generate --out hf://...` with a one-version warning;
  to be removed in v0.9.

Tests: 19 new in `tests/test_cli_push_pull.py` covering URI parsing
(both `hf://owner/name` and bare `owner/name`, plus malformed-input
rejection) + cmd_push / cmd_pull argument plumbing with mocked Hub I/O.
461/461 pass; ruff + format clean.

Docs: README, CLAUDE.md, docs/quickstart.md, docs/pipelines/pr_diff.md,
docs/reference/API.md, docs/reference/SPEC.md all updated. New flow
documented end-to-end (generate → validate → push → pull → harbor).
Live e2e on pallets/click surfaced a real bug: the bootstrap agent
sometimes saves test_cmds with a tail-truncator pipe like
  python -m pytest -q 2>&1 | head -50
and `targeted_test_cmds_for_pr` then appended PR test files at the end,
landing them AFTER the pipe and breaking the shell:
  cd /workspace && python -m pytest -q 2>&1 | head -50 -v tests/...
                                            ^^^^^^^^^^^^ args land here

Side effect of v0.8.1's STOP CONDITION prompt: telling the agent to
SAVE_SETUP early led some agents to embed the same truncator they were
using to keep their own diagnostic output short.

Fix in `normalize_test_cmds_for_runtime`: strip `| head/tail [...]`,
`2>&1`, `> /dev/null`, `&> /dev/null` BEFORE any per-runner normalization.
4 regression tests added.

Confirmed live: harbor run --agent oracle on the pallets/click task that
was returning Mean=0.0 now returns Mean=1.000 in 5s.
@adithya-s-k adithya-s-k marked this pull request as ready for review May 14, 2026 09:55
@adithya-s-k adithya-s-k merged commit 870d708 into main May 14, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

v0.8.2: CLI cleanup — explicit push/pull + drop reward/init

1 participant