Skip to content

v0.8.2.post1: multi-source pull (HF + Harbor + GitHub) + bare-name#19

Merged
adithya-s-k merged 5 commits into
mainfrom
v0.8.2.post1-multi-source-pull
May 14, 2026
Merged

v0.8.2.post1: multi-source pull (HF + Harbor + GitHub) + bare-name#19
adithya-s-k merged 5 commits into
mainfrom
v0.8.2.post1-multi-source-pull

Conversation

@adithya-s-k
Copy link
Copy Markdown
Collaborator

@adithya-s-k adithya-s-k commented May 14, 2026

Closes #18.

Live four-route e2e completed against the real HF Hub + GitHub + Harbor public registry — captured in `plans/v0.8.2.post1_e2e.md`.

CLI changes — one verb, four sources

```bash

HF Hub (default)

repo2rlenv pull click-r2e # bare → whoami auto-resolves owner
repo2rlenv pull AdithyaSK/click-r2e # owner/name
repo2rlenv pull AdithyaSK/click-r2e@v1.0 # @ revision pinning
repo2rlenv pull hf://AdithyaSK/click-r2e # explicit-prefix form

Harbor registry (NEW)

repo2rlenv pull harbor://cookbook/test # org/name form (verified live)
repo2rlenv pull harbor://swe-bench@lite # with version tag
repo2rlenv pull harbor://x --registry-url # custom registry

GitHub (NEW)

repo2rlenv pull gh://owner/repo # scheme form
repo2rlenv pull gh://owner/repo@main # with branch/tag/SHA pin
repo2rlenv pull https://github.com/owner/repo # full URL also accepted
```

Push stays HF-only. `push ./local harbor://x` and `push ./local gh://x` emit clear redirects to `harbor publish` / `git push`.

Live e2e evidence

Route Outcome
HF bare-name roundtrip Push `click-r2e-v082post1` → resolves to `AdithyaSK/click-r2e-v082post1`. Pull back. Validate (2/2). Harbor oracle Mean=1.000 in 7s.
GitHub `gh://` Pull from seeded `adithya-s-k/r2e-tasks-smoke`. Validate (2/2). Harbor oracle Mean=1.000 in 5s.
GitHub full URL + `@ref` `https://github.com/adithya-s-k/r2e-tasks-smoke@main\` works identically.
Harbor public registry `harbor://cookbook/test` pulled 1 real task from hub.harborframework.com. Validates as a well-formed Harbor task.
Push error UX `harbor://` and `gh://` schemes emit friendly redirects pointing at the right tool (exit code 2).

Real finding surfaced + fixed during e2e

Harbor's public registry uses `/` URIs (`cookbook/test`, `scale-ai/swe-atlas-qna`, `cais/swebenchpro`) — not bare names. Original parser rejected slashes inside `harbor://`. Fixed: parser now accepts both bare and org/name forms. 3 new test cases.

Why `0.8.2.post1` (not v0.8.3)

Framing this as completing the v0.8.2 CLI cleanup arc, not a new feature release. Sort order: `0.8.2 < 0.8.2.post1 < 0.8.3` — a future genuine v0.8.3 is unaffected.

Test plan

  • 488/488 unit tests pass (was 464; +24 new — multi-backend URI parser, all 3 backend dispatches, error UX, layout detection, back-compat)
  • `ruff check` clean; `ruff format --check` clean
  • Live HF roundtrip with bare-name resolution
  • Live GitHub pull (gh:// + https URL, with and without @ref)
  • Live Harbor pull from the real public registry
  • Push error UX for harbor:// and gh:// schemes
  • CI green across lint + matrix test 3.12/3.13/3.14 + sdist+wheel build

After v0.8.2 shipped, the natural follow-ups were:
1. Drop the `hf://` ceremony for the common case (`owner/name` should just work)
2. Support bare names (`my-dataset` → resolve owner via whoami)
3. Pull from Harbor's registry + GitHub, not just HF Hub

This finishes the CLI cleanup arc that v0.8.2 started.

New URI dispatcher accepts:
  name                       → HF (whoami resolves owner)
  owner/name                 → HF
  owner/name@<rev>           → HF (specific revision)
  hf://owner/name[@rev]      → HF, explicit
  harbor://name[@tag]        → Harbor registry (shells out to harbor)
  gh://owner/repo[@ref]      → GitHub (git clone --depth 1)
  https://github.com/...     → GitHub (full URL accepted)

`cmd_pull` routes to the right backend; new `pull_from_harbor` and
`pull_from_github` helpers in `hub.py` flatten the downloaded layout to
the standard `<local-dir>/<task-id>/...` so the result is immediately
consumable by `repo2rlenv validate` and `harbor run --path`.

`cmd_push` accepts the same parser but only allows HF as target.
`push ./local harbor://x` and `push ./local gh://x` get clear redirects
pointing at `harbor publish` / `git push` instead of half-implementing
those flows.

Pull-specific flag: `--registry-url` for custom Harbor registries.
Pull-specific behavior: `--task <name>` is HF-only (filters
allow_patterns on snapshot_download); ignored for Harbor / GitHub.

Tests: rewrote tests/test_cli_push_pull.py to cover all 4 backends +
revision pinning + error UX (40 tests). All 485 unit tests pass; ruff
+ format clean.

Why v0.8.2.post1 (not v0.8.3): framing this as completing the v0.8.2
CLI cleanup, not a new feature release. Sort order is correct:
0.8.2 < 0.8.2.post1 < 0.8.3.
Live e2e against Harbor's public registry (hub.harborframework.com)
surfaced that real datasets use the `<org>/<name>` URI form — e.g.
`cookbook/test`, `scale-ai/swe-atlas-qna`, `cais/swebenchpro` — not
bare names. Original parser rejected slashes inside `harbor://`.

Fix: parser now accepts both:
  harbor://name         (bare / legacy / convenience)
  harbor://org/name     (the actual registry form)

Both with optional `@tag` version suffix. 3 new test cases added; full
e2e confirmed working with `harbor://cookbook/test` pulling 1 real task
from the public registry.
@adithya-s-k adithya-s-k marked this pull request as ready for review May 14, 2026 10:31
@adithya-s-k adithya-s-k merged commit 35c7efd into main May 14, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

v0.8.2.post1: multi-source pull (HF + Harbor + GitHub) + bare-name resolution

1 participant