Skip to content

refactor(job): hoist on_sandbox_ready backfill to AbstractTrial (#788)#789

Merged
dengwx2009 merged 5 commits into
masterfrom
feature/job2
Apr 14, 2026
Merged

refactor(job): hoist on_sandbox_ready backfill to AbstractTrial (#788)#789
dengwx2009 merged 5 commits into
masterfrom
feature/job2

Conversation

@dengwx2026

Copy link
Copy Markdown
Collaborator

Summary

  • namespace / experiment_id 回填 + 一致性校验从 HarborTrial.on_sandbox_ready 上移到 AbstractTrial.on_sandbox_ready 默认实现,BashTrial 通过继承自动获得同样行为
  • 错误信息用 type(self._config).__name__ 代替硬编码 "HarborJobConfig"
  • 新增 TestBashTrialOnSandboxReady 3 个用例(TDD:先写失败测试,再提升逻辑到基类)
  • docs/dev/job/bench-replacement.md G4 描述更新

fixes #788

Test plan

  • uv run pytest tests/unit/sdk/job/ -q — 143 passed(原 140 + BashTrial 新增 3)
  • TestHarborTrialOnSandboxReady 3 用例保持绿(无回归)
  • TestBashTrialOnSandboxReady 3 用例新增并通过
  • 修复 _make_mock_sandbox()_namespace / _experiment_id 显式置 None,避免 AsyncMock 自动子 mock 触发 ScatterOperator(size=2) 场景下的假 mismatch
  • uv run ruff check / uv run ruff format 通过

🤖 Generated with Claude Code

dengwx2026 and others added 5 commits April 14, 2026 15:12
* docs(job): clarify G1 approach as union+isinstance in bench-replacement

Align docs/dev/job/bench-replacement.md with the design decision for
Task 1: AbstractTrial.collect returns TrialResult | list[TrialResult],
and Job._build_result uses isinstance to decide extend vs append.
BashTrial stays unchanged (single), HarborTrial returns list.

* chore: baseline for bench-replacement gap fixes (222 passing)

* fix(job/G1a): HarborTrial.collect returns all sub-trial results as list

* fix(job/G1b): flatten list-returning collect() into JobResult.trial_results

* refactor(job/G1): drop BaseTrialResult alias in harbor.py, use TrialResult directly

Per user request: remove the local import alias
'from rock.sdk.job.result import TrialResult as BaseTrialResult'
and use TrialResult directly. No behavior change.

* test(job/G1): add multi-trial timeout exception-injection coverage

Addresses code review I1: exception-injection loop on multi-sub-trial
timeout previously untested. New test asserts:
- every sub-trial without its own exception_info gets ProcessTimeout
- pre-existing exception_info on a sub-trial is preserved

* fix(job/G5): populate JobResult.raw_output and exit_code from sandbox obs

* fix(job/G6): write script/nohup output under USER_DEFINED_LOGS, not /tmp

* fix(job/G7): sync HarborJobConfig.auto_stop with environment.auto_stop

OR semantics: if either the top-level HarborJobConfig.auto_stop or the
nested environment.auto_stop is True, both become True. Preserves legacy
'environment.auto_stop=True' usage while letting new code set auto_stop
at the top level.

* fix(job/G3): HarborJobConfig auto-generates job_name from dataset/task + uuid

* fix(job/G2): HarborJobConfig derives effective timeout from agent + multiplier

* fix(job/G4): AbstractTrial.on_sandbox_ready hook + HarborTrial backfills ns/exp_id

* test(job): blue-green equivalence tests for G1-G7 gap fixes

* chore(job): mark rock.sdk.bench.Job as deprecated; point users to rock.sdk.job.Job

G1-G7 gaps all closed and validated by blue-green equivalence tests.
Emits DeprecationWarning (stacklevel=2) from Job.__init__ so callers
see their own site. Scheduled for removal in 1.7.x.

Refs #783
…th (#785)

Harbor was using timestamp-based directory names because job_name was
excluded from the serialized YAML config. Re-inject job_name in
to_harbor_yaml() so harbor uses it as the job directory name, allowing
result collection via {jobs_dir}/{job_name} to work correctly.

Also migrate harbor_demo.py from deprecated rock.sdk.bench.Job to
rock.sdk.job.Job.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Hoist the namespace/experiment_id backfill + consistency check from
HarborTrial up into AbstractTrial.on_sandbox_ready so BashTrial (and any
future JobConfig-based trial) inherits the same behavior. The error
message uses type(self._config).__name__ instead of a hardcoded class
name. _make_mock_sandbox() in test_job.py now sets _namespace /
_experiment_id to None so the auto-mock children don't trip the default
backfill when two trials share one config.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
on_sandbox_ready is no longer a HarborTrial override — the backfill +
consistency check now lives on AbstractTrial and is shared across
HarborTrial and BashTrial.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@dengwx2009 dengwx2009 merged commit 10f94fe into master Apr 14, 2026
10 checks passed
zhongwen666 pushed a commit to zhongwen666/ROCK that referenced this pull request May 17, 2026
…aba#788) (alibaba#789)

* rename JobConfig and HarborTrialResult

* fix(job): close G1-G7 gaps between bench/job.py and rock.sdk.job (alibaba#784)

* docs(job): clarify G1 approach as union+isinstance in bench-replacement

Align docs/dev/job/bench-replacement.md with the design decision for
Task 1: AbstractTrial.collect returns TrialResult | list[TrialResult],
and Job._build_result uses isinstance to decide extend vs append.
BashTrial stays unchanged (single), HarborTrial returns list.

* chore: baseline for bench-replacement gap fixes (222 passing)

* fix(job/G1a): HarborTrial.collect returns all sub-trial results as list

* fix(job/G1b): flatten list-returning collect() into JobResult.trial_results

* refactor(job/G1): drop BaseTrialResult alias in harbor.py, use TrialResult directly

Per user request: remove the local import alias
'from rock.sdk.job.result import TrialResult as BaseTrialResult'
and use TrialResult directly. No behavior change.

* test(job/G1): add multi-trial timeout exception-injection coverage

Addresses code review I1: exception-injection loop on multi-sub-trial
timeout previously untested. New test asserts:
- every sub-trial without its own exception_info gets ProcessTimeout
- pre-existing exception_info on a sub-trial is preserved

* fix(job/G5): populate JobResult.raw_output and exit_code from sandbox obs

* fix(job/G6): write script/nohup output under USER_DEFINED_LOGS, not /tmp

* fix(job/G7): sync HarborJobConfig.auto_stop with environment.auto_stop

OR semantics: if either the top-level HarborJobConfig.auto_stop or the
nested environment.auto_stop is True, both become True. Preserves legacy
'environment.auto_stop=True' usage while letting new code set auto_stop
at the top level.

* fix(job/G3): HarborJobConfig auto-generates job_name from dataset/task + uuid

* fix(job/G2): HarborJobConfig derives effective timeout from agent + multiplier

* fix(job/G4): AbstractTrial.on_sandbox_ready hook + HarborTrial backfills ns/exp_id

* test(job): blue-green equivalence tests for G1-G7 gap fixes

* chore(job): mark rock.sdk.bench.Job as deprecated; point users to rock.sdk.job.Job

G1-G7 gaps all closed and validated by blue-green equivalence tests.
Emits DeprecationWarning (stacklevel=2) from Job.__init__ so callers
see their own site. Scheduled for removal in 1.7.x.

Refs alibaba#783

* fix(job): pass job_name to harbor YAML and migrate demo to new Job path (alibaba#785)

Harbor was using timestamp-based directory names because job_name was
excluded from the serialized YAML config. Re-inject job_name in
to_harbor_yaml() so harbor uses it as the job directory name, allowing
result collection via {jobs_dir}/{job_name} to work correctly.

Also migrate harbor_demo.py from deprecated rock.sdk.bench.Job to
rock.sdk.job.Job.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(job): move on_sandbox_ready backfill to AbstractTrial

Hoist the namespace/experiment_id backfill + consistency check from
HarborTrial up into AbstractTrial.on_sandbox_ready so BashTrial (and any
future JobConfig-based trial) inherits the same behavior. The error
message uses type(self._config).__name__ instead of a hardcoded class
name. _make_mock_sandbox() in test_job.py now sets _namespace /
_experiment_id to None so the auto-mock children don't trip the default
backfill when two trials share one config.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs(job): update G4 description after hoisting on_sandbox_ready

on_sandbox_ready is no longer a HarborTrial override — the backfill +
consistency check now lives on AbstractTrial and is shared across
HarborTrial and BashTrial.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: ShixinPeng <58160712+BCeZn@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Refactor] Hoist Trial.on_sandbox_ready backfill to AbstractTrial

3 participants