refactor(job): hoist on_sandbox_ready backfill to AbstractTrial (#788)#789
Merged
Conversation
* docs(job): clarify G1 approach as union+isinstance in bench-replacement Align docs/dev/job/bench-replacement.md with the design decision for Task 1: AbstractTrial.collect returns TrialResult | list[TrialResult], and Job._build_result uses isinstance to decide extend vs append. BashTrial stays unchanged (single), HarborTrial returns list. * chore: baseline for bench-replacement gap fixes (222 passing) * fix(job/G1a): HarborTrial.collect returns all sub-trial results as list * fix(job/G1b): flatten list-returning collect() into JobResult.trial_results * refactor(job/G1): drop BaseTrialResult alias in harbor.py, use TrialResult directly Per user request: remove the local import alias 'from rock.sdk.job.result import TrialResult as BaseTrialResult' and use TrialResult directly. No behavior change. * test(job/G1): add multi-trial timeout exception-injection coverage Addresses code review I1: exception-injection loop on multi-sub-trial timeout previously untested. New test asserts: - every sub-trial without its own exception_info gets ProcessTimeout - pre-existing exception_info on a sub-trial is preserved * fix(job/G5): populate JobResult.raw_output and exit_code from sandbox obs * fix(job/G6): write script/nohup output under USER_DEFINED_LOGS, not /tmp * fix(job/G7): sync HarborJobConfig.auto_stop with environment.auto_stop OR semantics: if either the top-level HarborJobConfig.auto_stop or the nested environment.auto_stop is True, both become True. Preserves legacy 'environment.auto_stop=True' usage while letting new code set auto_stop at the top level. * fix(job/G3): HarborJobConfig auto-generates job_name from dataset/task + uuid * fix(job/G2): HarborJobConfig derives effective timeout from agent + multiplier * fix(job/G4): AbstractTrial.on_sandbox_ready hook + HarborTrial backfills ns/exp_id * test(job): blue-green equivalence tests for G1-G7 gap fixes * chore(job): mark rock.sdk.bench.Job as deprecated; point users to rock.sdk.job.Job G1-G7 gaps all closed and validated by blue-green equivalence tests. Emits DeprecationWarning (stacklevel=2) from Job.__init__ so callers see their own site. Scheduled for removal in 1.7.x. Refs #783
…th (#785) Harbor was using timestamp-based directory names because job_name was excluded from the serialized YAML config. Re-inject job_name in to_harbor_yaml() so harbor uses it as the job directory name, allowing result collection via {jobs_dir}/{job_name} to work correctly. Also migrate harbor_demo.py from deprecated rock.sdk.bench.Job to rock.sdk.job.Job. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Hoist the namespace/experiment_id backfill + consistency check from HarborTrial up into AbstractTrial.on_sandbox_ready so BashTrial (and any future JobConfig-based trial) inherits the same behavior. The error message uses type(self._config).__name__ instead of a hardcoded class name. _make_mock_sandbox() in test_job.py now sets _namespace / _experiment_id to None so the auto-mock children don't trip the default backfill when two trials share one config. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
on_sandbox_ready is no longer a HarborTrial override — the backfill + consistency check now lives on AbstractTrial and is shared across HarborTrial and BashTrial. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
dengwx2009
approved these changes
Apr 14, 2026
zhongwen666
pushed a commit
to zhongwen666/ROCK
that referenced
this pull request
May 17, 2026
…aba#788) (alibaba#789) * rename JobConfig and HarborTrialResult * fix(job): close G1-G7 gaps between bench/job.py and rock.sdk.job (alibaba#784) * docs(job): clarify G1 approach as union+isinstance in bench-replacement Align docs/dev/job/bench-replacement.md with the design decision for Task 1: AbstractTrial.collect returns TrialResult | list[TrialResult], and Job._build_result uses isinstance to decide extend vs append. BashTrial stays unchanged (single), HarborTrial returns list. * chore: baseline for bench-replacement gap fixes (222 passing) * fix(job/G1a): HarborTrial.collect returns all sub-trial results as list * fix(job/G1b): flatten list-returning collect() into JobResult.trial_results * refactor(job/G1): drop BaseTrialResult alias in harbor.py, use TrialResult directly Per user request: remove the local import alias 'from rock.sdk.job.result import TrialResult as BaseTrialResult' and use TrialResult directly. No behavior change. * test(job/G1): add multi-trial timeout exception-injection coverage Addresses code review I1: exception-injection loop on multi-sub-trial timeout previously untested. New test asserts: - every sub-trial without its own exception_info gets ProcessTimeout - pre-existing exception_info on a sub-trial is preserved * fix(job/G5): populate JobResult.raw_output and exit_code from sandbox obs * fix(job/G6): write script/nohup output under USER_DEFINED_LOGS, not /tmp * fix(job/G7): sync HarborJobConfig.auto_stop with environment.auto_stop OR semantics: if either the top-level HarborJobConfig.auto_stop or the nested environment.auto_stop is True, both become True. Preserves legacy 'environment.auto_stop=True' usage while letting new code set auto_stop at the top level. * fix(job/G3): HarborJobConfig auto-generates job_name from dataset/task + uuid * fix(job/G2): HarborJobConfig derives effective timeout from agent + multiplier * fix(job/G4): AbstractTrial.on_sandbox_ready hook + HarborTrial backfills ns/exp_id * test(job): blue-green equivalence tests for G1-G7 gap fixes * chore(job): mark rock.sdk.bench.Job as deprecated; point users to rock.sdk.job.Job G1-G7 gaps all closed and validated by blue-green equivalence tests. Emits DeprecationWarning (stacklevel=2) from Job.__init__ so callers see their own site. Scheduled for removal in 1.7.x. Refs alibaba#783 * fix(job): pass job_name to harbor YAML and migrate demo to new Job path (alibaba#785) Harbor was using timestamp-based directory names because job_name was excluded from the serialized YAML config. Re-inject job_name in to_harbor_yaml() so harbor uses it as the job directory name, allowing result collection via {jobs_dir}/{job_name} to work correctly. Also migrate harbor_demo.py from deprecated rock.sdk.bench.Job to rock.sdk.job.Job. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(job): move on_sandbox_ready backfill to AbstractTrial Hoist the namespace/experiment_id backfill + consistency check from HarborTrial up into AbstractTrial.on_sandbox_ready so BashTrial (and any future JobConfig-based trial) inherits the same behavior. The error message uses type(self._config).__name__ instead of a hardcoded class name. _make_mock_sandbox() in test_job.py now sets _namespace / _experiment_id to None so the auto-mock children don't trip the default backfill when two trials share one config. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs(job): update G4 description after hoisting on_sandbox_ready on_sandbox_ready is no longer a HarborTrial override — the backfill + consistency check now lives on AbstractTrial and is shared across HarborTrial and BashTrial. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: ShixinPeng <58160712+BCeZn@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
namespace/experiment_id回填 + 一致性校验从HarborTrial.on_sandbox_ready上移到AbstractTrial.on_sandbox_ready默认实现,BashTrial通过继承自动获得同样行为type(self._config).__name__代替硬编码 "HarborJobConfig"TestBashTrialOnSandboxReady3 个用例(TDD:先写失败测试,再提升逻辑到基类)docs/dev/job/bench-replacement.mdG4 描述更新fixes #788
Test plan
uv run pytest tests/unit/sdk/job/ -q— 143 passed(原 140 + BashTrial 新增 3)TestHarborTrialOnSandboxReady3 用例保持绿(无回归)TestBashTrialOnSandboxReady3 用例新增并通过_make_mock_sandbox()把_namespace/_experiment_id显式置 None,避免AsyncMock自动子 mock 触发 ScatterOperator(size=2) 场景下的假 mismatchuv run ruff check/uv run ruff format通过🤖 Generated with Claude Code