Skip to content

refactor(cli): rework rock job run for dual-mode input with strict validation#818

Merged
dengwx2009 merged 16 commits into
masterfrom
feature/job_cli
Apr 15, 2026
Merged

refactor(cli): rework rock job run for dual-mode input with strict validation#818
dengwx2009 merged 16 commits into
masterfrom
feature/job_cli

Conversation

@dengwx2026

Copy link
Copy Markdown
Collaborator

Summary

  • rock job run 的 bash-flags 与 harbor-YAML 两种模式统一到同一条 mode-agnostic pipeline,抽取 _fail / _apply_overrides / _config_from_flags / _config_from_yaml 四个 helper,简化后续扩展
  • 收紧输入校验:--config--script/--script-content 互斥、--type harbor 必须带 --config、无输入时给出友好错误;sub-parser 的 --config 重命名为 --job_config 避免与顶层冲突;rock job 跳过 base_url/cluster 的 backfill 让 YAML 胜出
  • 新增 tests/unit/cli/command/test_job.py(674 行)系统覆盖 parser 行为与 dispatch 流程;新增 docs/dev/cli/README.md CLI 开发者文档

Test plan

  • uv run pytest tests/unit/cli/command/test_job.py
  • uv run pytest -m "not need_ray and not need_admin and not need_admin_and_network" --reruns 1
  • 手工验证 rock job run --help 输出包含 bash / harbor 双模式说明
  • 手工验证 rock job run(无参)给出友好错误信息
  • 手工验证 rock job run --job_config ./x.yml --script ./s.sh 命中互斥校验

fixes #817

dengwx2009 and others added 16 commits April 15, 2026 21:01
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Changes --timeout default so YAML timeout survives unless overridden, and
adds a rich RawDescriptionHelpFormatter description that spells out the
two mutually-exclusive modes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaces the old bash/harbor branching in _job_run with a unified
validate → build (YAML or flags) → apply-overrides → run pipeline.
--config is now a first-class input for both bash and harbor jobs,
with the job type auto-detected by JobConfig.from_yaml.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ollision

The top-level `rock --config <ini>` flag shared dest="config" with the
job sub-parser's --config, so `rock job run --config job.yaml` routed
the YAML path into the CLI's INI loader, which failed with
"File contains no section headers". Renaming the sub-parser's flag to
--job_config (with --job-config alias) separates the two namespaces.

Updates the design spec at docs/dev/cli/README.md to match the new flag
name, and adds a regression test asserting that top-level --config and
sub-parser --job_config stay independent.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…new suite

tests/unit/sdk/job/test_cli_job.py asserted the pre-refactor behavior
(logger.error on validation failure, old --config dest, --type default
of "bash"). Its coverage is fully subsumed by the new
tests/unit/cli/command/test_job.py, which tests the same paths against
the new parser.error() + --job_config interface. The one non-overlapping
case (unknown job_command → logger.error) is moved into TestArun in the
new file.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
load_config_from_file() unconditionally backfilled args.base_url from
~/.rock/config.ini (default http://localhost:8080). That overwrote
values coming from the job YAML because _apply_overrides() sees the
non-None args.base_url and cannot distinguish a user flag from an INI
backfill. Result: user-specified base_url in job YAML was silently
ignored — sandbox requests went to localhost:8080 instead of the
YAML-provided endpoint.

For the `job` command the YAML is the source of truth; users who want
INI defaults can still pass --base-url explicitly. Other commands
(sandbox/admin/etc.) keep the backfill behavior unchanged.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@dengwx2009 dengwx2009 merged commit 82535e8 into master Apr 15, 2026
10 checks passed
BCeZn added a commit to BCeZn/ROCK that referenced this pull request Apr 16, 2026
Remove remaining auto_stop assertions in tests/unit/cli/command/test_job.py
(added by upstream alibaba#818, missed during rebase).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
BCeZn added a commit that referenced this pull request Apr 16, 2026
* refactor: remove auto_stop parameter from EnvironmentConfig

The auto_stop flag was a conditional that controlled whether the sandbox
should close after job completion. Since leaving sandboxes running is
almost never desired, remove the parameter and always close on completion.

closes #819

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fixup! refactor: remove auto_stop parameter from EnvironmentConfig

Sandbox is not closed after job completion — lifecycle is managed by
auto_clear_seconds instead. Remove the unconditional close() calls
introduced in the previous commit.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fixup! refactor: remove auto_stop parameter from EnvironmentConfig

Remove remaining auto_stop assertions in tests/unit/cli/command/test_job.py
(added by upstream #818, missed during rebase).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
zhongwen666 pushed a commit to zhongwen666/ROCK that referenced this pull request May 17, 2026
…validation (alibaba#818)

* test(cli): scaffold rock job run parser smoke test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(cli): add _fail() helper for consistent job-run errors

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(cli): stash run sub-parser on JobCommand for parser.error reuse

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(cli): friendly error when rock job run has no input

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(cli): enforce --config vs --script mutual exclusion

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(cli): route --script/--script-content mutex through _fail

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(cli): reject --type harbor without --config; default --type to None

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(cli): extract _config_from_flags helper for bash CLI-mode

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(cli): add _config_from_yaml helper with type-consistency check

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(cli): add _apply_overrides helper (shared bash/harbor)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(cli): clarify rock job run description; --timeout default None

Changes --timeout default so YAML timeout survives unless overridden, and
adds a rich RawDescriptionHelpFormatter description that spells out the
two mutually-exclusive modes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(cli): unify rock job run flow through mode-agnostic pipeline

Replaces the old bash/harbor branching in _job_run with a unified
validate → build (YAML or flags) → apply-overrides → run pipeline.
--config is now a first-class input for both bash and harbor jobs,
with the job type auto-detected by JobConfig.from_yaml.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test(cli): assert rock job run --help lists both modes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(cli): rename sub-parser flag to --job_config to avoid top-level collision

The top-level `rock --config <ini>` flag shared dest="config" with the
job sub-parser's --config, so `rock job run --config job.yaml` routed
the YAML path into the CLI's INI loader, which failed with
"File contains no section headers". Renaming the sub-parser's flag to
--job_config (with --job-config alias) separates the two namespaces.

Updates the design spec at docs/dev/cli/README.md to match the new flag
name, and adds a regression test asserting that top-level --config and
sub-parser --job_config stay independent.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test(cli): remove superseded test_cli_job.py; cover arun dispatch in new suite

tests/unit/sdk/job/test_cli_job.py asserted the pre-refactor behavior
(logger.error on validation failure, old --config dest, --type default
of "bash"). Its coverage is fully subsumed by the new
tests/unit/cli/command/test_job.py, which tests the same paths against
the new parser.error() + --job_config interface. The one non-overlapping
case (unknown job_command → logger.error) is moved into TestArun in the
new file.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(cli): skip base_url/cluster backfill for job command so YAML wins

load_config_from_file() unconditionally backfilled args.base_url from
~/.rock/config.ini (default http://localhost:8080). That overwrote
values coming from the job YAML because _apply_overrides() sees the
non-None args.base_url and cannot distinguish a user flag from an INI
backfill. Result: user-specified base_url in job YAML was silently
ignored — sandbox requests went to localhost:8080 instead of the
YAML-provided endpoint.

For the `job` command the YAML is the source of truth; users who want
INI defaults can still pass --base-url explicitly. Other commands
(sandbox/admin/etc.) keep the backfill behavior unchanged.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: dengwx <wanxi.dengwx@alibaba-inc.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
zhongwen666 pushed a commit to zhongwen666/ROCK that referenced this pull request May 17, 2026
)

* refactor: remove auto_stop parameter from EnvironmentConfig

The auto_stop flag was a conditional that controlled whether the sandbox
should close after job completion. Since leaving sandboxes running is
almost never desired, remove the parameter and always close on completion.

closes alibaba#819

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fixup! refactor: remove auto_stop parameter from EnvironmentConfig

Sandbox is not closed after job completion — lifecycle is managed by
auto_clear_seconds instead. Remove the unconditional close() calls
introduced in the previous commit.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fixup! refactor: remove auto_stop parameter from EnvironmentConfig

Remove remaining auto_stop assertions in tests/unit/cli/command/test_job.py
(added by upstream alibaba#818, missed during rebase).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Rework rock job run CLI for dual-mode input with strict validation

2 participants