Skip to content

Releases: alibaba/ROCK

v1.9.4

24 Jun 12:41

Choose a tag to compare

What's Changed

  • feat: make startup_timeout configurable via YAML / Nacos / SDK by @zhangjaycee in #1156
  • chore(release/v1.9): backport 6 commits from master (archive cleanup, admin guard, fiber overcut, server-first OSS SDK) by @jinbai340997 in #1158
  • feat(deploy): enforce disk quota on anonymous volumes and clean up on removal by @zhangjaycee in #1159
  • update rock version to 1.9.4 by @jinbai340997 in #1160

Full Changelog: v1.9.3...v1.9.4

v1.9.1

16 Jun 12:44
bc89d22

Choose a tag to compare

What's Changed

Full Changelog: v1.9.0...v1.9.1

v1.9.0

11 Jun 11:19

Choose a tag to compare

What's Changed

  • fix(sdk): drop wget -c so OSS upload overwrites existing target_path by @jinbai340997 in #992
  • docs(v1.8.0): remove internal dingtalk links from release note by @jinbai340997 in #996
  • feat(k8s): Support disk quota limits for K8s operator sandbox by @Generalwin in #994
  • fix(admin): retry SandboxTable ops once on stale connection after DB restart by @zhangjaycee in #987
  • feat(config): support _base inheritance and deep merge in RockConfig by @jinbai340997 in #1003
  • refactor(sandbox): introduce SandboxStateMachine for lifecycle management by @zhangjaycee in #988
  • fix(rocklet): add per-disk usage monitoring for rootfs, log, and kat… by @jake11-oho in #983
  • feat(datasets): make 'rock datasets list' fast on cross-region OSS (#1010) by @xdlkc in #1011
  • fix(sandbox): preserve stop reason in stop() lost in #988 FSM refactor by @zhangjaycee in #1021
  • refactor(deployments): split docker run into docker create + docker start -a by @zhangjaycee in #1012
  • fix(rocklet): use cgroup metrics for container memory instead of psutil by @jake11-oho in #1017
  • docs(v1.8.x): add sandbox concurrent-creation benchmark report & scheduler guide (#1035) by @zhongwen666 in #1036
  • chore(ci): run admin+network tests only on push, skip on PRs by @BCeZn in #1040
  • Feature/add tracking config into sdk job config by @Dengsheng-wzh in #999
  • feat(deployments): share docker rootfs XFS prjid with sandbox log dir by @zhangjaycee in #1013
  • refactor(meta-store): add Redis-merge semantics to archive and filter alive-key fields by @zhangjaycee in #1037
  • feature(sandbox): support sandbox restart by @zhangjaycee in #1001
  • Feat/sdk support gpu params by @zhongwen666 in #1047
  • fix(sandbox): always write stop_time on stop() even when start_time absent by @jinbai340997 in #1020
  • feat(scheduler): RayLogCleanupTask 4-stage cleanup of session_latest/logs by @jinbai340997 in #1029
  • feature(cli): add rock storage get to download archived sandbox logs from OSS by @jinbai340997 in #962
  • docs(v1.8.0): wrap PR numbers in release notes with GitHub links by @jinbai340997 in #1041
  • fix(scheduler): split ImageCleanupTask prune (idempotent) from docuum launch by @jinbai340997 in #1023
  • feat(scheduler): DB-driven SandboxLogArchiveTask, drop sentinel file design by @jinbai340997 in #1025
  • fix(sdk): sanitize generated Harbor job names by @xdlkc in #1031
  • feat(admin): add parameter validation for API endpoints by @jake11-oho in #985
  • refactor(sandbox): make start() delegate to start_async() to fix missing meta store write by @zhangjaycee in #1051
  • perf(scheduler): switch FileCleanupTask to find -delete and add minimal path safety guards by @jinbai340997 in #967
  • docs: refresh README Updates with v1.4.0 – v1.8.0 minor-release lineup by @jinbai340997 in #1034
  • feat(admin): ops-jobs API with DB-persisted state, multi-pod safe by @jinbai340997 in #1027
  • fix(rocklet): UploadResponse returns success=false after successful upload by @jinbai340997 in #1060
  • feat(sandbox): add /delete endpoint + STOPPED → DELETED for --rm containers by @zhangjaycee in #1038
  • fix(sandbox): handle actor not found in RayOperator.get_status() by @zhangjaycee in #1062
  • fix(rocklet): pass sandbox_id in body to /execute and /read_file (#1065) by @jake11-oho in #1066
  • fix(sandbox): handle K8s CRD not found in K8sOperator.get_status() by @zhangjaycee in #1068
  • Fix exclude_dirs not working for empty directory cleanup by @jinbai340997 in #1072
  • Fix docuum not restarting due to PID/TID reuse in check_pid_exists by @jinbai340997 in #1074

Full Changelog: v1.8.0...v1.9.0

v1.8.3

01 Jun 08:06
194f02b

Choose a tag to compare

Full Changelog: v1.8.1...v1.8.3

v1.8.1

22 May 16:13

Choose a tag to compare

What's Changed

  • fix(config): support _base inheritance and deep merge in RockConfig #1003

v1.8.0

21 May 09:45

Choose a tag to compare

chore: remove the need_database marker by @zhangjaycee in #901
Bump version to 1.7.1 by @zhangjaycee in #909
Fix sandbox _get_user_info metrics by @zhangjaycee in #911
fix(metrics): read sandbox image from meta_store instead of in-memory dict by @zhangjaycee in #913
feat(scheduler): add dynamic config reloading via Nacos #888 by @zhongwen666 in #889
fix(metrics): pass rock_config to SandboxTable/SandboxMetaStore so MetricsMonitor uses the correct metrics endpoint by @zhangjaycee in #919
docs(1.7.x): clarify install-agent vs Job, reorganize examples/ (#925) by @BCeZn in #926
hotfix the v1.7.x release metrics performance issue by reverting the recent metrics changes by @zhangjaycee in #927
feat(cli): add -v verbosity control and unify log level management by @berstpander in #907
feat: add startup timing instrumentation for sandbox launch stages by @jake11-oho in #924
fix(rocklet): mount loop disk to docker data-root instead of hardcode… by @jake11-oho in #933
feat(model-service): proxy supports stream + replay, byte passthrough, ForwardBackend/ReplayBackend by @BCeZn in #935
feat(rocklet): add Windows PowerShell support #921 by @zhongwen666 in #922
fix(rocklet): symlink mount into /bin for nix images with kata runtime by @jake11-oho in #936
feat(job): integrate in-sandbox model-service proxy for record/replay by @BCeZn in #938
fix(sdk): mkdir target parent dir before wget in OSS upload path by @BCeZn in #940
feat(metrics): pass rock_config for correct endpoint, add export/logging visibility, and reduce metastore metric points by @zhangjaycee in #920
feat(docker): cleanup XFS project quota on container stop by @zhangjaycee in #941
fix: handle exception caused by ray.init during ray reconnecting back… by @Dengsheng-wzh in #905
[REFACTOR] OSS 上传/下载从客户端 env vars 解耦;3 层配置解析机制 (#943) by @BCeZn in #949
chore(sdk): make sandbox cluster default configurable via env var by @Issac-Newton in #948
docs(scheduler): add scheduler user guide for v1.7.x #974 by @zhongwen666 in #975
feat(oss): add dual-account STS and migrate transfer bucket to chatos-rock by @jinbai340997 in #953
fix(rocklet): use cgroup metrics for container CPU instead of psutil (#945) by @jake11-oho in #946
feature(oss): add archive command builder + OSS config fields for sandbox log archival by @jinbai340997 in #957
feature(scheduler): add RayLogCleanupTask and disable worker-to-driver log forwarding by @jinbai340997 in #971
feat(sandbox): add CPU overcommit with grayscale rollout, lifecycle summary, and absolute-cores CPU gauge #978 by @zhongwen666 in #979
feat(sandbox): support include_all_states parameter in get_status API by @zhangjaycee in #951
feat(k8s): GPU support with Jinja2 templates and extensible accelerator types #980 by @zhongwen666 in #981
feature(scheduler): add BuildCacheCleanupTask for uv/pip cache pruning by @jinbai340997 in #969
feature(scheduler): merge dangling/BuildKit prune into ImageCleanupTask by @jinbai340997 in #970
docs: add v1.8.0 release version (CN + EN) by @jinbai340997 in #990

Full Changelog: v1.8.0...v1.8.0

v1.7.0

08 May 03:51

Choose a tag to compare

What's Changed

  • fix(docker-auth): fix(docker-auth): reconstruct temporary directory authorization scheme and remove traditional scheme (#81160079) by @jinbai340997 in #837
  • docs: replace News with Updates section, trim to 5 latest versions by @dengwx2026 in #850
  • [Chore] database connection unittest and parameter optimizations by @zhangjaycee in #852
  • feat(datasets): add datasets SDK and CLI commands for OSS registry by @dengwx2026 in #859
  • fix(test): clean up leaked timers in model client tests by @guoj14 in #839
  • feat(sandbox): enforce container rootfs disk limit via Docker storage-opt by @zhangjaycee in #860
  • fix(uv-env): copy project to writable dir before install in container by @jinbai340997 in #857
  • fix(proxy): forward whitelisted headers in WebSocket proxy upstream handshake (#865) by @Issac-Newton in #866
  • feat(datasets): add tasks subcommand with file task support and improved output by @berstpander in #875
  • feat(docker): mount host zoneinfo for container timezone (#863) by @Issac-Newton in #877
  • fix(sdk): fix auto_clear_time conversion and wait_interval boundary #882 by @zhongwen666 in #883
  • fix: preserve non-JSON request body in http_proxy endpoint by @Dengsheng-wzh in #881
  • Support meta store and database operation metrics by @zhangjaycee in #887
  • fix(proxy): block cookie header in WebSocket forwarding by @Issac-Newton in #894
  • update version by @Dengsheng-wzh in #893
  • docs: add v1.7.0 release notes (#896) by @Issac-Newton in #897

New Contributors

Full Changelog: v1.6.0...v1.7.0

v1.6.0

20 Apr 10:26

Choose a tag to compare

What's Changed

  • fix missing redis provider in k8s #764 by @Generalwin in #765
  • fix arun normal mode #767 by @FangwenDave in #768
  • Feature/add bash example by @StephenRi in #772
  • Bump master version to 1.5.1 and update 1.5.1 release note by @zhangjaycee in #774
  • [FEATURE] Refactor Job module: Job/Operator/Executor/Trial abstraction (#779) by @dengwx2026 in #780
  • fix: add SELECT 1 readiness check to pg_container fixture by @zhangjaycee in #778
  • refactor(job): hoist on_sandbox_ready backfill to AbstractTrial (#788) by @dengwx2026 in #789
  • feat: add TemplateConfig and template field to NativeConfig by @sanfeng-lhh in #786
  • feat: truncate path segments in auto-generated job_name by @berstpander in #791
  • fix: enlarge SandboxRecord.image to 512 and disable asyncpg statement cache by @zhangjaycee in #794
  • Feature/xinshi/harbor jobs by @BCeZn in #798
  • refactor(envhub): move JobEnvironmentConfig to envhub as EnvironmentC… by @BCeZn in #800
  • Refactor/envhub uploads by @BCeZn in #802
  • feature: add claw-eval bash job demo by @BCeZn in #804
  • fix: increase default JobConfig timeout from 3600s to 7200s by @BCeZn in #806
  • fix(job): update timeout sentinel and tests after default increased to 7200s by @BCeZn in #810
  • fix(job): BashTrial.collect should populate raw_output and exit_code by @BCeZn in #808
  • chore: apply ruff format and translate Chinese comments to English by @BCeZn in #812
  • feat(job): auto-detect job type from YAML via strict model validation by @BCeZn in #814
  • refactor(cli): rework rock job run for dual-mode input with strict validation by @dengwx2026 in #818
  • refactor: remove auto_stop parameter from EnvironmentConfig by @BCeZn in #820
  • fix(job): JobConfig.experiment_id takes priority over environment.experiment_id (#821) by @dengwx2026 in #822
  • chore: bump version to 1.6.0 by @dengwx2026 in #827
  • docs: update README latest updates table for v1.5.1 and v1.4.7 by @dengwx2026 in #829
  • feat(job): BashJob OSS Mirror — artifact upload after job completion by @BCeZn in #823
  • fix(job): remove redundant shebang and set -e from BashTrial.build() by @BCeZn in #816
  • fix(cli): lazy import psutil in admin stop command by @dengwx2026 in #831
  • feat(ci): add unit test workflow for TS SDK by @guoj14 in #796
  • fix(job): sync experiment_id to environment when environment.experiment_id is None by @dengwx2026 in #835
  • docs: add v1.6.0 release notes and register 1.6.x docs version (#840) by @Issac-Newton in #841
  • fix iflow agent version by @Issac-Newton in #844

Full Changelog: v1.5.0...v1.6.0

v1.5.1

13 Apr 07:18
d76bf55

Choose a tag to compare

What's Changed

Full Changelog: v1.5.0...v1.5.1

v1.5.0

13 Apr 07:15
c96c1b9

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v1.4.0...v1.5.0