Releases: alibaba/ROCK
v1.9.4
What's Changed
- feat: make startup_timeout configurable via YAML / Nacos / SDK by @zhangjaycee in #1156
- chore(release/v1.9): backport 6 commits from master (archive cleanup, admin guard, fiber overcut, server-first OSS SDK) by @jinbai340997 in #1158
- feat(deploy): enforce disk quota on anonymous volumes and clean up on removal by @zhangjaycee in #1159
- update rock version to 1.9.4 by @jinbai340997 in #1160
Full Changelog: v1.9.3...v1.9.4
v1.9.1
What's Changed
- feat(k8s): support CPU overcommit via separate limit_cpus (#1113) by @zhongwen666 in #1114
Full Changelog: v1.9.0...v1.9.1
v1.9.0
What's Changed
- fix(sdk): drop wget -c so OSS upload overwrites existing target_path by @jinbai340997 in #992
- docs(v1.8.0): remove internal dingtalk links from release note by @jinbai340997 in #996
- feat(k8s): Support disk quota limits for K8s operator sandbox by @Generalwin in #994
- fix(admin): retry SandboxTable ops once on stale connection after DB restart by @zhangjaycee in #987
- feat(config): support
_baseinheritance and deep merge in RockConfig by @jinbai340997 in #1003 - refactor(sandbox): introduce SandboxStateMachine for lifecycle management by @zhangjaycee in #988
- fix(rocklet): add per-disk usage monitoring for rootfs, log, and kat… by @jake11-oho in #983
- feat(datasets): make 'rock datasets list' fast on cross-region OSS (#1010) by @xdlkc in #1011
- fix(sandbox): preserve stop reason in stop() lost in #988 FSM refactor by @zhangjaycee in #1021
- refactor(deployments): split
docker runintodocker create+docker start -aby @zhangjaycee in #1012 - fix(rocklet): use cgroup metrics for container memory instead of psutil by @jake11-oho in #1017
- docs(v1.8.x): add sandbox concurrent-creation benchmark report & scheduler guide (#1035) by @zhongwen666 in #1036
- chore(ci): run admin+network tests only on push, skip on PRs by @BCeZn in #1040
- Feature/add tracking config into sdk job config by @Dengsheng-wzh in #999
- feat(deployments): share docker rootfs XFS prjid with sandbox log dir by @zhangjaycee in #1013
- refactor(meta-store): add Redis-merge semantics to archive and filter alive-key fields by @zhangjaycee in #1037
- feature(sandbox): support sandbox restart by @zhangjaycee in #1001
- Feat/sdk support gpu params by @zhongwen666 in #1047
- fix(sandbox): always write stop_time on stop() even when start_time absent by @jinbai340997 in #1020
- feat(scheduler): RayLogCleanupTask 4-stage cleanup of session_latest/logs by @jinbai340997 in #1029
- feature(cli): add
rock storage getto download archived sandbox logs from OSS by @jinbai340997 in #962 - docs(v1.8.0): wrap PR numbers in release notes with GitHub links by @jinbai340997 in #1041
- fix(scheduler): split ImageCleanupTask prune (idempotent) from docuum launch by @jinbai340997 in #1023
- feat(scheduler): DB-driven SandboxLogArchiveTask, drop sentinel file design by @jinbai340997 in #1025
- fix(sdk): sanitize generated Harbor job names by @xdlkc in #1031
- feat(admin): add parameter validation for API endpoints by @jake11-oho in #985
- refactor(sandbox): make start() delegate to start_async() to fix missing meta store write by @zhangjaycee in #1051
- perf(scheduler): switch FileCleanupTask to find -delete and add minimal path safety guards by @jinbai340997 in #967
- docs: refresh README Updates with v1.4.0 – v1.8.0 minor-release lineup by @jinbai340997 in #1034
- feat(admin): ops-jobs API with DB-persisted state, multi-pod safe by @jinbai340997 in #1027
- fix(rocklet): UploadResponse returns success=false after successful upload by @jinbai340997 in #1060
- feat(sandbox): add /delete endpoint + STOPPED → DELETED for --rm containers by @zhangjaycee in #1038
- fix(sandbox): handle actor not found in RayOperator.get_status() by @zhangjaycee in #1062
- fix(rocklet): pass sandbox_id in body to /execute and /read_file (#1065) by @jake11-oho in #1066
- fix(sandbox): handle K8s CRD not found in K8sOperator.get_status() by @zhangjaycee in #1068
- Fix exclude_dirs not working for empty directory cleanup by @jinbai340997 in #1072
- Fix docuum not restarting due to PID/TID reuse in check_pid_exists by @jinbai340997 in #1074
Full Changelog: v1.8.0...v1.9.0
v1.8.3
Full Changelog: v1.8.1...v1.8.3
v1.8.1
v1.8.0
chore: remove the need_database marker by @zhangjaycee in #901
Bump version to 1.7.1 by @zhangjaycee in #909
Fix sandbox _get_user_info metrics by @zhangjaycee in #911
fix(metrics): read sandbox image from meta_store instead of in-memory dict by @zhangjaycee in #913
feat(scheduler): add dynamic config reloading via Nacos #888 by @zhongwen666 in #889
fix(metrics): pass rock_config to SandboxTable/SandboxMetaStore so MetricsMonitor uses the correct metrics endpoint by @zhangjaycee in #919
docs(1.7.x): clarify install-agent vs Job, reorganize examples/ (#925) by @BCeZn in #926
hotfix the v1.7.x release metrics performance issue by reverting the recent metrics changes by @zhangjaycee in #927
feat(cli): add -v verbosity control and unify log level management by @berstpander in #907
feat: add startup timing instrumentation for sandbox launch stages by @jake11-oho in #924
fix(rocklet): mount loop disk to docker data-root instead of hardcode… by @jake11-oho in #933
feat(model-service): proxy supports stream + replay, byte passthrough, ForwardBackend/ReplayBackend by @BCeZn in #935
feat(rocklet): add Windows PowerShell support #921 by @zhongwen666 in #922
fix(rocklet): symlink mount into /bin for nix images with kata runtime by @jake11-oho in #936
feat(job): integrate in-sandbox model-service proxy for record/replay by @BCeZn in #938
fix(sdk): mkdir target parent dir before wget in OSS upload path by @BCeZn in #940
feat(metrics): pass rock_config for correct endpoint, add export/logging visibility, and reduce metastore metric points by @zhangjaycee in #920
feat(docker): cleanup XFS project quota on container stop by @zhangjaycee in #941
fix: handle exception caused by ray.init during ray reconnecting back… by @Dengsheng-wzh in #905
[REFACTOR] OSS 上传/下载从客户端 env vars 解耦;3 层配置解析机制 (#943) by @BCeZn in #949
chore(sdk): make sandbox cluster default configurable via env var by @Issac-Newton in #948
docs(scheduler): add scheduler user guide for v1.7.x #974 by @zhongwen666 in #975
feat(oss): add dual-account STS and migrate transfer bucket to chatos-rock by @jinbai340997 in #953
fix(rocklet): use cgroup metrics for container CPU instead of psutil (#945) by @jake11-oho in #946
feature(oss): add archive command builder + OSS config fields for sandbox log archival by @jinbai340997 in #957
feature(scheduler): add RayLogCleanupTask and disable worker-to-driver log forwarding by @jinbai340997 in #971
feat(sandbox): add CPU overcommit with grayscale rollout, lifecycle summary, and absolute-cores CPU gauge #978 by @zhongwen666 in #979
feat(sandbox): support include_all_states parameter in get_status API by @zhangjaycee in #951
feat(k8s): GPU support with Jinja2 templates and extensible accelerator types #980 by @zhongwen666 in #981
feature(scheduler): add BuildCacheCleanupTask for uv/pip cache pruning by @jinbai340997 in #969
feature(scheduler): merge dangling/BuildKit prune into ImageCleanupTask by @jinbai340997 in #970
docs: add v1.8.0 release version (CN + EN) by @jinbai340997 in #990
Full Changelog: v1.8.0...v1.8.0
v1.7.0
What's Changed
- fix(docker-auth): fix(docker-auth): reconstruct temporary directory authorization scheme and remove traditional scheme (#81160079) by @jinbai340997 in #837
- docs: replace News with Updates section, trim to 5 latest versions by @dengwx2026 in #850
- [Chore] database connection unittest and parameter optimizations by @zhangjaycee in #852
- feat(datasets): add datasets SDK and CLI commands for OSS registry by @dengwx2026 in #859
- fix(test): clean up leaked timers in model client tests by @guoj14 in #839
- feat(sandbox): enforce container rootfs disk limit via Docker storage-opt by @zhangjaycee in #860
- fix(uv-env): copy project to writable dir before install in container by @jinbai340997 in #857
- fix(proxy): forward whitelisted headers in WebSocket proxy upstream handshake (#865) by @Issac-Newton in #866
- feat(datasets): add tasks subcommand with file task support and improved output by @berstpander in #875
- feat(docker): mount host zoneinfo for container timezone (#863) by @Issac-Newton in #877
- fix(sdk): fix auto_clear_time conversion and wait_interval boundary #882 by @zhongwen666 in #883
- fix: preserve non-JSON request body in http_proxy endpoint by @Dengsheng-wzh in #881
- Support meta store and database operation metrics by @zhangjaycee in #887
- fix(proxy): block cookie header in WebSocket forwarding by @Issac-Newton in #894
- update version by @Dengsheng-wzh in #893
- docs: add v1.7.0 release notes (#896) by @Issac-Newton in #897
New Contributors
- @jinbai340997 made their first contribution in #837
Full Changelog: v1.6.0...v1.7.0
v1.6.0
What's Changed
- fix missing redis provider in k8s #764 by @Generalwin in #765
- fix arun normal mode #767 by @FangwenDave in #768
- Feature/add bash example by @StephenRi in #772
- Bump master version to 1.5.1 and update 1.5.1 release note by @zhangjaycee in #774
- [FEATURE] Refactor Job module: Job/Operator/Executor/Trial abstraction (#779) by @dengwx2026 in #780
- fix: add SELECT 1 readiness check to pg_container fixture by @zhangjaycee in #778
- refactor(job): hoist on_sandbox_ready backfill to AbstractTrial (#788) by @dengwx2026 in #789
- feat: add TemplateConfig and template field to NativeConfig by @sanfeng-lhh in #786
- feat: truncate path segments in auto-generated job_name by @berstpander in #791
- fix: enlarge SandboxRecord.image to 512 and disable asyncpg statement cache by @zhangjaycee in #794
- Feature/xinshi/harbor jobs by @BCeZn in #798
- refactor(envhub): move JobEnvironmentConfig to envhub as EnvironmentC… by @BCeZn in #800
- Refactor/envhub uploads by @BCeZn in #802
- feature: add claw-eval bash job demo by @BCeZn in #804
- fix: increase default JobConfig timeout from 3600s to 7200s by @BCeZn in #806
- fix(job): update timeout sentinel and tests after default increased to 7200s by @BCeZn in #810
- fix(job): BashTrial.collect should populate raw_output and exit_code by @BCeZn in #808
- chore: apply ruff format and translate Chinese comments to English by @BCeZn in #812
- feat(job): auto-detect job type from YAML via strict model validation by @BCeZn in #814
- refactor(cli): rework
rock job runfor dual-mode input with strict validation by @dengwx2026 in #818 - refactor: remove auto_stop parameter from EnvironmentConfig by @BCeZn in #820
- fix(job): JobConfig.experiment_id takes priority over environment.experiment_id (#821) by @dengwx2026 in #822
- chore: bump version to 1.6.0 by @dengwx2026 in #827
- docs: update README latest updates table for v1.5.1 and v1.4.7 by @dengwx2026 in #829
- feat(job): BashJob OSS Mirror — artifact upload after job completion by @BCeZn in #823
- fix(job): remove redundant shebang and set -e from BashTrial.build() by @BCeZn in #816
- fix(cli): lazy import psutil in admin stop command by @dengwx2026 in #831
- feat(ci): add unit test workflow for TS SDK by @guoj14 in #796
- fix(job): sync experiment_id to environment when environment.experiment_id is None by @dengwx2026 in #835
- docs: add v1.6.0 release notes and register 1.6.x docs version (#840) by @Issac-Newton in #841
- fix iflow agent version by @Issac-Newton in #844
Full Changelog: v1.5.0...v1.6.0
v1.5.1
What's Changed
- [1.5] fix missing redis provider in k8s by @zhangjaycee in #766
- Bump version to 1.5.1 by @zhangjaycee in #769
Full Changelog: v1.5.0...v1.5.1
v1.5.0
What's Changed
- feat: add hostname to metrics tag #636 by @FangwenDave in #637
- chore: add release notes for version 1.4.1 and bump version number #638 by @FangwenDave in #639
- feat: support pool and template mapping by @Generalwin in #635
- upload to specified oss by @zhongwen666 in #642
- Doc/v1.4.2 by @zhongwen666 in #646
- bugfix: correct memory size error message in sandbox manager #647 by @FangwenDave in #648
- add proxy interface by @zhongwen666 in #655
- feat: move pool config from config file to nacos #659 by @Generalwin in #656
- Doc/v1.4.3 by @zhongwen666 in #661
- add image pull task by @zhongwen666 in #668
- The image cleanup task supports pinned images. by @zhongwen666 in #674
- Fix/increas docker check timeout by @zhongwen666 in #678
- Doc/version 1.4.4 by @zhongwen666 in #680
- feat(proxy): websocket proxy support user-specified port; http proxy support all methods by @xdlkc in #666
- Fix/clear deprecated by @zhongwen666 in #687
- support Agent Run by @BCeZn in #681
- feat: support configurable Ray temp directory (#693) by @dengwx2026 in #694
- refactor: move constants and remove unused config fields by @BCeZn in #692
- fix: change Ray temp_dir to .tmp/ray (#695) by @dengwx2026 in #696
- fix tests for can't run by @BCeZn in #700
- refactor: add user defined logs path and ensure directory creation by @BCeZn in #702
- docs: update Latest Updates section with version history (#705) by @dengwx2026 in #706
- Refactor/xinshi/job envs by @BCeZn in #704
- Added TypeScript SDK by @Timandes in #492
- refactor: adjust sandbox resource limits in conftest.py by @BCeZn in #710
- Refactor/xinshi/oss env for harbor by @BCeZn in #715
- feat(sdk): Job OSS artifact mirror (OssMirrorConfig) (#707) by @xdlkc in #708
- feat: validate and sync experiment_id/namespace in JobConfig (#716) by @dengwx2026 in #717
- feat: rock agent support verifier native mode by @sanfeng-lhh in #722
- kata runtime support dind by @zhongwen666 in #731
- feat: 恢复CI请求触发工作流配置 by @guoj14 in #728
- feat: add oss_deps field to EnvironmentConfig by @BCeZn in #734
- fix: pin langgraph-prebuilt to 1.0.8 to fix CI error by @zhangjaycee in #745
- fix: pin docusaurus/theme-common version to fix npm build of docs by @zhangjaycee in #747
- Support persist sandbox metadaba to database by @zhangjaycee in #730
- feat: refactor k8s api client informer #712 by @Generalwin in #744
- fix sandbox info and redis info #712 by @Generalwin in #743
- feat: add skills for docs by @kkkky123 in #719
- add v1.4.8 docs by @Issac-Newton in #738
- docs: rename add-doc-version skill to rock-docs by @kkkky123 in #752
- Feature/xinshi/container verify by @BCeZn in #755
- feat: add labels support to JobConfig (#720) by @xdlkc in #721
- feat: auto-generate job_name from dataset and task info by @berstpander in #757
- add auto_delete_seconds in SandboxConfig by @Issac-Newton in #599
- Release note v1.5.0 by @zhangjaycee in #760
New Contributors
- @sanfeng-lhh made their first contribution in #722
- @berstpander made their first contribution in #757
Full Changelog: v1.4.0...v1.5.0