-
Notifications
You must be signed in to change notification settings - Fork 36
[codex] fix vllm prefix caching defaults #263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -400,6 +400,9 @@ def build_vllm_cmd_and_env(server_args: dict[str, Any]) -> tuple[list[str], dict | |||||||||
| if getattr(args, "use_rollout_routing_replay", False): | ||||||||||
| cmd += ["--enable-return-routed-experts"] | ||||||||||
|
|
||||||||||
| if getattr(args, "vllm_enable_prefix_caching", True) is not False: | ||||||||||
| cmd += ["--enable-prefix-caching"] | ||||||||||
|
Comment on lines
+403
to
+404
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Correctness Issue: Default-on prefix caching is broken in productionIn production, As a result:
The unit test Suggested FixTo fix this robustly, we should set the default value of # In vime/backends/vllm_utils/arguments.py -> vllm_parse_args()
args, _ = parser.parse_known_args()
user_provided, raw_values = _detect_user_provided_dests(parser, sys.argv[1:])
if "vllm_enable_prefix_caching" not in user_provided:
args.vllm_enable_prefix_caching = TrueThis ensures that:
Suggested change
|
||||||||||
|
|
||||||||||
| # gpu_memory_utilization: no vime-forced default. In colocate, training and rollout do not | ||||||||||
| # occupy the GPU simultaneously (sleep/offload cycles), so vLLM's own default is fine. A user | ||||||||||
| # value passed via --vllm-gpu-memory-utilization is auto-forwarded by _forward_vllm_cli_args. | ||||||||||
|
|
||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test Gap: Mock does not reflect production behavior
This test passes because
vllm_argsis a mockSimpleNamespacethat lacks thevllm_enable_prefix_cachingattribute. However, in production,argsis parsed viavllm_parse_args(), which populatesvllm_enable_prefix_cachingasFalseby default (due tostore_trueaction behavior).To prevent such false positives in the future, consider updating the test fixture or adding a test case that uses parsed arguments (e.g., by calling
vllm_parse_args()or simulating a parsed namespace wherevllm_enable_prefix_cachingisFalsebut not explicitly provided by the user).