[codex] fix vllm prefix caching defaults#263
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces changes to enable prefix caching by default in the vLLM engine command-line arguments, along with corresponding unit tests. However, the feedback highlights a critical correctness issue where the default-on logic fails in production because the parsed arguments default to False for this flag. Additionally, the new unit tests fail to catch this because they use a mock namespace that does not accurately reflect production behavior. It is recommended to set the default value during argument parsing and update the tests accordingly.
| if getattr(args, "vllm_enable_prefix_caching", True) is not False: | ||
| cmd += ["--enable-prefix-caching"] |
There was a problem hiding this comment.
Correctness Issue: Default-on prefix caching is broken in production
In production, args is parsed using vllm_parse_args() (defined in vime/backends/vllm_utils/arguments.py), which imports vLLM's CLI arguments. Since --enable-prefix-caching is a store_true action, its default value in argparse is False. Therefore, when the user does not pass --vllm-enable-prefix-caching on the command line, args.vllm_enable_prefix_caching is populated as False (not missing/None).
As a result:
getattr(args, "vllm_enable_prefix_caching", True)returnsFalse.False is not Falseevaluates toFalse.--enable-prefix-cachingis not added to the command line, leaving prefix caching disabled by default in production.
The unit test test_build_vllm_cmd_enables_prefix_cache_by_default only passes because it uses a mock SimpleNamespace that lacks the vllm_enable_prefix_caching attribute entirely, which does not reflect real-world execution.
Suggested Fix
To fix this robustly, we should set the default value of vllm_enable_prefix_caching to True when it is not explicitly provided by the user on the CLI. This can be done in vllm_parse_args() in vime/backends/vllm_utils/arguments.py:
# In vime/backends/vllm_utils/arguments.py -> vllm_parse_args()
args, _ = parser.parse_known_args()
user_provided, raw_values = _detect_user_provided_dests(parser, sys.argv[1:])
if "vllm_enable_prefix_caching" not in user_provided:
args.vllm_enable_prefix_caching = TrueThis ensures that:
- If the user doesn't pass the flag, it defaults to
True. - If they override it via YAML
vllm_overrides,_apply_vllm_overrideswill correctly set it toFalse. - In
vllm_engine.py, we can then safely use a simple truthiness check.
| if getattr(args, "vllm_enable_prefix_caching", True) is not False: | |
| cmd += ["--enable-prefix-caching"] | |
| if getattr(args, "vllm_enable_prefix_caching", True): | |
| cmd += ["--enable-prefix-caching"] |
| def test_build_vllm_cmd_enables_prefix_cache_by_default(vllm_args): | ||
| server_args = mod._compute_server_args(vllm_args, rank=0, dist_init_addr=None, host="127.0.0.1", port=8000) | ||
|
|
||
| cmd, _ = mod.build_vllm_cmd_and_env(server_args) | ||
|
|
||
| assert "--enable-prefix-caching" in cmd |
There was a problem hiding this comment.
Test Gap: Mock does not reflect production behavior
This test passes because vllm_args is a mock SimpleNamespace that lacks the vllm_enable_prefix_caching attribute. However, in production, args is parsed via vllm_parse_args(), which populates vllm_enable_prefix_caching as False by default (due to store_true action behavior).
To prevent such false positives in the future, consider updating the test fixture or adding a test case that uses parsed arguments (e.g., by calling vllm_parse_args() or simulating a parsed namespace where vllm_enable_prefix_caching is False but not explicitly provided by the user).
What changed
Why
The vLLM subprocess was being launched without
--enable-prefix-cachingon the default path, which let the engine start with APC disabled and produced 0% hit rate in multi-turn rollout.Validation
PYTHONPATH=/home/aoshen/vime/projects/vime_modal_sandbox/vime-arguments-fix-clean pytest -q tests/utils/test_vllm_engine.py -q