[codex] fix vllm prefix caching defaults by aoshen02 · Pull Request #263 · vllm-project/vime

aoshen02 · 2026-06-17T05:35:35Z

What changed

Restore default-on prefix caching in the vLLM launcher when the user does not explicitly disable it.
Add unit coverage for the default-on and explicit-off cases.

Why

The vLLM subprocess was being launched without --enable-prefix-caching on the default path, which let the engine start with APC disabled and produced 0% hit rate in multi-turn rollout.

Validation

PYTHONPATH=/home/aoshen/vime/projects/vime_modal_sandbox/vime-arguments-fix-clean pytest -q tests/utils/test_vllm_engine.py -q

gemini-code-assist

Code Review

This pull request introduces changes to enable prefix caching by default in the vLLM engine command-line arguments, along with corresponding unit tests. However, the feedback highlights a critical correctness issue where the default-on logic fails in production because the parsed arguments default to False for this flag. Additionally, the new unit tests fail to catch this because they use a mock namespace that does not accurately reflect production behavior. It is recommended to set the default value during argument parsing and update the tests accordingly.

gemini-code-assist · 2026-06-17T05:38:22Z

+    if getattr(args, "vllm_enable_prefix_caching", True) is not False:
+        cmd += ["--enable-prefix-caching"]


Correctness Issue: Default-on prefix caching is broken in production

In production, args is parsed using vllm_parse_args() (defined in vime/backends/vllm_utils/arguments.py), which imports vLLM's CLI arguments. Since --enable-prefix-caching is a store_true action, its default value in argparse is False. Therefore, when the user does not pass --vllm-enable-prefix-caching on the command line, args.vllm_enable_prefix_caching is populated as False (not missing/None).

As a result:

getattr(args, "vllm_enable_prefix_caching", True) returns False.

False is not False evaluates to False.

--enable-prefix-caching is not added to the command line, leaving prefix caching disabled by default in production.

The unit test test_build_vllm_cmd_enables_prefix_cache_by_default only passes because it uses a mock SimpleNamespace that lacks the vllm_enable_prefix_caching attribute entirely, which does not reflect real-world execution.

Suggested Fix

To fix this robustly, we should set the default value of vllm_enable_prefix_caching to True when it is not explicitly provided by the user on the CLI. This can be done in vllm_parse_args() in vime/backends/vllm_utils/arguments.py:

# In vime/backends/vllm_utils/arguments.py -> vllm_parse_args() args, _ = parser.parse_known_args() user_provided, raw_values = _detect_user_provided_dests(parser, sys.argv[1:]) if "vllm_enable_prefix_caching" not in user_provided: args.vllm_enable_prefix_caching = True

This ensures that:

If the user doesn't pass the flag, it defaults to True.

If they override it via YAML vllm_overrides, _apply_vllm_overrides will correctly set it to False.

In vllm_engine.py, we can then safely use a simple truthiness check.

Suggested change

if getattr(args, "vllm_enable_prefix_caching", True) is not False:

cmd += ["--enable-prefix-caching"]

if getattr(args, "vllm_enable_prefix_caching", True):

cmd += ["--enable-prefix-caching"]

gemini-code-assist · 2026-06-17T05:38:22Z

+def test_build_vllm_cmd_enables_prefix_cache_by_default(vllm_args):
+    server_args = mod._compute_server_args(vllm_args, rank=0, dist_init_addr=None, host="127.0.0.1", port=8000)
+
+    cmd, _ = mod.build_vllm_cmd_and_env(server_args)
+
+    assert "--enable-prefix-caching" in cmd


Test Gap: Mock does not reflect production behavior

This test passes because vllm_args is a mock SimpleNamespace that lacks the vllm_enable_prefix_caching attribute. However, in production, args is parsed via vllm_parse_args(), which populates vllm_enable_prefix_caching as False by default (due to store_true action behavior).

To prevent such false positives in the future, consider updating the test fixture or adding a test case that uses parsed arguments (e.g., by calling vllm_parse_args() or simulating a parsed namespace where vllm_enable_prefix_caching is False but not explicitly provided by the user).

fix vllm prefix caching defaults

9bd6542

gemini-code-assist Bot reviewed Jun 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[codex] fix vllm prefix caching defaults#263

[codex] fix vllm prefix caching defaults#263
aoshen02 wants to merge 1 commit into
vllm-project:mainfrom
aoshen02:codex/vllm-arguments-fix-clean

aoshen02 commented Jun 17, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 17, 2026

Uh oh!

gemini-code-assist Bot Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		if getattr(args, "vllm_enable_prefix_caching", True) is not False:
		cmd += ["--enable-prefix-caching"]

Uh oh!

Conversation

aoshen02 commented Jun 17, 2026

What changed

Why

Validation

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 17, 2026

Choose a reason for hiding this comment

Correctness Issue: Default-on prefix caching is broken in production

Suggested Fix

Uh oh!

gemini-code-assist Bot Jun 17, 2026

Choose a reason for hiding this comment

Test Gap: Mock does not reflect production behavior

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant