Skip to content

[codex] fix vllm prefix caching defaults#263

Draft
aoshen02 wants to merge 1 commit into
vllm-project:mainfrom
aoshen02:codex/vllm-arguments-fix-clean
Draft

[codex] fix vllm prefix caching defaults#263
aoshen02 wants to merge 1 commit into
vllm-project:mainfrom
aoshen02:codex/vllm-arguments-fix-clean

Conversation

@aoshen02

Copy link
Copy Markdown
Collaborator

What changed

  • Restore default-on prefix caching in the vLLM launcher when the user does not explicitly disable it.
  • Add unit coverage for the default-on and explicit-off cases.

Why

The vLLM subprocess was being launched without --enable-prefix-caching on the default path, which let the engine start with APC disabled and produced 0% hit rate in multi-turn rollout.

Validation

  • PYTHONPATH=/home/aoshen/vime/projects/vime_modal_sandbox/vime-arguments-fix-clean pytest -q tests/utils/test_vllm_engine.py -q

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces changes to enable prefix caching by default in the vLLM engine command-line arguments, along with corresponding unit tests. However, the feedback highlights a critical correctness issue where the default-on logic fails in production because the parsed arguments default to False for this flag. Additionally, the new unit tests fail to catch this because they use a mock namespace that does not accurately reflect production behavior. It is recommended to set the default value during argument parsing and update the tests accordingly.

Comment on lines +403 to +404
if getattr(args, "vllm_enable_prefix_caching", True) is not False:
cmd += ["--enable-prefix-caching"]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Correctness Issue: Default-on prefix caching is broken in production

In production, args is parsed using vllm_parse_args() (defined in vime/backends/vllm_utils/arguments.py), which imports vLLM's CLI arguments. Since --enable-prefix-caching is a store_true action, its default value in argparse is False. Therefore, when the user does not pass --vllm-enable-prefix-caching on the command line, args.vllm_enable_prefix_caching is populated as False (not missing/None).

As a result:

  1. getattr(args, "vllm_enable_prefix_caching", True) returns False.
  2. False is not False evaluates to False.
  3. --enable-prefix-caching is not added to the command line, leaving prefix caching disabled by default in production.

The unit test test_build_vllm_cmd_enables_prefix_cache_by_default only passes because it uses a mock SimpleNamespace that lacks the vllm_enable_prefix_caching attribute entirely, which does not reflect real-world execution.

Suggested Fix

To fix this robustly, we should set the default value of vllm_enable_prefix_caching to True when it is not explicitly provided by the user on the CLI. This can be done in vllm_parse_args() in vime/backends/vllm_utils/arguments.py:

# In vime/backends/vllm_utils/arguments.py -> vllm_parse_args()
args, _ = parser.parse_known_args()
user_provided, raw_values = _detect_user_provided_dests(parser, sys.argv[1:])
if "vllm_enable_prefix_caching" not in user_provided:
    args.vllm_enable_prefix_caching = True

This ensures that:

  • If the user doesn't pass the flag, it defaults to True.
  • If they override it via YAML vllm_overrides, _apply_vllm_overrides will correctly set it to False.
  • In vllm_engine.py, we can then safely use a simple truthiness check.
Suggested change
if getattr(args, "vllm_enable_prefix_caching", True) is not False:
cmd += ["--enable-prefix-caching"]
if getattr(args, "vllm_enable_prefix_caching", True):
cmd += ["--enable-prefix-caching"]

Comment on lines +231 to +236
def test_build_vllm_cmd_enables_prefix_cache_by_default(vllm_args):
server_args = mod._compute_server_args(vllm_args, rank=0, dist_init_addr=None, host="127.0.0.1", port=8000)

cmd, _ = mod.build_vllm_cmd_and_env(server_args)

assert "--enable-prefix-caching" in cmd

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Test Gap: Mock does not reflect production behavior

This test passes because vllm_args is a mock SimpleNamespace that lacks the vllm_enable_prefix_caching attribute. However, in production, args is parsed via vllm_parse_args(), which populates vllm_enable_prefix_caching as False by default (due to store_true action behavior).

To prevent such false positives in the future, consider updating the test fixture or adding a test case that uses parsed arguments (e.g., by calling vllm_parse_args() or simulating a parsed namespace where vllm_enable_prefix_caching is False but not explicitly provided by the user).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant