Skip to content

fix: cache bad_words tokenization to avoid 'Already borrowed' errors under concurrency#45522

Open
Oxygen56 wants to merge 1 commit into
vllm-project:mainfrom
Oxygen56:fix/issue-45445
Open

fix: cache bad_words tokenization to avoid 'Already borrowed' errors under concurrency#45522
Oxygen56 wants to merge 1 commit into
vllm-project:mainfrom
Oxygen56:fix/issue-45445

Conversation

@Oxygen56

Copy link
Copy Markdown
Contributor

Summary

Fixes #45445RuntimeError: Already borrowed when using bad_words with concurrent OpenAI serving requests.

Root Cause

SamplingParams.update_from_tokenizer() calls tokenizer.encode() for each bad_word on every request. When many concurrent requests share the same bad_words list (a common workload pattern, e.g. RL), these redundant encode calls race with prompt tokenization on the same HF fast tokenizer pool. This can still trigger "Already borrowed" even after the thread-safe tokenizer wrapper in #41181 was merged.

Fix

Cache tokenized bad_word results on the tokenizer object (lifetime-bound to the tokenizer). When a bad_word has been tokenized before, the cached result is reused, avoiding the redundant tokenizer.encode() call entirely.

  • Cache key: (bad_word, add_prefix_space) tuple
  • Cache storage: tokenizer._bad_words_token_cache dict (created lazily)
  • Thread-safety: Python dict operations under GIL are safe for this read-heavy workload pattern

Validation

The reporter confirmed that locally caching bad_words IDs inside update_from_tokenizer() eliminated all "Already borrowed" failures across their A/B hammer test:

  • Without cache: intermittent HTTP 500s with RuntimeError: Already borrowed
  • With cache: 0 HTTP 500s across the same hammer rounds

Test Plan

  • No functional change to tokenization logic — same result, just cached
  • Cache lifetime is bound to the tokenizer (cleaned up when tokenizer is replaced)
  • Backward compatible — _bad_words_token_cache attribute is lazily created if absent

…der concurrency

Signed-off-by: Willow Lopez <100782273+Oxygen56@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: RuntimeError: Already borrowed still occurs with bad_words despite thread-safe tokenizer wrapper

1 participant