fix: cache bad_words tokenization to avoid 'Already borrowed' errors under concurrency#45522
Open
Oxygen56 wants to merge 1 commit into
Open
fix: cache bad_words tokenization to avoid 'Already borrowed' errors under concurrency#45522Oxygen56 wants to merge 1 commit into
Oxygen56 wants to merge 1 commit into
Conversation
24c4653 to
72f3f7e
Compare
…der concurrency Signed-off-by: Willow Lopez <100782273+Oxygen56@users.noreply.github.com>
72f3f7e to
62d70f2
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #45445 —
RuntimeError: Already borrowedwhen usingbad_wordswith concurrent OpenAI serving requests.Root Cause
SamplingParams.update_from_tokenizer()callstokenizer.encode()for each bad_word on every request. When many concurrent requests share the samebad_wordslist (a common workload pattern, e.g. RL), these redundant encode calls race with prompt tokenization on the same HF fast tokenizer pool. This can still trigger "Already borrowed" even after the thread-safe tokenizer wrapper in #41181 was merged.Fix
Cache tokenized bad_word results on the tokenizer object (lifetime-bound to the tokenizer). When a bad_word has been tokenized before, the cached result is reused, avoiding the redundant
tokenizer.encode()call entirely.(bad_word, add_prefix_space)tupletokenizer._bad_words_token_cachedict (created lazily)Validation
The reporter confirmed that locally caching bad_words IDs inside
update_from_tokenizer()eliminated all "Already borrowed" failures across their A/B hammer test:RuntimeError: Already borrowedTest Plan
_bad_words_token_cacheattribute is lazily created if absent