Cache compiled regex in unicode_regex_split_stl (ByteLevel hotspot) (#197) by joshuuuasu · Pull Request #197 · meta-pytorch/tokenizers

joshuuuasu · 2026-06-15T17:34:23Z

Summary:

The ByteLevel pre-tokenizer's STL regex path in
third-party/llama.cpp-unicode/src/unicode.cpp recompiled its split regex on
every call to unicode_regex_split_stl (~497 compiles per request over a SID
prompt), dominating tokenize latency. A single std::regex/std::wregex compile
is expensive and the set of patterns is small and fixed, so we cache the
compiled regex per pattern.

This diff adds a function-local static
unordered_map<pattern, shared_ptr> guarded by a std::mutex in BOTH
unicode_regex_split_stl overloads (std::wregex and std::regex). The compiled
regex is returned as a shared_ptr and matched concurrently across
the multi-threaded tokenizer pool; matching on a const std::regex from multiple
threads is thread-safe. Behavior is identical by construction (same pattern +
flags -> same compiled regex -> same matches). Adds and .

Measured win (model 2119730608, constrained decoding on):

Metric	Before	After
Tokenizer.encode (bench)	144 ms	1.37 ms
Server tokenize	~97 ms	~1.7 ms
gr_loadgen greedy client p50	166.7 ms	68.9 ms
gr_loadgen beam=10 client p50	199.2 ms	91.3 ms
gr_loadgen client p99	272 ms	72 ms

This is upstreamable to llama.cpp (MIT) and we intend to send it there.

Differential Revision: D108634865

meta-codesync · 2026-06-15T17:34:31Z

@joshuuuasu has exported this pull request. If you are a Meta employee, you can view the originating Diff in D108634865.

…eta-pytorch#197) Summary: The ByteLevel pre-tokenizer's STL regex path in third-party/llama.cpp-unicode/src/unicode.cpp recompiled its split regex on every call to unicode_regex_split_stl (~497 compiles per request over a SID prompt), dominating tokenize latency. A single std::regex/std::wregex compile is expensive and the set of patterns is small and fixed, so we cache the compiled regex per pattern. This diff adds a function-local static unordered_map<pattern, shared_ptr<const regex>> guarded by a std::mutex in BOTH unicode_regex_split_stl overloads (std::wregex and std::regex). The compiled regex is returned as a shared_ptr<const regex> and matched concurrently across the multi-threaded tokenizer pool; matching on a const std::regex from multiple threads is thread-safe. Behavior is identical by construction (same pattern + flags -> same compiled regex -> same matches). Adds <memory> and <mutex>. Measured win (model 2119730608, constrained decoding on): | Metric | Before | After | |--------------------------------|----------|----------| | Tokenizer.encode (bench) | 144 ms | 1.37 ms | | Server tokenize | ~97 ms | ~1.7 ms | | gr_loadgen greedy client p50 | 166.7 ms | 68.9 ms | | gr_loadgen beam=10 client p50 | 199.2 ms | 91.3 ms | | gr_loadgen client p99 | 272 ms | 72 ms | This is upstreamable to llama.cpp (MIT) and we intend to send it there. Differential Revision: D108634865

meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jun 15, 2026

meta-codesync Bot added the meta-exported label Jun 15, 2026

meta-codesync Bot changed the title ~~Cache compiled regex in unicode_regex_split_stl (ByteLevel hotspot)~~ Cache compiled regex in unicode_regex_split_stl (ByteLevel hotspot) (#197) Jun 15, 2026

joshuuuasu force-pushed the export-D108634865 branch from 08d703c to e7dfa41 Compare June 15, 2026 17:39

larryliu0820 approved these changes Jun 15, 2026

View reviewed changes

meta-codesync Bot merged commit dd727c3 into meta-pytorch:main Jun 15, 2026
7 of 8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache compiled regex in unicode_regex_split_stl (ByteLevel hotspot) (#197)#197

Cache compiled regex in unicode_regex_split_stl (ByteLevel hotspot) (#197)#197
meta-codesync[bot] merged 1 commit into
meta-pytorch:mainfrom
joshuuuasu:export-D108634865

joshuuuasu commented Jun 15, 2026 •

edited by meta-codesync Bot

Loading

Uh oh!

meta-codesync Bot commented Jun 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

joshuuuasu commented Jun 15, 2026 • edited by meta-codesync Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

meta-codesync Bot commented Jun 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

joshuuuasu commented Jun 15, 2026 •

edited by meta-codesync Bot

Loading