perf(utilities): memoize sanitize_tool_name#6322
Conversation
There was a problem hiding this comment.
Summary: This PR memoizes sanitize_tool_name and adds tests for existing sanitization behavior and cache correctness; no authentication, authorization, external I/O, or data boundary changes are introduced.
Risk: Low risk. No exploitable security vulnerabilities were identified because the change is limited to a bounded in-memory cache around a pure string utility function.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthrough
ChangesTool name sanitization
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
sanitize_tool_name is a pure function called many times per agent iteration over the same small, stable set of tool names (tool selection, schema rendering, name comparisons), and each call does an NFKD normalize, an ascii round-trip and several regex substitutions. Cache results with an lru_cache(maxsize=1024); repeat calls drop from ~5.4us to ~0.04us (~120x). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
f3c034a to
63d41d1
Compare
Summary
sanitize_tool_nameis a pure function called many times per agent iteration over the same small, stable set of tool names — tool selection, schema/name rendering, and name comparisons (139 call sites; 38 intool_usage.pyalone). Each call does an NFKD Unicode normalize, an ascii encode/decode round-trip, and several regex substitutions (plus a SHA-256 for over-length names).Change
Memoize it with
functools.lru_cache(maxsize=1024). The function depends only on its(name, max_length)arguments (both hashable) and has no side effects, so caching is safe;maxsizebounds memory, and the module already useslru_cachefor_duplicate_separator_pattern.Benchmark
Repeated calls over a fixed name (local,
timeit):~120× on repeat calls. Within a single agent run the cache hit rate is ~100% since tool names are stable.
Tests
Added
TestSanitizeToolName(sanitization rules preserved, memoization viacache_info,max_lengthcache-key correctness).ruff+mypyclean;test_tool_usage.pyandtest_base_tool_adapter.pygreen.This PR was authored with Claude Code. Per
CONTRIBUTING.md, AI-generated contributions require thellm-generatedlabel — I don't have triage permission to set it, so could a maintainer please add it? 🤖 Generated with Claude CodeSummary by CodeRabbit
Performance
Tests