-
Notifications
You must be signed in to change notification settings - Fork 7.6k
Add runtime warnings when max_rpm and max_tokens are unset #6371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -683,6 +683,21 @@ def _validate_llm_fields(cls, data: Any) -> Any: | |
| data["is_anthropic"] = cls._is_anthropic_model(model) | ||
| return data | ||
|
|
||
| @model_validator(mode="after") | ||
| def _warn_tokens_uncapped(self) -> LLM: | ||
| """Warn when neither ``max_tokens`` nor ``max_completion_tokens`` is set. | ||
|
|
||
| Without an explicit cap, a single prompt can produce very large outputs | ||
| (e.g. 128k+ tokens on modern models), leading to runaway spend. This | ||
| warning surfaces the risk at runtime without changing behavior. | ||
| """ | ||
| if self.max_tokens is None and self.max_completion_tokens is None: | ||
| logging.warning( | ||
| "max_tokens/max_completion_tokens not set; LLM responses are uncapped. " | ||
| "Set a limit (e.g., max_tokens=4096) to control token costs and avoid runaway generation." | ||
| ) | ||
|
Comment on lines
+686
to
+698
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win Zero-valued caps bypass this warning but are still treated as uncapped. Line 755 and Line 2168 both collapse 🤖 Prompt for AI Agents |
||
| return self | ||
|
|
||
| @model_validator(mode="after") | ||
| def _init_litellm(self) -> LLM: | ||
| self.is_litellm = True | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
max_rpm=0still disables throttling without triggering this warning.Line 767 and Line 2102 both gate RPM behavior on truthiness, so
max_rpm=0skips controller hookup/cleanup just likeNone. As written, the new description and validator only coverNone, so this PR still misses one uncapped configuration path.Also applies to: 617-630
🤖 Prompt for AI Agents