RFC: Token Budget Observability — live utilization metrics per process #384
zatchbell1311-wq
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi @ingim and @shsym — now that #382 is merged, I'd like to take on something more substantial.
Problem
Pie manages token budgets internally via the context manager and admission gating, but operators currently have no visibility into live utilization per process. When a daemon deployment is under load, there's no way to know how close individual processes are to their budget limits without digging into logs.
Proposed contribution
Add a lightweight
/metricsor/statusendpoint to pie-server that exposes:Why I'm interested
My own research (DSPM) is directly about token budget management in LLM deployments, so I have deep context on what operators actually need to observe. Happy to start with a design discussion before writing any code.
Is this something the project would welcome, or is it already on the roadmap?
Beta Was this translation helpful? Give feedback.
All reactions