Would be cool to add support for OpenAI's Flex Processing tier when making requests to OpenAI models.
Flex processing is a service tier OpenAI offers where requests are processed with variable latency in exchange for lower costs(around 50%. If capacity isn't available, the request fails fast rather than queuing, which could be a worthy tradeoff? Or maybe just send the request again to the non flex? idk
openai docs for more info!
why this should be a thing
I think this would be useful as OA's models are slowly getting more and more expensive over time (gpt 5 mini to gpt 5.4 mini), making us restricted to mostly cheaper and OSS models with the $4 daily limit(which isnt an issue, im not being ungrateful yall doing a excellent job lol). With flex, gpt 5.4 mini is around the same price as gpt 5 mini (without flex) so basically its a almost free upgrade with a much higher intelligence. Also, plenty of real-world use cases don't actually need fast responses such as news summariser that sends you a morning digest; this doesnt need low latency and would benefit from the cost decrease.
Should be a relatively low-effort change too as you pass service_tier: "flex" in the request body
would be happy to create a pr if its worthwhile
also let me know if i can plead my case better!
Would be cool to add support for OpenAI's Flex Processing tier when making requests to OpenAI models.
Flex processing is a service tier OpenAI offers where requests are processed with variable latency in exchange for lower costs(around 50%. If capacity isn't available, the request fails fast rather than queuing, which could be a worthy tradeoff? Or maybe just send the request again to the non flex? idk
openai docs for more info!
why this should be a thing
I think this would be useful as OA's models are slowly getting more and more expensive over time (gpt 5 mini to gpt 5.4 mini), making us restricted to mostly cheaper and OSS models with the $4 daily limit(which isnt an issue, im not being ungrateful yall doing a excellent job lol). With flex, gpt 5.4 mini is around the same price as gpt 5 mini (without flex) so basically its a almost free upgrade with a much higher intelligence. Also, plenty of real-world use cases don't actually need fast responses such as news summariser that sends you a morning digest; this doesnt need low latency and would benefit from the cost decrease.
Should be a relatively low-effort change too as you pass
service_tier: "flex"in the request bodywould be happy to create a pr if its worthwhile
also let me know if i can plead my case better!