Add cancel() and isBusy to InferenceEngine protocol#32
Conversation
4e5d422 to
f1a8bf3
Compare
1fd5faa to
50386a3
Compare
| private func drain() { | ||
| public var isBusy: Bool { generating.withLock { $0 } } | ||
|
|
||
| public func cancel() async throws { |
There was a problem hiding this comment.
My understanding is that cancel() in static engine has a slightly different meaning than in pipelined, and sequential shares that same definition as static here
Pipelined engine cancels a real background Task so cancel actually does stop the work. but the static and sequential engines has no background Task so generation runs inside the consumer's next() calls, so generating only becomes false when next() runs and sees cancel being requested.
So cancel here kind of means: we ask the consumer to stop if it keeps pulling. Should we restructure the static/sequential path so cancellation doesn't depend on consumer making progress?
The main problem I see here is that cancelRequested is only checked inside next(), so semantically the idea of cancel is different between pipelined and the other engines
There was a problem hiding this comment.
To make more concrete if a consumer is paused and not calling next() the check never runts so cancel() in that case means throwing a timeout
There was a problem hiding this comment.
Yes that is a valid concern. The iterator and the engine are a bit decoupled, and it might make sense to cancel on the engine side, and have the iterator be invalid.
There was a problem hiding this comment.
Added a Token concept. And only the iterator that contains the token would talk to the engine
342fd81 to
f666779
Compare
f666779 to
1a754e1
Compare
Adds lifecycle management APIs to the InferenceEngine protocol so callers can gracefully cancel in-flight generation and query busy state.
All three engine implementations (Sequential, Pipelined, StaticShape) and MockEngine are updated with concrete implementations.
CoreAILanguageModel now calls cancel() before reset() to ensure clean state transitions.