Allow clients to opt in to caching AC results in the proxy via a platform property#12563
Allow clients to opt in to caching AC results in the proxy via a platform property#12563iain-macdonald wants to merge 1 commit into
Conversation
6b7723b to
545b5ae
Compare
There was a problem hiding this comment.
PR lets clients opt into having the action-cache proxy cache AC results locally via the cache-action-results-in-proxy platform property, falling through to a hardcoded 15-minute default for actionCacheTTL. The change is small and well-contained, with the BUILD dependency, import, IsTrue/RemoteHeaderOverrides usage, and precedence ordering all correct.
Additional findings (lines outside the diff)
enterprise/server/remote_execution/platform/platform.go:112— Stray blank line inside theconstblock, just below the new property name.
|
|
||
| // Clients may opt into this caching via a platform property as well. | ||
| for _, prop := range platform.RemoteHeaderOverrides(ctx) { | ||
| if strings.EqualFold(prop.GetName(), platform.CacheActionResultsInProxyPropertyName) && platform.IsTrue(prop.GetValue()) { |
There was a problem hiding this comment.
Per-request opt-in makes the local-AC TTL asymmetric between writes and reads, which can break the REv2 consistency guarantee the code claims (lines 393-395) and serve stale results.
With the old experiment-flag mechanism, actionCacheTTL returned a non-zero TTL uniformly for every request from a group, so every UpdateActionResult refreshed the local AC entry (line 397: ttl > 0) and reads always saw the latest write. The new path derives the TTL from a per-request header, so writes and reads can disagree about whether caching is active:
- Invocation A (header set) writes result v1 → remote updated, local entry cached with fresh mtime, TTL 15 min.
- Invocation B (header not set) writes v2 to the same proxy →
ttl == 0, so line 397 skips the local update; remote now holds v2 but the local entry still points at v1. - Invocation A (header set) reads within 15 min →
ttlFastPathEligible && isLocalActionResultFreshis true (line 287), so it serves the stale v1 without revalidating against the remote.
Because --remote_header is set per-invocation, mixed opt-in across clients/invocations sharing one proxy is a realistic configuration, and the comment at lines 393-395 ("ensures the REv2 guarantee that GetActionResult serves the most recent UpdateActionResult for all requests to the same endpoint") no longer holds for that mix. Worth either gating the read fast-path on a consistent signal or updating the comment to scope the guarantee to clients that opt in uniformly.
There was a problem hiding this comment.
Yeah, agreed, don't think I'd like to fix that at this time though.
There was a problem hiding this comment.
This actually seems OK to me. The two invocations live in slightly different worlds. That said, it seems like it would be any easy fix: always write the local action in UpdateActionResult. It will just be ignored most of the time, and shouldn't take up much space in the cache.
|
Converting this to draft because I realized that it will probably break the cache scorecard (as the current experiment does) because the remote hit tracker doesn't report locally-served AC hits, I think. |
Never mind: |
| return 0 | ||
|
|
||
| // Clients may opt into this caching via a platform property as well. | ||
| for _, prop := range platform.RemoteHeaderOverrides(ctx) { |
There was a problem hiding this comment.
does RemoteHeaderOverrides survive multiple cache proxy hops?
| if efp == nil { | ||
| return 0 | ||
| if efp != nil { | ||
| ttlSeconds := efp.Int64(ctx, "cache_proxy.action_cache_ttl_seconds", 0) |
There was a problem hiding this comment.
maybe we should allow experiments to turn off platform overrides by setting a negative value?
| RetryPropertyName = "retry" | ||
| PersistentVolumesPropertyName = "persistent-volumes" | ||
| execrootPathPropertyName = "execroot-path" | ||
| CacheActionResultsInProxyPropertyName = "cache-action-results-in-proxy" |
There was a problem hiding this comment.
this could use a doc to document the behavior and supported values.
| // Clients may opt into this caching via a platform property as well. | ||
| for _, prop := range platform.RemoteHeaderOverrides(ctx) { | ||
| if strings.EqualFold(prop.GetName(), platform.CacheActionResultsInProxyPropertyName) && platform.IsTrue(prop.GetValue()) { | ||
| return clientActionCacheProxyTTLDefault |
There was a problem hiding this comment.
If we're letting the client configure this, seems like we might as well let them configure the TTL as the header value, or default to 15min if they don't set a value?
|
|
||
| // Clients may opt into this caching via a platform property as well. | ||
| for _, prop := range platform.RemoteHeaderOverrides(ctx) { | ||
| if strings.EqualFold(prop.GetName(), platform.CacheActionResultsInProxyPropertyName) && platform.IsTrue(prop.GetValue()) { |
There was a problem hiding this comment.
This actually seems OK to me. The two invocations live in slightly different worlds. That said, it seems like it would be any easy fix: always write the local action in UpdateActionResult. It will just be ignored most of the time, and shouldn't take up much space in the cache.
Fixes: https://github.com/buildbuddy-io/buildbuddy-internal/issues/7611