Skip to content

Allow clients to opt in to caching AC results in the proxy via a platform property#12563

Open
iain-macdonald wants to merge 1 commit into
masterfrom
im/275fb7ee5fb
Open

Allow clients to opt in to caching AC results in the proxy via a platform property#12563
iain-macdonald wants to merge 1 commit into
masterfrom
im/275fb7ee5fb

Conversation

@iain-macdonald

@iain-macdonald iain-macdonald commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

@iain-macdonald iain-macdonald marked this pull request as ready for review June 25, 2026 20:53

@buildbuddy-io buildbuddy-io Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR lets clients opt into having the action-cache proxy cache AC results locally via the cache-action-results-in-proxy platform property, falling through to a hardcoded 15-minute default for actionCacheTTL. The change is small and well-contained, with the BUILD dependency, import, IsTrue/RemoteHeaderOverrides usage, and precedence ordering all correct.

Additional findings (lines outside the diff)

  • enterprise/server/remote_execution/platform/platform.go:112 — Stray blank line inside the const block, just below the new property name.


// Clients may opt into this caching via a platform property as well.
for _, prop := range platform.RemoteHeaderOverrides(ctx) {
if strings.EqualFold(prop.GetName(), platform.CacheActionResultsInProxyPropertyName) && platform.IsTrue(prop.GetValue()) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per-request opt-in makes the local-AC TTL asymmetric between writes and reads, which can break the REv2 consistency guarantee the code claims (lines 393-395) and serve stale results.

With the old experiment-flag mechanism, actionCacheTTL returned a non-zero TTL uniformly for every request from a group, so every UpdateActionResult refreshed the local AC entry (line 397: ttl > 0) and reads always saw the latest write. The new path derives the TTL from a per-request header, so writes and reads can disagree about whether caching is active:

  1. Invocation A (header set) writes result v1 → remote updated, local entry cached with fresh mtime, TTL 15 min.
  2. Invocation B (header not set) writes v2 to the same proxy → ttl == 0, so line 397 skips the local update; remote now holds v2 but the local entry still points at v1.
  3. Invocation A (header set) reads within 15 min → ttlFastPathEligible && isLocalActionResultFresh is true (line 287), so it serves the stale v1 without revalidating against the remote.

Because --remote_header is set per-invocation, mixed opt-in across clients/invocations sharing one proxy is a realistic configuration, and the comment at lines 393-395 ("ensures the REv2 guarantee that GetActionResult serves the most recent UpdateActionResult for all requests to the same endpoint") no longer holds for that mix. Worth either gating the read fast-path on a consistent signal or updating the comment to scope the guarantee to clients that opt in uniformly.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, agreed, don't think I'd like to fix that at this time though.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually seems OK to me. The two invocations live in slightly different worlds. That said, it seems like it would be any easy fix: always write the local action in UpdateActionResult. It will just be ignored most of the time, and shouldn't take up much space in the cache.

@iain-macdonald iain-macdonald requested a review from vanja-p June 25, 2026 21:27
@iain-macdonald iain-macdonald marked this pull request as draft June 26, 2026 17:48
@iain-macdonald

Copy link
Copy Markdown
Contributor Author

Converting this to draft because I realized that it will probably break the cache scorecard (as the current experiment does) because the remote hit tracker doesn't report locally-served AC hits, I think.

@iain-macdonald

Copy link
Copy Markdown
Contributor Author

Converting this to draft because I realized that it will probably break the cache scorecard (as the current experiment does) because the remote hit tracker doesn't report locally-served AC hits, I think.

Never mind:

if ttlFastPathEligible && s.isLocalActionResultFresh(localACMD, ttl) {
// Skip checking for existence of output files. The app recently
// validated or updated this result, which refreshed the referenced
// outputs' atime. With remote_download_minimal, this proxy
// often won't have every output locally.
sizeBytes := int64(proto.Size(localResult))
s.recordLocalACHit(ctx, req, localResult, sizeBytes)
labels := prometheus.Labels{
metrics.StatusLabel: status.MetricsLabel(nil),
metrics.CacheHitMissStatus: metrics.HitStatusLabel,
metrics.CacheProxyRequestType: metrics.DefaultCacheProxyRequestLabel,
}
metrics.ActionCacheProxiedReadRequests.With(labels).Inc()
metrics.ActionCacheProxiedReadBytes.With(labels).Add(float64(sizeBytes))
return localResult, nil
}

@iain-macdonald iain-macdonald marked this pull request as ready for review June 26, 2026 21:10
return 0

// Clients may opt into this caching via a platform property as well.
for _, prop := range platform.RemoteHeaderOverrides(ctx) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does RemoteHeaderOverrides survive multiple cache proxy hops?

if efp == nil {
return 0
if efp != nil {
ttlSeconds := efp.Int64(ctx, "cache_proxy.action_cache_ttl_seconds", 0)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should allow experiments to turn off platform overrides by setting a negative value?

RetryPropertyName = "retry"
PersistentVolumesPropertyName = "persistent-volumes"
execrootPathPropertyName = "execroot-path"
CacheActionResultsInProxyPropertyName = "cache-action-results-in-proxy"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could use a doc to document the behavior and supported values.

// Clients may opt into this caching via a platform property as well.
for _, prop := range platform.RemoteHeaderOverrides(ctx) {
if strings.EqualFold(prop.GetName(), platform.CacheActionResultsInProxyPropertyName) && platform.IsTrue(prop.GetValue()) {
return clientActionCacheProxyTTLDefault

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're letting the client configure this, seems like we might as well let them configure the TTL as the header value, or default to 15min if they don't set a value?


// Clients may opt into this caching via a platform property as well.
for _, prop := range platform.RemoteHeaderOverrides(ctx) {
if strings.EqualFold(prop.GetName(), platform.CacheActionResultsInProxyPropertyName) && platform.IsTrue(prop.GetValue()) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually seems OK to me. The two invocations live in slightly different worlds. That said, it seems like it would be any easy fix: always write the local action in UpdateActionResult. It will just be ignored most of the time, and shouldn't take up much space in the cache.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants