In #5851, we added manual retry logic inside try_get_metric to catch and ignore transient Code::Internal and Code::NotFound errors from the Cloud Monitoring backend.
While this works for now to reduce test flakiness, we should implement a proper retry policy decorator that can automatically wrap these API calls and handle transient errors (like Internal) under the hood.
See #5851 (review)
In #5851, we added manual retry logic inside
try_get_metricto catch and ignore transientCode::InternalandCode::NotFounderrors from the Cloud Monitoring backend.While this works for now to reduce test flakiness, we should implement a proper retry policy decorator that can automatically wrap these API calls and handle transient errors (like
Internal) under the hood.See #5851 (review)