Skip to content

feat: support Redis Sentinel for HA cache and lock connections [#9449]#9451

Draft
fatih-acar wants to merge 1 commit into
stablefrom
fac-redis-sentinel-support
Draft

feat: support Redis Sentinel for HA cache and lock connections [#9449]#9451
fatih-acar wants to merge 1 commit into
stablefrom
fac-redis-sentinel-support

Conversation

@fatih-acar

@fatih-acar fatih-acar commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

What & why

Closes #9449. The Redis cache adapter and lock registry each constructed redis.asyncio.Redis(host=..., port=..., db=...) against a single fixed endpoint, and CacheSettings exposed only scalar address/port. Production HA runs Redis behind Sentinel, so the documented HA deployment had to add a redis-sentinel-proxy sidecar purely because the client could not talk to Sentinel.

This adds a single optional INFRAHUB_CACHE_URL accepting redis://, rediss://, redis+sentinel://, and rediss+sentinel://. When a Sentinel URL is set, Infrahub discovers the current master through the Sentinel nodes via redis.asyncio.Sentinel(...).master_for() and follows failover automatically — letting the recommended HA architecture drop the proxy. Single-node deployments upgrade with zero config change. No new dependency (uses redis-py 6.0.0's bundled Sentinel).

Implementation

  • services/adapters/cache/connection.py (new): hand-rolled URL parser (urllib.parse), structured RedisConnectionConfig, a single shared build_redis_connection() used by both the cache adapter and the lock registry (no divergent second parser), and redact_redis_url() for secret redaction. Grammar: redis+sentinel://[user:pass@]host:port[,host2:port2,...]/service_name[/db][?params].
  • config.py: CacheSettings.url (SecretStr) with a fail-fast model_validator — the URL is mutually exclusive with the scalar connection fields (via model_fields_set) and is parsed at startup into a typed RedisUrlError. hide_input_in_errors keeps the URL out of ValidationError reprs; the cache field uses a default_factory to avoid a config↔services import cycle the validator would otherwise trigger at module load.
  • redis.py + lock.py: both collapse to build_redis_connection(config.SETTINGS.cache).
  • exceptions.py: typed RedisUrlError.
  • workflows/initialization.py: Prefect's result-storage Redis URL now honors cache.url — single-node passes through (rebuilt with redis-py-native ssl_* params), Sentinel degrades best-effort to the first member's data port (see note below).
  • Docs/deploy: HA guide rewritten to connect directly via redis+sentinel:// with a manual failover-validation runbook; redis-sentinel-proxy-*.yaml manifests removed; infrahub-values.yaml and the generated configuration.mdx updated.

Known limitation (follow-up)

prefect_redis has no Sentinel support (it builds a plain redis.asyncio.Redis from host/port). Both Prefect Redis consumers (helm messaging host + the result-storage builder) now point best-effort at the Sentinel-managed Redis service directly and do not follow master failover. Making Prefect's Redis HA is tracked as a separate follow-up; documented in the HA guide and code comments. (Distributed-lock loss during failover remains pre-existing and tracked in #9450.)

Testing

  • Unit: exhaustive parse_redis_url/redaction suite, config-validator cases, and Prefect-URL cases.
  • Component (real Docker): single-node-URL test and a 1-master/1-sentinel TestContainers topology asserting cache + lock work through master_for(). The original single-node test_redis.py validates the scalar path through the new builder.
  • Failover recovery (SC-001) is a documented manual runbook in the HA guide (induced-promotion timing is too flaky for CI).
  • ruff format/check + mypy clean; 67 unit + 6 redis component tests pass.

🤖 Generated with Claude Code


Summary by cubic

Adds native Redis Sentinel support for cache and distributed locks via a new INFRAHUB_CACHE_URL, enabling automatic master discovery and failover without a redis-sentinel-proxy. Single-node setups continue to work unchanged. Addresses #9449.

  • New Features

    • INFRAHUB_CACHE_URL accepts redis://, rediss://, redis+sentinel://, rediss+sentinel:// (e.g. redis+sentinel://s1:26379,s2:26379/mymaster).
    • Shared Redis builder (build_redis_connection) used by cache and lock; follows failover via redis-py Sentinel.master_for().
    • URL parser with strict validation, TLS knobs, and secret redaction in errors/logs; config enforces URL is exclusive with scalar fields.
    • Prefect result storage honors cache.url; Sentinel degrades to first member’s data port (no Sentinel support in prefect_redis).
    • Updated HA docs and infrahub-values.yaml to connect directly to Sentinel; removed redis-sentinel-proxy manifests. Added unit and component tests (including Sentinel topology).
  • Migration

    • For HA with Sentinel, set INFRAHUB_CACHE_URL=redis+sentinel://<sentinel-hosts>/mymaster and remove any redis-sentinel-proxy.
    • No changes for single-node (INFRAHUB_CACHE_ADDRESS/PORT still work).
    • Note: Prefect messaging/result storage does not follow Sentinel failover; documented as a follow-up.

Written for commit 52ec561. Summary will update on new commits.

Review in cubic

@github-actions github-actions Bot added type/documentation Improvements or additions to documentation group/backend Issue related to the backend (API Server, Git Agent) labels Jun 3, 2026

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 issues found across 21 files

Confidence score: 2/5

  • There is clear regression risk in backend/infrahub/services/adapters/cache/connection.py: password-only Redis auth is dropped, which can lead to unauthenticated cache connections when only cache.password is set.
  • development/k8s/infrahub-values.yaml points Prefect messaging at the Sentinel service host on port 6379 in this mode, which can be read-only and may break write operations at runtime.
  • Several config/URL handling issues add startup and connectivity risk (backend/infrahub/config.py validator behavior, credential truthiness and IPv6 URL rebuilding in backend/infrahub/workflows/initialization.py, and missing port-range validation in backend/infrahub/services/adapters/cache/connection.py), so this is not a low-risk merge yet.
  • Pay close attention to backend/infrahub/services/adapters/cache/connection.py, development/k8s/infrahub-values.yaml, backend/infrahub/config.py, backend/infrahub/workflows/initialization.py - authentication, Redis endpoint selection, and URL validation/serialization can cause runtime failures.
Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="backend/infrahub/config.py">

<violation number="1" location="backend/infrahub/config.py:446">
P2: Cache URL validator runs even for non-Redis cache drivers, so `INFRAHUB_CACHE_URL` is not actually ignored when `driver != redis` and can wrongly fail settings load.</violation>
</file>

<file name="backend/infrahub/services/adapters/cache/connection.py">

<violation number="1" location="backend/infrahub/services/adapters/cache/connection.py:116">
P2: Redis URL port range is not validated, allowing invalid ports (e.g. 70000) past startup validation.</violation>

<violation number="2" location="backend/infrahub/services/adapters/cache/connection.py:226">
P1: Password-only scalar Redis auth is dropped, causing unauthenticated connections when only `cache.password` is configured.</violation>
</file>

<file name="backend/infrahub/workflows/initialization.py">

<violation number="1" location="backend/infrahub/workflows/initialization.py:31">
P2: Credential serialization uses truthiness, which drops valid username-only/empty-password URL credentials.</violation>

<violation number="2" location="backend/infrahub/workflows/initialization.py:46">
P2: Rebuilt Redis URLs do not bracket IPv6 hosts, producing invalid connection strings.</violation>
</file>

<file name="development/k8s/infrahub-values.yaml">

<violation number="1" location="development/k8s/infrahub-values.yaml:97">
P1: Prefect messaging is configured to use the Sentinel service host directly, which is read-only on Redis port 6379 in this chart mode and can break writes.</violation>
</file>

Shadow auto-approve: would not auto-approve because issues were found.

Re-trigger cubic

Comment on lines +226 to +229
if settings.username and settings.password:
credential_provider = UsernamePasswordCredentialProvider(
username=settings.username, password=settings.password
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Password-only scalar Redis auth is dropped, causing unauthenticated connections when only cache.password is configured.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/infrahub/services/adapters/cache/connection.py, line 226:

<comment>Password-only scalar Redis auth is dropped, causing unauthenticated connections when only `cache.password` is configured.</comment>

<file context>
@@ -0,0 +1,251 @@
+
+    if settings.url is None:
+        credential_provider: UsernamePasswordCredentialProvider | None = None
+        if settings.username and settings.password:
+            credential_provider = UsernamePasswordCredentialProvider(
+                username=settings.username, password=settings.password
</file context>
Suggested change
if settings.username and settings.password:
credential_provider = UsernamePasswordCredentialProvider(
username=settings.username, password=settings.password
)
if settings.password:
credential_provider = UsernamePasswordCredentialProvider(
username=settings.username or None, password=settings.password
)

# NOTE: prefect_redis has no Sentinel support, so it cannot follow master failover on its
# own. It points at the Sentinel-managed Redis service directly; making Prefect's messaging
# Redis highly available is tracked as a separate follow-up.
host: "cache-redis"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Prefect messaging is configured to use the Sentinel service host directly, which is read-only on Redis port 6379 in this chart mode and can break writes.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At development/k8s/infrahub-values.yaml, line 97:

<comment>Prefect messaging is configured to use the Sentinel service host directly, which is read-only on Redis port 6379 in this chart mode and can break writes.</comment>

<file context>
@@ -93,7 +91,10 @@ infrahub:
+          # NOTE: prefect_redis has no Sentinel support, so it cannot follow master failover on its
+          # own. It points at the Sentinel-managed Redis service directly; making Prefect's messaging
+          # Redis highly available is tracked as a separate follow-up.
+          host: "cache-redis"
       affinity:
         podAntiAffinity:
</file context>

Comment on lines +446 to +447
if self.url is None:
return self

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Cache URL validator runs even for non-Redis cache drivers, so INFRAHUB_CACHE_URL is not actually ignored when driver != redis and can wrongly fail settings load.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/infrahub/config.py, line 446:

<comment>Cache URL validator runs even for non-Redis cache drivers, so `INFRAHUB_CACHE_URL` is not actually ignored when `driver != redis` and can wrongly fail settings load.</comment>

<file context>
@@ -424,6 +441,25 @@ def service_port(self) -> int:
 
+    @model_validator(mode="after")
+    def validate_url_exclusivity(self) -> Self:
+        if self.url is None:
+            return self
+        explicit = self.model_fields_set & CACHE_URL_EXCLUSIVE_FIELDS
</file context>
Suggested change
if self.url is None:
return self
if self.url is None or self.driver != CacheDriver.Redis:
return self

if not host:
raise RedisUrlError(f"Missing host in cache URL: {redact_redis_url(url)}")
try:
port = int(port_str)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Redis URL port range is not validated, allowing invalid ports (e.g. 70000) past startup validation.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/infrahub/services/adapters/cache/connection.py, line 116:

<comment>Redis URL port range is not validated, allowing invalid ports (e.g. 70000) past startup validation.</comment>

<file context>
@@ -0,0 +1,251 @@
+        if not host:
+            raise RedisUrlError(f"Missing host in cache URL: {redact_redis_url(url)}")
+        try:
+            port = int(port_str)
+        except ValueError as exc:
+            raise RedisUrlError(f"Invalid port {port_str!r} in cache URL: {redact_redis_url(url)}") from exc
</file context>

query["ssl_ca_certs"] = str(conn["ssl_ca_certs"])

qs = f"?{urlencode(query)}" if query else ""
return f"{scheme}://{userinfo}{host}:{port}/{db}{qs}"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Rebuilt Redis URLs do not bracket IPv6 hosts, producing invalid connection strings.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/infrahub/workflows/initialization.py, line 46:

<comment>Rebuilt Redis URLs do not bracket IPv6 hosts, producing invalid connection strings.</comment>

<file context>
@@ -20,41 +20,74 @@
+            query["ssl_ca_certs"] = str(conn["ssl_ca_certs"])
+
+    qs = f"?{urlencode(query)}" if query else ""
+    return f"{scheme}://{userinfo}{host}:{port}/{db}{qs}"
+
 
</file context>
Suggested change
return f"{scheme}://{userinfo}{host}:{port}/{db}{qs}"
host_for_url = f"[{host}]" if ":" in host and not host.startswith("[") else host
return f"{scheme}://{userinfo}{host_for_url}:{port}/{db}{qs}"

Comment on lines +31 to +34
if username and password:
userinfo = f"{quote(str(username), safe='')}:{quote(str(password), safe='')}@"
elif password:
userinfo = f":{quote(str(password), safe='')}@"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Credential serialization uses truthiness, which drops valid username-only/empty-password URL credentials.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/infrahub/workflows/initialization.py, line 31:

<comment>Credential serialization uses truthiness, which drops valid username-only/empty-password URL credentials.</comment>

<file context>
@@ -20,41 +20,74 @@
+    userinfo = ""
+    username = conn.get("username")
+    password = conn.get("password")
+    if username and password:
+        userinfo = f"{quote(str(username), safe='')}:{quote(str(password), safe='')}@"
+    elif password:
</file context>
Suggested change
if username and password:
userinfo = f"{quote(str(username), safe='')}:{quote(str(password), safe='')}@"
elif password:
userinfo = f":{quote(str(password), safe='')}@"
if username is not None and password is not None:
userinfo = f"{quote(str(username), safe='')}:{quote(str(password), safe='')}@"
elif username is not None:
userinfo = f"{quote(str(username), safe='')}@"
elif password is not None:
userinfo = f":{quote(str(password), safe='')}@"

Add a single optional INFRAHUB_CACHE_URL setting accepting redis://, rediss://,
redis+sentinel:// and rediss+sentinel:// schemes. When a Sentinel URL is configured,
the cache adapter and lock registry discover the current master through the Sentinel
nodes via redis.asyncio.Sentinel(...).master_for() and follow failover automatically,
removing the need for a redis-sentinel-proxy in front of a Sentinel-managed Redis.

- New connection.py with a hand-rolled URL parser (urllib.parse, no new dependency),
  a structured RedisConnectionConfig, a shared build_redis_connection() used by both
  the cache adapter and the lock registry, and secret redaction for logs/errors.
- CacheSettings.url (SecretStr) with a fail-fast validator: URL is mutually exclusive
  with the scalar connection fields, and is parsed at startup with a typed RedisUrlError.
- Existing scalar single-node configuration is preserved unchanged (zero-config upgrade).
- Prefect's result-storage Redis URL honors the cache URL (single-node passthrough,
  Sentinel best-effort to the first member's data port); prefect_redis has no Sentinel
  support, so Prefect Redis HA is documented as a separate follow-up.
- Rewrite the HA guide to connect directly via redis+sentinel://, remove the
  redis-sentinel-proxy manifests, and add a manual failover-validation runbook.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@fatih-acar fatih-acar force-pushed the fac-redis-sentinel-support branch from 85c8312 to 52ec561 Compare June 3, 2026 21:57
@codspeed-hq

codspeed-hq Bot commented Jun 3, 2026

Copy link
Copy Markdown

Merging this PR will not alter performance

✅ 12 untouched benchmarks


Comparing fac-redis-sentinel-support (52ec561) with stable (9b08603)1

Open in CodSpeed

Footnotes

  1. No successful run was found on stable (1bcda1e) during the generation of this report, so 9b08603 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

group/backend Issue related to the backend (API Server, Git Agent) type/documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feature: support Redis Sentinel for HA cache and lock connections

1 participant