feat: support Redis Sentinel for HA cache and lock connections [#9449]#9451
feat: support Redis Sentinel for HA cache and lock connections [#9449]#9451fatih-acar wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
6 issues found across 21 files
Confidence score: 2/5
- There is clear regression risk in
backend/infrahub/services/adapters/cache/connection.py: password-only Redis auth is dropped, which can lead to unauthenticated cache connections when onlycache.passwordis set. development/k8s/infrahub-values.yamlpoints Prefect messaging at the Sentinel service host on port 6379 in this mode, which can be read-only and may break write operations at runtime.- Several config/URL handling issues add startup and connectivity risk (
backend/infrahub/config.pyvalidator behavior, credential truthiness and IPv6 URL rebuilding inbackend/infrahub/workflows/initialization.py, and missing port-range validation inbackend/infrahub/services/adapters/cache/connection.py), so this is not a low-risk merge yet. - Pay close attention to
backend/infrahub/services/adapters/cache/connection.py,development/k8s/infrahub-values.yaml,backend/infrahub/config.py,backend/infrahub/workflows/initialization.py- authentication, Redis endpoint selection, and URL validation/serialization can cause runtime failures.
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="backend/infrahub/config.py">
<violation number="1" location="backend/infrahub/config.py:446">
P2: Cache URL validator runs even for non-Redis cache drivers, so `INFRAHUB_CACHE_URL` is not actually ignored when `driver != redis` and can wrongly fail settings load.</violation>
</file>
<file name="backend/infrahub/services/adapters/cache/connection.py">
<violation number="1" location="backend/infrahub/services/adapters/cache/connection.py:116">
P2: Redis URL port range is not validated, allowing invalid ports (e.g. 70000) past startup validation.</violation>
<violation number="2" location="backend/infrahub/services/adapters/cache/connection.py:226">
P1: Password-only scalar Redis auth is dropped, causing unauthenticated connections when only `cache.password` is configured.</violation>
</file>
<file name="backend/infrahub/workflows/initialization.py">
<violation number="1" location="backend/infrahub/workflows/initialization.py:31">
P2: Credential serialization uses truthiness, which drops valid username-only/empty-password URL credentials.</violation>
<violation number="2" location="backend/infrahub/workflows/initialization.py:46">
P2: Rebuilt Redis URLs do not bracket IPv6 hosts, producing invalid connection strings.</violation>
</file>
<file name="development/k8s/infrahub-values.yaml">
<violation number="1" location="development/k8s/infrahub-values.yaml:97">
P1: Prefect messaging is configured to use the Sentinel service host directly, which is read-only on Redis port 6379 in this chart mode and can break writes.</violation>
</file>
Shadow auto-approve: would not auto-approve because issues were found.
Re-trigger cubic
| if settings.username and settings.password: | ||
| credential_provider = UsernamePasswordCredentialProvider( | ||
| username=settings.username, password=settings.password | ||
| ) |
There was a problem hiding this comment.
P1: Password-only scalar Redis auth is dropped, causing unauthenticated connections when only cache.password is configured.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/infrahub/services/adapters/cache/connection.py, line 226:
<comment>Password-only scalar Redis auth is dropped, causing unauthenticated connections when only `cache.password` is configured.</comment>
<file context>
@@ -0,0 +1,251 @@
+
+ if settings.url is None:
+ credential_provider: UsernamePasswordCredentialProvider | None = None
+ if settings.username and settings.password:
+ credential_provider = UsernamePasswordCredentialProvider(
+ username=settings.username, password=settings.password
</file context>
| if settings.username and settings.password: | |
| credential_provider = UsernamePasswordCredentialProvider( | |
| username=settings.username, password=settings.password | |
| ) | |
| if settings.password: | |
| credential_provider = UsernamePasswordCredentialProvider( | |
| username=settings.username or None, password=settings.password | |
| ) |
| # NOTE: prefect_redis has no Sentinel support, so it cannot follow master failover on its | ||
| # own. It points at the Sentinel-managed Redis service directly; making Prefect's messaging | ||
| # Redis highly available is tracked as a separate follow-up. | ||
| host: "cache-redis" |
There was a problem hiding this comment.
P1: Prefect messaging is configured to use the Sentinel service host directly, which is read-only on Redis port 6379 in this chart mode and can break writes.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At development/k8s/infrahub-values.yaml, line 97:
<comment>Prefect messaging is configured to use the Sentinel service host directly, which is read-only on Redis port 6379 in this chart mode and can break writes.</comment>
<file context>
@@ -93,7 +91,10 @@ infrahub:
+ # NOTE: prefect_redis has no Sentinel support, so it cannot follow master failover on its
+ # own. It points at the Sentinel-managed Redis service directly; making Prefect's messaging
+ # Redis highly available is tracked as a separate follow-up.
+ host: "cache-redis"
affinity:
podAntiAffinity:
</file context>
| if self.url is None: | ||
| return self |
There was a problem hiding this comment.
P2: Cache URL validator runs even for non-Redis cache drivers, so INFRAHUB_CACHE_URL is not actually ignored when driver != redis and can wrongly fail settings load.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/infrahub/config.py, line 446:
<comment>Cache URL validator runs even for non-Redis cache drivers, so `INFRAHUB_CACHE_URL` is not actually ignored when `driver != redis` and can wrongly fail settings load.</comment>
<file context>
@@ -424,6 +441,25 @@ def service_port(self) -> int:
+ @model_validator(mode="after")
+ def validate_url_exclusivity(self) -> Self:
+ if self.url is None:
+ return self
+ explicit = self.model_fields_set & CACHE_URL_EXCLUSIVE_FIELDS
</file context>
| if self.url is None: | |
| return self | |
| if self.url is None or self.driver != CacheDriver.Redis: | |
| return self |
| if not host: | ||
| raise RedisUrlError(f"Missing host in cache URL: {redact_redis_url(url)}") | ||
| try: | ||
| port = int(port_str) |
There was a problem hiding this comment.
P2: Redis URL port range is not validated, allowing invalid ports (e.g. 70000) past startup validation.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/infrahub/services/adapters/cache/connection.py, line 116:
<comment>Redis URL port range is not validated, allowing invalid ports (e.g. 70000) past startup validation.</comment>
<file context>
@@ -0,0 +1,251 @@
+ if not host:
+ raise RedisUrlError(f"Missing host in cache URL: {redact_redis_url(url)}")
+ try:
+ port = int(port_str)
+ except ValueError as exc:
+ raise RedisUrlError(f"Invalid port {port_str!r} in cache URL: {redact_redis_url(url)}") from exc
</file context>
| query["ssl_ca_certs"] = str(conn["ssl_ca_certs"]) | ||
|
|
||
| qs = f"?{urlencode(query)}" if query else "" | ||
| return f"{scheme}://{userinfo}{host}:{port}/{db}{qs}" |
There was a problem hiding this comment.
P2: Rebuilt Redis URLs do not bracket IPv6 hosts, producing invalid connection strings.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/infrahub/workflows/initialization.py, line 46:
<comment>Rebuilt Redis URLs do not bracket IPv6 hosts, producing invalid connection strings.</comment>
<file context>
@@ -20,41 +20,74 @@
+ query["ssl_ca_certs"] = str(conn["ssl_ca_certs"])
+
+ qs = f"?{urlencode(query)}" if query else ""
+ return f"{scheme}://{userinfo}{host}:{port}/{db}{qs}"
+
</file context>
| return f"{scheme}://{userinfo}{host}:{port}/{db}{qs}" | |
| host_for_url = f"[{host}]" if ":" in host and not host.startswith("[") else host | |
| return f"{scheme}://{userinfo}{host_for_url}:{port}/{db}{qs}" |
| if username and password: | ||
| userinfo = f"{quote(str(username), safe='')}:{quote(str(password), safe='')}@" | ||
| elif password: | ||
| userinfo = f":{quote(str(password), safe='')}@" |
There was a problem hiding this comment.
P2: Credential serialization uses truthiness, which drops valid username-only/empty-password URL credentials.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/infrahub/workflows/initialization.py, line 31:
<comment>Credential serialization uses truthiness, which drops valid username-only/empty-password URL credentials.</comment>
<file context>
@@ -20,41 +20,74 @@
+ userinfo = ""
+ username = conn.get("username")
+ password = conn.get("password")
+ if username and password:
+ userinfo = f"{quote(str(username), safe='')}:{quote(str(password), safe='')}@"
+ elif password:
</file context>
| if username and password: | |
| userinfo = f"{quote(str(username), safe='')}:{quote(str(password), safe='')}@" | |
| elif password: | |
| userinfo = f":{quote(str(password), safe='')}@" | |
| if username is not None and password is not None: | |
| userinfo = f"{quote(str(username), safe='')}:{quote(str(password), safe='')}@" | |
| elif username is not None: | |
| userinfo = f"{quote(str(username), safe='')}@" | |
| elif password is not None: | |
| userinfo = f":{quote(str(password), safe='')}@" |
Add a single optional INFRAHUB_CACHE_URL setting accepting redis://, rediss://, redis+sentinel:// and rediss+sentinel:// schemes. When a Sentinel URL is configured, the cache adapter and lock registry discover the current master through the Sentinel nodes via redis.asyncio.Sentinel(...).master_for() and follow failover automatically, removing the need for a redis-sentinel-proxy in front of a Sentinel-managed Redis. - New connection.py with a hand-rolled URL parser (urllib.parse, no new dependency), a structured RedisConnectionConfig, a shared build_redis_connection() used by both the cache adapter and the lock registry, and secret redaction for logs/errors. - CacheSettings.url (SecretStr) with a fail-fast validator: URL is mutually exclusive with the scalar connection fields, and is parsed at startup with a typed RedisUrlError. - Existing scalar single-node configuration is preserved unchanged (zero-config upgrade). - Prefect's result-storage Redis URL honors the cache URL (single-node passthrough, Sentinel best-effort to the first member's data port); prefect_redis has no Sentinel support, so Prefect Redis HA is documented as a separate follow-up. - Rewrite the HA guide to connect directly via redis+sentinel://, remove the redis-sentinel-proxy manifests, and add a manual failover-validation runbook. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
85c8312 to
52ec561
Compare
What & why
Closes #9449. The Redis cache adapter and lock registry each constructed
redis.asyncio.Redis(host=..., port=..., db=...)against a single fixed endpoint, andCacheSettingsexposed only scalaraddress/port. Production HA runs Redis behind Sentinel, so the documented HA deployment had to add aredis-sentinel-proxysidecar purely because the client could not talk to Sentinel.This adds a single optional
INFRAHUB_CACHE_URLacceptingredis://,rediss://,redis+sentinel://, andrediss+sentinel://. When a Sentinel URL is set, Infrahub discovers the current master through the Sentinel nodes viaredis.asyncio.Sentinel(...).master_for()and follows failover automatically — letting the recommended HA architecture drop the proxy. Single-node deployments upgrade with zero config change. No new dependency (uses redis-py 6.0.0's bundledSentinel).Implementation
services/adapters/cache/connection.py(new): hand-rolled URL parser (urllib.parse), structuredRedisConnectionConfig, a single sharedbuild_redis_connection()used by both the cache adapter and the lock registry (no divergent second parser), andredact_redis_url()for secret redaction. Grammar:redis+sentinel://[user:pass@]host:port[,host2:port2,...]/service_name[/db][?params].config.py:CacheSettings.url(SecretStr) with a fail-fastmodel_validator— the URL is mutually exclusive with the scalar connection fields (viamodel_fields_set) and is parsed at startup into a typedRedisUrlError.hide_input_in_errorskeeps the URL out ofValidationErrorreprs; thecachefield uses adefault_factoryto avoid a config↔services import cycle the validator would otherwise trigger at module load.redis.py+lock.py: both collapse tobuild_redis_connection(config.SETTINGS.cache).exceptions.py: typedRedisUrlError.workflows/initialization.py: Prefect's result-storage Redis URL now honorscache.url— single-node passes through (rebuilt with redis-py-nativessl_*params), Sentinel degrades best-effort to the first member's data port (see note below).redis+sentinel://with a manual failover-validation runbook;redis-sentinel-proxy-*.yamlmanifests removed;infrahub-values.yamland the generatedconfiguration.mdxupdated.Known limitation (follow-up)
prefect_redishas no Sentinel support (it builds a plainredis.asyncio.Redisfrom host/port). Both Prefect Redis consumers (helm messaging host + the result-storage builder) now point best-effort at the Sentinel-managed Redis service directly and do not follow master failover. Making Prefect's Redis HA is tracked as a separate follow-up; documented in the HA guide and code comments. (Distributed-lock loss during failover remains pre-existing and tracked in #9450.)Testing
parse_redis_url/redaction suite, config-validator cases, and Prefect-URL cases.master_for(). The original single-nodetest_redis.pyvalidates the scalar path through the new builder.ruff format/check+mypyclean; 67 unit + 6 redis component tests pass.🤖 Generated with Claude Code
Summary by cubic
Adds native Redis Sentinel support for cache and distributed locks via a new
INFRAHUB_CACHE_URL, enabling automatic master discovery and failover without aredis-sentinel-proxy. Single-node setups continue to work unchanged. Addresses #9449.New Features
INFRAHUB_CACHE_URLacceptsredis://,rediss://,redis+sentinel://,rediss+sentinel://(e.g.redis+sentinel://s1:26379,s2:26379/mymaster).build_redis_connection) used by cache and lock; follows failover viaredis-pySentinel.master_for().cache.url; Sentinel degrades to first member’s data port (no Sentinel support inprefect_redis).infrahub-values.yamlto connect directly to Sentinel; removedredis-sentinel-proxymanifests. Added unit and component tests (including Sentinel topology).Migration
INFRAHUB_CACHE_URL=redis+sentinel://<sentinel-hosts>/mymasterand remove anyredis-sentinel-proxy.INFRAHUB_CACHE_ADDRESS/PORTstill work).Written for commit 52ec561. Summary will update on new commits.