Leader election fails on managed Kubernetes clusters - need configurable timeouts

Hello :) 

### Description
I'm experiencing frequent crashes of the Doppler operator across 5 different OVH managed Kubernetes clusters. The operator loses leader election and restarts every few hours due to API server timeout issues.

### Error Message
```
E1122 12:01:41.275002       1 leaderelection.go:361] Failed to update lock: Put "https://10.3.0.1:443/api/v1/namespaces/doppler-operator-system/configmaps/f39fa519.doppler.com": context deadline exceeded
I1122 12:01:41.275125       1 leaderelection.go:278] failed to renew lease doppler-operator-system/xxx.doppler.com: timed out waiting for the condition
2025-11-22T12:01:41.275Z    ERROR   setup   problem running manager {"error": "leader election lost"}
github.com/go-logr/zapr.(*zapLogger).Error
    /go/pkg/mod/github.com/go-logr/zapr@v0.2.0/zapr.go:132
sigs.k8s.io/controller-runtime/pkg/log.(*DelegatingLogger).Error
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.3/pkg/log/deleg.go:144
main.main
    /workspace/main.go:103
runtime.main
    /usr/local/go/src/runtime/proc.go:250
```

### Environment
- Doppler Operator version: 1.5.1
- Kubernetes version: 1.30.14
- Cluster type: OVH Managed Kubernetes (affecting 5 different production clusters)
- Pod Resources: 100m CPU / 256Mi RAM

### What I've Tried
- ✅ Reduced API server load by optimizing other operators (Velero sync periods from 1m → 10m)
- ✅ Verified the operator has sufficient CPU/memory resources
- ❌ Tried to increase leader election timeouts but the flags aren't exposed

### Impact
The operator restarts every few hours, causing:
- Brief interruptions in secret synchronization
- Alert noise and operational overhead
- Concerns about reliability in production

### Suggested Fix
Could you expose the standard controller-runtime leader election flags? This would let me test if increasing the timeouts resolves the issue on managed Kubernetes platforms:
```go
--leader-elect-lease-duration (default: 15s)
--leader-elect-renew-deadline (default: 10s)
--leader-elect-retry-period (default: 2s)
```

The current 10s deadline seems too aggressive for managed clusters where API server latency can occasionally spike. Being able to configure these values (e.g., 30s/20s/5s) would help determine if this is just a timing issue or a deeper problem.

### Additional Context
This issue appears specific to managed Kubernetes environments where we don't control the API server performance.

Happy to provide more logs or help test a fix if needed!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Leader election fails on managed Kubernetes clusters - need configurable timeouts #95

Description

Error Message

Environment

What I've Tried

Impact

Suggested Fix

Additional Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Leader election fails on managed Kubernetes clusters - need configurable timeouts #95

Description

Description

Error Message

Environment

What I've Tried

Impact

Suggested Fix

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions