Skip to content

feat(docker): respect restart policy and resolve depends_on on start#956

Open
acouvreur wants to merge 18 commits into
mainfrom
add-depends-on-restart-policy
Open

feat(docker): respect restart policy and resolve depends_on on start#956
acouvreur wants to merge 18 commits into
mainfrom
add-depends-on-restart-policy

Conversation

@acouvreur

@acouvreur acouvreur commented May 31, 2026

Copy link
Copy Markdown
Member

This PR addresses two closely related issues that prevented Sablier from working correctly with real-world multi-service Compose stacks: init/migration containers and ordered startup with depends_on.

Closes #792
Closes #952


What changed

1. Respect the Docker restart policy for exited containers (#952)

The problem. When a container finishes its job and exits with code 0 (a common pattern for one-shot init or schema-migration containers), Sablier was reporting it as Stopped. This caused Sablier to restart the container on every readiness poll, which in turn prevented the main application from ever becoming ready — because it was always waiting for its service_completed_successfully dependency to finish.

The fix. InstanceInspect now consults the container's restart policy before mapping an exit-0 state:

Restart policy Exit 0 → Sablier status Why
always / unless-stopped Starting Docker will restart the container; the exit is transient
no / on-failure Ready (completed) The container completed its work and won't be restarted

Non-zero exits are still reported as Error regardless of policy, so real failures are not masked.

// pkg/provider/docker/container_inspect.go
case container.StateExited:
    if spec.Container.State.ExitCode != 0 {
        return errorStatus(...)
    }
    if restartsOnSuccess(restartPolicyMode(spec.Container.HostConfig)) {
        return startingStatus(...) // transient — Docker will bring it back
    }
    return readyStatus(1, ...) // completed one-shot / init container ✅

2. Resolve depends_on before starting a container (#792)

The problem. Sablier starts every container in a group concurrently. When a group contains both an application and its database (or a migration), the app often starts before the database is ready, leading to connection errors.

The fix. InstanceStart now reads the com.docker.compose.depends_on label that Docker Compose writes automatically and resolves the full dependency graph before starting the target container:

Request to start "app"
  └─ reads depends_on label: db:service_healthy, migration:service_completed_successfully
       ├─ start "db" → wait until healthy ✅
       └─ start "migration" → wait until exited 0 ✅
  └─ start "app" ✅

All four Docker Compose conditions are supported:

Condition Behaviour
service_started Waits until State.Running == true
service_healthy Waits until the health check passes
service_completed_successfully Waits until StateExited + exit code 0
service_running_or_healthy Running, or healthy if a health check exists

Dependencies that are not in the same Compose project (or have no matching container) are skipped with a warning — Sablier never fails a start because of an optional or external dependency.

Cycle protection is built in: a started set is threaded through every recursive instanceStart call; any container that is already being started is skipped.

No label changes needed on your existing stack. The depends_on label is written by docker compose up automatically. You only need sablier.enable / sablier.group on the container you want Sablier to manage.


How to migrate

No configuration changes are required. The behaviour changes automatically:

Before this PR — Affine / Paperless / any stack with a migration container:

Sablier starts affine-migration ──► exits 0 ──► Sablier reports Stopped
                                                 ──► Sablier restarts it
                                                      ──► affine never starts 🔴

After this PR:

Sablier starts affine-migration ──► exits 0, restart:no ──► Ready (completed)
Sablier starts affine ──────────────────────────────────────────────────────► 🟢

Runnable example

A complete docker compose stack is provided under examples/depends-on/:

services:
  sablier   — manages the "app" group
  db        — mimic with health check (service_healthy dependency)
  migration — one-shot mimic (exit 0, restart:no → service_completed_successfully)
  app       — the managed service; depends_on db (healthy) + migration (completed)
cd examples/depends-on
make up     # start sablier, stop the dependency chain
make start  # blocking request — watch Sablier resolve the graph
make down

Tests

  • Unit: TestParseComposeDependsOn — table-driven tests for the label parser (no daemon required, runs in -short mode).
  • Integration: TestDockerClassicProvider_StartWithDependsOn and TestDockerClassicProvider_StartWithDependsOnHealthy — dind-based tests that create real containers, set com.docker.compose.* labels, call InstanceStart, and assert the full dependency chain was started in the correct order.
  • Inspect regression: The existing exited container with status code 0 test has been updated to assert Ready (was Stopped); a new on-failure restart-policy variant is added.

EDIT: Not satisfied with current implementation

Will have to properly implement in two phases:

  • Startup dependency at the sablier level, not inside the provider. But each provider can give sablier hints about the dependencies
  • What is considered "ready" -> respecting restart policies

Fixes two related issues in the Docker provider:

1. Respect container restart policy for exited containers (#952)
   A container that exits with code 0 under a non-restarting policy
   (no, on-failure) is now reported as Ready/completed rather than
   Stopped. This prevents Sablier from endlessly restarting init/
   migration containers that have already done their work.

2. Resolve Docker Compose depends_on before starting a container (#792)
   InstanceStart now reads the com.docker.compose.depends_on label and
   starts each declared dependency — recursively — before starting the
   target container. The provider waits for each dependency to satisfy
   its declared condition (service_started, service_healthy,
   service_completed_successfully, service_running_or_healthy) before
   proceeding to the next one.

Also adds:
- Unit tests for parseComposeDependsOn
- Integration tests for service_completed_successfully and service_healthy
- examples/depends-on: runnable Compose stack with walkthrough README
@github-actions github-actions Bot added the provider Issue related to a provider label May 31, 2026
@acouvreur acouvreur requested a review from Copilot May 31, 2026 20:26
@github-actions

github-actions Bot commented May 31, 2026

Copy link
Copy Markdown
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ Diff between sablier and sablier                                                                                       │
├─────────┬──────────────────────────────────────────────────────────────────────────────┬──────────┬──────────┬─────────┤
│ PERCENT │ NAME                                                                         │ OLD SIZE │ NEW SIZE │ DIFF    │
├─────────┼──────────────────────────────────────────────────────────────────────────────┼──────────┼──────────┼─────────┤
│ +9.77%  │ go.opentelemetry.io/otel                                                     │ 969 kB   │ 1.1 MB   │ +95 kB  │
│ +4.20%  │ github.com/sablierapp/sablier                                                │ 557 kB   │ 580 kB   │ +23 kB  │
│ +0.46%  │ golang.org/x/net                                                             │ 898 kB   │ 902 kB   │ +4.2 kB │
│ +0.34%  │ <unnamed:generated>                                                          │ 979 kB   │ 982 kB   │ +3.3 kB │
│ +5.28%  │ go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp                │ 54 kB    │ 56 kB    │ +2.8 kB │
│ +0.07%  │ runtime                                                                      │ 3.3 MB   │ 3.3 MB   │ +2.4 kB │
│ +0.82%  │ go.yaml.in/yaml/v2                                                           │ 275 kB   │ 277 kB   │ +2.3 kB │
│ +0.14%  │ github.com/quic-go/quic-go                                                   │ 1.3 MB   │ 1.4 MB   │ +1.9 kB │
│ +0.21%  │ go.mongodb.org/mongo-driver/v2                                               │ 672 kB   │ 674 kB   │ +1.4 kB │
│ +0.73%  │ golang.org/x/sys                                                             │ 46 kB    │ 46 kB    │ +337 B  │
│ +0.19%  │ go.opentelemetry.io/contrib/instrumentation/github.com/gin-gonic/gin/otelgin │ 36 kB    │ 36 kB    │ +68 B   │
│ +0.02%  │ gopkg.in/yaml.v3                                                             │ 304 kB   │ 304 kB   │ +49 B   │
│ +0.04%  │ k8s.io/klog/v2                                                               │ 124 kB   │ 124 kB   │ +47 B   │
│ +0.02%  │ sigs.k8s.io/json                                                             │ 173 kB   │ 173 kB   │ +38 B   │
│ +0.11%  │ vendor/golang.org/x/net/idna                                                 │ 22 kB    │ 22 kB    │ +25 B   │
│ +0.00%  │ google.golang.org/grpc                                                       │ 1.2 MB   │ 1.2 MB   │ +18 B   │
│ +0.00%  │ github.com/json-iterator/go                                                  │ 464 kB   │ 464 kB   │ +17 B   │
│ +0.00%  │ k8s.io/kube-openapi                                                          │ 467 kB   │ 467 kB   │ +16 B   │
│ +0.00%  │ github.com/gin-gonic/gin                                                     │ 337 kB   │ 337 kB   │ +12 B   │
│ +0.00%  │ github.com/google/go-cmp                                                     │ 298 kB   │ 298 kB   │ +5 B    │
│ +0.00%  │ github.com/go-playground/validator/v10                                       │ 328 kB   │ 328 kB   │ +5 B    │
│ +0.00%  │ github.com/prometheus/client_golang                                          │ 310 kB   │ 310 kB   │ +4 B    │
│ +0.01%  │ github.com/prometheus/common                                                 │ 68 kB    │ 68 kB    │ +4 B    │
│ +0.00%  │ github.com/moby/moby/client                                                  │ 434 kB   │ 434 kB   │ +2 B    │
│ +0.00%  │ crypto                                                                       │ 1.9 MB   │ 1.9 MB   │ +2 B    │
│ +0.00%  │ github.com/leodido/go-urn                                                    │ 55 kB    │ 55 kB    │ +2 B    │
│ +0.00%  │ github.com/spf13/viper                                                       │ 73 kB    │ 73 kB    │ +2 B    │
│ +0.01%  │ github.com/go-openapi/swag                                                   │ 10 kB    │ 10 kB    │ +1 B    │
│ +0.00%  │ k8s.io/apimachinery                                                          │ 1.8 MB   │ 1.8 MB   │ +1 B    │
│ +0.00%  │ golang.org/x/text                                                            │ 162 kB   │ 162 kB   │ +1 B    │
│ +0.01%  │ github.com/pmezard/go-difflib                                                │ 17 kB    │ 17 kB    │ +1 B    │
│ -0.00%  │ github.com/emicklei/go-restful/v3                                            │ 134 kB   │ 134 kB   │ -1 B    │
│ -0.00%  │ github.com/sourcegraph/conc                                                  │ 41 kB    │ 41 kB    │ -1 B    │
│ -0.01%  │ vendor/golang.org/x/net/http2/hpack                                          │ 35 kB    │ 35 kB    │ -4 B    │
│ -0.00%  │ github.com/pelletier/go-toml/v2                                              │ 228 kB   │ 228 kB   │ -5 B    │
│ -0.00%  │ google.golang.org/protobuf                                                   │ 2.0 MB   │ 2.0 MB   │ -7 B    │
│ -0.00%  │ sigs.k8s.io/structured-merge-diff/v6                                         │ 276 kB   │ 276 kB   │ -8 B    │
│ -0.01%  │ github.com/moby/moby/api                                                     │ 149 kB   │ 149 kB   │ -8 B    │
│ -0.00%  │ net                                                                          │ 1.7 MB   │ 1.7 MB   │ -10 B   │
│ -0.16%  │ vendor/golang.org/x/sys/cpu                                                  │ 6.4 kB   │ 6.4 kB   │ -10 B   │
│ -0.31%  │ internal/cpu                                                                 │ 6.1 kB   │ 6.1 kB   │ -19 B   │
│ -0.07%  │ k8s.io/utils                                                                 │ 32 kB    │ 32 kB    │ -24 B   │
│ -0.00%  │ k8s.io/client-go                                                             │ 14 MB    │ 14 MB    │ -47 B   │
│ -0.01%  │ encoding                                                                     │ 418 kB   │ 418 kB   │ -55 B   │
│ -0.01%  │ github.com/goccy/go-yaml                                                     │ 703 kB   │ 703 kB   │ -62 B   │
│ -0.00%  │ k8s.io/api                                                                   │ 17 MB    │ 17 MB    │ -75 B   │
│ -0.53%  │ go.opentelemetry.io/auto/sdk                                                 │ 89 kB    │ 89 kB    │ -470 B  │
│ -0.74%  │ go.yaml.in/yaml/v3                                                           │ 314 kB   │ 312 kB   │ -2.3 kB │
│ -29.52% │ golang.org/x/crypto                                                          │ 91 kB    │ 64 kB    │ -27 kB  │
├─────────┼──────────────────────────────────────────────────────────────────────────────┼──────────┼──────────┼─────────┤
│ +0.07%  │ .rodata                                                                      │ 2.6 MB   │ 2.6 MB   │ +1.8 kB │
│ +0.82%  │ .data                                                                        │ 208 kB   │ 210 kB   │ +1.7 kB │
│ +0.01%  │ .noptrdata                                                                   │ 459 kB   │ 459 kB   │ +32 B   │
├─────────┼──────────────────────────────────────────────────────────────────────────────┼──────────┼──────────┼─────────┤
│ +0.24%  │ sablier                                                                      │ 62 MB    │ 62 MB    │ +148 kB │
│         │ sablier                                                                      │          │          │         │
└─────────┴──────────────────────────────────────────────────────────────────────────────┴──────────┴──────────┴─────────┘

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves the Docker provider’s behavior in real-world Docker Compose stacks by (1) mapping exited containers to Sablier statuses in a way that respects Docker restart policies (particularly for one-shot init/migration containers) and (2) starting containers in Compose dependency order by resolving com.docker.compose.depends_on prior to starting the target container.

Changes:

  • Update Docker inspect status mapping so exit-0 containers with non-restarting policies are treated as “Ready (completed)”, while exit-0 containers that will be restarted by Docker are treated as “Starting”.
  • Add Compose depends_on label parsing + dependency graph resolution so InstanceStart starts and waits for dependencies according to the declared condition.
  • Add unit + integration tests and a runnable examples/depends-on Compose stack to demonstrate/verify behavior.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
pkg/provider/docker/container_start.go Routes InstanceStart through a recursive start path that resolves Compose depends_on before starting the target container.
pkg/provider/docker/container_inspect.go Adjusts exited-container status mapping based on restart policy; adds helpers for restart policy evaluation.
pkg/provider/docker/container_inspect_test.go Updates/extends regression coverage for exited containers + restart policy behavior.
pkg/provider/docker/container_depends_on.go New dependency-resolution implementation: parse label, locate dependency containers, start recursively, and wait on Compose conditions.
pkg/provider/docker/container_depends_on_unit_test.go New table-driven unit tests for parsing the depends_on label format.
pkg/provider/docker/container_depends_on_test.go New DinD integration tests verifying dependency ordering and condition waiting.
examples/depends-on/README.md Documentation for the runnable example and expected behavior/logs.
examples/depends-on/Makefile Helper targets to run the example stack and trigger a blocking start.
examples/depends-on/compose.yml Example Compose stack demonstrating service_healthy and service_completed_successfully chains.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/provider/docker/container_depends_on.go Outdated
Comment thread pkg/provider/docker/container_depends_on.go Outdated
Comment thread pkg/provider/docker/container_depends_on.go Outdated
@github-actions

github-actions Bot commented May 31, 2026

Copy link
Copy Markdown

Test Results

✅ All tests passed! | 557 tests in 183.312s

⚠️ 1 test(s) were flaky (failed then passed on rerun)

  • github.com/sablierapp/sablier/pkg/provider/dockerswarm/TestDockerSwarmProvider_InstanceEvents_Created
    View HTML Test Report

Label the long-running db service so it is scaled to zero with the app group, and keep the one-shot migration container unlabeled (a one-shot job must never join a blocking group or it would never become ready). Document restart-policy handling and depends_on ordering, including when to set sablier.enable on a dependency.
@github-actions github-actions Bot added the documentation Improvements or additions to documentation label May 31, 2026
@acouvreur acouvreur requested a review from Copilot May 31, 2026 21:04

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

Comment thread pkg/provider/docker/container_start.go Outdated
Comment thread pkg/provider/docker/container_depends_on.go Outdated
Comment thread pkg/provider/docker/container_depends_on.go Outdated
Comment thread pkg/provider/docker/container_inspect.go Outdated
acouvreur added 2 commits May 31, 2026 17:12
Address PR review feedback on depends_on resolution:
- Break dependency cycles using a separate in-progress recursion stack so a
  cyclic dependency is skipped instead of blocking until context timeout.
- Fail fast with a clear error when a service_healthy dependency has no
  healthcheck configured, instead of looping until the deadline.
- Make findComposeContainer deterministic: prefer a running container,
  otherwise pick the lexicographically smallest name.
Replace the per-start skip-based cycle breaker with a global, concurrency-
safe dependency graph. Each instance's depends_on tree is built and committed
to the graph, which must remain a Directed Acyclic Graph. If committing the
tree would introduce a self-loop or cycle, the whole tree is considered invalid
and ignored: the instance is still started on its own and a warning explaining
the reason is logged.
@github-actions github-actions Bot added the ci label May 31, 2026
Comment thread .github/workflows/examples.yml Fixed
Comment thread .github/workflows/examples.yml Fixed
Comment thread .github/workflows/examples.yml Fixed
Comment thread .github/workflows/examples.yml Fixed
Comment thread .github/workflows/examples.yml Fixed
acouvreur added 2 commits May 31, 2026 17:31
Drop the persistent global dependency graph (mutex + edge refcounting +
per-root commit/rollback) in favour of a per-start cycle check that mirrors
Docker Compose's own Graph.HasCycles: build the resolved depends_on tree and
run a three-color DFS over it.

A cycle reachable from a root is always fully contained in that root's tree,
so a local check is equivalent to the global one but far simpler and stateless.
As before, an invalid (cyclic) tree is ignored with a warning naming the
offending path, and the instance is started on its own.
Break the monolithic container_depends_on.go into focused files:

- compose.go: Compose label/condition constants, label parsing, container
  discovery.
- dependency_graph.go: the resolved dependency tree and cycle detection,
  with no Docker dependencies.
- dependency_condition.go: waiting for and evaluating depends_on conditions.
- container_depends_on.go: orchestration that builds the tree and starts it
  in dependency order.

Rename treeDep to dependency, trim comments to explain intent only, and
rename the test files to match their source files. No behaviour change.
@acouvreur acouvreur force-pushed the add-depends-on-restart-policy branch from 4467823 to 958f382 Compare May 31, 2026 22:01
- Skip depends_on resolution when a container has no compose project label,
  so a service name cannot match a container from an unrelated project.
- Poll dependency conditions with a single reused ticker instead of
  allocating a timer per iteration via time.After.
- Report a completed (exited 0, not restarted) one-shot container with
  CurrentReplicas=0 since it is no longer running, keeping Status=ready.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 8 comments.

Comment thread pkg/provider/docker/dependency_condition.go Outdated
Comment thread pkg/provider/docker/dependency_condition.go Outdated
Comment thread pkg/provider/docker/container_depends_on.go Outdated
Comment thread pkg/provider/docker/dependency_condition.go Outdated
Comment thread pkg/provider/docker/dependency_condition.go Outdated
Comment thread pkg/provider/docker/dependency_condition.go Outdated
Comment thread pkg/provider/docker/dependency_condition.go Outdated
Comment thread pkg/provider/docker/container_inspect.go
acouvreur added 5 commits May 31, 2026 18:32
- service_completed_successfully now treats an exited(0) dependency with an
  always/unless-stopped restart policy as not yet satisfied, since Docker
  will restart it. This matches InstanceInspect, which reports such a
  container as starting.
- startSingle no longer attempts to start a container that is already
  running and not paused, so depends_on works with always-on dependencies
  that Sablier does not manage. Scale mode still applies its resources.
Add white-box unit tests for checkDependencyCondition, isHealthy,
restartPolicyMode, restartsOnSuccess and healthStatus using a minimal
fake client, raising docker package coverage above the quality gate.

Also align the podman exited(0) inspect expectation with the docker
provider, which now reports completed one-shot containers as ready.
When a container exits with code 0 and no restart policy was explicitly
set (Docker leaves the name field empty), Sablier now keeps the
historical behavior and reports the container as Stopped so it can be
restarted on demand.

When a restart policy is explicitly set to "no" or "on-failure", the
policy is honored and the container is reported as Ready (one-shot/init
container that has completed its work).

"always" and "unless-stopped" continue to result in Starting (Docker
will restart the container automatically).

A TODO comment notes that we may change the unset-policy behavior in the
future to treat it the same as "no" (Docker's default).

Also update all compose examples and documentation snippets to include
`restart: unless-stopped` on every long-lived service, except for
intentional one-shot/init containers (e.g. the depends-on migration
which keeps `restart: "no"` to demonstrate the explicit policy behavior).

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 49 out of 49 changed files in this pull request and generated 3 comments.

Comment thread examples/depends-on/README.md Outdated
Comment on lines +181 to +182
level=DEBUG msg="starting depends_on dependency" dependency=db condition=service_healthy
level=DEBUG msg="starting depends_on dependency" dependency=migration condition=service_completed_successfully
Comment thread examples/depends-on/compose.yml Outdated
app:
restart: unless-stopped
image: sablierapp/mimic:v0.3.3
command:
walk = func(name string) (string, error) {
spec, err := p.Client.ContainerInspect(ctx, name, client.ContainerInspectOptions{})
if err != nil {
return "", fmt.Errorf("cannot inspect container: %w", err)
@sonarqubecloud

sonarqubecloud Bot commented Jun 1, 2026

Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
65.4% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci documentation Improvements or additions to documentation provider Issue related to a provider

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sablier should restart a container only once Correct way of setting up a multi service compose stacks with depends_on and traefik labels

3 participants