feat(docker): respect restart policy and resolve depends_on on start#956
feat(docker): respect restart policy and resolve depends_on on start#956acouvreur wants to merge 18 commits into
depends_on on start#956Conversation
Fixes two related issues in the Docker provider: 1. Respect container restart policy for exited containers (#952) A container that exits with code 0 under a non-restarting policy (no, on-failure) is now reported as Ready/completed rather than Stopped. This prevents Sablier from endlessly restarting init/ migration containers that have already done their work. 2. Resolve Docker Compose depends_on before starting a container (#792) InstanceStart now reads the com.docker.compose.depends_on label and starts each declared dependency — recursively — before starting the target container. The provider waits for each dependency to satisfy its declared condition (service_started, service_healthy, service_completed_successfully, service_running_or_healthy) before proceeding to the next one. Also adds: - Unit tests for parseComposeDependsOn - Integration tests for service_completed_successfully and service_healthy - examples/depends-on: runnable Compose stack with walkthrough README
|
There was a problem hiding this comment.
Pull request overview
This PR improves the Docker provider’s behavior in real-world Docker Compose stacks by (1) mapping exited containers to Sablier statuses in a way that respects Docker restart policies (particularly for one-shot init/migration containers) and (2) starting containers in Compose dependency order by resolving com.docker.compose.depends_on prior to starting the target container.
Changes:
- Update Docker inspect status mapping so exit-0 containers with non-restarting policies are treated as “Ready (completed)”, while exit-0 containers that will be restarted by Docker are treated as “Starting”.
- Add Compose
depends_onlabel parsing + dependency graph resolution soInstanceStartstarts and waits for dependencies according to the declared condition. - Add unit + integration tests and a runnable
examples/depends-onCompose stack to demonstrate/verify behavior.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/provider/docker/container_start.go | Routes InstanceStart through a recursive start path that resolves Compose depends_on before starting the target container. |
| pkg/provider/docker/container_inspect.go | Adjusts exited-container status mapping based on restart policy; adds helpers for restart policy evaluation. |
| pkg/provider/docker/container_inspect_test.go | Updates/extends regression coverage for exited containers + restart policy behavior. |
| pkg/provider/docker/container_depends_on.go | New dependency-resolution implementation: parse label, locate dependency containers, start recursively, and wait on Compose conditions. |
| pkg/provider/docker/container_depends_on_unit_test.go | New table-driven unit tests for parsing the depends_on label format. |
| pkg/provider/docker/container_depends_on_test.go | New DinD integration tests verifying dependency ordering and condition waiting. |
| examples/depends-on/README.md | Documentation for the runnable example and expected behavior/logs. |
| examples/depends-on/Makefile | Helper targets to run the example stack and trigger a blocking start. |
| examples/depends-on/compose.yml | Example Compose stack demonstrating service_healthy and service_completed_successfully chains. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Test Results✅ All tests passed! | 557 tests in 183.312s
|
Label the long-running db service so it is scaled to zero with the app group, and keep the one-shot migration container unlabeled (a one-shot job must never join a blocking group or it would never become ready). Document restart-policy handling and depends_on ordering, including when to set sablier.enable on a dependency.
Address PR review feedback on depends_on resolution: - Break dependency cycles using a separate in-progress recursion stack so a cyclic dependency is skipped instead of blocking until context timeout. - Fail fast with a clear error when a service_healthy dependency has no healthcheck configured, instead of looping until the deadline. - Make findComposeContainer deterministic: prefer a running container, otherwise pick the lexicographically smallest name.
Replace the per-start skip-based cycle breaker with a global, concurrency- safe dependency graph. Each instance's depends_on tree is built and committed to the graph, which must remain a Directed Acyclic Graph. If committing the tree would introduce a self-loop or cycle, the whole tree is considered invalid and ignored: the instance is still started on its own and a warning explaining the reason is logged.
Drop the persistent global dependency graph (mutex + edge refcounting + per-root commit/rollback) in favour of a per-start cycle check that mirrors Docker Compose's own Graph.HasCycles: build the resolved depends_on tree and run a three-color DFS over it. A cycle reachable from a root is always fully contained in that root's tree, so a local check is equivalent to the global one but far simpler and stateless. As before, an invalid (cyclic) tree is ignored with a warning naming the offending path, and the instance is started on its own.
Break the monolithic container_depends_on.go into focused files: - compose.go: Compose label/condition constants, label parsing, container discovery. - dependency_graph.go: the resolved dependency tree and cycle detection, with no Docker dependencies. - dependency_condition.go: waiting for and evaluating depends_on conditions. - container_depends_on.go: orchestration that builds the tree and starts it in dependency order. Rename treeDep to dependency, trim comments to explain intent only, and rename the test files to match their source files. No behaviour change.
4467823 to
958f382
Compare
- Skip depends_on resolution when a container has no compose project label, so a service name cannot match a container from an unrelated project. - Poll dependency conditions with a single reused ticker instead of allocating a timer per iteration via time.After. - Report a completed (exited 0, not restarted) one-shot container with CurrentReplicas=0 since it is no longer running, keeping Status=ready.
- service_completed_successfully now treats an exited(0) dependency with an always/unless-stopped restart policy as not yet satisfied, since Docker will restart it. This matches InstanceInspect, which reports such a container as starting. - startSingle no longer attempts to start a container that is already running and not paused, so depends_on works with always-on dependencies that Sablier does not manage. Scale mode still applies its resources.
Add white-box unit tests for checkDependencyCondition, isHealthy, restartPolicyMode, restartsOnSuccess and healthStatus using a minimal fake client, raising docker package coverage above the quality gate. Also align the podman exited(0) inspect expectation with the docker provider, which now reports completed one-shot containers as ready.
When a container exits with code 0 and no restart policy was explicitly set (Docker leaves the name field empty), Sablier now keeps the historical behavior and reports the container as Stopped so it can be restarted on demand. When a restart policy is explicitly set to "no" or "on-failure", the policy is honored and the container is reported as Ready (one-shot/init container that has completed its work). "always" and "unless-stopped" continue to result in Starting (Docker will restart the container automatically). A TODO comment notes that we may change the unset-policy behavior in the future to treat it the same as "no" (Docker's default). Also update all compose examples and documentation snippets to include `restart: unless-stopped` on every long-lived service, except for intentional one-shot/init containers (e.g. the depends-on migration which keeps `restart: "no"` to demonstrate the explicit policy behavior).
| level=DEBUG msg="starting depends_on dependency" dependency=db condition=service_healthy | ||
| level=DEBUG msg="starting depends_on dependency" dependency=migration condition=service_completed_successfully |
| app: | ||
| restart: unless-stopped | ||
| image: sablierapp/mimic:v0.3.3 | ||
| command: |
| walk = func(name string) (string, error) { | ||
| spec, err := p.Client.ContainerInspect(ctx, name, client.ContainerInspectOptions{}) | ||
| if err != nil { | ||
| return "", fmt.Errorf("cannot inspect container: %w", err) |
|


This PR addresses two closely related issues that prevented Sablier from working correctly with real-world multi-service Compose stacks: init/migration containers and ordered startup with
depends_on.Closes #792
Closes #952
What changed
1. Respect the Docker restart policy for exited containers (
#952)The problem. When a container finishes its job and exits with code
0(a common pattern for one-shot init or schema-migration containers), Sablier was reporting it asStopped. This caused Sablier to restart the container on every readiness poll, which in turn prevented the main application from ever becoming ready — because it was always waiting for itsservice_completed_successfullydependency to finish.The fix.
InstanceInspectnow consults the container's restart policy before mapping an exit-0 state:always/unless-stoppedStartingno/on-failureReady(completed)Non-zero exits are still reported as
Errorregardless of policy, so real failures are not masked.2. Resolve
depends_onbefore starting a container (#792)The problem. Sablier starts every container in a group concurrently. When a group contains both an application and its database (or a migration), the app often starts before the database is ready, leading to connection errors.
The fix.
InstanceStartnow reads thecom.docker.compose.depends_onlabel that Docker Compose writes automatically and resolves the full dependency graph before starting the target container:All four Docker Compose conditions are supported:
service_startedState.Running == trueservice_healthyservice_completed_successfullyStateExited+ exit code 0service_running_or_healthyDependencies that are not in the same Compose project (or have no matching container) are skipped with a warning — Sablier never fails a start because of an optional or external dependency.
Cycle protection is built in: a
startedset is threaded through every recursiveinstanceStartcall; any container that is already being started is skipped.How to migrate
No configuration changes are required. The behaviour changes automatically:
Before this PR — Affine / Paperless / any stack with a migration container:
After this PR:
Runnable example
A complete
docker composestack is provided underexamples/depends-on/:Tests
TestParseComposeDependsOn— table-driven tests for the label parser (no daemon required, runs in-shortmode).TestDockerClassicProvider_StartWithDependsOnandTestDockerClassicProvider_StartWithDependsOnHealthy— dind-based tests that create real containers, setcom.docker.compose.*labels, callInstanceStart, and assert the full dependency chain was started in the correct order.exited container with status code 0test has been updated to assertReady(wasStopped); a newon-failurerestart-policy variant is added.EDIT: Not satisfied with current implementation
Will have to properly implement in two phases: