Skip to content

bug: workspace lock heartbeat never pumped during deploy (concurrent-attempt race) #105

Description

@pparage

Problem

The deployment workspace lock has a heartbeat mechanism, but it is never pumped during a running deploy — so any deploy longer than the stale threshold can be preempted by a second concurrent attempt.

Evidence

  • app/core/locks.py — lock is stale after 3 * heartbeat_interval_s (default 30s → 90s); heartbeat() exists at locks.py:89 but is not called from the deploy path.
  • app/core/deploy_trigger.py:80acquire_lock(...) is called once at start; no heartbeat pump loop for the duration of the ansible-runner execution.

Impact (High / P1)

Real deploys routinely exceed 90s. After the lock goes stale, a second POST to deploy the same workspace can acquire the lock and run concurrently against the same target — racing inventory/state writes.

Suggested fix

Pump heartbeat() on an interval (apscheduler job or a background task tied to the attempt lifecycle) while the runner is active; release on completion. Alternatively bump the stale threshold to exceed max expected deploy time, but a real heartbeat is preferred.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions