Skip to content

fix(git): narrow the repository lock to git working-copy mutations#9506

Draft
polmichel wants to merge 3 commits into
stablefrom
pmi-20260507-longduration
Draft

fix(git): narrow the repository lock to git working-copy mutations#9506
polmichel wants to merge 3 commits into
stablefrom
pmi-20260507-longduration

Conversation

@polmichel

@polmichel polmichel commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Intent

This branch narrows the repository lock so it only guards the git working-copy mutations (clone, fetch, worktree creation) and no longer wraps the object import into the graph.

The object import reads file content from a per-commit worktree that is pinned while the lock is held, so it does not need the lock itself. By moving the import outside the critical section, repositories can import concurrently while the on-disk git state stays correctly serialized.

A reusable, mock-free lock-recording test adapter underpins the series, so the "import runs outside the lock" guarantee is asserted directly from the recorded acquire/release boundaries.

Intermediate PRs

Each PR below was reviewed and approved before being merged into this branch:

PR Title Approved
#9439 test: reusable lock recording adapter Yes, merged 2026-06-04
#9440 fix(git): run object import outside the repository lock during sync Yes, merged 2026-06-08
#9441 fix(git): run object import outside the repository lock when adding a repository Yes, merged 2026-06-08

🤖 Generated with Claude Code


Summary by cubic

Narrowed the repository lock to only clone/fetch/worktree operations, and moved object imports outside the lock for sync and add-repository to reduce contention and allow concurrent imports. init() now fetches a missing commit under the lock when needed.

  • Bug Fixes

    • Run imports after releasing the repository lock in both sync and add-repo flows; git state mutations remain serialized.
    • InfrahubRepository.init() fetches a missing commit under the repository lock, then retries; still raises if absent.
    • create_commit_worktree() is local-only and raises CommitNotFoundError on a local miss (no implicit fetch).
  • Refactors

    • Introduced RepositorySyncer, RepositoryAdder, RepositoryImporter, and RepositoryFileImporter.
    • Split sync flow: collect_pending_imports() returns PendingObjectImport; sync() executes imports afterward.
    • Updated tasks to use the new components; existing RefreshGitFetch notifications are preserved.
    • Added a lock-recording test adapter and timeline to assert that imports run outside the repository lock.

Written for commit 60a1bbd. Summary will update on new commits.

Review in cubic

polmichel and others added 3 commits June 4, 2026 08:42
* adding new lock test adapter

* make the lock installation a fixture

* use constants

* typing issues

* docs(test): trim LockTimeline docstring to one line

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(test): give RecordingLock an explicit typed constructor

ty (the CI type checker) does not honor the mypy-style `# type: ignore[arg-type]`, so
forwarding `*args: object` into InfrahubLock.__init__ failed ty check. Mirror the base
constructor signature and forward named arguments instead, dropping the ignore.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test: accept a collection of lock names in assert_never_overlap

Generalize assert_never_overlap to take a Collection[str] and verify that
no two of the given locks are ever held simultaneously, instead of being
limited to exactly two lock names.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test: split checkpoint assertion and require injected timeline

- Replace the confusing `expected` flag on assert_held_at_checkpoint with
  two intent-revealing methods (assert_held_at_checkpoint /
  assert_not_held_at_checkpoint) delegating to a shared private helper.
- Make `timeline` a required argument of RecordingLockRegistry instead of
  constructing one internally; install_recording_lock_registry now owns
  the default construction and injects it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…9440)

* create a seam to inject the new test adapter

* refactor(git): split repo sync into git-state collection and import

Extract collect_pending_imports() which performs the on-disk git mutations
(fetch, branch creation, pull, commit-worktree pinning) and returns one entry
per branch to import, instead of importing inline. sync() now runs the returned
imports. Behavior is preserved; this is the seam that lets the import run outside
the repository lock.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(git): run object import outside the repository lock during sync

The repository lock is held to serialize git working-copy mutations, but it was
also kept reserved during the object import, which is the slow phase when a branch
carries conflicting data. Narrow the lock so it covers only the git-state work:

- sync_repository now locks collect_pending_imports() and runs the imports after
  releasing the lock; the import target is injectable for testing.
- sync_remote_repositories no longer wraps the origin sync and the reinit-fallback
  import in its outer repository lock; only the local init/reinit stays locked.

Imports read from immutable per-commit worktrees pinned during the locked phase, so
they no longer need the lock. Resolves the long lock reservation reported in IFC-1566.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(git): assert object import runs outside the repository lock

Drives sync_repository with a recording lock registry and an injected recording
importer, asserting the repository lock is not held when the import runs. Replaces
the earlier pydantic-recast/flow-override scaffold with the injected-importer seam.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(git): create RepositorySyncer and RepositoryImporter components

Replace the injected callable with a RepositoryImporter implementation that checkpoints
into the lock timeline, asserting the repository lock is not held when the import runs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(git): drop redundant lock-scope comment in sync flow

The narrow async-with scope is self-evident from the structure, and the
intent is pinned by test_repository_lock_released_before_import.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(git): rename BranchImport to PendingObjectImport

The dataclass models a pending import of repository objects (the target
Infrahub branch and the source commit), not a branch. Rename it and the
corresponding parameter so they read as queued import work, matching
collect_pending_imports / pending_imports.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(git): call assert_not_held_at_checkpoint for the released-lock check

The test invoked assert_held_at_checkpoint with an unsupported expected=
kwarg. Use the adapter's dedicated assert_not_held_at_checkpoint, which
expresses the intent that the repository lock is released before import.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… repository (#9441)

* create a seam to inject the new test adapter

* fix(git): run object import outside the repository lock during sync

The repository lock is held to serialize git working-copy mutations, but it was
also kept reserved during the object import, which is the slow phase when a branch
carries conflicting data. Narrow the lock so it covers only the git-state work:

- sync_repository now locks collect_pending_imports() and runs the imports after
  releasing the lock; the import target is injectable for testing.
- sync_remote_repositories no longer wraps the origin sync and the reinit-fallback
  import in its outer repository lock; only the local init/reinit stays locked.

Imports read from immutable per-commit worktrees pinned during the locked phase, so
they no longer need the lock. Resolves the long lock reservation reported in IFC-1566.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(git): create RepositorySyncer and RepositoryImporter components

Replace the injected callable with a RepositoryImporter implementation that checkpoints
into the lock timeline, asserting the repository lock is not held when the import runs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(git): run object import outside the repository lock when adding a repository

add_git_repository held the repository lock for the whole flow (local clone, default-branch
object import, and origin sync). Narrow it to cover only the local clone and pinning its
commit worktree; the import (via the injected RepositoryImporter) and the origin sync (via
RepositorySyncer) now run after the lock is released. Imports read from the immutable
per-commit worktree, so they no longer need the lock.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(git): assert add_git_repository imports outside the repository lock

Add a no-mock test driving add_git_repository with a recording lock registry and a recording
RepositoryImporter, asserting the repository lock is not held when the import runs. Update the
existing mock-based create test for the new structure (import via the importer, sync via
RepositorySyncer).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(git): drop redundant add_git_repository lock comment

The narrow async-with scope is self-evident, and the import-outside-lock invariant is pinned by test_add_git_repository_releases_lock_before_import.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(git): restore create_commit_worktree as a local-only primitive

Revert the on-demand remote fetch that was placed inside the worktree
primitive. The fetch mutates the shared clone and belongs at the
orchestration layer under the repository lock, not in a synchronous local
working-copy primitive. The primitive again raises CommitNotFoundError when
the commit is absent from the local clone.

The fetch-on-miss safety net is re-homed under the repository lock in a
follow-up commit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(git): fetch a missing commit under the repository lock during init

When a task references a commit that this worker's clone has not fetched
yet, init() now fetches once under the repository lock and retries the
worktree creation before surfacing CommitNotFoundError. The lock serializes
the shared-clone fetch against concurrent resets and merges.

This re-homes, at the orchestration layer, the recovery that previously
lived inside the create_commit_worktree primitive, and keeps the safety net
for the window the eventually-consistent sync fan-out leaves open: a
dispatched task can outrun the fetch broadcast, or run on a restarted or
newly added worker.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor: recording adapter test package + common class RecordingImporter

* refactor: renamed locally overridden object

* test(git): assert CommitNotFoundError message in worktree and init tests

Add match= to the CommitNotFoundError assertions so they verify the commit
SHA and repository identifier carried in the message, not just the exception
type.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(git): tighten init fetch-retry comment

Keep only the non-obvious rationale (the commit may be remote-only; the lock
serializes shared-clone mutations); drop the unknowable database provenance
and the enumerated operation list.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(git): use the recording lock adapter in the add-repo create test

Replace the patched lock mock and the weak assert_any_call with
install_recording_lock_registry and a timeline assertion that the repository
lock was acquired, removing unittest.mock from the lock check in this test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* tests: use proper fixtures

* refactor(git): extract RepositoryAdder and drop the add_git_repository importer param

Move the locked clone/worktree and the outside-lock default-branch import into
a RepositoryAdder component (sibling to RepositorySyncer). The task no longer
needs an injectable importer parameter and always uses RepositoryFileImporter;
the lock-release test injects the recording importer through the component
instead. Invert the active-status check to return early.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the group/backend Issue related to the backend (API Server, Git Agent) label Jun 8, 2026

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 issues found across 16 files

Confidence score: 3/5

  • There is a concrete regression risk in backend/infrahub/git/tasks.py: running import outside the lock without pinning a commit can import from a moved branch head, causing inconsistent objects.
  • backend/infrahub/git/repository.py has a medium-high reliability risk because postponing imports until after full collection may skip imports already gathered if a later branch operation fails.
  • Lock-scope changes in backend/infrahub/git/sync.py still appear to hold the repository lock across graph/API work (collect_pending_imports and InfrahubRepository.new()), so contention and throughput issues are still likely; the backend/tests/component/git/test_git_rpc.py note is minor/readability-only.
  • Pay close attention to backend/infrahub/git/tasks.py, backend/infrahub/git/repository.py, and backend/infrahub/git/sync.py - race conditions and lock scope here can affect correctness and sync performance.
Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="backend/tests/component/git/test_git_rpc.py">

<violation number="1" location="backend/tests/component/git/test_git_rpc.py:392">
P3: Unused fixture parameter `git_repos_dir`: the variable is declared in the function signature but never referenced in the function body, which is misleading for readers.</violation>
</file>

<file name="backend/infrahub/git/tasks.py">

<violation number="1" location="backend/infrahub/git/tasks.py:296">
P1: Import runs outside the lock without a pinned commit, creating a race where branch head can change before object import.</violation>
</file>

<file name="backend/infrahub/git/repository.py">

<violation number="1" location="backend/infrahub/git/repository.py:82">
P1: Deferring all imports until after full collection can drop imports for already-processed branches when a later branch operation raises.</violation>
</file>

<file name="backend/infrahub/git/sync.py">

<violation number="1" location="backend/infrahub/git/sync.py:51">
P2: `RepositoryAdder` keeps the repo lock during `InfrahubRepository.new()`, which includes async graph commit updates.</violation>

<violation number="2" location="backend/infrahub/git/sync.py:88">
P2: Repository lock still wraps graph/API calls via `collect_pending_imports`, so contention remains higher than intended.</violation>
</file>

Shadow auto-approve: would not auto-approve because issues were found.

Re-trigger cubic

infrahub_branch_name=infrahub_branch,
infrahub_branch_id=branches[infrahub_branch].id,
commit=pinned_commit,
await repo.import_objects_from_files( # type: ignore[call-overload]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Import runs outside the lock without a pinned commit, creating a race where branch head can change before object import.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/infrahub/git/tasks.py, line 296:

<comment>Import runs outside the lock without a pinned commit, creating a race where branch head can change before object import.</comment>

<file context>
@@ -288,55 +283,56 @@ async def sync_remote_repositories() -> None:
-                    infrahub_branch_name=infrahub_branch,
-                    infrahub_branch_id=branches[infrahub_branch].id,
-                    commit=pinned_commit,
+                await repo.import_objects_from_files(  # type: ignore[call-overload]
+                    git_branch_name=default_import_git_branch, infrahub_branch_name=infrahub_branch
                 )
</file context>

GraphQLError: When creating a branch in the graph fails for a reason other than the branch already existing.

"""
for pending in await self.collect_pending_imports(staging_branch=staging_branch):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Deferring all imports until after full collection can drop imports for already-processed branches when a later branch operation raises.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/infrahub/git/repository.py, line 82:

<comment>Deferring all imports until after full collection can drop imports for already-processed branches when a later branch operation raises.</comment>

<file context>
@@ -65,6 +75,23 @@ async def sync(self, staging_branch: str | None = None) -> None:
+            GraphQLError: When creating a branch in the graph fails for a reason other than the branch already existing.
+
+        """
+        for pending in await self.collect_pending_imports(staging_branch=staging_branch):
+            await self.import_objects_from_files(
+                infrahub_branch_name=pending.infrahub_branch_name,
</file context>


async def add(self, model: GitRepositoryAdd) -> InfrahubRepository:
async with self._lock_registry.get(name=model.repository_name, namespace="repository"):
repo = await InfrahubRepository.new(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: RepositoryAdder keeps the repo lock during InfrahubRepository.new(), which includes async graph commit updates.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/infrahub/git/sync.py, line 51:

<comment>`RepositoryAdder` keeps the repo lock during `InfrahubRepository.new()`, which includes async graph commit updates.</comment>

<file context>
@@ -0,0 +1,90 @@
+
+    async def add(self, model: GitRepositoryAdd) -> InfrahubRepository:
+        async with self._lock_registry.get(name=model.repository_name, namespace="repository"):
+            repo = await InfrahubRepository.new(
+                id=model.repository_id,
+                name=model.repository_name,
</file context>


async def sync(self, repo: InfrahubRepository, staging_branch: str | None = None) -> None:
async with self._lock_registry.get(name=repo.name, namespace="repository"):
pending_imports = await repo.collect_pending_imports(staging_branch=staging_branch)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Repository lock still wraps graph/API calls via collect_pending_imports, so contention remains higher than intended.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/infrahub/git/sync.py, line 88:

<comment>Repository lock still wraps graph/API calls via `collect_pending_imports`, so contention remains higher than intended.</comment>

<file context>
@@ -0,0 +1,90 @@
+
+    async def sync(self, repo: InfrahubRepository, staging_branch: str | None = None) -> None:
+        async with self._lock_registry.get(name=repo.name, namespace="repository"):
+            pending_imports = await repo.collect_pending_imports(staging_branch=staging_branch)
+        for pending_import in pending_imports:
+            await self._importer.import_branch(repo, pending_import)
</file context>

async def test_add_git_repository_releases_lock_before_import(
prefect_test_fixture: None,
git_upstream_repo_01: dict[str, str],
git_repos_dir: Path,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3: Unused fixture parameter git_repos_dir: the variable is declared in the function signature but never referenced in the function body, which is misleading for readers.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/tests/component/git/test_git_rpc.py, line 392:

<comment>Unused fixture parameter `git_repos_dir`: the variable is declared in the function signature but never referenced in the function body, which is misleading for readers.</comment>

<file context>
@@ -380,3 +384,31 @@ async def test_new_repository(self, setup: None, prefect_test_fixture: None) ->
+async def test_add_git_repository_releases_lock_before_import(
+    prefect_test_fixture: None,
+    git_upstream_repo_01: dict[str, str],
+    git_repos_dir: Path,
+) -> None:
+    """The default-branch import must run after the repository lock held for the clone is released."""
</file context>

@codspeed-hq

codspeed-hq Bot commented Jun 8, 2026

Copy link
Copy Markdown

Merging this PR will not alter performance

✅ 12 untouched benchmarks


Comparing pmi-20260507-longduration (60a1bbd) with stable (2146d13)1

Open in CodSpeed

Footnotes

  1. No successful run was found on stable (83a7695) during the generation of this report, so 2146d13 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

group/backend Issue related to the backend (API Server, Git Agent)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant