Skip to content

fix: continue git repository sync when a single branch fails (closes #9463)#9466

Draft
minitriga wants to merge 4 commits into
stablefrom
ai-bug-pipeline-9463-branch-sync-abort
Draft

fix: continue git repository sync when a single branch fails (closes #9463)#9466
minitriga wants to merge 4 commits into
stablefrom
ai-bug-pipeline-9463-branch-sync-abort

Conversation

@minitriga

@minitriga minitriga commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Why

When the git worker syncs a repository and one branch fails (e.g. the remote branch was force-pushed and now diverges from the worker's local copy), the exception escapes the per-branch loop in InfrahubRepository.sync() and all remaining branches are skipped. Because branch iteration is sorted, every 1-minute sync cycle fails on the same branch first, so branches sorting after it stay stale indefinitely. Reproduced against the released 1.9.7 image via testcontainers.

Goal: a failure scoped to one branch must not prevent the synchronization of the other branches.

Non-goals: surfacing branch sync errors to users in the UI (#9465), auto-resolving divergent branches, fixing sync_status reporting for starved branches.

Closes #9463

What changed

Behavioral changes:

  • A pull or import failure on one branch no longer aborts the sync of the remaining branches; each branch is now synced inside its own error boundary.
  • After all branches are processed, sync() raises a single consolidated RepositoryError listing the branches that failed, so the existing repository-level failure tagging and operational-status handling still fire.
  • RepositoryConnectionError and RepositoryCredentialsError still abort immediately — if the remote itself is unreachable or unauthorized, iterating the remaining branches is pointless.

Implementation notes:

  • The bodies of the new-branches and updated-branches loops moved to two private helpers (_sync_new_branch, _sync_updated_branch); the loops wrap them in try/except, accumulate failed branch names, and continue.
  • The boundary catches Exception (not just RepositoryError) because object imports re-raise arbitrary exceptions after recording the per-branch error-import status.

What stayed the same: no schema changes, no API contract changes, branch processing order unchanged, branches synced before a failure were already updated and still are.

How to review

  • backend/infrahub/git/repository.py is the entire production change — review sync() and the two new helpers.
  • Extra scrutiny: the broad except Exception per branch and the connection/credential re-raise — confirm no repo-wide failure mode is silently converted into a per-branch failure.
  • backend/tests/component/git/conftest.py adds the git_repo_07 fixture (one conflicting branch + one fast-forward branch); test_git_repository.py adds the regression test.

How to test

uv run pytest backend/tests/component/git/test_git_repository.py::test_sync_continues_after_branch_pull_failure -x -v
uv run pytest backend/tests/component/git/ -k "sync"

The new test fails on stable (the fast-forward branch stays on its old commit) and passes with this fix. Full git component suite: 102 passed, 1 pre-existing unrelated failure (test_extract_repo_file_information, also failing on clean stable).

Impact & rollout

  • Backward compatibility: sync() now raises a consolidated RepositoryError after processing all branches instead of the first branch's error; both callers already handle RepositoryError.
  • Performance: negligible — same operations, plus continued iteration after a failure.
  • Config/env changes: none.
  • Deployment notes: safe to deploy.

Checklist

  • Tests added/updated
  • Changelog entry added (uv run towncrier create ...)
  • External docs updated (if user-facing or ops-facing change) — N/A
  • Internal .md docs updated (internal knowledge and AI code tools knowledge) — N/A
  • I have reviewed AI generated content

Summary by cubic

Keep syncing git repositories even if one branch fails to pull/import, so other branches don’t get stuck. Fixes the starvation issue described in #9463 by surfacing a single consolidated error after all branches are processed.

  • Bug Fixes
    • Wrapped each branch’s sync in its own try/except; continue on failure. Still abort immediately on RepositoryConnectionError and RepositoryCredentialsError.
    • Extracted logic into _sync_new_branch and _sync_updated_branch; catch broad exceptions to include import errors.
    • Added regression test test_sync_continues_after_branch_pull_failure and fixture git_repo_07 (one conflicted branch, one fast-forward).

Written for commit 7f40f75. Summary will update on new commits.

Review in cubic

minitriga and others added 4 commits June 4, 2026 16:12
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A pull or import failure on one branch no longer aborts the per-branch
sync loop. Failed branches are recorded and reported once via a
consolidated RepositoryError after all branches have been processed.
Connection and credential errors still abort immediately since the
remote itself is unavailable.

Closes #9463

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the group/backend Issue related to the backend (API Server, Git Agent) label Jun 4, 2026

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 2 files (changes from recent commits).

Shadow auto-approve: would auto-approve. This change is a well-contained bug fix that wraps each branch's synchronization in its own error boundary, so a single branch failure no longer blocks sync of other branches, and it includes a regression test and proper re-raising of connection/credential errors.

Re-trigger cubic

@codspeed-hq

codspeed-hq Bot commented Jun 4, 2026

Copy link
Copy Markdown

Merging this PR will not alter performance

✅ 12 untouched benchmarks


Comparing ai-bug-pipeline-9463-branch-sync-abort (1ae9c83) with stable (31f082e)

Open in CodSpeed

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 4 files

Confidence score: 3/5

  • There is a concrete regression risk in backend/infrahub/git/repository.py: a broad per-branch except Exception around push() may swallow transport/auth failures instead of aborting immediately.
  • Because this is severity 7/10 with high confidence (8/10) and can mask connectivity or credential errors, the change carries user-impacting operational risk until exception handling is narrowed.
  • Pay close attention to backend/infrahub/git/repository.py - ensure push() connectivity/auth exceptions are surfaced and trigger the required immediate abort behavior.
Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="backend/infrahub/git/repository.py">

<violation number="1" location="backend/infrahub/git/repository.py:100">
P1: Broad per-branch `except Exception` can hide transport/auth failures from `push()`, preventing required immediate abort on repo connectivity/credentials errors.</violation>
</file>

Shadow auto-approve: would not auto-approve because issues were found.

Re-trigger cubic

await self._sync_new_branch(branch_name=branch_name)
except (RepositoryConnectionError, RepositoryCredentialsError):
raise
except Exception as exc:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Broad per-branch except Exception can hide transport/auth failures from push(), preventing required immediate abort on repo connectivity/credentials errors.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/infrahub/git/repository.py, line 100:

<comment>Broad per-branch `except Exception` can hide transport/auth failures from `push()`, preventing required immediate abort on repo connectivity/credentials errors.</comment>

<file context>
@@ -80,49 +84,89 @@ async def sync(self, staging_branch: str | None = None) -> None:
+                    await self._sync_new_branch(branch_name=branch_name)
+                except (RepositoryConnectionError, RepositoryCredentialsError):
+                    raise
+                except Exception as exc:
+                    log.warning(
+                        f"Unable to synchronize the new branch: {exc}", repository=self.name, branch=branch_name
</file context>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

group/backend Issue related to the backend (API Server, Git Agent)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: git repository sync aborts remaining branches when one branch fails

1 participant