docs: add design doc for operator/operand repo split by Patryk-Stefanski · Pull Request #1078 · Kuadrant/mcp-gateway

Patryk-Stefanski · 2026-06-04T11:04:49Z

Summary

Design doc for splitting mcp-gateway into two repos: mcp-gateway (operand) and mcp-gateway-operator (control-plane)
Covers repository boundary, runtime contracts (config secret, status endpoint, deployment), CI/CD split, OLM/release pipeline, and phased migration plan
Full coupling analysis: 10 compile-time coupling points (5 controller→operand, 5 operand→CRD) to resolve in Phase 1

Relates-to: #1020

Test plan

Review design doc for technical accuracy
Verify coupling analysis matches current codebase (grep -rn 'api/v1alpha1' internal/broker/ cmd/mcp-broker-router/)
Verify config secret schema matches internal/config/types.go

Summary by CodeRabbit

Documentation
- Added a comprehensive design proposing splitting the monorepo into separate operand and operator repositories. Defines clear repo boundaries, runtime and config contracts between operator and operand, deployment and health-check expectations, CI/CD and release-flow guidance, packaging/artifact ownership, cross-repo compatibility/testing workflow, and a three-phase migration plan with compatibility and migration guidance.

Covers repository boundary, operator-operand runtime contracts (config secret schema, status endpoint, deployment contract), CI/CD pipeline split, OLM/release pipeline considerations, phased migration plan, and full coupling analysis. Relates-to: #1020 Signed-off-by: Patryk Stefanski <pstefans@redhat.com>

coderabbitai · 2026-06-04T11:05:02Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

New design document formalizing splitting the mcp-gateway monorepo into operand (mcp-gateway) and operator (mcp-gateway-operator) repos. It defines repo boundaries, the mcp-gateway-config Secret schema, operator /status and deployment contracts, MCP dual-spec expectations, CI/CD and OLM packaging split, and a three-phase migration and artifact ownership plan.

Changes

Repository Split Design

Layer / File(s)	Summary
Problem statement, boundaries, and repository structure `docs/design/repo-split/repo-split-design.md`	Introduces motivation for separating control-plane operator and data-plane operand into independently released repos with no compile-time Go dependencies; specifies directory layout, cmd entrypoints, and package groupings for each repo post-split.
Configuration Secret and /status contract `docs/design/repo-split/repo-split-design.md`	Defines the `mcp-gateway-config` Secret schema and `config.yaml` shape, operand ignore-unknown-fields behavior and `configVersion` warning semantics; specifies operator `/status` polling contract and operator-controlled deployment requirements (image selection, stable CLI/env vars, fixed ports, `/readyz`, and config mount path).
MCP transition, CI/CD split, and OLM movement `docs/design/repo-split/repo-split-design.md`	Documents operand dual-spec requirement during MCP transition; splits CI/CD with cross-repo compatibility jobs, operand Helm e2e and operator envtest, release sequencing with operator pinning operand versions; moves OLM/packaging artifacts to the operator repo.
Phased migration plan, artifact mapping, and post-split expectations `docs/design/repo-split/repo-split-design.md`	Details three-phase migration (decouple compile-time imports, create operator repo with new `go.mod`, clean operand repo), includes "what moves/what stays" ownership tables, and post-split Makefile/docs/test-server responsibilities.
Compile-time coupling analysis and precedent `docs/design/repo-split/repo-split-design.md`	Enumerates compile-time coupling points to resolve in Phase 1 and compares the proposal to Kuadrant precedents (`authorino`/`limitador`).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Split operator and operand into separate repositories #1020 — This design document formalizes the repository split request and provides the boundaries, contracts, and phased migration approach to implement it.

Suggested labels

review-effort/large

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: adding a design document for splitting the monorepo into operator and operand repositories.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch worktree-1020-split-design

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

docs/design/repo-split/repo-split-design.md (1)
350-350: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Ensure the file ends with a trailing newline.

Line 350 appears to be EOF without a final newline; please add one to satisfy repo-wide file rules.

As per coding guidelines: "Files must end with a newline."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/design/repo-split/repo-split-design.md` at line 350, The file
docs/design/repo-split/repo-split-design.md is missing a trailing newline at
EOF; open the file and add a single newline character at the end so the file
ends with a newline (ensure your editor/save settings preserve it), then commit
the change so the file complies with the "Files must end with a newline" rule.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/design/repo-split/repo-split-design.md`:
- Around line 41-59: The fenced code blocks containing repository tree/layout
diagrams (the triple-backtick blocks showing "mcp-gateway/ ... go.mod" and the
other similar blocks around the same sections) are missing a language identifier
and should be updated to include one (e.g., add "text" after the opening ```);
locate the three markdown fenced blocks referenced in the diff (the tree/layout
blocks at the earlier block and the other two blocks noted in the comment) and
change their opening fences from ``` to ```text so the linter MD040 is
satisfied.
- Around line 7-349: The design doc docs/design/repo-split/repo-split-design.md
is missing mandated sections and subsections; add top-level sections "Problem",
"Goals/Non-Goals", "Job Stories", "Security Considerations", "Relationship to
Existing Approaches", "Future Considerations", and "Execution", and within the
existing "Design" section add subsections "Prerequisites", "Flow", "Component
Responsibilities", "API Changes", and "Data storage". Insert concise content
matching the repo-split context (referencing symbols like mcp-gateway,
mcp-gateway-operator, Config Secret `mcp-gateway-config`,
StatusResponse/ServerStatus structs, and the three-phase Migration Plan) so each
new section clearly states the issue, objectives, responsibilities, expected
API/format changes (e.g., configVersion and /status contract), storage impacts,
security considerations, and an explicit Execution plan aligned to the existing
Phase 1–3 migration steps.
- Around line 20-218: Add a mermaid flow diagram illustrating the
operator↔operand interactions (deployments, config secret write/read, status
call to GET /status, image/flag/env var/port contracts) and insert it into the
Design section (near "Operator-Operand Contract" or right after "Repository
Structure"); also add persona-based job stories (one-liners using the required
template "When <situation>, <actor> <context> wants <action/outcome> so that
<benefit>") covering the listed personas — Platform Engineer, MCP client user,
MCP Developer, non-interactive agent — and include both happy-path and at least
one edge-case story for each (e.g., unknown configVersion, disabled server,
missing env var, protocol version mismatch), ensuring each story references the
relevant contract pieces like the config secret (mcp-gateway-config), GET
/status, and deployment contract items.

---

Outside diff comments:
In `@docs/design/repo-split/repo-split-design.md`:
- Line 350: The file docs/design/repo-split/repo-split-design.md is missing a
trailing newline at EOF; open the file and add a single newline character at the
end so the file ends with a newline (ensure your editor/save settings preserve
it), then commit the change so the file complies with the "Files must end with a
newline" rule.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1c955975-7668-490c-ab20-346b152b0417

📥 Commits

Reviewing files that changed from the base of the PR and between 322a175 and 0289a54.

📒 Files selected for processing (1)

docs/design/repo-split/repo-split-design.md

Signed-off-by: Patryk Stefanski <pstefans@redhat.com>

maleck13 · 2026-06-04T14:17:22Z

+
+#### Cross-Repo E2E
+
+The operator repo's e2e tests pull the operand image by tag:


The issue is we want a bunch of e2e tests to execute on changes to the gateway not just on changes to the operator

What if the operand repo's CI builds a temporary image from the PR, deploys the latest released operator (via Helm), and runs the operator repo's e2e suite against that image.

The e2e test definitions stay in the operator repo as the single source of truth — the operand CI pulls them at test time (git checkout or similar). The operator repo continues running the same suite on its own PRs against a released operand image.

This gives full e2e coverage on operand PRs without duplicating the test suite or adding cross-repo dispatch latency.

Well yes that works in many cases. But it doesn't work when the operator and the operand change together.
This is not that un-common, by nature they are tightly coupled components, this is why the single monorepo works so well.

Without the mono repo you end up In a chicken and egg situation where a new flag is added or env var you end up with a chicken and egg situation. (operator needs new image of the operand to properly test , operand needs a new image of the operator to properly test).

To do this properly we probably need the following:

Helm based deployment in the operand repo that deploys the gateway (without operator)

Gateway focused e2e test suite in the operand repo that executes against functionality of the gateway (creates the config secret with expected config, sets the needed env vars / flags to enable the features etc under test , uses the test servers). Main functionality it tested here.

Helm based deployment in the operator repo that deploys the operator

Integration (controller envtest suite )in the operator repo that tests it created the deployment and secret etc correctly based on MCPServerReg , MCPGatewayExt etc...

Manually triggered job that deploys version x of the operand with version y of the operator. (could execute nightly also, and must be executed on release).

So to summerise:

Operand repo — tests the gateway as a gateway. Direct config secret, direct flags, no operator in the loop. This is where most feature work lands and where most e2e coverage lives.

Operator repo — tests the controller as a controller. envtest proves it produces correct Deployments/Secrets/RBAC from CRs. No real gateway needed.

Cross-repo job — the integration point. Manually triggered or nightly, catches version skew. Required gate before release.

Or we just keep it as a mono repo :)

I added the e2e suggestion to the design doc, keeping it a mono repo I think is a wider discussion 😆

maleck13 · 2026-06-04T14:25:56Z

+
+### OLM and Release Pipeline
+
+CI is GitHub Actions-based — no Tekton pipelines to split. The OLM packaging artifacts (`bundle/`, `catalog/`, `bundle.Dockerfile`, `build/olm.mk`, `utils/generate-catalog.sh`) and AppStudio ReleasePlan resources all move to the operator repo. The operand repo has no OLM footprint.


Old name for konflux, will update.

…oduct build repos Signed-off-by: Patryk Stefanski <pstefans@redhat.com>

Signed-off-by: Patryk Stefanski <pstefans@redhat.com>

maleck13 · 2026-06-05T13:00:26Z

+│   ├── mcp-system/      # K8s deployment manifests
+│   ├── test-servers/    # K8s manifests for test servers
+│   └── samples/         # example manifests
+├── tests/e2e/           # full e2e tests (pulls released operand image)


this is out of date now

maleck13 · 2026-06-05T13:02:50Z

+- The operator must not remove a field without a deprecation period spanning at least one operand release, preventing older operands from receiving a config missing a field they expect.
+- New required fields must have sensible defaults so older operators that don't set them still produce valid config.
+- Breaking changes require a coordinated release with both repos tagged.
+- The config includes a top-level `configVersion: 1` field. The operand checks this on load — known versions are processed normally; unknown versions log a warning with the expected and actual values. This enables detection of operator/operand version mismatches without hard failures.


what is deciding the configVersion? Is that based on something like the operator version?

It's a schema version, not tied to the operator release version. Renamed to schemaVersion to make that clearer.

It only bumps when the config secret format has a breaking structural change (field renames, type changes, removed fields). Additive changes like new optional fields don't bump it. This way the operand just checks "do I understand this config shape?" — a simple integer comparison — without needing to know anything about operator release versions. The operator can ship many releases without the schema version changing at all.

maleck13 · 2026-06-05T13:04:45Z

+| Health | HTTP | `GET /readyz` on port 8080 |
+| Config mount | Volume | `/config/config.yaml` |
+
+### MCP Spec Considerations


I think we cab remove this section. It has no bearing on the the actual design imo

maleck13 · 2026-06-05T13:06:57Z

+
+A workflow in the operator repo that deploys both components together and runs full integration tests:
+
+- **Nightly:** Latest released operator + latest released operand. Catches drift.


I think this should run main against main so build a "nigthly" or something like that tag and execute the :nightly operator with the :nightly operand

maleck13 · 2026-06-05T13:08:09Z

+A workflow in the operator repo that deploys both components together and runs full integration tests:
+
+- **Nightly:** Latest released operator + latest released operand. Catches drift.
+- **Manual:** Accepts image overrides for both components (e.g., operand PR image + operator PR image). Used for coordinated changes that touch both repos.


This manual one allows us to do something custom. Like run rc1 of the operand with v1.5 of the operator cause there were no changes to the operator code

maleck13 · 2026-06-05T13:12:55Z

+
+- **Operator repo** gets a new `Makefile` with controller-specific targets: `generate`, `manifests`, `docker-build`, `test-unit`, `test-controller-integration`, `lint`, `helm-*`, and CRD generation (`controller-gen`).
+- **Operand repo** retains the existing `Makefile` but removes controller-specific targets (`docker-build-controller`, `generate`, `manifests`, CRD generation). Keeps `build`, `docker-build`, `test-unit`, `lint`, and performance test targets.
+


We need to have an operator make command that will deploy the operator with a sepecific version of the operand. So from the operand we could run make load-image and then `make deploy-operator OPERAND_IMAGE=xxxx:y

maleck13 · 2026-06-05T13:15:25Z

+
+### Test Server Images
+
+Test server source code (`tests/servers/`) stays in the operand repo and produces test server container images. The operator repo's e2e tests reference these images. The operator repo contains the Kubernetes manifests (`config/test-servers/`) that deploy them but does not build them — it pulls pre-built images from the operand repo's CI.


Do we still need tests/servers in the operator repo? Perhaps we just need one so we can test MCPServerRegistration?

maleck13

few more minor things but getting there

- Update operator repo structure (catalog, rbac, minimal test server) - Clarify configVersion is a schema version, not release version - Remove MCP Spec Considerations section (no bearing on design) - Nightly cross-repo job builds from main, not released images - Add make deploy-operator OPERAND_IMAGE= target to Makefile section - Reduce operator test servers to one minimal server for MCPServerRegistration Signed-off-by: Patryk Stefanski <pstefans@redhat.com>

Signed-off-by: Patryk Stefanski <pstefans@redhat.com>

david-martin · 2026-06-16T10:21:18Z

The doc covers which code moves where, but the user-facing docs boundary could be stated more explicitly. The "What Moves" table lists docs/guides/ and docs/reference/, and the docs.kuadrant.io section mentions updating mkdocs.yml, but a reader (or agent/claude) could miss that all user-facing documentation lives in the operator repo post-split.

A few places where this could be clearer:

Repository Boundary table: no mention of docs ownership. A row for user-facing docs (or a note that the operand repo has none) would make the boundary obvious upfront.
Repository Structure trees: neither tree shows docs/guides/ or docs/reference/. The operator tree should reflect that these live there.
docs.kuadrant.io section: says to update mkdocs.yml references but doesn't explicitly state the principle: all user-facing docs live in the operator repo.
Phase 3 cleanup: lists code to remove from the operand but doesn't mention removing docs/guides/ and docs/reference/.

david-martin · 2026-06-18T16:38:09Z

Let's hold this, pending alignment with Kuadrant/architecture#179
@maleck13 I know you had some thoughts on how including mcp gateway in the umbrella operator work could substitute for this work

maleck13 · 2026-06-19T09:35:02Z

Yes it would be worth thinking about. Rather than splitting out the operator. That said Kuadrant is not currently a multi-tenant deployment but MCP-Gateway is. If we could put the effort we were going to put into this work into the umbrella operator approach that would likely be a better use of time

maleck13 · 2026-06-19T09:35:12Z

cc @mikenairn

Patryk-Stefanski · 2026-06-25T13:46:23Z

/closing for now , superseded by #1198

coderabbitai Bot added the review-effort/medium Medium review effort (3): few files, moderate logic label Jun 4, 2026

coderabbitai Bot reviewed Jun 4, 2026

View reviewed changes

Comment thread docs/design/repo-split/repo-split-design.md

Comment thread docs/design/repo-split/repo-split-design.md

Comment thread docs/design/repo-split/repo-split-design.md Outdated

docs: add language tags to fenced code blocks

526ad85

Signed-off-by: Patryk Stefanski <pstefans@redhat.com>

maleck13 reviewed Jun 4, 2026

View reviewed changes

Comment thread docs/design/repo-split/repo-split-design.md

maleck13 reviewed Jun 4, 2026

View reviewed changes

Comment thread docs/design/repo-split/repo-split-design.md

maleck13 reviewed Jun 4, 2026

View reviewed changes

docs: update release pipeline section with Konflux terminology and pr…

764c00a

…oduct build repos Signed-off-by: Patryk Stefanski <pstefans@redhat.com>

coderabbitai Bot added review-effort/small Low review effort (1-2): straightforward, single file, config/docs and removed review-effort/medium Medium review effort (3): few files, moderate logic labels Jun 4, 2026

docs: rework testing strategy for repo split per review feedback

d80f611

Signed-off-by: Patryk Stefanski <pstefans@redhat.com>

coderabbitai Bot added review-effort/large High review effort (4-5): many files, complex, cross-cutting and removed review-effort/small Low review effort (1-2): straightforward, single file, config/docs labels Jun 5, 2026

maleck13 reviewed Jun 5, 2026

View reviewed changes

Patryk-Stefanski added 2 commits June 9, 2026 09:57

docs: rename configVersion to schemaVersion in config secret contract

0698a91

Signed-off-by: Patryk Stefanski <pstefans@redhat.com>

Patryk-Stefanski closed this Jun 25, 2026


		#### Cross-Repo E2E

		The operator repo's e2e tests pull the operand image by tag:


		### OLM and Release Pipeline

		CI is GitHub Actions-based — no Tekton pipelines to split. The OLM packaging artifacts (`bundle/`, `catalog/`, `bundle.Dockerfile`, `build/olm.mk`, `utils/generate-catalog.sh`) and AppStudio ReleasePlan resources all move to the operator repo. The operand repo has no OLM footprint.


		A workflow in the operator repo that deploys both components together and runs full integration tests:

		- Nightly: Latest released operator + latest released operand. Catches drift.


		- Operator repo gets a new `Makefile` with controller-specific targets: `generate`, `manifests`, `docker-build`, `test-unit`, `test-controller-integration`, `lint`, `helm-*`, and CRD generation (`controller-gen`).
		- Operand repo retains the existing `Makefile` but removes controller-specific targets (`docker-build-controller`, `generate`, `manifests`, CRD generation). Keeps `build`, `docker-build`, `test-unit`, `lint`, and performance test targets.


		### Test Server Images

		Test server source code (`tests/servers/`) stays in the operand repo and produces test server container images. The operator repo's e2e tests reference these images. The operator repo contains the Kubernetes manifests (`config/test-servers/`) that deploy them but does not build them — it pulls pre-built images from the operand repo's CI.

Uh oh!

Conversation

Patryk-Stefanski commented Jun 4, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Suggested labels

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maleck13 left a comment

Choose a reason for hiding this comment

Uh oh!

david-martin commented Jun 16, 2026

Uh oh!

david-martin commented Jun 18, 2026

Uh oh!

maleck13 commented Jun 19, 2026

Uh oh!

maleck13 commented Jun 19, 2026

Uh oh!

Patryk-Stefanski commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Patryk-Stefanski commented Jun 4, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading